11.5. Security Considerations

Containers use the kernel of the host system. This exposes an attack surface for malicious users. In general, full virtual machines provide better isolation. This should be considered if containers are provided to unknown or untrusted people.

To reduce the attack surface, LXC uses many security features like AppArmor, CGroups and kernel namespaces.

AppArmor profiles are used to restrict access to possibly dangerous actions. Some system calls, i.e. mount, are prohibited from execution.

To trace AppArmor activity, use:

# dmesg | grep apparmor

Although it is not recommended, AppArmor can be disabled for a container. This brings security risks with it. Some syscalls can lead to privilege escalation when executed within a container if the system is misconfigured or if a LXC or Linux Kernel vulnerability exists.

To disable AppArmor for a container, add the following line to the container configuration file located at /etc/pve/lxc/CTID.conf:

lxc.apparmor.profile = unconfined

cgroup is a kernel mechanism used to hierarchically organize processes and distribute system resources.

The main resources controlled via cgroups are CPU time, memory and swap limits, and access to device nodes. cgroups are also used to "freeze" a container before taking snapshots.

There are 2 versions of cgroups currently available, legacy and cgroupv2.

Since Proxmox VE 7.0, the default is a pure cgroupv2 environment. Previously a "hybrid" setup was used, where resource control was mainly done in cgroupv1 with an additional cgroupv2 controller which could take over some subsystems via the cgroup_no_v1 kernel command line parameter. (See the kernel parameter documentation for details.)

The main difference between pure cgroupv2 and the old hybrid environments regarding Proxmox VE is that with cgroupv2 memory and swap are now controlled independently. The memory and swap settings for containers can map directly to these values, whereas previously only the memory limit and the limit of the sum of memory and swap could be limited.

Another important difference is that the devices controller is configured in a completely different way. Because of this, file system quotas are currently not supported in a pure cgroupv2 environment.

cgroupv2 support by the container’s OS is needed to run in a pure cgroupv2 environment. Containers running systemd version 231 or newer support cgroupv2 [44], as do containers not using systemd as init system [45].

Note

CentOS 7 and Ubuntu 16.10 are two prominent Linux distributions releases, which have a systemd version that is too old to run in a cgroupv2 environment, you can either

  • Upgrade the whole distribution to a newer release. For the examples above, that could be Ubuntu 18.04 or 20.04, and CentOS 8 (or RHEL/CentOS derivatives like AlmaLinux or Rocky Linux). This has the benefit to get the newest bug and security fixes, often also new features, and moving the EOL date in the future.
  • Upgrade the Containers systemd version. If the distribution provides a backports repository this can be an easy and quick stop-gap measurement.
  • Move the container, or its services, to a Virtual Machine. Virtual Machines have a much less interaction with the host, that’s why one can install decades old OS versions just fine there.
  • Switch back to the legacy cgroup controller. Note that while it can be a valid solution, it’s not a permanent one. There’s a high likelihood that a future Proxmox VE major release, for example 8.0, cannot support the legacy controller anymore.


[44] this includes all newest major versions of container templates shipped by Proxmox VE

[45] for example Alpine Linux