Containers use the kernel of the host system. This exposes an attack surface for malicious users. In general, full virtual machines provide better isolation. This should be considered if containers are provided to unknown or untrusted people.
To reduce the attack surface, LXC uses many security features like AppArmor, CGroups and kernel namespaces.
AppArmor profiles are used to restrict access to possibly dangerous actions.
Some system calls, i.e. mount
, are prohibited from execution.
To trace AppArmor activity, use:
# dmesg | grep apparmor
Although it is not recommended, AppArmor can be disabled for a container. This brings security risks with it. Some syscalls can lead to privilege escalation when executed within a container if the system is misconfigured or if a LXC or Linux Kernel vulnerability exists.
To disable AppArmor for a container, add the following line to the container
configuration file located at /etc/pve/lxc/CTID.conf
:
lxc.apparmor.profile = unconfined
Please note that this is not recommended for production use.
cgroup is a kernel mechanism used to hierarchically organize processes and distribute system resources.
The main resources controlled via cgroups are CPU time, memory and swap limits, and access to device nodes. cgroups are also used to "freeze" a container before taking snapshots.
There are 2 versions of cgroups currently available, legacy and cgroupv2.
Since Proxmox VE 7.0, the default is a pure cgroupv2 environment. Previously a "hybrid" setup was used, where resource control was mainly done in cgroupv1 with an additional cgroupv2 controller which could take over some subsystems via the cgroup_no_v1 kernel command line parameter. (See the kernel parameter documentation for details.)
The main difference between pure cgroupv2 and the old hybrid environments regarding Proxmox VE is that with cgroupv2 memory and swap are now controlled independently. The memory and swap settings for containers can map directly to these values, whereas previously only the memory limit and the limit of the sum of memory and swap could be limited.
Another important difference is that the devices controller is configured in a completely different way. Because of this, file system quotas are currently not supported in a pure cgroupv2 environment.
cgroupv2 support by the container’s OS is needed to run in a pure cgroupv2 environment. Containers running systemd version 231 or newer support cgroupv2 [44], as do containers not using systemd as init system [45].
CentOS 7 and Ubuntu 16.10 are two prominent Linux distributions releases, which have a systemd version that is too old to run in a cgroupv2 environment, you can either
If file system quotas are not required and all containers support cgroupv2, it is recommended to stick to the new default.
To switch back to the previous version the following kernel command line parameter can be used:
systemd.unified_cgroup_hierarchy=0
See this section Section 3.12.6, “Editing the Kernel Commandline” on editing the kernel boot command line on where to add the parameter.