How does it work?
Docker implements Enhanced Container Isolation by using the Sysbox container runtime. Sysbox is a fork of the standard OCI runc runtime that was modified to enhance standard container isolation and workloads. For more details see Under the hood.
Starting with version 4.13, Docker Desktop includes a customized version of Sysbox.
When Enhanced Container Isolation is enabled, containers
created by users through
docker run or
docker create are automatically
launched using Sysbox instead of the standard OCI runc runtime. Users need not
do anything else and can continue to use containers as usual. For exceptions,
Even containers that use the insecure
--privileged flag can now be run
securely with Enhanced Container Isolation, such that they can no longer be used
to breach the Docker Desktop Virtual Machine (VM) or other containers.
When Enhanced Container Isolation is enabled in Docker Desktop, the Docker CLI "--runtime" flag is ignored. Docker's default runtime continues to be "runc", but all user containers are implicitly launched with Sysbox.
Enhanced Container Isolation is not the same as Docker Engine's userns-remap mode or Rootless Docker. This is explained further below.
Sysbox enhances container isolation by using techniques such as:
- Enabling the Linux user-namespace on all containers (root user in the container maps to an unprivileged user in the Linux VM).
- Restricting the container from mounting sensitive VM directories.
- Vetting sensitive system-calls between the container and the Linux kernel.
- Mapping filesystem user/group IDs between the container's user-namespace and the Linux VM.
- Emulating portions of the procfs and sysfs filesystems inside the container.
Some of these are made possible by recent advances in the Linux kernel which Docker Desktop now incorporates. Sysbox applies these techniques with minimal functional or performance impact to containers.
These techniques complement Docker's traditional container security mechanisms such as using other Linux namespaces, cgroups, restricted Linux capabilities, seccomp, and AppArmor. They add a strong layer of isolation between the container and the Linux kernel inside the Docker Desktop VM.
For more information, see Key features and benefits.
The Docker Engine includes a feature called userns-remap mode that enables the user-namespace in all containers. However it suffers from a few limitations and it's not supported within Docker Desktop.
Userns-remap mode is similar to Enhanced Container Isolation in that both improve container isolation by leveraging the Linux user-namespace.
However, Enhanced Container Isolation is much more advanced since it assigns exclusive user-namespace mappings per container automatically and adds several other container isolation features meant to secure Docker Desktop in organizations with stringent security requirements.
Rootless Docker allows the Docker Engine, and by extension the containers, to run without root privileges natively on a Linux host. This allows non-root users to install and run Docker natively on Linux.
Rootless Docker is not supported within Docker Desktop. While it's a valuable feature when running Docker natively on Linux, its value within Docker Desktop is reduced since Docker Desktop runs the Docker Engine within a Linux VM. That is, Docker Desktop already allows non-root host users to run Docker and isolates the Docker Engine from the host using a virtual machine.
Unlike Rootless Docker, Enhanced Container Isolation does not run Docker Engine within a Linux user-namespace. Rather it runs the containers generated by that engine within a user-namespace. This has the advantage of bypassing the limitations of Rootless Docker and creates a stronger boundary between the containers and the Docker Engine.
Enhanced Container Isolation is meant to ensure containers launched with Docker Desktop can't easily breach the Docker Desktop Linux VM and therefore modify security settings within it.