Well, I took up this, putting myself in the spot of teaching docker, and its internals. This is complete notes as I go on a journey about learning docker deeper, not just command, but also the history, and the rational of something the way it is.
First up a bit of history to kn know how docker came about. dotCloud, is YC funded company, with the ambition of changing how applications are deployed, in moment I will talk about just that; but the company was did not really take off as it would have like to, and in PyCon 2013, Solomon Hykes gave this lightening talk...
As one of the comments says "This is video should do into the computer history museum.", and I believe, it should.
But getting back to the ground, that was the first time ever, docker was introduced to general public, and soon after that conference in April 2013, Docker was Opensourced, and soon became a rage that developers, deployers and operators cannot avoid. Now I am hoping to piece together the answers to the following questions(not necessarily in that order)
- What makes docker click?
- Whats the technologies behind?
- Why should I bother about docker?
Its is important to know that containers are not new this world by any means, BSD had Jails and Sun had Zones, well before we had the big blue whale with containers on it. So what are containers, think of them as chroot on steroids, minus the virtualized hardware, and but with all the isolations primitives of a VM. I did not bother to get into the exploring either Jails or Zones, its just so you know the concept is not new.
It is not new to Linux either, as the basis for creating containers cgroups and namespaces existed very early on in linux. In 2008 lxc was introduced, and it is still in active development. Since it suits the purpose, lxc seems to be an acronym for LiniuX Containers, but I really doubt the intentions. Anyway LXC is still in active developments and so the idea of containers is not new and they seem to use three main kernel primitives, namespaces, cgroups and capabilities, to achieve isolation, control and privileges. Here is some understanding what each of them does.
Namespaces, on linux, wrap around global resources, and abstract that underlying resource so for the process in the namespace the resource seems isloted instance of the global resource. One of the overall goals of namespaces is to support container isolation, providing the process(es) within them an illusion that they are the only processes in the system. Namespaces can wrap around various types of linux resources, here is a list. The namespace related API as
setns(), namespaces are identified using the
- Mount namespaces (
- UTS namespaces (
- IPC namespaces (
- PID namespaces (
- Network namespaces (
- User namespaces (
Linux consists of multiple subsystems, at the very basic level all the processes share these subsystems without any limitations on how much they are allowed to use. CGroups provide a mechanisms to the derive quotas on the use of these subsystem, and define mechanism for process to be restricted within these quotas. This is become more essential when we try to use containers, as the each container is isolated by namespaces, and using cgroups we could make processes within those containers believe that they have only a defined amount resource subsystem of which they could utilize. So effectively using CGroups, we would be able to build a tree of resource quotas on the following resources.
cpuset- assigns individual processor(s) and memory nodes to task(s) in a group;
cpu- uses the scheduler to provide cgroup tasks access to the processor resources;
cpuacct- generates reports about processor usage by a group;
io- sets limit to read/write from/to block devices;
memory- sets limit on memory usage by a task(s) from a group;
devices- allows access to devices by a task(s) from a group;
freezer- allows to suspend/resume for a task(s) from a group;
net_cls- allows to mark network packets from task(s) from a group;
net_prio- provides a way to dynamically set the priority of network traffic per network interface for a group;
perf_event- provides access to perf events to a group;
hugetlb- activates support for huge pages for a group;
pid- sets limit to number of processes in a group.
NOTE: Cgroups are similar to the processes, in that they are hierarchical and a child cgroups inherits the properties of parent cgroups and so forth. But the fundamental difference is that they there are multiple such hierarchies, vis-a-vis a single tree of the process model. These multiple heirachies are attributed with multiple subsystems that are present for Cgroups. Seek to do,
At a very basic level privileges in linux come in twos,
regular user and
root. If a regular user requires to do some privileged operation, say using a port number lesser than 1024, the user might require root priveleges. Unfortunately giving root privileges to a user would be giving access to all the resources without limitations, and when that user gets compromized the options are unlimited and horrific. To overcome this linux came up with
capabilities that provides granular restrictions on what a particular user could do on the system. Hence reducing the attack surface if that user is compromised.
Since linux capabilities is a digression from the topic, its best keep in mind that use of the capabilities can give fine grained privileges to processes running within the container, and greatly reducing the attack vectors when a container is compromised.
The above features that are provided by modern linux kernel makes isolation, control and security(or privileges) for containers much more easily addressable and flexible. With these in mind lets step out into looking at containers and docker a little more.
As mentioned earlier, linux containers precede docker by an technological epoch. They broke into mainstream linux distros in almost the end of the first decade of the new mellinium. On popular linux distros they went by the name LXC(Linux Containers), and inside Google they were called lmctfy(Let Me Contain That For You).
Here is a diagram depicting how conatainers compare with normal VMs.
LXC provided good abrstactions over the Namespaces, CGroups and Capabilities, to contain linux images like ubuntu or fedora. This project is in active development, fixing many of the security and resource management flaws, as they try to be more mainstream. But somehow they failed to captivate the imagination as much as docker did. I feel that the reasons for these where
- It was positioned more as virtualization model than a packaging model
- Development was slow as it was a community owned project
- Barriers of entry was pretty high, with much focus on the kernel primitives
- Interfaces were not conginitive enough for developer adoption
- Had not matured enough in the given time frame
- Developed in C
The opinions on why, might differ from person to person, but I have a strong feeling the above sums it up.
Seeing the emergence of docker, Google could no longer hide it's container ship behind the curtains of closed source. In 2014, Google opensourced their container mechanism to work with a very strange name, lmctfy(Let Me Contain That For You). But failed to make any impact as the feature to feature comparison did not live up to docker's mechanics, also the C++ codebase meant it was for the elitists.
Others in the Room
Well there were others(vagga and rkt) who challenged the monopoly that Docker was gonna be, but fell short. As much as I really want a true competition for Docker, nothing really worthy has come around yet. In fact all that the challengers could do, was to impact the course of docker, as you would see later in this writing.
Docker in the recent times has been the eye-candy of the software deployment industry, if you are a software developer your chances are that you have heard about it, but if you are software deployer your chances to remain are bleak if you have not heard about it. It has become the defacto delivery mechanism of all modern software on the cloud, and if you are not deploying using this, you are not deploying to the cloud, sadly I am one of them. Nevertheless, the story is already etched for us to go the docker way- it is docker or die. Docker is several things and the following will differentiate it for you.
- Docker Inc., is the company behind docker
- Docker, is the name of the software
dockeris the name of the daemon(or service) and the client, and I will be explicit to mention
It is very important to know the stages of development that docker went through to understand the motivation for those changes. But at the very least it is good to know on Linux, Docker utilizes all the kernel features described above,
capabilities. I will try to point them out where ever needed, how it tries to achieve this.
This is really early stages of Docker, when it was first out in the open, docker was a wrapper or facade to the LXC's capabilities. That all the container lifecycle operations were handled by
lxc, and docker was both the client, and the service. The client for the CLI tool to offer commands, and subcommands, that would talk to docker engine through an API, and docker engine will inturn speaks to
lxc , for containers. The diagram shows the, control flow.
The partnership between docker and lxc will not last very long, as the unprecedented response to this docker led the way to a sea of requests for improvements. And docker had to look inwards to replace lxc for these two reasons.
- LXC was linux specific, and had no roadmap for supporting multiple platforms
- LXC development was mostly community driven, and was not accountable to anyone
libcontainer was the answer to the LXC's limitations, and did replace the lxc completely, by directly interfacing with the linux primitives,
capabilities. By this time Docker was able to put itself on a track of faster growth.
libcontainer landed in the Docker in the 0.9.0 release. Though started by Docker Inc., the libcontainer became part the Open Containers Initiative, which we will come a little later to.
Breaking the Monolith
Though docker was moving along pretty fast but there was one problem, the docker engine codebase it was still a monolith doing all the following operations.
- Providing docker API to the client to operate on
- Providing the image management services for creating and managing images
- Managing the container lifecycle by using
- Providing the network subsystem for containers to communicate with the host as well other containers
Well these were some of them, actually except for running, stopping, starting and deleting of containers everything was done by the docker daemon. This meant that during the docker upgrades, the docker instances had to be stopped, and started again, which was quick but not desirable. So the motivations to break this monolith were primarily,
- Ease of innovation
- Fast development model
- The ecosystem demanded the change
This led to an immense effort on the part of the Docker Inc., to break docker daemon down into independent yet cooperating pieces. There were other developments which led to some interesting changes to docker which will be discussed in sections below.
OCI - Open Container Initiative
Around the same time that docker began the exercise to a break down its monolithic daemon, Open Container Initiative was formed to derive some standards in the container industry, so different players could play well with each other. So OCI, came up with.
These were defining standards in the course of Docker. Docker Inc., played an important role in coming up with these specifications and also contributing a good amount of code to the initiative, like the
libcontainer is now part of the OCI.
runc is Docker Inc. reference implementation of OCI Runtime Spec, and by implementing this Docker Inc., had removed all container runtime code out of its docker daemon.
runc does just one task, and does it really well, to create and run a container with the given parameters.
runc wraps itself around the
libcontainer, for bringing up containers.
runc exits as soon as it creates a container.
The the interface glue between the the docker daemon and
runc was built as the
docker-containerd. The idea behind the
docker-containerd is to abstact a larger system like the docker daemon from the details regarding image snapshotting, container life-cycle, and through use of OCI runtimes the mechanisms to launch containers.
docker-containerd importantly provided all this with clean API interface for systems like docker daemon to harness.
Initially, the main function of the
docker-containerd, was of a supervisor responsible for container lifecycle operations, and that was the only function that it would do. But later on, its functionality also included image push and pull images, and their management.
Containerd was contributed to the CNCF project and live here on github.
docker-containerd-shim allows for daemonless containers. It basically sits as the parent of the container's process to facilitate a few things.
First it allows the runtimes, i.e. runc,to exit after it starts the container. This way we don't have to have the long running runtime processes for containers. For example, when you start mysql you should only see the mysql process and the shim.
Second it keeps the STDIO and other fds open for the container incase
docker-containerd and/or docker daemon both die. If the shim was not running then the parent side of the pipes or the TTY master would be closed and the container would exit.
Finally it allows the container's exit status to be reported back to
Now that all of the functionality has been stripped off, what remains of the docker daemon, is the Docker API for the docker client to interface with. docker daemon is also responsible for providing support for UnionFS variants, like aufs, zfs, btrfs etc., for the image storage.
Putting it All Together
The here is the diagram to depict how all the above pieces fit together to provide docker ability to create, run, start, stop and delete containers.
This is only Part one of "Docker know how". Other parts are in the making.