[Cloud native] One article to understand the core technology of Docker

Hits: 0

​Traditional Layered [Architecture] vs Microservice Docker

Yunshi Public Account

For more “cloud native” content, follow the official account [Yunshi]: GZH_YUNSHI

• Based on the Linux kernel’s Cgroup, Namespace, and [Union] FS technologies, it encapsulates and isolates processes. It is a virtualization technology at the operating system level. Since the isolated process is independent of the host and other isolated processes, it is also called a container. .

• The initial implementation is based on LXC. After 0.7, LXC is removed and the self-developed Libcontainer is used. Since 1.11, it is further evolved to use runC and Containerd. 

•  On the basis of containers, [Docker] further encapsulates, from file systems, network interconnection to process isolation, etc., which greatly simplifies the creation and maintenance of containers, making Docker technology lighter and faster than virtual machine technology.

Why use Docker

More efficient use of system resources

Faster startup time

Consistent operating environment

Continuous Delivery and Deployment

easier migration

Easier maintenance and expansion

…comparison of virtual machine and container running state

start up:

• docker run

-it interact

-d run in the background

-p port mapping

-v disk mount

• Start a terminated container

docker start

• Stop the container

docker stop

• View container processes

docker ps

container operation

• View container details:

docker inspect <containerid>

• Into the container:

docker attachdocker exec

• Via nsenter:

PID=$(docker inspect --format "{{ .State.Pid }}" <container>)$ nsenter --target $PID --mount --uts --ipc --net --pid

• Copy files into the container:

docker cp file1 <containerid>:/file-to-path

Getting to know the container

• cat Dockerfile

FROM ubuntuENV MY_SERVICE_PORT=80ADD bin/amd64/httpserver /httpserverENTRYPOINT /httpserver

• Package the Dockerfile into an image

docker build -t cncamp/httpserver:${tag} .docker push cncamp/httpserver:v1.0

• Run containers

docker run -d cncamp/httpserver:v1.0

Container main features

Namespace

• Linux Namespace is a resource isolation scheme provided by Linux Kernel:

• The system can assign different Namespaces to processes; 

• And ensure that different Namespace resources are allocated independently and processes are isolated from each other, that is, processes under different Namespaces do not interfere with each other.

Implementation of Namespace in Linux Kernel Code

• Process data structures 

struct task_struct {.../* namespaces */struct nsproxy *nsproxy;...}

• Namespace data structure

struct nsproxy {atomic_t count;struct uts_namespace *uts_ns;struct ipc_namespace *ipc_ns;struct mnt_namespace *mnt_ns;struct pid_namespace*pid_ns_for_children;struct net *net_ns; }

How to operate on Namespace in Linux

• clone

When creating a system call for a new process, you can specify the type of Namespace to be created through the flags parameter:

// CLONE_NEWCGROUP / CLONE_NEWIPC / CLONE_NEWNET / CLONE_NEWNS / CLONE_NEWPID / CLONE_NEWUSER / CLONE_NEWUTSint clone(int (*fn)(void *), void *child_stack, int flags, void *arg)

• setns

This system call allows the calling process to join an existing namespace:

Int setns(int fd, int nstype)

• unshare

This system call can move the calling process to a new Namespace:

int unshare(int flags)

Isolation – LinuxNamespaceNam

Common operations on namespaces

• View the namespace of the current system:

lsns –t <type>

• View the namespace of a process:

ls -la /proc/<pid>/ns/

• Enter a namespace and run the command:

nsenter -t <pid> -n ip addr

Namespace exercise

• Execute the sleep command in the new network namespace:

unshare -fn sleep 60

• View process information

ps -ef|grep sleeproot 32882 4935 0 10:00 pts/0 00:00:00 unshare -fn sleep 60root 32883 32882 0 10:00 pts/0 00:00:00 sleep 60

• View Network Namespace

lsns -t net4026532508 net 2 32882 root unassigned unshare

• Enter the Namespace where the improvement process is located to view the network configuration, which is inconsistent with the host

nsenter -t 32882 -n ip a1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

Cgroups

• Cgroups (Control Groups) is a mechanism for resource control and monitoring of one or a group of processes under Linux;

• It is possible to limit the resources required by processes such as CPU usage time, memory, disk I/O, etc.;

• The specific management work of different resources is realized by the corresponding Cgroup subsystem (Subsystem); 

• For different types of resource restrictions, as long as the restriction policies are associated with different subsystems; 

• Cgroups are organized and managed in a hierarchical tree (Hierarchy) manner in different system resource management subsystems: each

Cgroups can contain other sub-Cgroups, so the resources that sub-Cgroups can use are not limited by the configuration of this Cgroup

The resource parameter limit is also limited by the resource limit set by the parent Cgroup.

Implementation of Cgroups in Linux Kernel Code

• Process data structures

struct task_struct{#ifdef CONFIG_CGROUPSstruct css_set __rcu *cgroups; struct list_head cg_list; #endif}

• css_set is a set data structure of cgroup_subsys_state objects

struct css_set {/** Set of subsystem states, one for each subsystem. This array is* immutable after creation apart from the init_css_set during* subsystem registration (at boot time).*/struct cgroup_subsys_state *subsys[CGROUP_SUBSYS_COUNT];};

Quota/Measurable – Control Groups (cgroups)

Quota/Measurable – Control Groups (cgroups)

cgroups implement quotas and metrics for resources.

•  blkio : This subsystem setting limits the input and output control of each block device. For example: disk, CD and USB, etc.;

•  cpu : This subsystem uses the scheduler to provide CPU access for cgroup tasks;

•  cpuacct : Generates CPU resource reports for cgroup tasks;

•  cpuset : If it is a multi-core CPU, this subsystem will allocate separate CPU and memory for cgroup tasks;

•  devices : allow or deny cgroup tasks access to devices;

•  freezer : suspend and resume cgroup tasks;

•  memory : set the memory limit for each cgroup and generate a memory resource report;

•  net_cls : marks each network packet for cgroup convenience;

•  ns : namespace subsystem;

•  pid : Process identification subsystem.

CPU subsystem

CPU Subsystem Exercise

• Create a directory structure in the cgroup cpu subsystem directory

cd /sys/fs/cgroup/cpumkdir cpudemocd cpudemo

• run busyloop

• Execute top to check the CPU usage, the CPU usage is 200%

• Limit CPU via cgroups

cd /sys/fs/cgroup/cpu/cpudemo

• Add the process to the cgroup process configuration group

echo ps -ef| grep busyloop| grep -v grep |awk '{print $2}' > cgroup.procs

• set cpuquota

echo 10000 > cpu.cfs_quota_us

• Execute top to check the CPU usage, the CPU usage becomes 10%

cpuacct subsystem

Used to count the CPU usage of processes under a Cgroup and its sub-Cgroups.

• cpuacct.usage

Contains the CPU usage time of processes under this Cgroup and its sub-Cgroups, in ns (nanoseconds).

• cpuacct.stat

Contains the CPU time used by processes in this Cgroup and its sub-Cgroups, as well as the time in user mode and kernel mode.

memory subsystem

• memory.usage_in_bytes

The memory used by the processes under the cgroup, including the memory used by the processes under the cgroup and its sub-cgroups.

• memory.max_usage_in_bytes

The maximum memory usage of a process under a cgroup, including the memory usage of sub-cgroups.

• memory.limit_in_bytes

Sets the maximum memory that processes under Cgroup can use. If set to – 1, it means that the memory usage of this cgroup is not limited

system.

• memory.oom_control

Set whether to use OOM (Out of Memory) Killer in Cgroup, the default is to use. when belonging to the cgroup

When the memory used by the process exceeds the maximum limit, it will be processed by OOM Killer immediately.

memory subsystem exercises

• Create a directory structure in the cgroup memory subsystem directory:

cd /sys/fs/cgroup/memorymkdir memorydemocd memorydemo

• run malloc (make build on a Linux machine);

• View memory usage;

watch  'ps -aux|grep malloc|grep -v grep'

• Limit memory through cgroups: 

• Add the process to the cgroup process configuration group:

echo ps -ef| grep malloc | grep -v grep |awk '{print $2}' > cgroup.procs

• Set memory.limit_in_bytes:

echo 104960000 > memory.limit_in_bytes

• Wait for the process to be killed by oom.

File system

Union FS

• Mount different directories into a filesystem under the same virtual filesystem (unite several directories into a single virtual filesystem). 

• Support readonly, readwrite and whiteout-able permissions for each member directory (similar to Git Branch). 

• File system hierarchy, the branch with readonly permission can be modified logically (incremental, without affecting the readonly part). 

• Usually Union FS has two purposes, one is to mount multiple disks to the same directory, and the other is to combine a readonly branch with a writeable branch.

Docker’s filesystem

A typical Linux file system consists of:

• Bootfs(boot file system) 

• Bootloader – bootloads the kernel, 

• Kernel – umount when the kernel is loaded into memory

bootfs。 

• rootfs (root file system) 

• Standard directories and files such as /dev, /proc, /bin, /etc. 

• For different linux distributions, the bootfs is basically the same,

But rootfs will be different.

Docker start

Linux

• After boot, first set rootfs to readonly, do a series of checks, then switch it to “readwrite”

for users to use.

Docker start

• During initialization, rootfs is also loaded and checked in readonly mode, but then the method of union mount is used

Mount a readwrite file system on the readonly rootfs; 

• And allow the lower FS (file system) to be set to readonly again and superimposed upward; 

• Such a set of readonly and a writeable structure constitutes the runtime state of a [container , and each FS is called]

Make a FS layer.

write operation

Due to the shared nature of the image, operations on the writable layer of the container need to rely on the copy-on-write and time allocation mechanisms provided by the storage driver to support the modification of the writable layer of the container and improve the utilization of storage and memory resources. .

• Copy-on-write

Copy-on-write, or Copy-on-Write. An image can be used by multiple containers, but does not require much memory and disk

a copy. When a file provided by the image needs to be modified, the file is copied from the image’s file system to the container’s writable

The file system of the layer is modified, but the files in the image will not change. Modifications to files by different containers are independent of each other and do not affect each other.

• Time allocation

Space is allocated on demand, not ahead of time, that is, when a file is created, space is allocated.

container storage driver

Take OverlayFS as an example

OverlayFS is also a joint file system similar to AUFS. It is also a file-level storage driver, including the initial

Overlay and update the more stable overlay2.

Overlay has only two layers: the upper layer and the lower layer. The lower layer represents the image layer, and the upper layer represents the container writable layer

OverlayFS file system exercise

$ mkdir upper lower merged work$ echo "from lower" > lower/in_lower.txt$ echo "from upper" > upper/in_upper.txt$ echo "from lower" > lower/in_both.txt$ echo "from upper" > upper/in_both.txt$ sudo mount -t overlay overlay -o lowerdir={{EJS0}}/lower,upperdir={{EJS1}}/upper,workdir={{EJS2}}/work {{EJS3}}/merged$ cat merged/in_both.txt$ delete merged/in_both.txt$ delete merged/in_lower.txt$ delete merged/in_upper.txt

network

Null mode

• Null mode is an empty implementation;

• Containers can be started in Null mode and

Configure the network for the container through commands on the host.

mkdir -p /var/run/netnsfind -L /var/run/netns -type l -deleteln -s /proc/$pid/ns/net /var/run/netns/$pidip link add A type veth peer name Bbrctl addif br0 Aip link set A upip link set B netns $pidip netns exec $pid ip link set dev B name eth0ip netns exec $pid ip link set eth0 upip netns exec $pid ip addr add $SETIP/$SETMASK dev eth0ip netns exec $pid ip route add default via $GATEWAY

Docker Advantage

Encapsulation:

• No need to restart the kernel, so applications can be started in seconds when scaling up or down.

• The resource utilization rate is high, and the host kernel is used directly to schedule resources, and the performance loss is small.

• Convenient CPU and memory resource adjustment.

• Fast rollback in seconds.

• Start all dependent services with one click, no need to worry about setting up the environment for testing, and no need for PE

Worry about the complexity of building a website.

• Images are compiled once and used anywhere.

• Test and production environments are highly consistent (except for data).

Isolation:

• The operating environment of the application has nothing to do with the host environment, and is completely controlled by the image.

Deploy image tests of multiple environments on the host computer.

• Multiple application versions can coexist on a machine.

The community is active:

Docker commands are simple and easy to use, the community is very active, and the surrounding components are rich.

Mirror incremental distribution:

Due to the use of Union FS, it simply supports mounting different directories to the same virtual file system, and implements a layer concept. Only the changed part is transmitted for each release, saving bandwidth.

A. Question: What are the disadvantages of containers?

B. Test question: use memory to limit.

In the next section, share “Kubernetes Architecture Principles and Object Design”, remember to read it.

If you think this series is valuable, you are also welcome to forward it to friends in need.

​​​​​​​

Yunshi

【Cloud World】Focus on improving the hard power and soft skills of professionals in the workplace. Evangelize products on the cloud , [microservices] and cloud-native cutting-edge technologies and implementation practices, and share the dry goods, cases, and experiences of cloud technologies such as microservice architecture, containers, K8s, and Service Mesh; continue to publish workplace methodology, team management, and reading notes, good Soft skills such as book recommendations.

the public   

Like it|Follow|Forward|Like|Subscribe to the topic

Search topics in the official account

“Cloud native” “Microservice” “Service Mesh” “K8s” “Docker”

“Workplace Nutrition” “Workplace Soft Power” ” Cognitive Leap

—END—

Reply to “Cloud Technology” in the background of the official account and receive and send Alibaba Cloud technical materials: Alibaba Cloud Cloud Effect DevOps Solution | Kubernetes in simple terms | Alibaba Cloud Technical Interview Red Book. Reply to “java video tutorial” to receive the most complete JAVA video tutorial course on the entire network. Reply to “PPT” to receive massive interview materials and resume templates.

You may also like...

Leave a Reply

Your email address will not be published.