Tencent programmers stay up late to code words network IO evolution and development process and model introduction, this is the only one

When it comes to networking in the Internet, we will inevitably discuss [high concurrency] and millions of connections. The realization of millions of connections here is inseparable from the choice of network IO, so this article, as a personal study note, hereby records the development and evolution of the entire network IO. As well as the currently widely used network model.

1. Development of network IO

In this section, we’ll take a step-by-step introduction to the evolution of network IO. After introducing the development process, compare and analyze several groups of concepts that are easily confused in network IO.

1.1 Various development stages of network IO

Usually, the network IO we discuss here is generally for the linux operating system. The development process of network IO changes with the evolution of the Linux kernel, so network IO can be roughly divided into the following stages:

1. Blocking IO (BIO)
2. Non-blocking IO (NIO)
3. IO multiplexing version 1 (select/poll)
4. IO multiplexing version 2 (epoll)
5. Asynchronous IO (AIO)

At each stage, there are some defects in the current network, so the defects are continuously improved. This is the essence of how network IO has been evolving . The above-mentioned stages will be introduced below, and the problems, advantages and disadvantages solved by network IO in each stage will be analyzed.

1.2 Two Phases of the Network

In a network, we can generally divide it broadly into the following two stages:

The first stage: the hardware interface to the kernel state
The second stage: the kernel state to the user state

My understanding: We usually surf the Internet, and most of the data is transmitted through the network cable. Therefore, for two computers, to carry out network communication, the data is first passed from the application to the transport layer (TCP/UDP) to the kernel state, then to the network layer, data link layer, physical layer, and then data It is passed to the hardware network card, and finally transmitted to the network card of the peer machine through the network transmission medium, and then the data is passed from the network card to the kernel state step by step, and finally copied to the user state.

1.3 The difference between blocking IO and non-blocking IO

According to the content of Section 1.2, we can know that the data transmission in the network needs the above two stages to reach the destination machine from the network transmission medium. Here we can divide the network into blocking IO and non-blocking IO whether blocking wait occurs in the stage from hardware to kernel state . If the user initiates a read and write request, but the kernel state data is not ready, the user operation will not be blocked at this stage, and the kernel will return immediately, which is called non-blocking IO. If this stage has been blocking user operations. It does not return until the kernel state data is ready. This way is called blocking IO.

Therefore, the distinction between blocking IO and non-blocking IO mainly depends on whether the first stage blocks user operations.

1.4 The difference between synchronous IO and asynchronous IO

We know from the front that data transfer requires two stages. Here, as long as any stage blocks user requests, it is called synchronous IO, and neither stage is blocked, it is called asynchronous IO.

In all current operating systems, epoll in linux and kqueue in mac belong to synchronous IO, because copy blocking occurs in the second stage (data from kernel mode to user mode). And only IOCP in windows really belongs to asynchronous IO, that is, AIO.

2. Block IO

In this section, we will introduce the initial blocking IO, blocking IO in English is blocking IO, also known as BIO. According to the previous introduction, blocking IO mainly refers to the first stage (hardware network card to kernel state).

2.1 The concept of blocking IO

Blocking IO, as the name implies, when the user makes a system call, if the data does not reach the kernel state from the network card, and the kernel state data is not ready, it will be blocked at this time. Until the data is ready, then copy from kernel state to user state and return. The specific process can refer to the diagram in 2.2.

2.2 The process of blocking IO

2.3 Disadvantages of blocking IO

When using blocking IO in general, it is necessary to configure multi-threading for use. The most common model is blocking IO + multi-threading , and each connection is processed by a separate thread.

We know that the threads that can be opened by a program are generally limited, and the overhead of opening threads is relatively large. It is this way that limits the number of client requests that an application can handle. In the face of millions of connections, it is impossible to handle.

Now that the problem has been found and analyzed, the problem must be solved. Since there is a problem with blocking IO, it is essentially caused by its blocking, so it naturally leads to the protagonist to be introduced below: non-blocking IO

3. Non-blocking IO

Non-blocking IO is introduced to solve the above-mentioned defects of blocking IO. Below we will introduce the process of non-blocking IO.

3.1 The concept of non-blocking IO

Non-blocking IO: As the name implies, it does not wait when the data does not arrive in the first stage (network card – kernel state), and then returns directly. Therefore, non-blocking IO requires constant user requests to ask whether the kernel data is OK or not.

3.2 The process of non-blocking IO

Non-blocking IO needs to be supported by the system kernel. After the connection is created, you can call setsockop to set noblocking

3.3 Advantages of non-blocking IO

As mentioned earlier, non-blocking IO solves the problem of blocking IO processing by one thread per connection , so its biggest advantage is that one thread can handle multiple connections , which is also determined by its non-blocking.

3.4 Disadvantages of non-blocking IO

However, this mode also has a problem, that is, it requires the user to initiate system calls multiple times. Frequent system calls consume system resources.

Therefore, since there is such a problem, naturally we need to solve this problem: reduce system calls while retaining the advantages of non-blocking IO

4. IO Multiplexing First Edition

In order to solve the problem of frequent system calls in non-blocking IO, with the development of the kernel, the IO multiplexing model appeared. Then we need to understand a few questions:

  1. What exactly is IO multiplexing multiplexing?
  2. How is IO multiplexing multiplexed?

IO multiplexing: Many people say that IO multiplexing uses one thread to manage multiple network connections, but I don’t agree with it, because in non-blocking IO, one thread can already handle multiple networks. connected, this is due to its non-blocking.

Here, from a personal point of view, multiplexing mainly uses a limited number of system calls to manage multiple network connections. To put it simply, I currently have 10 connections. I can throw all 10 connections to the kernel through a system call, and let the kernel tell me which connections are ready for data, and then I go to read each ready. data on the connection. Therefore, IO is multiplexed, and system calls are multiplexed. Determine whether the data is ready for massive connections through a limited number of system calls

Regardless of the following select, poll, and epoll, they are all realized by this idea, but in terms of implementation, select/poll can be regarded as the first edition, and epoll is the second edition

4.1 The concept of IO multiplexing version 1

The first version of IO multiplexing, this concept came up by myself, mainly to facilitate the distinction between select/poll and epoll

Therefore, the first version of IO multiplexing here mainly refers to select and poll.

select api

// readfds: fd set concerned with reading; writefds: fd set concerned with writing; excepttfds: abnormal fd set 
int  select ( int n, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout ) ;

The file descriptors monitored by the select function are divided into three categories: writefds, readfds, and exceptfds. After the call, the select function will block until the description is ready (data is readable, writable, or except), or it times out (timeout specifies the waiting time, if it returns immediately, it can be set to null), and the function returns. When the select function returns, the ready descriptor can be found by traversing the fdset.

select is currently supported on almost all platforms, and its good cross-platform support is also one of its advantages. One disadvantage of select is that there is a maximum limit on the number of file descriptors that a single process can monitor, which is generally 1024 on Linux. This limit can be improved by modifying macro definitions or even recompiling the kernel, but this will also reduce efficiency. .

poll api

int poll (struct pollfd *fds, unsigned int nfds, int timeout);

struct pollfd {
    int fd; /* file descriptor */
    short events; /* requested events to watch */
    short revents; /* returned events witnessed */
};

The pollfd structure contains the event to monitor and the event that occurs, no longer using the select “parameter-value” method of passing. At the same time, pollfd does not have a maximum number limit (but the performance will also decrease if the number is too large). As with the select function, after poll returns, pollfd needs to be polled for ready descriptors.

From the above, both select and poll need to obtain the ready socket by traversing the file descriptor after returning. In fact, a large number of concurrently connected clients may have few ready states at a time, so the efficiency decreases linearly as the number of monitored descriptors grows.

Essentially: in IO multiplexing, the functions select()/poll()/epoll_wait() correspond to the first stage; read()/recvfrom() correspond to the second stage

4.2 The process of IO multiplexing version 1

4.3 Advantages of IO Multiplexing Version 1

IO multiplexing mainly lies in multiplexing. Through select() or poll(), multiple socket fds are passed to the kernel in batches through system calls, and the kernel traverses to determine which fd data is ready, and then the ready fds are ready. returned to the user. Then the user traverses the ready fd one by one, and reads or writes the data.

Therefore, through IO multiplexing + non-blocking IO, on the one hand, the number of system calls is reduced, and on the other hand, very few threads can be used to handle multiple network connections.

4.4 Disadvantages of IO Multiplexing Version 1

Although the first version of IO multiplexing solved the frequent system calls mentioned earlier, it also introduced a new problem: the user needs to transfer a massive set of socket fds from user mode to kernel mode each time, so that the kernel mode to detect which network connection data is ready

But this place will frequently transfer massive fd sets from user mode to kernel mode, and then copy from kernel mode to user mode. So, the cost of this place is quite high.

Since there is still this problem, we continue to start solving this problem, which leads to the second version of IO multiplexing.

In fact, the idea is quite simple, since you need to copy, then find a way to not copy. Since you don’t copy it, open up an area in the kernel.

4.5 Differences in IO Multiplexing Version 1

Difference between select and poll

  1. The maximum number of connections that select can handle, the default is 1024, which can be changed by modifying the configuration, but it is limited after all; while poll can theoretically support unlimited
  2. When select and poll manage a large number of connections, they are frequently copied from user mode to kernel mode, which consumes more resources.

5. IO Multiplexing 2nd Edition

The second version of IO multiplexing mainly refers to epoll. The emergence of epoll was born with the iteration of the kernel version. It is seen everywhere on the Internet that epoll has been supported since kernel 2.6.

The emergence of epoll is to solve the problem of the first version of IO multiplexing mentioned earlier

5.1 Concepts of IO Multiplexing Version 2

APIs provided by epoll

//Create epollFd, the bottom layer is to allocate a section in the kernel state, the underlying data structure red-black tree + doubly linked list 
int  epoll_create ( int size ); //Create an epoll handle, size is used to tell the kernel how big the number of monitors is

//Add, delete, and update the managed socket fd in the red-black tree 
int  epoll_ctl ( int epfd, int op, int fd, struct epoll_event * event );

//This api is used to block in the first stage, waiting for a ready fd. 
int  epoll_wait ( int epfd, struct epoll_event * events, int maxevents, int timeout ) ;
 1.  int  epoll_create ( int size ) ;
Create an epoll handle, size is used to tell the kernel how big the number of monitors is. This parameter is different from the first parameter in select (), which gives the value of fd+ 1 for the maximum monitor. The parameter size does not limit the epoll The maximum number of descriptors that can be monitored is only a suggestion for the initial allocation of internal data structures by the kernel.
When the epoll handle is created, it will occupy an fd value. If you look at /proc/process id/fd/ under linux, you can see the fd, so after using epoll, you must call close() to close it , otherwise fd may be exhausted.

2. int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
The function is to perform the op operation on the specified descriptor fd.
- epfd: is epoll_create ()The return value.
- op: represents the op operation, represented by three macros: add EPOLL_CTL_ADD, delete EPOLL_CTL_DEL, and modify EPOLL_CTL_MOD. Add, delete and modify the listening events for fd respectively.
- fd: is the fd (file descriptor) that needs to be monitored
- epoll_event: It tells the kernel what to monitor. The structure of struct epoll_event is as follows:

struct epoll_event {
  __uint32_t events;  /* Epoll events */
  epoll_data_t data;  /* User data variable */
};

//events can be a collection of the following macros:
EPOLLIN: Indicates that the corresponding file descriptor can be read (including the normal closing of the peer SOCKET);
EPOLLOUT: Indicates that the corresponding file descriptor can be written;
EPOLLPRI: Indicates that the corresponding file descriptor has urgent data to read (this should indicate that out-of-band data is coming);
EPOLLERR: Indicates that the corresponding file descriptor has an error;
EPOLLHUP: Indicates that the corresponding file descriptor is hung up;
EPOLLET: Set EPOLL to Edge Triggered mode, which is relative to Level Triggered.
EPOLLONESHOT: Only listen to the event once. After listening to this event, if you need to continue monitoring the socket, you need to add the socket to the EPOLL queue again.

3. int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);
Wait for io events on epfd, returning at most maxevents events.
The parameter events is used to get the set of events from the kernel, maxevents tells the kernel how big the events are, the value of maxevents cannot be greater than the size when creating epoll_create(), the parameter timeout is the timeout time (in milliseconds, 0 will return immediately, -1 will Not sure, there is also a saying that it is permanently blocked). This function returns the number of events that need to be processed. If it returns 0 , it means that it has timed out.

Two working modes

There are two modes for epoll to operate on file descriptors: LT (level trigger) and ET (edge ​​trigger). LT mode is the default mode. The differences between LT mode and ET mode are as follows: LT mode: When epoll_wait detects the occurrence of a descriptor event and notifies the application of this event, the application may not process the event immediately. The next time epoll_wait is called, the application will respond again and be notified of this event. ET mode: When epoll_wait detects the occurrence of a descriptor event and notifies the application of this event, the application must process the event immediately. If not handled, the next time epoll_wait is called, the application will not respond again and be notified of this event.

  1. LT mode

LT (level triggered) is the default way of working, and supports both block and no-block sockets. In this approach, the kernel tells you whether a file descriptor is ready, and then you can perform IO operations on the ready fd . If you do nothing, the kernel will continue to notify you.

  1. ET mode

ET (edge-triggered) is a high-speed working method and only supports no-block sockets. In this mode, the kernel tells you via epoll when a descriptor becomes ready from never ready. Then it will assume that you know the file descriptor is ready, and will not send more ready notifications for that file descriptor until you do something that causes that file descriptor to be no longer ready (e.g. you An EWOULDBLOCK error was caused when sending, receiving, or receiving a request, or sending or receiving less than a certain amount of data). Note, however, that the kernel will not send more notifications (only once) if the fd has not been IOed (thus causing it to become unready again)

The ET mode greatly reduces the number of times the epoll event is repeatedly triggered, so the efficiency is higher than the LT mode. When epoll works in ET mode, non-blocking sockets must be used to avoid starving the task of processing multiple file descriptors due to blocking read/blocking write operations on a file handle.

5.2 Process of IO Multiplexing Version 2

When epoll_wait() is called, it will block, and when it returns, it will return which fd data is ready, the user only needs to traverse the ready fd to read and write.

5.3 Advantages of IO Multiplexing Version 2

The advantages of IO multiplexing version 2 epoll are:

At the beginning, a space is allocated in the kernel mode to store the managed fd, so after each connection is established, when it is handed over to epoll for management, it needs to be added to the originally allocated space, and it does not need to be frequently managed later. The collection of fds managed from the userland copy. In this way, the performance is greatly improved.

So the current IO multiplexing mainly refers to epoll

5.4 Disadvantages of IO Multiplexing Version 2

Personal guess: how to reduce the space occupied

6. Asynchronous IO

6.1 The process of asynchronous IO

All the network IOs described above are synchronous IOs, because when the data is ready in the kernel state, there will still be a short-term blocking waiting in the process of copying the kernel state to the user state. The asynchronous IO refers to: the method of copying data from the kernel state to the user state is also implemented by the system thread, which is not completed by the user thread . At present, only the IOCP of the windows system belongs to the asynchronous IO.

7. Various models of network IO

7.1 The reactor model

At present, the reactor model has the following implementation schemes:

1. Single reactor single thread model
2. Single reactor multi thread model
3. Multi-reactor multi thread model
4. Multi-reactor multi process model

The diagrams of the network model below are all taken from this article

7.1.1 Single reactor single thread model

In this model, there is usually only one epoll object, and all the operations of receiving client connections , client reads , and client writes are contained in one thread. This model also has some middleware in use, such as redis

However, in the current single-threaded Reactor mode, not only I/O operations are performed on the Reactor thread, but also non-I/O business operations are processed on this thread, which may greatly delay the response to I/O requests. Therefore, we should offload non-I/O business logic operations from the Reactor thread to speed up the Reactor thread’s response to I/O requests.

7.1.2 Single-reactor multi-threading model

In the worker thread pool mode, although non-I/O operations are handed over to the thread pool for processing, all I/O operations are still performed by the Reactor single thread. In high load, high concurrency or large data volume application scenarios, It is still more likely to become a bottleneck. Therefore, for the optimization of Reactor, the following multi-threading mode is produced.

7.1.3 multi-reactor multithreading model

In this model, it is mainly divided into two parts: mainReactor and subReactors. mainReactor is mainly responsible for receiving client connections, and then distributing the established client connections to subReactors through load balancing.

subReactors are responsible for the specific read and write of each connection

For non-IO operations, it is still handed over to the worker thread pool to decouple the logic

mainReactor corresponds to the BossGroup thread group configured in Netty, and is mainly responsible for accepting the establishment of client connections. Generally only one service port is exposed. The BossGroup thread group generally works with one thread, and the subReactor corresponds to the WorkerGroup thread group configured in Netty. After the BossGroup thread group accepts and establishes the client connection, it transfers the network socket to the WorkerGroup thread group, and then in the WorkerGroup thread group Select a thread in the thread group to perform I/O processing. The WorkerGroup thread group mainly handles I/O, generally setting 2*CPU cores and several threads

7.2 The proactor model

Proactor is mainly a model of encapsulation of asynchronous IO, which requires the support of the underlying operating system. Currently, only the IOCP of windows supports it better. For a detailed introduction, please refer to this article

7.3 Network Models Used by Mainstream Middleware

7.4 Mainstream Network Framework

  • netty
  • gnet
  • libevent
  • evio (golang)
  • ACE(c++)
  • boost::asio(c++)
  • muduo (linux only)

Regarding the comparison of the above-mentioned libraries of c++ and c, if you are interested, you can search for information by yourself.

8. References

  1. IO mode and IO multiplexing
  2. Detailed explanation of Linux IO mode and select, poll, epoll
  3. Chapter 6. I/O Multiplexing: The select and poll Functions
  4. High-performance IO model analysis-Reactor mode and Proactor mode (2)

【Let’s learn together】

[With the Java concurrency atlas + JDK source code analysis notes that Github has dominated the list for half a year, I finally don’t panic.]

I have been doing Java for 3 years. With this interview questions and answers, I went from 15K to 40K

Leave a Comment

Your email address will not be published. Required fields are marked *