Solving nginx's HTTP/3 Architecture Problem: Angie's Experience and the Magic of eBPF#

29.01.2026

How Angie 1.11 addressed the fundamental shortcomings of the HTTP/3 implementation in nginx: from simple hashes to building a full-fledged accept() equivalent for QUIC using BPF programs.

../../../_images/07e06e4363c9f1db4eb21826cbdd88f5.png

To the end user, switching from HTTP/2 to HTTP/3 may appear to be simply replacing TCP with UDP in the config. But for server software with a multi-process architecture, this step becomes a real headache. The classic accept() scheme that has underpinned TCP connection handling for years simply does not exist in the QUIC world. Packets arrive on a UDP port, and the OS kernel no longer knows which worker process should receive them.

In the original nginx, this resulted in HTTP/3 support remaining "experimental" and limited for a long time: it suffers from session disconnects and service degradation during configuration reloads. For many, this has been a dealbreaker for deploying the protocol in production.

In this article, we describe how Angie 1.11 addresses these fundamental shortcomings. We did not simply add protocol support — we rethought the way the server interacts with the kernel. The journey from simple hashes to building a full-fledged accept() equivalent for QUIC using BPF programs allows us to say: Angie's HTTP/3 implementation is complete, free of nginx's "teething problems", and fully ready for production use in high-load environments.

Welcome under the hood of modern data transport.

---

Now I hand it over to Vladimir, one of the developers of the HTTP/3 module in nginx, who is the author of the new mechanism and will share all the details.

../../../_images/16b33e4269a0388fefa5efdb3bc6c420.jpg

Vladimir Khomutov#

An nginx developer since 2012 and an Angie developer since 2022.

Why HTTP/3 Is Nothing Like HTTP/{1,2}#

The thing is, HTTP/3 runs on top of the QUIC protocol, which is based on UDP, unlike previous HTTP versions that ran on top of TCP. In this article, we will not discuss how HTTP/3 differs in terms of semantics — there are no revolutions there. It is still URL requests with various methods, arguments, and headers, and responses with familiar status codes. Even though the wire representation has changed (it is now binary rather than text-based), the essence remains the same. What has truly changed, however, is the transport layer and how the application layer interacts with the transport protocol.

So, UDP. This means we are dealing with packets that may be lost or arrive out of order. There is also no flow control. In previous versions, all of these (and many other!) concerns were handled by the TCP stack, which is typically implemented in the OS kernel. Now the QUIC protocol takes on this responsibility. It handles packet numbering, tracks delivery order, and controls both the data transfer rate and packet sizes. Furthermore, it ensures data integrity through integrated cryptography. And today, all of this runs not in the kernel but in user space. It is possible that someday QUIC will make it into the kernel (such projects already exist) and we will return to the old paradigm, but for now Angie implements the entire QUIC and HTTP/3 stack on its own.

Thus, adopting QUIC means incorporating a large body of networking code tightly integrated with TLS into your application. The latter imposes additional requirements on the SSL library being used. The level of support for the required primitives varies widely across libraries, which can sometimes even lead to network-level incompatibilities.

Beyond the complexity, QUIC also brings new capabilities. For example, it can maintain connections when the client's or server's IP address changes (migration), provides greater privacy, enables fast session resumption (0-RTT), and supports truly independent data streams within a single connection. This is a significant improvement over HTTP/2, where a single lost packet in one stream would stall all others because they shared a single TCP connection (the Head-of-Line Blocking problem).

How a TCP Server Works#

Due to the architecture's reliance on multiple processes for multi-core systems, implementing the QUIC protocol presents challenges. To understand these, we first need to look at how client handling works in a TCP server. Consider this simple configuration:

worker_processes  2;
events { }

http {

    server {
        listen 127.0.0.1:8080;
        location / { return 200 "Hello, world\n"; }
    }
}

First, the master process starts, reads the configuration, and creates a listen socket. It then uses fork() to spawn two worker processes (one per core). The worker processes wait for new connections in an infinite loop by calling the accept() system call. When a client connects, one of the processes receives a new client socket, which it uses for communication. The OS kernel ensures that data from the client reaches the correct worker process through this specific socket.

To go further, it is important to consider two more procedures that significantly affect operation: loading a new configuration and upgrading the binary without service interruption (graceful reload and graceful upgrade). If you need to change Angie's settings, you certainly do not want to stop the server and drop existing connections. The same applies to upgrading the server version itself.

How is a configuration update related to the TCP server? In Angie, to update settings, the user edits the configuration file and asks the system to apply the changes. At that point, the master process reads the new configuration and starts new worker processes that begin using it. The old processes continue to serve existing connections but no longer accept new requests because they close their listen sockets. At any given time, there may be multiple sets of worker processes, each operating with its own version of the configuration. New connections are handled only by the current worker processes.

For example, here is what you might see during a configuration update:

$ ps aux|grep angie
root       26092   angie: master process v1.11.0 #1 [./sbin/angie]
nobody     26093   angie: worker process #1
nobody     26094   angie: worker process #1
nobody     26095   angie: worker process #1
nobody     26096   angie: worker process #1

# kill -HUP `cat logs/angie.pid`

$ ps aux|grep angie
root       26092   angie: master process v1.11.0 #2 [./sbin/angie]
nobody     26094   angie: worker process is shutting down #1
nobody     27084   angie: worker process #2
nobody     27085   angie: worker process #2
nobody     27086   angie: worker process #2
nobody     27087   angie: worker process #2

We see a master process running as the superuser and four worker processes running without privileges. The original processes were using configuration #1, while the new ones use configuration #2. One old process is still active — it has unfinished client connections.

The binary upgrade process is conceptually similar, but in this case a new master process is started (from the new executable), which spawns its own set of worker processes. The old and new masters run in parallel, each with its own set of worker processes. Existing connections continue to be served by the old processes until they finish, while new connections may land in either instance of the system (both old and new worker processes). Here is what the process table might look like during the transition:

# ps aux|grep angie
root      101664  angie: master process v1.11.0 #1 [./sbin/angie]
nobody    101665  angie: worker process #1
nobody    101666  angie: worker process #1
nobody    101667  angie: worker process is shutting down #1
nobody    101668  angie: worker process #1
root      101676  angie: master process v1.11.0 #2 [./sbin/angie]
nobody    101753  angie: worker process #2
nobody    101754  angie: worker process #2
nobody    101755  angie: worker process #2
nobody    101756  angie: worker process #2

Here we have two master processes, each with its own set of worker processes, with the second one already using the second generation of configuration.

You can always check the configuration generation in Angie via the API.

In both of these scenarios, the rule holds: existing connections continue to be served by their original worker process, while new connections can be accepted by the new processes. All of this is possible because the kernel handles connection establishment, and new processes can claim connections from it. Existing connections operate through already-created sockets that belong to a specific process, so there is no room for confusion.

How a UDP/QUIC Server Works#

What changes with the move to QUIC? Now it is the Angie process, not the kernel, that is responsible for accepting incoming connections. Instead of a ready-made established connection obtained through an accept() call, we simply receive UDP packets (from the listening socket). Their contents must be correctly processed and attributed either to existing connections, to new connections, or to noise.

Previously, there was an atomic way to accept a client connection (accept(): the kernel performs the TCP handshake and only notifies the application upon success), but now we face a problem. To establish a connection, we need to exchange a series of packets with the client within the same worker process. However, a return packet may end up being received by a different worker process, since it listens on the same socket. As a result, in the basic scenario QUIC would only work with a single worker process (and even then with caveats), which is obviously unacceptable for high-performance systems.

Binding Clients to Processes via reuseport#

So we have multiple processes listening on the same port and reading UDP packets from it. A packet read by one process may actually be intended for another. Even if we determine that a packet is "not ours", we still don't know where to forward it. This problem needs to be solved.

To solve it, we can use the reuseport socket option (which enables SO_REUSEPORT or SO_REUSEPORT_LB). Although its name and history can be misleading, on modern systems it allows multiple processes to share an address:port pair. Each process must have its own socket. In other words, instead of "1 socket for N processes", we switch to "N sockets for N processes".

The kernel handles distributing incoming packets across sockets: it hashes packet data (including the client's IP address and port) and uses the result to select a socket from the reuseport group. Thanks to this, an unchanged client always lands on the same socket and therefore in the correct worker process. Using the reuseport option in the listen directive:

worker_processes  2;
events { }

http {

    ssl_certificate ...
    ssl_certificate_key ...

    server {
        listen 127.0.0.1:8080 quic reuseport;
        location / { return 200 "Hello, world\n"; }
    }
}

While this approach works, it is not ideal:

  • Uneven distribution: the distribution of clients across the IP address space may not be random, causing the load to be spread unevenly across worker processes.

  • Address changes: a client's IP address may change either due to QUIC migration mechanisms or NAT behavior.

  • Update process: complex scenarios involving configuration or binary updates bring back the original problem — socket sharing between multiple sets of processes. This happens because after the master process calls fork(), sockets are inherited and shared between different generations of worker processes.

Binding Clients to Processes via BPF#

To address these problems, the BPF module was introduced. This is a Linux-specific technology that allows an application to intervene in the kernel's socket selection process for an incoming packet. This functionality extends the capabilities of reuseport: instead of simply distributing packets by hash, it allows loading a custom socket selection algorithm into the kernel. The diagram below illustrates how this works.

Binding packets to listening sockets via BPF in nginx (click the diagram to enlarge)

Binding packets to listening sockets via BPF in nginx (click the diagram to enlarge)#

How does this algorithm work? In the first version (still in nginx), for simplicity, client QUIC connections were bound to the worker process number. This was implemented as follows: each QUIC packet contains a Destination Connection ID (DCID) — a destination identifier that may change during the lifetime of a connection. We used this property to encode the socket identifier (obtained via SO_COOKIE) directly into the DCID.

The BPF module created a table in the kernel that mapped sockets to their identifiers. The table size was fixed and determined by the number of processes in the original configuration (with a small margin). The program analyzed each QUIC packet, extracted the DCID, and used it to determine which socket should receive the packet. New packets (without a DCID) could be directed to any socket. Enabling BPF is done by adding the quic_bpf on directive to the configuration:

worker_processes  2;
events { }

quic_bpf on;

http {

    ssl_certificate ...
    ssl_certificate_key ...

    server {
        listen 127.0.0.1:8080 quic reuseport;
        location / { return 200 "Hello, world\n"; }
    }
}

In essence, this is analogous to the sticky cookie mechanism in load balancers. This solved the client migration problem, but still performed poorly in complex scenarios. Moreover, this scheme exposed internal worker process implementation details to the outside world. It also broke down when new master processes were started during binary upgrades. And during configuration reloads, new packets could still end up at old worker processes.

To minimize the negative effects, old worker processes were made to either not respond to new connection requests or respond with a retry packet. This was based on the assumption that the client would make several attempts, and over time (once its port changed or the old processes terminated) it would reach a new worker process where it could successfully establish a connection. Of course, this solution was not ideal and led to temporary service degradation after a configuration reload.

Using Client Sockets#

At some point, it became clear that each client needed its own socket — just like with TCP. In effect, we needed an accept() equivalent for QUIC. This approach was implemented using BPF in recent versions of Angie. It solved the problems that arose during configuration and binary updates. The diagram below shows the new approach — as you can see, it is considerably more complex than the previous one.

Binding packets to listening sockets via BPF in Angie (click the diagram to enlarge)

Binding packets to listening sockets via BPF in Angie (click the diagram to enlarge)#

Now the BPF module is aware of how many Angie instances are running and how many worker processes each one has. Each instance maintains a table of accepted connections (mapping unmodified DCIDs to specific client sockets). In addition, the kernel maintains a table of listening sockets for each instance.

Every packet destined for a port that Angie listens on with the quic option first passes through the BPF program. It runs a socket (and therefore worker process) selection procedure by checking conditions in sequence:

  1. Session ID present: if the packet contains a known session identifier, the socket of the existing connection is selected.

  2. New connection: if no session is found, the packet is treated as a new connection request.

  3. Instance selection: for new connections, an Angie instance is chosen at random to handle them (if more than one master is running).

  4. Socket selection: the client's hash is then used to select a listening socket from that specific instance.

Now, when a worker process receives a new connection request on its listening socket, it creates a new client socket and adds an entry with the corresponding DCID to the BPF table. This guarantees that all subsequent packets will be delivered to that exact socket.

When an instance shuts down, the worker process closes the listening socket, removes its entries from the BPF tables, and stops accepting new requests, while existing connections continue to work without interruption. During a configuration reload or new master launch, both old and new processes correctly update the kernel tables, allowing the BPF module to route traffic accurately in any situation.

Configuring the BPF Module#

An important note: by enabling the BPF module in the configuration, you are not simply changing Angie's internal settings — you are also modifying global kernel objects (attached to the reuseport socket group). Once BPF is enabled, it cannot be disabled without a full process restart. Even if you remove it from the new configuration, the program loaded by the previous version will remain in the kernel.

The active connections table size is limited and calculated using the formula: N = worker_connections x MAX_SERVER_IDS, where:

  • worker_connections is the value of the corresponding directive in the configuration at the time the BPF tables were created;

  • MAX_SERVER_IDS is the maximum number of QUIC Server IDs per connection (currently a preset value of 8).

A clarification is needed here: the Connection ID in the QUIC protocol may change multiple times during a session to make it harder to track the existence of a connection (for enhanced privacy). Therefore, at any given moment, more than one ID may be associated with a particular connection. And MAX_SERVER_IDS defines the limit on the number of such simultaneously active identifiers.

Conclusion#

In summary, the transition to HTTP/3 is not simply a protocol version change — it is a fundamental shift in the web data transport paradigm. The main challenge lies not in HTTP semantics, which have remained the same, but in the need to adapt server software to a fundamentally different connection model based on QUIC and UDP.

Here are the key architectural differences when using HTTP/1, 2, and HTTP/3 in Angie today:

HTTP/1.1

HTTP/2

HTTP/3

Representation

Text

Binary

Binary

Transport

TCP

TCP

QUIC over UDP

Security

TLS over TCP

TLS over TCP

QUIC (TLS integrated into the protocol)

Network stack (transport layer)

Kernel implementation (SOCK_STREAM sockets)

Kernel implementation (SOCK_STREAM sockets)

Implemented in Angie (on top of SOCK_DGRAM sockets)

Streams within a connection

None

Present but may block each other (HoL)

Streams are independent

Encryption

Optional

Optional*

Integral part of the protocol

Process selection for connections

Chosen by the kernel (via the accept() system call)

Chosen by the kernel (via the accept() system call)

Chosen by the BPF module upon receiving a UDP packet, based on connection data from Angie

Protocol selection by client

Default port 80, ALPN list for TLS on port 443

Default port 80, ALPN list for TLS on port 443

Alt-Svc header in the response, protocol list in DNS records

Compatibility

All supported OSes

All supported OSes

BPF module is available only on Linux; on other OSes, support is limited (single worker process, configuration changes may interrupt existing connections)

Unlike TCP, where the operating system kernel takes on all the complexity of managing connections and balancing them between processes, in the QUIC world this responsibility falls on the application itself. As we have seen with Angie, this gives rise to a number of non-trivial challenges: from initial packet balancing to supporting complex scenarios such as connection migration and seamless configuration or binary updates.

The evolution of solutions in Angie — from the primitive SO_REUSEPORT approach inherited from nginx to the sophisticated system with individual client sockets and multiple BPF tables — clearly demonstrates how new standards are integrated into time-tested multi-process architectures. The key achievement was creating an accept() equivalent for QUIC using eBPF, which allowed the system to return to the familiar and reliable connection handling model. Despite the increased complexity and dependence on Linux-specific capabilities, this approach paves the way for stable, high-performance HTTP/3 operation in high-load environments.