Network Buffers and Memory Management

时间：2005-03-29 来源：soloix

Network Buffers and Memory Management By Alan Cox on Mon, 1996-09-30 23:00.

Writing a network device driver for Linux is fundamentally simple---most of the complexity (other than talking to the hardware) involves managing network packets in memory.

Network Buffers and Memory Management

By Alan Cox Created 1996-09-30 23:00

Writing a network device driver for Linux is fundamentally simple---most of the complexity (other than talking to the hardware) involves managing network packets in memory.

The Linux operating system implements the industry-standard Berkeley socket API, which has its origins in the BSD Unix developments (4.2/4.3/4.4 BSD). In this article, we will look at the way the memory management and buffering is implemented for network layers and network device drivers under the existing Linux kernel, as well as explain how and why some things have changed over time.

Core Concepts

The networking layer is fairly object-oriented in its design, as indeed is much of the Linux kernel. The core structure of the networking code goes back to the initial networking and socket implementations by Ross Biro and Orest Zborowski respectively. The key objects are:

Device or Interface: A network interface is programming code for sending and receiving data packets. Usually an interface is used for a physical device like an Ethernet card; however, some devices are software only, e.g., the loopback device used for sending data to yourself.
Protocol: Each protocol is effectively a different networking language. Some protocols exist purely because vendors chose to use proprietary networking schemes, while others are designed for special purposes. Within the Linux kernel each protocol is a separate module of code which provides services to the socket layer.
Socket: A socket is a connection in the networking that provides Unix file I/O and exists as a file descriptor to the user program. In the kernel each socket is a pair of structures that represent the high level socket interface and the low level protocol interface.
sk_buff: All the buffers used by the networking layers are sk_buffs. The control for these buffers is provided by core low-level library routines that are available to all of the networking system. sk_buffs provide the general buffering and flow control facilities needed by network protocols.

Implementation of sk_buffs

The primary goal of the sk_buff routines is to provide a consistent and efficient buffer-handling method for all of the network layers, and by being consistent to make it possible to provide higher level sk_buff and socket handling facilities to all of the protocols.

A sk_buff is a control structure with a block of memory attached. Two primary sets of functions are provided in the sk_buff library. The first set consists of routines to manipulate doubly linked lists of sk_buffs; the second of functions for controlling the attached memory. The buffers are held on linked lists optimised for the common network operations of append to end and remove from start. As so much of the networking functionality occurs during interrupts these routines are written to use atomic memory. The small extra overhead that results is well worth the pain it saves in bug hunting.

We use the list operations to manage groups of packets as they arrive from the network, and as we send them to the physical interfaces. We use the memory manipulation routines for handling the contents of packets in a standardised and efficient manner.

At its most basic level, a list of buffers is managed using functions like this:

void append_frame(char *buf, int len) { struct sk_buff *skb=alloc_skb(len, GFP_ATOMIC); if(skb==NULL) my_dropped++; else { skb_put(skb,len); memcpy(skb->data,data,len); skb_append(&my_list, skb); } } void process_queue(void) { struct sk_buff *skb; while((skb=skb_dequeue(&my_list))!=NULL) { process_data(skb); kfree_skb(skb, FREE_READ); } }

These two fairly simplistic pieces of code actually demonstrate the receive packet mechanism quite accurately. The append_frame() function is similar to the code called from an interrupt by a device driver receiving a packet, and process_frame() is similar to the code called to feed data into the protocols. If you look in net/core/dev.c at netif_rx() and net_bh(), you will see that they manage buffers similarly. They are far more complex, as they have to feed packets to the right protocol and manage flow control, but the basic operations are the same. This is just as true if you look at buffers going from the protocol code to a user application.

The example also shows the use of one of the data control functions, skb_put(). Here it is used to reserve space in the buffer for the data we wish to pass down.

Let's look at append_frame(). The alloc_skb() function obtains a buffer of len bytes (