How SKB socket accounting works

When sockets send and receive packets, they must be properly charged for the amount of system memory consumed by that packet. BSD sockets have the concept of send and receive buffers, which are used to keep track of this and make sure sockets stay within their limits.

We keep track of how many bytes of system memory are consumed by a packet in 'skb->truesize'. This is the total of how large a data buffer we allocated for the packet, plus the size of 'struct sk_buff' itself. This avoids two common implementation errors. First, it would not be correct to only charge the data bytes actually used for packet data. Otherwise, it is possible for us to allocate a large packet, only receive a smaller one into the buffer, then charge the socket only for the smaller size. Likewise, for similar reasons it is essential that the 'struct sk_buff' overhead is charged to the socket. Otherwise a socket could consume a large amount of system memory by receiving tiny 1-byte packets.

The total number of bytes of send packet memory a socket may use is limited by 'sk->sk_sndbuf', and likewise 'sk->sk_rcvbuf' for receive packets. Actually, this is mostly used for datagram protocols. Stream protocols, such as TCP, use a more elaborate scheme. For this discussion, however, investigating the basic socket accounting scheme is sufficient.

When a received packet is to be charged to a socket, we invoke the function 'skb_set_owner_r'. It sets 'skb->sk' to the socket, hooks up a destructor function, and accounts the data bytes in 'sk->sk_rmem_alloc'. Often, this function is invoked by the helper routine 'skb_queue_rcv_skb'.

static inline void skb_set_owner_r(struct sk_buff *skb, struct sock *sk)
{
	skb->sk = sk;
	skb->destructor = sock_rfree;
	atomic_add(skb->truesize, &sk->sk_rmem_alloc);
}

Later when the packet is freed up (via 'kfree_skb()', usually after the receive packet data is copied into user space), the destructor is invoked. In the above example, the destructor is 'sock_rfree()'. It releases the buffer allocate space from 'sk->sk_rmem_alloc'.

void sock_rfree(struct sk_buff *skb)
{
	struct sock *sk = skb->sk;

	atomic_sub(skb->truesize, &sk->sk_rmem_alloc);
}

On the packet send side, similar things happen. Except that in the destructor we have to wake up any processes waiting for send buffer space to become available. The routines used here are 'skb_set_owner_w()' and 'sock_wfree()'.

static inline void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
{
	sock_hold(sk);
	skb->sk = sk;
	skb->destructor = sock_wfree;
	atomic_add(skb->truesize, &sk->sk_wmem_alloc);
}
...
void sock_wfree(struct sk_buff *skb)
{
	struct sock *sk = skb->sk;

	/* In case it might be waiting for more memory. */
	atomic_sub(skb->truesize, &sk->sk_wmem_alloc);
	if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE))
		sk->sk_write_space(sk);
	sock_put(sk);
}

Notice how the callback 'sk->sk_write_space()' is invoked by 'sock_wfree()'. This call is what wakes up the user processes, if any, waiting for send buffer free space.

Another subtle issue is worth pointing out here. For receive buffer accounting, we do not grab a reference to the socket (via 'sock_hold()'), because the socket handling code will always make sure to free up any packets in it's receive queue before allowing the socket to be destroyed. Whereas for send packets, we have to do proper accounting with 'sock_hold()' and 'sock_put()'. Send packets can be freed asynchronously at any point in time. For example, a packet could sit in a devices transmit queue for a long time under certain conditions. If, meanwhile, the socket is closed, we have to keep the socket reference around until SKBs referencing that socket are liberated.