Hacking and Other Thoughts

Sun, 21 Nov 2010

IPv4 source address selection in Linux

Often when we connect a socket up, or bind it, we really don't care what source address is used for the resulting connection. We let the kernel decide.

The usual sequence is:

	connect(fd, { AF_INET, $PORT, $DEST_IP_ADDR }, ...);
Here we leave the source port and source address to be selected by the kernel via a facility called auto-binding. Oddly enough TCP and UDP use a different ordering for selecting the source address vs. selecting the source port.

UDP will first select a local port to use, and make this choice in a global namespace of ports for the machine. TCP on the other hand will have a source address selected first, and then try to allocate a local port using the source address as a partial key. This latter ordering is necessary in order to handle SO_REUSEADDR correctly.

Source address selection itself happens via routing.

The route lookup will be performed with the source address in the flow lookup key set to zero. After the route, based upon destination address, is found the routing code uses the next-hop interface to select an appropriate source address.

All of these results are propagated into the routing cache entry.

It is interesting to note that the routing cache entry created in such a situation will have a zero source address as well in it's routing key. So the next time a routing lookup occurs to the same destination, but without a specified source-address, we'll match this routing cache entry.

This little detail creates some minor complications when handling ICMP messages for redirects. Since we must update any potentially matching routing cache entries, we have to probe the hash table multiple times. Once with an explicit source address in the lookup key, and once with the source address in the key set to zero. Otherwise we won't update all of the entries that we need to.

Actual source address selection is performed by inet_select_addr(). Either via direct calls made by net/ipv4/route.c, or indirectly via __fib_res_prefsrc() This function works with a "scope" specification which says which realm in which the source address must be valid. Most of the time this is RT_SCOPE_UNIVERSE.

The linked list of ipv4 interface addressed for the interface is traversed, and the first address with an appropriate scope is selected.

Even though the flow key of the routing cache entry will have a zero source address, the source address selected is remembered in rt->rt_src so that users of it can see what source address to use.

Finally, routes loaded into the kernel can have an explicit "preferred source address" attribute set by the administrator. This value will fully preempt whatever inet_select_addr() would have choosen.