Hacking and Other Thoughts

Fri, 10 Dec 2010

Metrics, metrics, metrics...

First a quick shout-out to the Colbert Nation.

Next, let's talk about route metrics and sharing, shall we?

Metrics exist to specify attributes for a path. For example, what hoplimit should we use when using this route? What should the path MTU be? How about the TCP congestion window or RTT estimate?

Metric settings come from two places.

The administrator can attach initial metric values to routes when they get loaded into the kernel. This helps deal with specialized situations, but it's not very common at all.

There is a special metric, called "lock" which is a bitmask. There is a bit for each of the existing metrics. If the bit is set, it means that metric should not be modified by the kernel and we should always respect the value the administrator placed there.

The kernel itself will, on the fly, adjust metric values. For example, at the teardown of every TCP connection the kernel can update the metrics on the route attached to the socket. These updates are based upon measurements made during the life of the TCP socket. There is a sysctl to disable this automatic TCP metric update mechanism, for testing purposes.

For a router, metrics don't really change. So, in theory, we could take the defaults stored in the routing table and just reference those directly instead of having a private copy in every routing cache entry.

There are some barriers to this, although none insurmountable.

First of all, in order to share we have to be able to catch any dynamic update so we can unshare those read-only metrics. The net-next-2.6 tree has changes to make sure every metric write goes through a helper function. So we have the traps there and ready to go, problem solved.

Next, we actually change the metrics a little bit when we create every single routing cache entry. Essentially we pre-compute defaults. This is pretty much unnecessary, and actually could theoretically cause some problems in some cases. The metrics in question are the hoplimit, the mtu, and advertised MSS, For ipv4 these are set in rt_set_nexthop.

If the route table metric is simply the default (ie. zero) we pre-calculate it. These calculations are pretty simple and could be done instead when the value of the metric is actually asked for.

Since the defaults are address-family dependent we will need to abstract the calculations behind dst_ops methods, but that's easy enough. Accesses to each of these three metrics then need to go through a helper function which essentially says:

	if (metric_value == 0)
		metric_value = dst->ops->metric_foo_default(dst);
	return metric_value;

With that in place we will rarely, if ever, modify metric values in the routing cache metric arrays. Then the next step is putting unshared metrics somewhere else (f.e. inetpeer cache), and then changing dst_entry metrics member to be a pointer instead of an array.

Initially the pointer will point into the routing table read-only default metrics. On a COW event, we'll hook up an inetpeer entry to the route and retarget the metrics pointer to the inetpeer's metrics storage.