Hacking and Other Thoughts

Thu, 27 Jul 2006

Kevent, Netchannels, and Utrace

A lot has been going on lately, where to begin? :-)

About netchannels. lwn.net posted an article in the kernel section of the weekly addition yesterday quoting my seeming abandonment of VJ netchannels. This is only a half truth. For sure, there are real practicality issues to overcome, but I haven't given up hope entirely.

For example, while netfilter is a hard problem wrt. netchannels, someone will figure out a workable solution. Rusty started talks along those lines today. One of his ideas being tossed around is to rework the core tuple ID used for netfilter lookups into something simpler. The idea is to use a direct key involving fields that a netchannel hw engine would need to look at anyways.

This would be a radical departure from the present netfilter mechanisms, and would require redoing the userspace tools. Rusty says the world be damned and let's do it if necessary! This guy is quite inspirational :)

Alexey Kuznetsov also proposed an idea wherein we netchannel to a socket, and then in socket processing context we run the netfilter code path.

I've been trying to lend some support to Evgeniy Polyakov's kernel event mechanism KEVENT. I think the ideas are clean, and the event reporting scheme will scale very well. Although kevent exists on it's own and doesn't tie in directly to networking AIO it does make an implementation of networking AIO much easier and an implementation is included in his patches.

Meanwhile, Ulrich Drepper has been thinking a lot about this topic as well. Just the other day he posted his paper and slides from his OLS presentation on this topic.

Evgeniy's and Ulrich's proposals have a lot in common, and because Evgeniy's implementation has been around a while and has been refined we can use it to make forward progress in this area now.

One unique idea in Ulrich's proposal for network AIO is the concept of DMA user mappings. Do not confuse this with the kind of DMA mappings a device uses to transfer data to or from memory. It is logically different.

The idea is to allow the user to allocate buffers, which initially live in the kernel, but which the user can ask for a mapping for later. So the user refers to the buffer using opaque integer keys, or somesuch, when performing an AIO request. It uses this instead of pointers, since the user doesn't necessary have a mapping to the buffer.

My current guess is that we'll end up with something that looks like Kevent with Ulrich's DMA mapping ideas added on.

Finally, I have been staring at Roland's new debugging infrastructure called UTRACE. This stuff is very impressive. If you've ever toiled through trying to get some debugging application working under Linux using ptrace(), you'll immediately see why Roland's work is extremely cool. ptrace() can be considered one of the most bletcherous kernel APIs ever designed. Every platform implements it slightly differently, making every GDB port a royal pain in the ass. In particular, ptrace()'s signal semantics are nearly impossible to deal with cleanly.

Enter utrace. It's a debugging framework of sorts for the kernel. It's based upon the idea of debugging engines. For example, backwards compatibility for ptrace() is coded as a utrace engine. Multiple engines can attach to the same process. Furthermore the register transferring and signal handling are done in well defined, explict, ways.

There is a sample kernel utrace module provided on Roland's page that provides Plan 9 like semantics. Way back when I first read up on Plan 9 I thought the coolest thing was that programs didn't crash, they just froze when they got a segfault or something and waited for someone to come around and debug them. I thought this was brilliant! Roland's sample module implements this, you just pass in the pid you're interested in and that pid's process group will get the Plan 9 semantics. The process will suspend instead of dying, but if you give it a SIGCONT it would continue on and process that fatal signal as normal.

And let's face it, sometimes you don't get all the information you really need in the core file. This can be particularly true for threaded programs when they dump core.

So I'll have to find some time to work on the sparc ports, which should not be difficult at all. Roland even provides a test suite of sorts in the "ntrace" tarball. He is one of the most complete software developers I have ever observed.