With the 2.6.18 merge window open it's a great opportunity to break things as best I can. With this in mind I decided to take a deep breath and go through a marathon coding session to deal with some frustrating issues with the Sparc port(s).
Support for SBUS, EBUS, and ISA devices was, at best, cobbled
together over time as needs arised. No real design, just adding
functionality as-needed. The devices driven by these ad-hoc
bus layers really want a basic set of things:
EBUS and ISA busses are the two which Sun uses to attach "simple" devices such as serial ports, floppy drives, NVRAM, the RTC, and simpler sound devices such as the cs4231. Some more involved hierarchies of devices can sit on these busses, such as an I2C controller with I2C devices described underneath. On 32-bit Sparc systems these devices were "OBIO" for "OnBoard I/O" and sat right at the root of the device tree, and not really under any kind of special bus. The registers and interrupts were fully qualified in the PROM device tree nodes of these OBIO devices and did not require any translation.
On non-PCI systems the less simple devices, such as framebuffers, scsi controllers, and network cards, tend to show up under SBUS. The SBUS typically has an IOMMU controller and the registers and interrupts of it's devices need translation. PCI works similarly on sparc64 boxes.
The existing SBUS/EBUS/ISA driver probing looked really ugly, here is an example of this grot:
struct linux_ebus *ebus;
struct linux_ebus_device *edev;
for_each_ebus(ebus) {
for_each_ebusdev(edev, ebus) {
if (!strcmp(edev->prom_name, "mydevice"))
mydevice_init_one(edev);
}
}
There is a version of these objects and interfaces for SBUS, and
ISA, it all looks the same. What architecture! In my defense, I
was just learning how to program in C when I wrote the SBUS device
layer some 10+ years ago.
Things get particularly ugly in some of the serial drivers, where multiple busses apply to the device type. The Zilog serial driver is probably the worst example. True unmaintainable spaghetti code, these probing routines.
So how to clean all this crap up? First it helps to get the PROM device tree into some kernel data structure, using something that exposes the hierarchy of the device tree (parent, child, root, siblings), and a way to get at a node's properties. The powerpc guys had something similar since they have to deal with openfirmware on their machines too. So I pulled over their asm/prom.h header definitions and wrote code to build the device tree right after we setup paging. This is done by using the PROM callbacks to traverse and probe the device tree. Once this is setup you can do fun stuff like:
struct device_node *root = find_node_by_path("/");
const char *banner_name;
int len;
banner_name = of_get_property(root, "banner-name", &len);
if (!strcmp(banner_name, "Sun Blade 1500 (Silver)"))
this_is_davems_workstation();
You can walk nodes by saying node->child or node->sibling
or node->parent and the root node's parent is NULL. For
convenience the "name" and "device_type" property pointers are available
directly as node->name and node->type respectively.
A lot of code was converted to using the device tree instead of the firmware calls, even the openpromfs filesystem and /dev/openprom driver code. But we don't have enough infrastructure to use this for real device drivers just yet. For that we need a little bit more.
To integrate with the nice generic device layer we have in the Linux kernel we have to define a bus layer. The powerpc folks, again, had a good chunk of infrastructure code we could make use of in their of_device and of_driver abstraction. This creates a &of_bus_type, to which I attach of_device objects representing every node in the PROM device tree.
To the of_device object, I added resources (for register mappings), and an array of IRQs. In linux/mod_devicetable.h there is an of_device_id drivers use to indicate which devices in the tree the driver is for. This is exported in driver modules using the MODULE_DEVICE_TABLE() macro. Here is an example of how this works for the cg3 framebuffer driver:
static struct of_device_id cg3_match[] = {
{
.name = "cgthree",
},
{
.name = "cgRDI",
},
{},
};
MODULE_DEVICE_TABLE(of, cg3_match);
static struct of_platform_driver cg3_driver = {
.name = "cg3",
.match_table = cg3_match,
.probe = cg3_probe,
.remove = __devexit_p(cg3_remove),
};
This says the the driver can handle hardware whose device node
name is either "cgthree" or "cgRDI". Here is a more involved match
table from the "su" 8250 serial driver:
static struct of_device_id su_match[] = {
{
.name = "su",
},
{
.name = "su_pnp",
},
{
.name = "serial",
.compatible = "su",
},
{},
};
This driver can handle name "su", or name "su_pnp", or name "serial"
with compatible property entry of "su".
At the first stage of this of_device layer, I only computed the register physical addresses so they could be remapped and used by the driver. This is done in a mostly standard fashion by walking up the device tree to the root, applying a "ranges" property each stage along the way.
Registers are described by a "reg" property. It is an address, length pair. You walk up the root looking for "#address-cells" and "#size-cells" properties, a cell is 4-bytes and these nodes tell you how large each address and length value is in the "reg" properties.
A parent device has a "ranges" property which are used to translate "reg" properties of child devices into the "parent" space. You iteratively do this all the way up to the root device to get the system physical address.
Once this part was written I was able to convert several drivers, basically the ones that didn't need interrupts.
Probing interrupts for the of_device objects is more complicated. Sparc64 doesn't have the simplest interrupt scheme in the world. Every unique interrupt source has a pair of interrupt mapping registers called IMAP and ICLR. On Niagara sun4v systems these registers aren't really accessed, but instead are abstracted via hypervisor calls. Usually these registers live in the PCI and SBUS controllers, but they can live in other bus devices such as the central FHC bus on Sunfire systems.
So the meat of the matter is that part of translating interrupts involves finding the location of these registers so they can be attached with an interrupt value and programmed. Unfortunately we have 4 or 5 different PCI controllers with differing interrupt register layouts. So for all of these controller types we have to write a small driver to do the final interrupt translation.
But before we get there we have to so some other adjustments. Interrupts are found in the device node's "interrupts" property. Some busses have pre-defined IRQ translation mechanisms. For example, PCI to PCI busses swizzle the interrupt number based upon the PCI slot number. Other bus types typically instead provide an "interrupt-map" and "interrupt-map-mask" property to learn the interrupt routing.
Interrupt map properties are matched by the "reg" property of the child. When you've found the matching map entry, you take out the resulting interrupt and "controller node" value. The "reg" property has the interrupt map mask applied to it before comparison with each entry. The resulting controller node is usually a parent device that acts as the interrupt controller and what should be next considered for interrupt translation.
Once the final translation takes place, we pass the final interrupt number into either the sun4u or the sun4v interrupt builder, depending upon what kind of platform we are on, and we get back a virtual IRQ number which routines like request_irq(), enable_irq(), disable_irq(), and free_irq() understand. This final virtual IRQ is what gets placed in the of_device object for the driver.
Any driver not needing DMA mappings could now be converted.
The DMA stuff will need to get done next, and once that happens all drivers can be converted and all the SBUS/EBUS/ISA stuff can get deleted. What is really nice is that all the devices show up with proper PROM path component names under sysfs. For example, this is on my T1000 Niagara box:
davem@t1000:/sys/bus/of/devices$ cd /sys/bus/of/devices/root davem@t1000:/sys/bus/of/devices/root$ ls aliases cpu@12 cpu@3 cpu@a ebus@800 subsystem bus cpu@13 cpu@4 cpu@b memory@m0,800000 uevent chosen cpu@14 cpu@5 cpu@c openprom virtual-devices@100 cpu@0 cpu@15 cpu@6 cpu@d options virtual-memory cpu@1 cpu@16 cpu@7 cpu@e packages cpu@10 cpu@17 cpu@8 cpu@f pci@780 cpu@11 cpu@2 cpu@9 devspec pci@7c0 davem@t1000:/sys/bus/of/devices/root$ cd ebus@800/serial@0\,c2c000/ davem@t1000:/sys/bus/of/devices/root/ebus@800/serial@0,c2c000$ ls -l total 0 lrwxrwxrwx 1 root root 0 2006-06-30 00:13 bus -> ../../../../bus/of -r--r--r-- 1 root root 8192 2006-06-30 00:13 devspec lrwxrwxrwx 1 root root 0 2006-06-30 00:13 driver -> ../../../../bus/of/drivers/su lrwxrwxrwx 1 root root 0 2006-06-30 00:13 subsystem -> ../../../../bus/of lrwxrwxrwx 1 root root 0 2006-06-30 00:13 tty:ttyS1 -> ../../../../class/tty/ttyS1 --w------- 1 root root 8192 2006-06-30 00:13 uevent davem@t1000:/sys/bus/of/devices/root/ebus@800/serial@0,c2c000$
Another nice part of all this is that it will now be possible to do proper automatic module loading by things like UDEV and friends.