This is just laborious and tedious work, but very beneficial. It would eliminate one huge difference between the 32-bit and 64-bit ports.
Should be very simple.
This way we can free up the unused code based upon the processor type we have.
It supposed to be possible to mix UltraSPARC-III and IV chips in the same machine. We don't handle that so well currently, although every time I go walking around the relevant code most things look OK. The main chip differences have to do with the fact that UltraSPARC-IV has all the features of UltraSPARC-III+ and onward, whereas plain UltraSPARC-III does not. In particular, this means the ability to choose a page size for the indexed TLBs does not exist in UltraSPARC-III whereas it does in UltraSPARC-IV.
Unfortunately I lack any UltraSPARC-IV chips on which to test any of this at all, so all efforts will be a shot in the dark until someone sends me some hardware. :(
Due to how the elf header sits at the front of the final kernel image, and the alignment we need for the trap table, we waste a lot of space. There is also some funny that makes init/main.o force the linker to 64K align that object when it gets hit in the final link.
I spent some time investigating all of this in 2004 and it was very painful work. I never tracked down the root cause of the init/main.o alignment issue. This is worthwhile stuff because we'll get back nearly 100K of object space if all the holes can be plugged.
Use ppc64 as a guide.
It's not easy, we are pretty much out of registers in the SMP cross call to handle TLB flushes. I've tried to do this work before, and register allocation was the issue that prevented a working implementation from being possible. One constant that we need in a register is 8192 because that number is too large to use as an immediate value in an add or subtract instruction. But that can be worked around by using two add or two subtract instructions using immediate value 4096.
It's very desirable though, since all my traces show that %99 of entries in the TLB flush batch arrays are sequential.
Get it on par with the Ultra-III stuff.
We currently emit a lot of code and strings unnecessarily. Use the ppc64 implementation as a guide.
I actually tried this, and it bloats up the kernel with about 80K of new text. That's pretty bad. The __builtin_trap() generates significantly better code, as does the function call version of the DEBUG_BUGVERBOSE generic stuff. Ho hum.. that patch I wrote is here.