All UltraSPARC chips prior to Niagara provided 5 trap levels to privileged mode code. Each trap level has a set of 3 trap state registers:
This layout generally meant that you could do things in trap handlers that would take further traps. For example, trap entry before the Niagara changes on sparc64/Linux looked roughly like:
etrap: rdpr %pil, %g2 rdpr %tstate, %g1 sllx %g2, 20, %g3 andcc %g1, TSTATE_PRIV, %g0 or %g1, %g3, %g1 bne,pn %xcc, 1f sub %sp, STACKFRAME_SZ+TRACEREG_SZ-STACK_BIAS, %g2
Save away the %tstate register, and allocate new trap stack space if we are trapping from privileged mode code. In the software state we save away the trap time %pil into an unused area of the %tstate value, and later extract these bits out and load it into the %pil register at return from trap time. Note that we never actually write these %pil bits into the %tstate register, it is purely software state.
This next sequence obtains the initial kernel stack pointer when we are trapping from user-mode, it also clears the FPU state by clearing out the %fprs register.
wrpr %g0, 7, %cleanwin sethi %hi(TASK_REGOFF), %g2 sethi %hi(TSTATE_PEF), %g3 or %g2, %lo(TASK_REGOFF), %g2 and %g1, %g3, %g3 brnz,pn %g3, 1f add %g6, %g2, %g2 wr %g0, 0, %fprs
Now we have the stack we'll use in %g2. Now we save away the trap state registers from the current trap level, onto the stack.
1: rdpr %tpc, %g3 stx %g1, [%g2 + STACKFRAME_SZ + PT_V9_TSTATE] rdpr %tnpc, %g1 stx %g3, [%g2 + STACKFRAME_SZ + PT_V9_TPC] rd %y, %g3 stx %g1, [%g2 + STACKFRAME_SZ + PT_V9_TNPC] st %g3, [%g2 + STACKFRAME_SZ + PT_V9_Y]
Ok, now we are ready to grab a register window for this trap.
save %g2, -STACK_BIAS, %sp ! Ordering here is critical
This is where things get really interesting on Niagara. On pre-Niagara systems we had 5 trap levels, we are at trap level 1 in this piece of code. That "save" instruction can cause several layers of traps to occur, namely:
Along comes Niagara which puts a monkey wrench into the works by only giving us 2 trap levels (MAXTL=2) in privileged mode. The rest of the 5 trap levels are there, but the ones above 2 are for the hypervisor. The long and short of this is that we have to do our trap handling a little different into order to fit into these limits.
Firstly, we have to get rid of the virtual page tables and move over to TSB based TLB miss handling. This was discussed in a previous blog entry of mine. Virtual page table handling can take up to 2 levels of traps, which would exceed out quota immediately.
So with TSB based TLB miss handling we've got things down to 3 trap levels for the above trap entry code:
We're still over-budget by one trap level. What we need to do is get rid of the window spill trap, and the way we handle that is to check inline whether we would take a window spill trap or not (we can do this by checking some privileged state registers). If we would trap, we do the window spill handling inline in order to not take the trap.
So, the "save" instruction in the trap entry sequence above is replaced with the following code:
rdpr %cansave, %g1 brnz,pt %g1, etrap_save nop
The %cansave register tells us how many register windows we can "save" into without causing a window spill trap. If it's non-zero, we can just do the "save" instruction directly because we know it will not trap.
We now have to figure out what kind of register window we need to save. It could be either a kernel or a user window, and this determines what kind of store instructins we should use. Within the userspace case there are two more sub-cases. We could be dealing with either a 64-bit or a 32-bit process, and we need to do the save accordingly.
rdpr %cwp, %g1 add %g1, 2, %g1 wrpr %g1, %cwp
Switch into the window we need to save by incrementing the %cwp register by 2.
be,pt %xcc, etrap_user_spill mov ASI_AIUP, %g3Here we are making use of the previously compute conditional test checking if TSTATE_PRIV is set in %tstate. If it is clear, we are trapping from userland and we know that it's a user window we need to save.
If we trapped from kernel there could still be user windows still in the cpu, so we have another check to make.
rdpr %otherwin, %g3 brz %g3, etrap_kernel_spill mov ASI_AIUS, %g3
Register windows live in the processor are split into two classes. "normal" and "other". The OS uses this so that it knows how to save the register window on window spill traps. If it's "other" we know it's a userspace register window. So if the %otherwin register is non-zero, we have a user window to save.
etrap_user_spill: wr %g3, 0x0, %asi ldx [%g6 + TI_FLAGS], %g3 and %g3, _TIF_32BIT, %g3 brnz,pt %g3, etrap_user_spill_32bit nop ba,a,pt %xcc, etrap_user_spill_64bitWe check the thread state to see if we have a 32-bit or 64-bit user process.
etrap_save: save %g2, -STACK_BIAS, %spAnd there's the save instruction itself, after we save the window away by hand, we execute a "saved" instruction to make the new free register window available. Thus we know the save instruction won't trap and cause trouble.
The "etrap_kernel_spill", "etrap_user_spill_32bit" and "etrap_user_spill_64bit" routines are inlined into unused slots of the processor trap table and do the real work. Here is what etrap_user_spill_32bit looks like:
etrap_user_spill_32bit: srl %sp, 0, %sp; stwa %l0, [%sp + 0x00] %asi; stwa %l1, [%sp + 0x04] %asi; stwa %l2, [%sp + 0x08] %asi; stwa %l3, [%sp + 0x0c] %asi; stwa %l4, [%sp + 0x10] %asi; stwa %l5, [%sp + 0x14] %asi; stwa %l6, [%sp + 0x18] %asi; stwa %l7, [%sp + 0x1c] %asi; stwa %i0, [%sp + 0x20] %asi; stwa %i1, [%sp + 0x24] %asi; stwa %i2, [%sp + 0x28] %asi; stwa %i3, [%sp + 0x2c] %asi; stwa %i4, [%sp + 0x30] %asi; stwa %i5, [%sp + 0x34] %asi; stwa %i6, [%sp + 0x38] %asi; stwa %i7, [%sp + 0x3c] %asi; saved; sub %g1, 2, %g1; ba,pt %xcc, etrap_save; wrpr %g1, %cwp; nop; nop; nop; nop; nop; nop; nop; nop; ba,a,pt %xcc, etrap_spill_fixup_32bit; ba,a,pt %xcc, etrap_spill_fixup_32bit; ba,a,pt %xcc, etrap_spill_fixup_32bit;The nops are there because the trap table entries being used here are 32 instructions long. We save the window away, and reload the %cwp register with what it was before the window save sequence.
The last 3 instructions need some explanation. If we take a TLB miss trap here, we will just return if we can service it directly. If we need to do real fault processing, we can't do fault processing as we're already busy at the topmost trap level servicing an exception. So in this case, which the TLB miss handler can recognize by testing if the trap level is greater than 1, it returns from the trap to one of these 3 branch instructions. This will save the userspace register window into the kernel-side thread window save area, and we'll deal with the fault later before returning back to userspace.
And that's how we deal with the reduced number of privileged trap levels available in the Niagara chip under Linux.
Tsk tsk tsk, all you disbelievers over at CNET.... You just had to notice the panic message at the end of the boot log I posted on friday. :-)
Well, just to quiet any notion that the thing isn't functional, I spent this past weekend working on getting a full install on the box.
It builds kernels (in about 3 minutes, 37 seconds), GIT works, etc. etc. etc.