Today’s story is the story of Speed Racer in the Challenge of Racer X. Here goes. The really scary thing is that I still remember the details.
To this day, I can’t bear to listen to the Speed Racer theme song because I spent over a week debugging why the program froze up right after the title sequence music. The crashes were completely nonsensical and random.
Windows 95 uses the
iretd
instruction to
return from the kernel back to the application.
After days of frustrating head-scratching,
I eventually discovered
that if you use the instruction to return
from the kernel back to the application,
and the application is running 32-bit protected-mode
code on a 16-bit stack,
then only the bottom 16 bits of the esp
register
are updated by the iretd
instruction.
The upper 16 bits remain unchanged and continue
to hold the value they had while you were in kernel mode.
This behavior doesn’t appear to be documented anywhere in Intel’s
reference books.¹
The effect of this is that 32-bit protected-mode code
running on a 16-bit stack will observe that the upper 16
bits of the esp
register are spontaneously corrupted
randomly.
(Sound familiar?)
Unfortunately,
Speed Racer
was counting
on the upper 16 bits of the esp
register remaining
zero.
To fix this, I had to counter insanity with more insanity.
At the last moment before restoring all the general purpose registers and
executing the iretd
instruction,
Windows 95 does a check to see whether the troublesome
scenario is about to occur.
If so, the kernel sets up a temporary stack selector
whose base linear address matches the high 16 bits of
the kernel esp
register,
then switches to that stack while simultaneously zeroing
out the high 16 bits of its own esp
register.
This double-switch rewrites the ss:esp
value
such that it points to the same memory, but shuffles the
bits around to arrange for the high 16 bits of esp
to be zero.
In other words, it rewrote
SS:ESP = 00000000 + xxxxyyyy
as
SS:ESP = xxxx0000 + 0000yyyy
.
(Sound familiar?)
At this point, the kernel is set up to
restore the general purpose registers and
perform the iretd
.
This returns control back to the application
with the high 16 bits of the esp
register
set to zero, as the application expects.
Now, this may seem like an awful lot of work just to get a single game to work, and it’s not like Speed Racer was a blockbuster game like DOOM. However, this particular problem was not intrinsic to Speed Racer. Rather, it was a problem in the client-side library code that came with the MS-DOS extender they were using, and that MS-DOS extender was one of the major players in the MS-DOS extender market, so fixing this issue actually fixed a lot of programs. It’s just that Speed Racer was the first one discovered to exhibit the problem, so it was the one I ended up debugging.
¹Maybe I’m missing it.
You tell me if you see it in there.
The pseudocode at the
RETURN-TO-OUTER-PRIVILEGE-LEVEL
label talks about raising an exception if the stack doesn’t
have at least 8 bytes of data in it,
but it doesn’t appear to discuss what happens to the esp
register.
The discussion says
“If the return is to another privilege level,
the IRET instruction also pops the stack pointer and SS from the stack,”
but it doesn’t mention what happens if the destination stack pointer
is a different size from the current stack pointer.