diff --git a/blog/post/exception-diagnostics.md b/blog/post/exception-diagnostics.md index 45f2eb91..c8efadca 100644 --- a/blog/post/exception-diagnostics.md +++ b/blog/post/exception-diagnostics.md @@ -227,19 +227,130 @@ Now we see a correct exception stack frame when we execute `make run`: The values look correct this time. -However, it no longer works on a real machine! It triple faults and enters a boot loop. +## Testing on real Hardware +Virtual machines such as QEMU are very convenient to quickly test our kernel. However, they might behave a bit different than real hardware in some situations. So we should test our kernel on real hardware, too. -## Failure on real Hardware +Let's do it by burning it to an USB stick: -- reproduce using `-enable-kvm` -- debugging using `loop {}` and gdb -- frame pointer and thus stack pointer alignment wrong -- requirements system v -- stack frame high level (xx bytes) -- hacky workaround (`push 0`) -- `extern "C" fn() -> !` not the correct handler function type -- assembly stub required to ensure correct stack alignment -- naked functions for handlers with and without error code (`push 0`, `call`) +``` +> sudo dd if=build/os-x86_64.iso of=/dev/sdX; and sync +``` + +Replace `sdX` by the device name of your USB stick. But **be careful**! The command will erase everything on that device. + +When we boot from this USB stick now, we see that our computer reboots just before printing the exception message. So our code, which worked well in QEMU, causes a triple fault on real hardware. What's happening? + +### Reproducing in QEMU +Debugging on a real machine is difficult. Fortunately there is a way to reproduce this bug in QEMU: We use Linux's [Kernel-based Virtual Machine] \(KVM) by passing the `‑enable-kvm` flag: + +[Kernel-based Virtual Machine]: https://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine + +``` +> qemu-system-x86_64 -cdrom build/os-x86_64.iso -enable-kvm +``` + +Now QEMU triple faults as well. This should make debugging much easier. + +### Debugging + +QEMU's `-d int`, which prints every exception, doesn't seem to work in KVM mode. However `-d cpu_reset` still works. It prints the complete CPU state whenever the CPU resets. Let's try it: + +``` +> qemu-system-x86_64 -cdrom build/os-x86_64.iso -enable-kvm -d cpu_reset +CPU Reset (CPU 0) +EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000 +ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 +EIP=00000000 EFL=00000000 [-------] CPL=0 II=0 A20=0 SMM=0 HLT=0 +[...] +CPU Reset (CPU 0) +EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000663 +ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 +EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 +[...] +CPU Reset (CPU 0) +RAX=000000000011fac8 RBX=0000000000000800 RCX=1d1d1d1d1d1d1d1d RDX=0000000000000000 +RSI=0000000000119d70 RDI=000000000011fb58 RBP=000000000011fb48 RSP=000000000011f9c8 +R8 =0000000000000000 R9 =0000000000000100 R10=000000000011f500 R11=000000000011f800 +R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000 +RIP=000000000010db23 RFL=00210002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 +[...] +``` +The first two resets occur while the CPU is still in 32-bit mode (`EAX` instead of `RAX`), so we ignore them. The third interrupt is the interesting one. It tells us that the instruction pointer value was `0x10db23` just before the reset. This might be the address of the instruction that caused the triple fault. + +We can find the corresponding instruction by disassembling our kernel: + +```shell +objdump -d build/kernel-x86_64.bin | grep "10db23:" + 10db23: 0f 29 45 b0 movaps %xmm0,-0x50(%rbp) +``` +The [movaps] instruction is an [SSE] instruction that moves aligned 128bit values. It can fail for a number of reasons: + +[movaps]: http://x86.renejeschke.de/html/file_module_x86_id_180.html +[SSE]: https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions + +1. For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments. +2. For an illegal address in the SS segment. +3. If a memory operand is not aligned on a 16-byte boundary. +4. For a page fault. +5. If TS in CR0 is set. + +The segment registers contain no meaningful values in long mode, so they can't contain illegal addresses. We did not change the TS bit in [CR0] and there is no reason for a page fault either. So it has to be option 3. + +[CR0]: https://en.wikipedia.org/wiki/Control_register#CR0 + +### 16-byte Alignment +Some SSE instructions such as `movaps` require that memory operands are 16-byte aligned. In our case, the instruction is `movaps %xmm0,-0x50(%rbp)`, which writes to address `rbp - 0x50`. To number `0x50` is 16-byte aligned, since `0x50 = 5*0x10 = 5*16`. Therefore `rbp` needs to be 16-byte aligned too. + +Let's look at the above `-d cpu_reset` dump again and check the value of `rbp`: + +``` +CPU Reset (CPU 0) +RAX=[...] RBX=[...] RCX=[...] RDX=[...] +RSI=[...] RDI=[...] RBP=000000000011fb48 RSP=[...] +... +``` +`RBP` is `0x11fb48`, which is _not_ 16-byte aligned. So this is the reason for the triple fault. It seems like QEMU doesn't check the alignment for `movaps`, but real hardware of course does. + +But how did we end up with a misaligned `rbp` register? + +### Calling Conventions +In order to solve this mystery, we need to look at the disassembly of the preceding code: + +``` +> objdump -d build/kernel-x86_64.bin | grep -B12 "10db23:" +000000000010daf0 <_ZN7blog_os10interrupts12main_handler17he035E>: + 10daf0: 55 push %rbp + 10daf1: 48 89 e5 mov %rsp,%rbp + 10daf4: 48 81 ec 80 01 00 00 sub $0x180,%rsp + 10dafb: 48 8d 45 80 lea -0x80(%rbp),%rax + 10daff: 48 b9 1d 1d 1d 1d 1d movabs $0x1d1d1d1d1d1d1d1d,%rcx + 10db06: 1d 1d 1d + 10db09: 48 89 4d 88 mov %rcx,-0x78(%rbp) + 10db0d: 48 89 4d 80 mov %rcx,-0x80(%rbp) + 10db11: 48 89 8d f0 fe ff ff mov %rcx,-0x110(%rbp) + 10db18: 48 89 7d f8 mov %rdi,-0x8(%rbp) + 10db1c: 0f 10 05 8d b5 00 00 movups 0xb58d(%rip),%xmm0 + 10db23: 0f 29 45 b0 movaps %xmm0,-0x50(%rbp) +``` +The exception occurs inside our `main_handler` function. We see that `rbp` is loaded with the value of `rsp` at the beginning. The `rbp` register now holds the so-called _base pointer_, which points to the beginning of the stack frame. It is used in the following to address variables and other values on the stack. + +The base pointer is initialized directly from the stack pointer (`rsp`) after pushing the old base pointer. There is no special alignment code, so the compiler blindly assumes that `(rsp - 8)`[^fn-rsp-8] is always 16-byte aligned. This seems to be wrong in our case. But why does the compiler assume this? + +[^fn-rsp-8]: By pushing the old base pointer, `rsp` is updated to `rsp-8`. + +The reason is that our exception handler is defined as `extern "C" function`, which means that it's using the C [calling convention]. On x86_64 Linux, the C calling convention is specified by the System V AMD64 ABI ([PDF][system v abi]). Section 3.2.2 defines the following: + +[calling convention]: https://en.wikipedia.org/wiki/X86_calling_conventions +[system v abi]: http://www.x86-64.org/documentation/abi.pdf + +> The end of the input argument area shall be aligned on a 16 byte boundary. In other words, the value (%rsp + 8) is always a multiple of 16 when control is transferred to the function entry point. + +The “end of the input argument area” refers to the last stack-passed argument (in our case there aren't any). So the stack pointer must be 16 byte aligned when we `call` a function with C calling convention. The `call` instruction then pushes the return value on the stack so that “the value (%rsp + 8) is a multiple of 16 when control is transferred to the function entry point”. + +_Summary_: The calling convention requires a 16 byte aligned stack pointer before `call` instructions. The compiler relies on this requirement, but we broke it somehow. Thus the generated code triple faults due to a misaligned memory address in the `movaps` instruction. + +### Fixing the Alignment +In order to fix this bug, we need to make sure that the stack pointer is correctly aligned before calling `extern "C"` functions. Let's calculate the ## What's next?