Move blog posts/pages into code repository

For history see the `rust_os_posts` tag in the phil-opp/phil-opp.github.io repository.
This commit is contained in:
Philipp Oppermann
2015-11-06 22:47:39 +01:00
parent fa78bd82a8
commit 885168bea4
6 changed files with 1703 additions and 0 deletions

View File

@@ -0,0 +1,46 @@
---
layout: page
title: Cross Compile Binutils
---
The [GNU Binutils] are a collection of various binary tools such as `ld`, `as`, `objdump`, or `readelf`. These tools are platform-specific, so you need to compile them again if your host system and target system are different. In our case, we need `ld` and `objdump` for the x86_64 architecture.
[GNU Binutils]: https://www.gnu.org/software/binutils/
## Building Setup
First, you need to download a current binutils version from [here][download] \(the latest one is near the bottom). After extracting, you should have a folder named `binutils-2.X` where `X` is for example `25.1`. Now can create and switch to a new folder for building (recommended):
[download]: ftp://sourceware.org/pub/binutils/snapshots
```bash
mkdir build-binutils
cd build-binutils
```
## Configuration
We execute binutils's `configure` and pass a lot of arguments to it (replace the `X` with the version number):
```bash
../binutils-2.X/configure --target=x86_64-elf --prefix="$HOME/opt/cross" \
--disable-nls --disable-werror \
--disable-gdb --disable-libdecnumber --disable-readline --disable-sim
```
- The `target` argument specifies the the x86_64 target architecture.
- The `prefix` argument selects the installation directory, you can change it if you like. But be careful that you do not overwrite your system's binutils.
- The `disable-nls` flag disables native language support (so you'll get the same english error messages). It also reduces build dependencies.
- The `disable-werror` turns all warnings into errors.
- The last line disables features we don't need to reduce compile time.
## Building it
Now we can build and install it to the location supplied as `prefix` (it will take a while):
```bash
make
make install
```
Now you should have multiple `x86_64-elf-XXX` files in `$HOME/opt/cross/bin`.
## Adding it to the PATH
To use the tools from the command line easily, you should add the `bin` folder to your PATH:
```bash
export PATH="$HOME/opt/cross/bin:$PATH"
```
If you add this line to your e.g. `.bashrc`, the `x86_64-elf-XXX` commands are always available.

View File

@@ -0,0 +1,47 @@
---
layout: page
title: "Cross Compiling: libcore"
---
So you're getting an ``error: can't find crate for `core` [E0463]`` when using `--target x86_64-unknown-linux-gnu`. That means that you're not running Linux or not using using a x86_64 processor.
**If you have an x86_64 processor and want a quick fix**, try it with `x86_64-pc-windows-gnu` or `x86_64-apple-darwin` (or simply omit the explicit `--target`).
The idiomatic alternative and the only option for non x86_64 CPUs is described below. Note that you need to [cross compile binutils], too.
[cross compile binutils]: {{ site.url }}/cross-compile-binutils.html
## Libcore
The core library is a dependency-free library that is added implicitly when using `#![no_std]`. It provides basic standard library features like Option or Iterator. The core library is installed together with the rust compiler (just like the std library). But the installed libcore is specific to your architecture. If you aren't working on x86_64 Linux and pass `target x86_64unknownlinuxgnu` to cargo, it can't find a x86_64 libcore. To fix this, you can either download it or build it using cargo.
## Download it
You need to download the 64-bit Linux Rust build corresponding to your installed nightly. You can either just update to the current nightly and download the current nightly source [here][Rust downloads]. Or you retrieve your installed version through `rustc --version` and search the corresponding subfolder [here](http://static.rust-lang.org/dist/).
[Rust downloads]: https://www.rust-lang.org/downloads.html
After extracting it and you need to copy the `x86_64-unknown-linux-gnu` folder in `rust-std-x86_64-unknown-linux-gnu/lib/rustlib` to your local Rust installation. For multirust, the right target folder is `~/.multirust/toolchains/nightly/lib/rustlib`. That's it!
## Build it using cargo
The alternative is to use cargo to build libcore. But this variant has one big disadvantage: You have to modify each crate you depend on because it needs to use the same libcore. So you can't just add a crates.io dependency anymore, you need to fork and modify it first.
If you want to build libcore anyway, you need its source code. You can either clone the [rust repository] \(makes updates easy) or manually [download the Rust source][Rust downloads] \(faster and less memory).
[rust repository]: https://github.com/rust-lang/rust
Now we create a new cargo project named `core`, but delete its `src` folder:
```bash
cargo new core
rm -r core/src
```
Then we create a symbolic link named `src` to the `rust/src/libcore` of the Rust source code:
```bash
ln -s ../rust/src/libcore core/src
```
To use our new libcore crate (instead of the one installed together with rust) in our OS, we need to add it as a local dependency in the `Cargo.toml`:
```toml
...
[dependencies.core]
path = "core"
```
Now cargo compiles libcore for all Rust targets automatically.

View File

@@ -0,0 +1,300 @@
---
layout: post
title: 'A minimal x86 kernel'
---
This post explains how to create a minimal x86 operating system kernel. In fact, it will just boot and print `OK` to the screen. The following blog posts we will extend it using the [Rust] programming language.
I tried to explain everything in detail and to keep the code as simple as possible. If you have any questions, suggestions or other issues, please leave a comment or [create an issue] on Github. The source code is available in a [repository][source code], too.
[Rust]: http://www.rust-lang.org/
[create an issue]: https://github.com/phil-opp/blog_os/issues
[source code]: https://github.com/phil-opp/blog_os/tree/multiboot_bootstrap/src/arch/x86_64
## Overview
When you turn on a computer, it loads the [BIOS] from some special flash memory. The BIOS runs self test and initialization routines of the hardware, then it looks for bootable devices. If it finds one, the control is transferred to its _bootloader_, which is a small portion of executable code stored at the device's beginning. The bootloader has to determine the location of the kernel image on the device and load it into memory. It also needs to switch the CPU to the so-called [protected mode] because x86 CPUs start in the very limited [real mode] by default (to be compatible to programs from 1978).
[BIOS]: https://en.wikipedia.org/wiki/BIOS
[protected mode]: https://en.wikipedia.org/wiki/Protected_mode
[real mode]: http://wiki.osdev.org/Real_Mode
We won't write a bootloader because that would be a complex project on its own (if you really want to do it, check out [_Rolling Your Own Bootloader_]). Instead we will use one of the [many well-tested bootloaders][bootloader comparison] out there. But which one?
[_Rolling Your Own Bootloader_]: http://wiki.osdev.org/Rolling_Your_Own_Bootloader
[bootloader comparison]: https://en.wikipedia.org/wiki/Comparison_of_boot_loaders
## Multiboot
Fortunately there is a bootloader standard: the [Multiboot Specification][multiboot]. Our kernel just needs to indicate that it supports Multiboot and every Multiboot-compliant bootloader can boot it. We will use the Multiboot 2 specification ([PDF][Multiboot 2]) together with the well-known [GRUB 2] bootloader.
[multiboot]: https://en.wikipedia.org/wiki/Multiboot_Specification
[multiboot 2]: http://nongnu.askapache.com/grub/phcoder/multiboot.pdf
[grub 2]: http://wiki.osdev.org/GRUB_2
To indicate our Multiboot 2 support to the bootloader, our kernel must start with a _Multiboot Header_, which has the following format:
Field | Type | Value
------------- | --------------- | ----------------------------------------
magic number | u32 | `0xE85250D6`
architecture | u32 | `0` for i386, `4` for MIPS
header length | u32 | total header size, including tags
checksum | u32 | `-(magic + architecture + header_length)`
tags | variable |
end tag | (u16, u16, u32) | `(0, 0, 8)`
Converted to a x86 assembly file it looks like this (Intel syntax):
```nasm
section .multiboot_header
header_start:
dd 0xe85250d6 ; magic number (multiboot 2)
dd 0 ; architecture 0 (protected mode i386)
dd header_end - header_start ; header length
; checksum
dd 0x100000000 - (0xe85250d6 + 0 + (header_end - header_start))
; insert optional multiboot tags here
; required end tag
dw 0 ; type
dw 0 ; flags
dd 8 ; size
header_end:
```
If you don't know x86 assembly, here is some quick guide:
- the header will be written to a section named `.multiboot_header` (we need this later)
- `header_start` and `header_end` are _labels_ that mark a memory location, we use them to calculate the header length easily
- `dd` stands for `define double` (32bit) and `dw` stands for `define word` (16bit). They just output the specified 32bit/16bit constant.
- the additional `0x100000000` in the checksum calculation is a small hack[^fn-checksum_hack] to avoid a compiler warning
[^fn-checksum_hack]: The formula from the table, `-(magic + architecture + header_length)`, creates a negative value that doesn't fit into 32bit. By subtracting from `0x100000000` (= 2^(32)) instead, we keep the value positive without changing its truncated value. Without the additional sign bit(s) the result fits into 32bit and the compiler is happy :).
We can already _assemble_ this file (which I called `multiboot_header.asm`) using `nasm`. It produces a flat binary by default, so the resulting file just contains our 24 bytes (in little endian if you work on a x86 machine):
```
> nasm multiboot_header.asm
> hexdump -x multiboot_header
0000000 50d6 e852 0000 0000 0018 0000 af12 17ad
0000010 0000 0000 0008 0000
0000018
```
## The Boot Code
To boot our kernel, we must add some code that the bootloader can call. Let's create a file named `boot.asm`:
```nasm
global start
section .text
bits 32
start:
; print `OK` to screen
mov dword [0xb8000], 0x2f4b2f4f
hlt
```
There are some new commands:
- `global` exports a label (makes it public). As `start` will be the entry point of our kernel, it needs to be public.
- the `.text` section is the default section for executable code
- `bits 32` specifies that the following lines are 32-bit instructions. It's needed because the CPU is still in [Protected mode] when GRUB starts our kernel. When we switch to [Long mode] in the [next post] we can use `bits 64` (64-bit instructions).
- the `mov dword` instruction moves the 32bit constant `0x2f4f2f4b` to the memory at address `b8000` (it prints `OK` to the screen, an explanation follows in the next posts)
- `hlt` is the halt instruction and causes the CPU to stop
Through assembling, viewing and disassembling we can see the CPU [Opcodes] in action:
[Opcodes]: https://en.wikipedia.org/wiki/Opcode
```
> nasm boot.asm
> hexdump -x boot
0000000 05c7 8000 000b 2f4b 2f4f 00f4
000000b
> ndisasm -b 32 boot
00000000 C70500800B004B2F mov dword [dword 0xb8000],0x2f4b2f4f
-4F2F
0000000A F4 hlt
```
## Building the Executable
To boot our executable later through GRUB, it should be an [ELF] executable. So we want `nasm` to create ELF [object files] instead of plain binaries. To do that, we simply pass the `f elf64` argument to it.
[ELF]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
[object files]: http://wiki.osdev.org/Object_Files
To create the ELF _executable_, we need to [link] the object files together. We use a custom [linker script] named `linker.ld`:
[link]: https://en.wikipedia.org/wiki/Linker_(computing)
[linker script]: https://sourceware.org/binutils/docs/ld/Scripts.html
```
ENTRY(start)
SECTIONS {
. = 1M;
.boot :
{
/* ensure that the multiboot header is at the beginning */
*(.multiboot_header)
}
.text :
{
*(.text)
}
}
```
Let's translate it:
- `start` is the entry point, the bootloader will jump to it after loading the kernel
- `. = 1M;` sets the load address of the first section to 1 MiB, which is a conventional place to load a kernel[^Linker 1M]
- the executable will have two sections: `.boot` at the beginning and `.text` afterwards
- the `.text` output section contains all input sections named `.text`
- Sections named `.multiboot_header` are added to the first output section (`.boot`) to ensure they are at the beginning of the executable. This is necessary because GRUB expects to find the Multiboot header very early in the file.
[^Linker 1M]: We don't want to load the kernel to e.g. `0x0` because there are many special memory areas below the 1MB mark (for example the so-called VGA buffer at `0xb8000`, that we use to print `OK` to the screen).
So let's create the ELF object files and link them using our new linker script. It's important to pass the `-n` flag to the linker, which disables the automatic section alignment in the executable. Otherwise the linker may page align the `.boot` section in the executable file. If that happens, GRUB isn't able to find the Multiboot header because it isn't at the beginning anymore. We can use `objdump` to print the sections of the generated executable and verify that the `.boot` section has a low file offset.
```
> nasm -f elf64 multiboot_header.asm
> nasm -f elf64 boot.asm
> ld -n -o kernel.bin -T linker.ld multiboot_header.o boot.o
> objdump -h kernel.bin
kernel.bin: file format elf64-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
0 .boot 00000018 0000000000100000 0000000000100000 00000080 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .text 0000000b 0000000000100020 0000000000100020 000000a0 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
```
_Note_: The `ld` and `objdump` commands are platform specific. If you're _not_ working on x86_64 architecture, you will need to [cross compile binutils]. Then use `x86_64elfld` and `x86_64elfobjdump` instead of `ld` and `objdump`.
[cross compile binutils]: {{ site.url }}/cross-compile-binutils.html
## Creating the ISO
The last step is to create a bootable ISO image with GRUB. We need to create the following directory structure and copy the `kernel.bin` to the right place:
```
isofiles
└── boot
├── grub
│ └── grub.cfg
└── kernel.bin
```
The `grub.cfg` specifies the file name of our kernel and its Multiboot 2 compliance. It looks like this:
```
set timeout=0
set default=0
menuentry "my os" {
multiboot2 /boot/kernel.bin
boot
}
```
Now we can create a bootable image using the command:
```
grub-mkrescue -o os.iso isofiles
```
_Note_: If it does not work for you, make sure `xorriso` is installed and try to run it with `--verbose`.
## Booting
Now it's time to boot our OS. We will use [QEMU]:
[QEMU]: https://en.wikipedia.org/wiki/QEMU
```
qemu-system-x86_64 -hda os.iso
```
![qemu output]({{ site.url }}/images/qemu-ok.png)
Notice the green `OK` in the upper left corner. If it does not work for you, take a look at the comment section.
Let's summarize what happens:
1. the BIOS loads the bootloader (GRUB) from the virtual hard drive (the ISO)
2. the bootloader reads the kernel executable and finds the Multiboot header
3. it copies the `.boot` and `.text` sections to memory (to addresses `0x100000` and `0x100020`)
4. it jumps to the entry point (`0x100020`, you can obtain it through `objdump -f`)
5. our kernel prints the green `OK` and stops the CPU
You can test it on real hardware, too. Just burn the ISO to a disk or USB stick and boot from it.
## Build Automation
Right now we need to execute 4 commands in the right order everytime we change a file. That's bad. So let's automate the build using a [Makefile][Makefile tutorial]. But first we should create some clean directory structure for our source files to separate the architecture specific files:
[Makefile tutorial]: http://mrbook.org/blog/tutorials/make/
```
├── Makefile
└── src
└── arch
└── x86_64
├── multiboot_header.asm
├── boot.asm
├── linker.ld
└── grub.cfg
```
The Makefile looks like this (but indented with tabs instead of spaces):
```Makefile
arch ?= x86_64
kernel := build/kernel-$(arch).bin
iso := build/os-$(arch).iso
linker_script := src/arch/$(arch)/linker.ld
grub_cfg := src/arch/$(arch)/grub.cfg
assembly_source_files := $(wildcard src/arch/$(arch)/*.asm)
assembly_object_files := $(patsubst src/arch/$(arch)/%.asm, \
build/arch/$(arch)/%.o, $(assembly_source_files))
.PHONY: all clean run iso
all: $(kernel)
clean:
@rm -r build
run: $(iso)
@qemu-system-x86_64 -hda $(iso)
iso: $(iso)
$(iso): $(kernel) $(grub_cfg)
@mkdir -p build/isofiles/boot/grub
@cp $(kernel) build/isofiles/boot/kernel.bin
@cp $(grub_cfg) build/isofiles/boot/grub
@grub-mkrescue -o $(iso) build/isofiles 2> /dev/null
@rm -r build/isofiles
$(kernel): $(assembly_object_files) $(linker_script)
@ld -n -T $(linker_script) -o $(kernel) $(assembly_object_files)
# compile assembly files
build/arch/$(arch)/%.o: src/arch/$(arch)/%.asm
@mkdir -p $(shell dirname $@)
@nasm -felf64 $< -o $@
```
Some comments (see the [Makefile tutorial] if you don't know `make`):
- the `$(wildcard src/arch/$(arch)/*.asm)` chooses all assembly files in the src/arch/$(arch)` directory, so you don't have to update the Makefile when you add a file
- the `patsubst` operation for `assembly_object_files` just translates `src/arch/$(arch)/XYZ.asm` to `build/arch/$(arch)/XYZ.o`
- the `$<` and `$@` in the assembly target are [automatic variables]
- if you're using [cross-compiled binutils][cross compile binutils] just replace `ld` with `x86_64elfld`
[automatic variables]: https://www.gnu.org/software/make/manual/html_node/Automatic-Variables.html
Now we can invoke `make` and all updated assembly files are compiled and linked. The `make iso` command also creates the ISO image and `make run` will additionally start QEMU.
## What's next?
In the [next post] we will create a page table and do some CPU configuration to switch to the 64-bit [long mode].
[next post]: {{ site.url }}{{ page.next.url }}
[long mode]: https://en.wikipedia.org/wiki/Long_mode

View File

@@ -0,0 +1,495 @@
---
layout: post
title: 'Entering Long Mode'
updated: 2015-10-29 00:00:00 +0000
---
In the [previous post] we created a minimal multiboot kernel. It just prints `OK` and hangs. The goal is to extend it and call 64-bit [Rust] code. But the CPU is currently in [protected mode] and allows only 32-bit instructions and up to 4GiB memory. So we need to setup _Paging_ and switch to the 64-bit [long mode] first.
I tried to explain everything in detail and to keep the code as simple as possible. If you have any questions, suggestions, or issues, please leave a comment or [create an issue] on Github. The source code is available in a [repository][source code], too.
[previous post]: {{ site.url }}{{ page.previous.url }}
[Rust]: http://www.rust-lang.org/
[protected mode]: https://en.wikipedia.org/wiki/Protected_mode
[long mode]: https://en.wikipedia.org/wiki/Long_mode
[create an issue]: https://github.com/phil-opp/blog_os/issues
[source code]: https://github.com/phil-opp/blog_os/tree/entering_longmode/src/arch/x86_64
_Notable Changes_: We don't use 1GiB pages anymore, since they have [compatibility problems][1GiB page problems]. The identity mapping is now done through 2MiB pages.
[1GiB page problems]: https://github.com/phil-opp/blog_os/issues/17
## Some Tests
To avoid bugs and strange errors on old CPUs we should test if the processor supports every needed feature. If not, the kernel should abort and display an error message. To handle errors easily, we create an error procedure in `boot.asm`. It prints a rudimentary `ERR: X` message, where X is an error code letter, and hangs:
```nasm
; Prints `ERR: ` and the given error code to screen and hangs.
; parameter: error code (in ascii) in al
error:
mov dword [0xb8000], 0x4f524f45
mov dword [0xb8004], 0x4f3a4f52
mov dword [0xb8008], 0x4f204f20
mov byte [0xb800a], al
hlt
```
At address `0xb8000` begins the so-called [VGA text buffer]. It's an array of screen characters that are displayed by the graphics card. A [future post] will cover the VGA buffer in detail and create a Rust interface to it. But for now, manual bit-fiddling is the easiest option.
[VGA text buffer]: https://en.wikipedia.org/wiki/VGA-compatible_text_mode
[future post]: {{ site.url }}{{ page.next.next.url }}
A screen character consists of a 8 byte color code and a 8 byte [ASCII] character. We used the color code `4f` for all characters, which means white text on red background. `0x52` is an ASCII `R`, `0x45` is an `E`, `0x3a` is a `:`, and `0x20` is a space. The second space is overwritten by the given ASCII byte. Finally the CPU is stopped with the `hlt` instruction.
[ASCII]: https://en.wikipedia.org/wiki/ASCII
Now we can add some test _functions_. A function is just a normal label with an `ret` (return) instruction at the end. The `call` instruction can be used to call it. Unlike the `jmp` instruction that just jumps to a memory address, the `call` instruction will push a return address to the stack (and the `ret` will jump to this address). But we don't have a stack yet. The [stack pointer] in the esp register could point to some important data or even invalid memory. So we need to update it and point it to some valid stack memory.
[stack pointer]: http://stackoverflow.com/a/1464052/866447
### Creating a Stack
To create stack memory we reserve some bytes at the end of our `boot.asm`:
```nasm
...
section .bss
stack_bottom:
resb 64
stack_top:
```
A stack doesn't need to be initialized because we will `pop` only when we `pushed` before. So storing the stack memory in the executable file would make it unnecessary large. By using the [.bss] section and the `resb` (reserve byte) command, we just store the length of the uninitialized data (= 64). When loading the executable, GRUB will create the section of required size in memory.
[.bss]: https://en.wikipedia.org/wiki/.bss
To use the new stack, we update the stack pointer register right after `start`:
```nasm
global start
section .text
bits 32
start:
mov esp, stack_top
; print `OK` to screen
...
```
We use `stack_top` because the stack grows downwards: A `push eax` subtracts 4 from `esp` and does a `mov [esp], eax` afterwards (`eax` is a general purpose register).
Now we have a valid stack pointer and are able to call functions. The following test functions are just here for completeness and I won't explain details. Basically they all work the same: They will test a feature and jump to `error` if it's not available.
### Multiboot test
We rely on some Multiboot features in the next posts. To make sure the kernel was really loaded by a Multiboot compliant bootloader, we can check the `eax` register. According to the Multiboot specification ([PDF][Multiboot specification]), the bootloader must write the magic value `0x36d76289` to it before loading a kernel. To verify that we can add a simple function:
```nasm
test_multiboot:
cmp eax, 0x36d76289
jne .no_multiboot
ret
.no_multiboot:
mov al, "0"
jmp error
```
We compare the value in `eax` with the magic value and jump to the label `no_multiboot` if they're not equal (`jne` “jump if not equal”). In `no_multiboot`, we jump to the error function with error code `0`.
[Multiboot specification]: http://nongnu.askapache.com/grub/phcoder/multiboot.pdf
### CPUID test
[CPUID] is a CPU instruction that can be used to get various information about the CPU. But not every processor supports it. CPUID detection is quite laborious, so we just copy a detection function from the [OSDev wiki][CPUID detection]:
[CPUID]: http://wiki.osdev.org/CPUID
[CPUID detection]: http://wiki.osdev.org/Setting_Up_Long_Mode#Detection_of_CPUID
```nasm
test_cpuid:
pushfd ; Store the FLAGS-register.
pop eax ; Restore the A-register.
mov ecx, eax ; Set the C-register to the A-register.
xor eax, 1 << 21 ; Flip the ID-bit, which is bit 21.
push eax ; Store the A-register.
popfd ; Restore the FLAGS-register.
pushfd ; Store the FLAGS-register.
pop eax ; Restore the A-register.
push ecx ; Store the C-register.
popfd ; Restore the FLAGS-register.
xor eax, ecx ; Do a XOR-operation on the A-register and the C-register.
jz .no_cpuid ; The zero flag is set, no CPUID.
ret ; CPUID is available for use.
.no_cpuid:
mov al, "1"
jmp error
```
### Long Mode test
Now we can use CPUID to detect whether long mode can be used. I use code from [OSDev][long mode detection] again:
[long mode detection]: http://wiki.osdev.org/Setting_Up_Long_Mode#x86_or_x86-64
```nasm
test_long_mode:
mov eax, 0x80000000 ; Set the A-register to 0x80000000.
cpuid ; CPU identification.
cmp eax, 0x80000001 ; Compare the A-register with 0x80000001.
jb .no_long_mode ; It is less, there is no long mode.
mov eax, 0x80000001 ; Set the A-register to 0x80000001.
cpuid ; CPU identification.
test edx, 1 << 29 ; Test if the LM-bit, which is bit 29, is set in the D-register.
jz .no_long_mode ; They aren't, there is no long mode.
ret
.no_long_mode:
mov al, "2"
jmp error
```
### Putting it together
We just call these test functions right after start:
```nasm
global _start
section .text
bits 32
_start:
mov esp, stack_top
call test_multiboot
call test_cpuid
call test_long_mode
; print `OK` to screen
...
```
When the CPU doesn't support a needed feature, we get an error message with an unique error code. Now we can start the real work.
## Paging
_Paging_ is a memory management scheme that separates virtual and physical memory. The address space is split into equal sized _pages_ and a _page table_ specifies which virtual page points to which physical page. If you never heard of paging, you might want to look at the paging introduction ([PDF][paging chapter]) of the [Three Easy Pieces] OS book.
[paging chapter]: http://pages.cs.wisc.edu/~remzi/OSTEP/vm-paging.pdf
[Three Easy Pieces]: http://pages.cs.wisc.edu/~remzi/OSTEP/
In long mode, x86 uses a page size of 4096 bytes and a 4 level page table that consists of:
- the Page-Map Level-4 Table (PML4),
- the Page-Directory Pointer Table (PDP),
- the Page-Directory Table (PD),
- and the Page Table (PT).
As I don't like these names, I will call them P4, P3, P2, and P1 from now on.
Each page table contains 512 entries and one entry is 8 bytes, so they fit exactly in one page (`512*8 = 4096`). To translate a virtual address to a physical address the CPU[^hardware_lookup] will do the following[^virtual_physical_translation_source]:
[^hardware_lookup]: In the x86 architecture, the page tables are _hardware walked_, so the CPU will look at the table on its own when it needs a translation. Other architectures, for example MIPS, just throw an exception and let the OS translate the virtual address.
[^virtual_physical_translation_source]: Image source: [Wikipedia](https://commons.wikimedia.org/wiki/File:X86_Paging_64bit.svg), with modified font size, page table naming, and removed sign extended bits. The modified file is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.
![translation of virtual to physical addresses in 64 bit mode]({{ site.url }}/images/X86_Paging_64bit.svg)
1. Get the address of the P4 table from the CR3 register
2. Use bits 39-47 (9 bits) as an index into P4 (`2^9 = 512 = number of entries`)
3. Use the following 9 bits as an index into P3
4. Use the following 9 bits as an index into P2
5. Use the following 9 bits as an index into P1
6. Use the last 12 bits as page offset (`2^12 = 4096 = page size`)
But what happens to bits 48-63 of the 64-bit virtual address? Well, they can't be used. The “64-bit” long mode is in fact just a 48-bit mode. The bits 48-63 must be copies of bit 47, so each valid virtual address is still unique. For more information see [Wikipedia][wikipedia_48bit_mode].
[wikipedia_48bit_mode]: https://en.wikipedia.org/wiki/X86-64#Virtual_address_space_details
An entry in the P4, P3, P2, and P1 tables consists of the page aligned 52-bit _physical_ address of the page/next page table and the following bits that can be OR-ed in:
Bit(s) | Name | Meaning
--------------------- | ------ | ----------------------------------
0 | present | the page is currently in memory
1 | writable | it's allowed to write to this page
2 | user accessible | if not set, only kernel mode code can access this page
3 | write through caching | writes go directly to memory
4 | disable cache | no cache is used for this page
5 | accessed | the CPU sets this bit when this page is used
6 | dirty | the CPU sets this bit when a write to this page occurs
7 | huge page/null | must be 0 in P1 and P4, creates a 1GiB page in P3, creates a 2MiB page in P2
8 | global | page isn't flushed from caches on address space switch (PGE bit of CR4 register must be set)
9-11 | available | can be used freely by the OS
52-62 | available | can be used freely by the OS
63 | no execute | forbid executing code on this page (the NXE bit in the EFER register must be set)
### Setup Identity Paging
When we switch to long mode, paging will be activated automatically. The CPU will then try to read the instruction at the following address, but this address is now a virtual address. So we need to do _identity mapping_, i.e. map a physical address to the same virtual address.
The `huge page` bit is now very useful to us. It creates a 2MiB (when used in P2) or even a 1GiB page (when used in P3). So we could map the first _gigabytes_ of the kernel with only one P4 and one P3 table by using 1GiB pages. Unfortunately 1GiB pages are relatively new feature, for example Intel introduced it 2010 in the [Westmere architecture]. Therefore we will use 2MiB pages instead to make our kernel compatible to older computers, too.
[Westmere architecture]: https://en.wikipedia.org/wiki/Westmere_(microarchitecture)#Technology
To identity map the first gigabyte of our kernel with 512 2MiB pages, we need one P4, one P3, and one P2 table. Of course we will replace them with finer-grained tables later. But now that we're stuck with assembly, we choose the easiest way.
We can add these two tables at the beginning[^page_table_alignment] of the `.bss` section:
[^page_table_alignment]: Page tables need to be page-aligned as the bits 0-11 are used for flags. By putting these tables at the beginning of `.bss`, the linker can just page align the whole section and we don't have unused padding bytes in between.
```nasm
...
section .bss
align 4096
p4_table:
resb 4096
p3_table:
resb 4096
p2_table:
resb 4096
stack_bottom:
resb 64
stack_top:
```
The `resb` command reserves the specified amount of bytes without initializing them, so the 8KiB don't need to be saved in the executable. The `align 4096` ensures that the page tables are page aligned.
When GRUB creates the `.bss` section in memory, it will initialize it to `0`. So the `p4_table` is already valid (it contains 512 non-present entries) but not very useful. To be able to map 2MiB pages, we need to link P4's first entry to the `p3_table` and P3's first entry to the the `p2_table`:
```nasm
setup_page_tables:
; map first P4 entry to P3 table
mov eax, p3_table
or eax, 0b11 ; present + writable
mov [p4_table], eax
; map first P3 entry to P2 table
mov eax, p2_table
or eax, 0b11 ; present + writable
mov [p3_table], eax
; TODO map each P2 entry to a huge 2MiB page
ret
```
We just set the present and writable bits (`0b11` is a binary number) in the aligned P3 table address and move it to the first 4 bytes of the P4 table. Then we do the same to link the first P3 entry to the `p2_table`.
Now we need to map P2's first entry to a huge page starting at 0, P2's second entry to a huge page starting at 2MiB, P3's third entry to a huge page starting at 4MiB, and so on. It's time for our first (and only) assembly loop:
```nasm
setup_page_tables:
...
; map each P2 entry to a huge 2MiB page
mov ecx, 0 ; counter variable
.map_p2_table:
; map ecx-th P2 entry to a huge page that starts at address 2MiB*ecx
mov eax, 0x200000 ; 2MiB
mul ecx ; start address of ecx-th page
or eax, 0b10000011 ; present + writable + huge
mov [p2_table + ecx * 8], eax ; map ecx-th entry
inc ecx ; increase counter
cmp ecx, 512 ; if counter == 512, the whole P2 table is mapped
jne .map_p2_table ; else map the next entry
ret
```
Maybe I first explain how an assembly loop works. We use the `ecx` register as a counter variable, just like `i` in a for loop. After mapping the `ecx-th` entry, we increase `ecx` by one and jump to `.map_p2_table` again if it's still smaller 512.
To map a P2 entry we first calculate the start address of its page in `eax`: The `ecx-th` entry needs to be mapped to `ecx * 2MiB`. We use the `mul` operation for that, which multiplies `eax` with the given register and stores the result in `eax`. Then we set the `present`, `writable`, and `huge page` bits and write it to the P2 entry. The address of the `ecx-th` entry in P2 is `p2_table + ecx * 8`, because each entry is 8 bytes large.
Now the first gigabyte (512 * 2MiB) of our kernel is identity mapped and thus accessible through the same physical and virtual addresses.
### Enable Paging
To enable paging and enter long mode, we need to do the following:
1. write the address of the P4 table to the CR3 register (the CPU will look there, see the [paging section](#paging))
2. long mode is an extension of [Physical Address Extension] \(PAE), so we need to enable PAE first
3. Set the long mode bit in the EFER register
4. Enable Paging
[Physical Address Extension]: https://en.wikipedia.org/wiki/Physical_Address_Extension
The assembly function looks like this (some boring bit-moving to various registers):
```nasm
enable_paging:
; load P4 to cr3 register (cpu uses this to access the P4 table)
mov eax, p4_table
mov cr3, eax
; enable PAE-flag in cr4 (Physical Address Extension)
mov eax, cr4
or eax, 1 << 5
mov cr4, eax
; set the long mode bit in the EFER MSR (model specific register)
mov ecx, 0xC0000080
rdmsr
or eax, 1 << 8
wrmsr
; enable paging in the cr0 register
mov eax, cr0
or eax, 1 << 31
or eax, 1 << 16
mov cr0, eax
ret
```
The `or eax, 1 << X` is a common pattern. It sets the bit `X` in the eax register (`<<` is a left shift). Through `rdmsr` and `wrmsr` it's possible to read/write to the so-called model specific registers at address `ecx` (in this case `ecx` points to the EFER register).
Finally we need to call our new functions in `start`:
```nasm
...
start:
mov esp, stack_top
call test_multiboot
call test_cpuid
call test_long_mode
call setup_page_tables ; new
call enable_paging ; new
; print `OK` to screen
mov dword [0xb8000], 0x2f4b2f4f
hlt
...
```
To test it we execute `make run`. If the green OK is still printed, we have successfully enabled paging!
## The Global Descriptor Table
After enabling Paging, the processor is in long mode. So we can use 64-bit instructions now, right? Wrong. The processor is still in some 32-bit compatibility submode. To actually execute 64-bit code, we need to setup a new Global Descriptor Table.
The Global Descriptor Table (GDT) was used for _Segmentation_ in old operating systems. I won't explain Segmentation but the [Three Easy Pieces] OS book has good introduction ([PDF][Segmentation chapter]) again.
[Segmentation chapter]: http://pages.cs.wisc.edu/~remzi/OSTEP/vm-segmentation.pdf
Today almost everyone uses Paging instead of Segmentation (and so do we). But on x86, a GDT is always required, even when you're not using Segmentation. GRUB has set up a valid 32-bit GDT for us but now we need to switch to a long mode GDT.
A GDT always starts with an 0-entry and contains a arbitrary number of segment entries afterwards. An entry has the following format:
Bit(s) | Name | Meaning
--------------------- | ------ | ----------------------------------
0-15 | limit 0-15 | the first 2 byte of the segment's limit
16-39 | base 0-23 | the first 3 byte of the segment's base address
40 | accessed | set by the CPU when the segment is accessed
41 | read/write | reads allowed for code segments / writes allowed for data segments
42 | direction/conforming | the segment grows down (i.e. base>limit) for data segments / the current privilege level can be higher than the specified level for code segments (else it must match exactly)
43 | executable | if set, it's a code segment, else it's a data segment
44 | descriptor type | should be 1 for code and data segments
45-46 | privilege | the [ring level]: 0 for kernel, 3 for user
47 | present | must be 1 for valid selectors
48-51 | limit 16-19 | bits 16 to 19 of the segment's limit
52 | available | freely available to the OS
53 | 64-bit | should be set for 64-bit code segments
54 | 32-bit | should be set for 32-bit segments
55 | granularity | if it's set, the limit is the number of pages, else it's a byte number
56-63 | base 24-31 | the last byte of the base address
[ring level]: http://wiki.osdev.org/Security#Rings
We need one code and one data segment. They have the following bits set: _descriptor type_, _present_, and _read/write_. The code segment has additionally the _executable_ and the _64-bit_ flag. In Long mode, it's not possible to actually use the GDT entries for Segmentation and thus the base and limit fields must be 0. Translated to assembly the long mode GDT looks like this:
```nasm
section .rodata
gdt64:
dq 0 ; zero entry
dq (1<<44) | (1<<47) | (1<<41) | (1<<43) | (1<<53) ; code segment
dq (1<<44) | (1<<47) | (1<<41) ; data segment
```
We chose the `.rodata` section here because it's initialized read-only data. The `dq` command stands for `define quad` and outputs a 64-bit constant (similar to `dw` and `dd`). And the `(1<<44)` is a [bit shift] that sets bit 44.
[bit shift]: http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/BitOp/bitshift.html
### Loading the GDT
To load our new 64-bit GDT, we have to tell the CPU its address and length. We do this by passing the memory location of a special pointer structure to the `lgdt` (load GDT) instruction. The pointer structure looks like this:
```nasm
gdt64:
...
dq (1<<44) | (1<<47) | (1<<41) ; data segment
.pointer:
dw $ - gdt64 - 1
dq gdt64
```
The first 2 bytes specify the (GDT length - 1). The `$` is a special symbol that is replaced with the current address (it's equal to `.pointer` in our case). The following 8 bytes specify the GDT address. Labels that start with a point (such as `.pointer`) are sub-labels of the last label without point. To access them, they must be prefixed with the parent label (e.g., `gdt64.pointer`).
Now we can load the GDT in `start`:
```nasm
start:
...
call enable_paging
; load the 64-bit GDT
lgdt [gdt64.pointer]
; print `OK` to screen
...
```
When you still see the green `OK`, everything went fine and the new GDT is loaded. But we still can't execute 64-bit code: The selector registers such as the code selector `cs` and the data selector `ds` still have the values from the old GDT. To update them, we need to load them with the GDT offset (in bytes) of the desired segment. In our case the code segment starts at byte 8 of the GDT and the data segment at byte 16. Let's try it:
```nasm
...
lgdt [gdt64.pointer]
; update selectors
mov ax, 16
mov ss, ax ; stack selector
mov ds, ax ; data selector
mov es, ax ; extra selector
; print `OK` to screen
...
```
It should still work. The segment selectors are only 16-bits large, so we use the 16-bit `ax` subregister. Notice that we didn't update the code selector `cs`. We will do that later. First we should replace this hardcoded `16` by adding some labels to our GDT:
```nasm
section .rodata
gdt64:
dq 0 ; zero entry
.code: equ $ - gdt64 ; new
dq (1<<44) | (1<<47) | (1<<41) | (1<<43) | (1<<53) ; code segment
.data: equ $ - gdt64 ; new
dq (1<<44) | (1<<47) | (1<<41) ; data segment
.pointer:
...
```
We can't just use normal labels here, as we need the table offset. We calculate this offset using the current address `$` and set the labels to this value using [equ]. Now we can use `gdt64.data` instead of 16 and `gdt64.code` instead of 8 and these labels will still work if we modify the GDT.
[equ]: http://www.nasm.us/doc/nasmdoc3.html#section-3.2.4
Now there is just one last step left to enter the true 64-bit mode: We need to load `cs` with `gdt64.code`. But we can't do it through `mov`. The only way to reload the code selector is a _far jump_ or a _far return_. These instructions work like a normal jump/return but change the code selector. We use a far jump to a long mode label:
```nasm
global start
extern long_mode_start
...
start:
...
lgdt [gdt64.pointer]
; update selectors
mov ax, gdt64.data
mov ss, ax
mov ds, ax
mov es, ax
jmp gdt64.code:long_mode_start
...
```
The actual `long_mode_start` label is defined as `extern`, so it's part of another file. The `jmp gdt64.code:long_mode_start` is the mentioned far jump.
I put the 64-bit code into a new file to separate it from the 32-bit code, thereby we can't call the (now invalid) 32-bit code accidentally. The new file (I named it `long_mode_init.asm`) looks like this:
```nasm
global long_mode_start
section .text
bits 64
long_mode_start:
; print `OKAY` to screen
mov rax, 0x2f592f412f4b2f4f
mov qword [0xb8000], rax
hlt
```
You should see a green `OKAY` on the screen. Some notes on this last step:
- As the CPU expects 64-bit instructions now, we use `bits 64`
- We can now use the extended registers. Instead of the 32-bit `eax`, `ebx`, etc. we now have the 64-bit `rax`, `rbx`, …
- and we can write these 64-bit registers directly to memory using `mov qword` (quad word)
_Congratulations_! You have successfully wrestled through this CPU configuration and compatibility mode mess :).
## What's next?
It's time to finally leave assembly behind[^leave_assembly_behind] and switch to some higher level language. We won't use C or C++ (not even a single line). Instead we will use the relatively new [Rust] language. It's a systems language without garbage collections but with guaranteed memory safety. Through a real type system and many abstractions it feels like a high-level language but can still be low-level enough for OS development. The [next post] describes the Rust setup.
[^leave_assembly_behind]: Actually we will still need some assembly in the future, but I'll try to minimize it.
[Rust]: https://www.rust-lang.org/
[next post]: {{ site.url }}{{ page.next.url }}

View File

@@ -0,0 +1,380 @@
---
layout: post
title: 'Setup Rust'
---
In the previous posts we created a [minimal Multiboot kernel][multiboot post] and [switched to Long Mode][long mode post]. Now we can finally switch to [Rust] code. Rust is a high-level language without runtime. It allows us to not link the standard library and write bare metal code. Unfortunately the setup is not quite hassle-free yet.
This blog post tries to setup Rust step-by-step and point out the different problems. If you have any questions, problems, or suggestions please [file an issue] or create a comment at the bottom. The code from this post is in a [Github repository], too.
[multiboot post]: {{ site.url }}{{ page.previous.previous.url }}
[long mode post]: {{ site.url }}{{ page.previous.url }}
[Rust]: https://www.rust-lang.org/
[file an issue]: https://github.com/phil-opp/blog_os/issues
[Github repository]: https://github.com/phil-opp/blog_os/tree/setup_rust
## Installing Rust
We need a nightly compiler, as we will use many unstable features. To manage Rust installations I highly recommend brson's [multirust]. It allows you to install nightly, beta, and stable compilers side-by-side and makes it easy to update them. After installing it you can just run `multirust update nightly` to install or update to the latest Rust nightly.
[multirust]: https://github.com/brson/multirust
## Creating a Cargo project
[Cargo] is Rust excellent package manager. Normally you would call `cargo new` when you want to create a new project folder. We can't use it because our folder already exists, so we need to do it manually. Fortunately we only need to add a cargo configuration file named `Cargo.toml`:
[Cargo]: http://doc.crates.io/guide.html
```toml
[package]
name = "blog_os"
version = "0.1.0"
authors = ["Philipp Oppermann <dev@phil-opp.com>"]
[lib]
crate-type = ["staticlib"]
```
The `package` section contains required project metadata such as the [semantic crate version]. The `lib` section specifies that we want to build a static library, i.e. a library that contains all of its dependencies. This is required to link the Rust project with our kernel.
[semantic crate version]: http://doc.crates.io/manifest.html#the-package-section
Now we place our root source file in `src/lib.rs`:
```rust
#![feature(no_std, lang_items)]
#![no_std]
#[no_mangle]
pub extern fn rust_main() {}
#[lang = "eh_personality"] extern fn eh_personality() {}
#[lang = "panic_fmt"] extern fn panic_fmt() -> ! {loop{}}
```
Let's break it down:
- `#!` defines an [attribute] of the current module. Since we are at the root module, they apply to the crate itself.
- The `feature` attribute is used to allow the specified _feature-gated_ attributes in this crate. You can't do that in a stable/beta compiler, so this is one reason we need a Rust nighly.
- The `no_std` attribute prevents the automatic linking of the standard library. We can't use `std` because it relies on operating system features like files, system calls, and various device drivers. Remember that currently the only “feature” of our OS is printing `OKAY` :).
- A `#` without a `!` afterwards defines an attribute for the _following_ item (a function in our case).
- The `no_mangle` attribute disables the automatic [name mangling] that Rust uses to get unique function names. We want to do a `call rust_main` from our assembly code, so this function name must stay as it is.
- We mark our main function as `extern` to make it compatible to the standard C [calling convention].
- The `lang` attribute defines a Rust [language item].
- The `eh_personality` function is used for Rust's [unwinding] on `panic!`. We can leave it empty since we don't have any unwinding support in our OS yet.
- The `panic_fmt` function is the entry point on panic. Right now we can't do anything useful, so we just make sure that it doesn't return (required by the `!` return type).
[attribute]: https://doc.rust-lang.org/book/attributes.html
[name mangling]: https://en.wikipedia.org/wiki/Name_mangling
[calling convention]: https://en.wikipedia.org/wiki/Calling_convention
[language item]: https://doc.rust-lang.org/book/lang-items.html
[unwinding]: https://doc.rust-lang.org/std/rt/unwind/
## Building Rust
We can now build it using `cargo build`. To make sure, we are building it for the x86_64 architecture, we can pass an explicit target:
```bash
cargo build --target=x86_64-unknown-linux-gnu
```
It creates a static library at `target/x86_64-unknown-linux-gnu/debug/libblog_os.a`, which can be linked with our assembly kernel. If you're getting an error about a missing `core` crate, [look here][cross compile libcore].
[cross compile libcore]: {{ site.url }}/cross-compile-libcore.html
To build and link the rust library on `make`, we extend our `Makefile`([full file][github makefile]):
```make
# ...
target ?= $(arch)-unknown-linux-gnu
rust_os := target/$(target)/debug/libblog_os.a
# ...
$(kernel): cargo $(rust_os) $(assembly_object_files) $(linker_script)
@ld -n -T $(linker_script) -o $(kernel) $(assembly_object_files) $(rust_os)
cargo:
@cargo build --target $(target)
```
We added a new `cargo` target that just executes `cargo build` and modified the `$(kernel)` target to link the created static lib .
But now `cargo build` is executed on every `make`, even if no source file was changed. And the ISO is recreated on every `make iso`/`make run`, too. We could try to avoid this by adding dependencies on all rust source and cargo configuration files to the `cargo` target, but the ISO creation takes only half a second on my machine and most of the time we will have changed a Rust file when we run `make`. So we keep it simple for now and let cargo do the bookkeeping of changed files (it does it anyway).
[github makefile]: https://github.com/phil-opp/blog_os/blob/setup_rust/Makefile
## Calling Rust
Now we can call the main method in `long_mode_start`:
```nasm
bits 64
long_mode_start:
; call the rust main
extern rust_main ; new
call rust_main ; new
; print `OKAY` to screen
mov rax, 0x2f592f412f4b2f4f
mov qword [0xb8000], rax
hlt
```
By defining `rust_main` as `extern` we tell nasm that the function is defined in another file. As the linker takes care of linking them together, we'll get a linker error if we have a typo in the name or forget to mark the rust function as `pub extern`.
If we've done everything right, we should still see the green `OKAY` when executing `make run`. That means that we successfully called the Rust function and returned back to assembly.
## Fixing Linker Errors
Now we can try some Rust code:
```rust
pub extern fn rust_main() {
let x = ["Hello", " ", "World", "!"];
}
```
When we test it using `make run`, it fails with `undefined reference to 'memcpy'`. The `memcpy` function is one of the basic functions of the C library (`libc`). Usually the `libc` crate is linked to every Rust program together with the standard library, but we opted out through `#![no_std]`. We could try to fix this by adding the [libc crate] as `extern crate`. But `libc` is just a wrapper for the system `libc`, for example `glibc` on Linux, so this won't work for us. Instead we need to recreate the basic `libc` functions such as `memcpy`, `memmove`, `memset`, and `memcmp` in Rust.
[libc crate]: https://doc.rust-lang.org/nightly/libc/index.html
### rlibc
Fortunately there already is a crate for that: [rlibc]. When we look at its [source code][rlibc source] we see that it contains no magic, just some [raw pointer] operations in a while loop. To add `rlibc` as a dependency we just need to add two lines to the `Cargo.toml`:
```toml
...
[dependencies]
rlibc = "*"
```
and an `extern crate` definition in our `src/lib.rs`:
```rust
...
extern crate rlibc;
#[no_mangle]
pub extern fn rust_main() {
...
```
Now `make run` doesn't complain about `memcpy` anymore. Instead it will show a pile of new errors:
```
target/debug/libblog_os.a(core-35017696.0.o): In function `ops::f32.Rem::rem::hfcbbcbe5711a6e6emxm':
core.0.rs:(.text._ZN3ops7f32.Rem3rem20hfcbbcbe5711a6e6emxmE+0x1): undefined reference to `fmodf'
target/debug/libblog_os.a(core-35017696.0.o): In function `ops::f64.Rem::rem::hbf225030671c7a35Txm':
core.0.rs:(.text._ZN3ops7f64.Rem3rem20hbf225030671c7a35TxmE+0x1): undefined reference to `fmod'
...
```
[rlibc]: https://crates.io/crates/rlibc
[rlibc source]: https://github.com/rust-lang/rlibc/blob/master/src/lib.rs
[raw pointer]: https://doc.rust-lang.org/book/raw-pointers.html
[crates.io]: https://crates.io
### --gc-sections
The new errors are linker errors about missing `fmod` and `fmodf` functions. These functions are used for the modulo operation (`%`) on floating point numbers in [libcore]. The core library is added implicitly when using `#![no_std]` and provides basic standard library features like `Option` or `Iterator`. According to the documentation it is “dependency-free”. But it actually has some dependencies, for example on `fmod` and `fmodf`.
[libcore]: https://doc.rust-lang.org/core/
So how do we fix this problem? We don't use any floating point operations, so we could just provide our own implementations of `fmod` and `fmodf` that just do a `loop{}`. But there's a better way that doesn't fail silently when we use float modulo some day: We tell the linker to remove unused sections. That's generally a good idea as it reduces kernel size. And we don't have any references to `fmod` and `fmodf` anymore until we use floating point modulo. The magic linker flag is `--gc-sections`, which stands for “garbage collect sections”. Let's add it to the `$(kernel)` target in our `Makefile`:
```make
$(kernel): cargo $(rust_os) $(assembly_object_files) $(linker_script)
@ld -n --gc-sections -T $(linker_script) -o $(kernel) $(assembly_object_files) $(rust_os)
```
Now we can do a `make run` again and… it doesn't boot anymore:
```
GRUB error: no multiboot header found.
```
What happened? Well, the linker removed unused sections. And since we don't use the Multiboot section anywhere, `ld` removes it, too. So we need to tell the linker explicitely that it should keep this section. The `KEEP` command does exactly that, so we add it to the linker script (`linker.ld`):
```
.boot :
{
/* ensure that the multiboot header is at the beginning */
KEEP(*(.multiboot_header))
}
```
Now everything should work again (the green `OKAY`). But there is another linking issue, which is triggered by some other example code.
### no-landing-pads
The following snippet still fails:
```rust
...
let test = (0..3).flat_map(|x| 0..x).zip(0..);
```
The error is a linker error again (hence the ugly error message):
```
target/debug/libblog_os.a(blog_os.0.o): In function `blog_os::iter::Iterator::zip<core::iter::FlatMap<core::ops::Range<i32>, core::ops::Range<i32>, closure>,core::ops::RangeFrom<i32>>':
/home/.../src/libcore/iter.rs:654: undefined reference to `_Unwind_Resume'
```
So the linker can't find a function named `_Unwind_Resume` that is referenced in `iter.rs:654` in libcore. This reference is not really there at [line 654 of libcore's `iter.rs`][iter.rs:654]. Instead, it is a compiler inserted _landing pad_, which is used for exception handling.
The easiest way of fixing this problem is to disable the landing pad creation since we don't supports panics anyway right now. We can do this by passing a `-Z no-landing-pads` flag to `rustc` (the actual Rust compiler below cargo). To do this we replace the `cargo build` command in our Makefile with the `cargo rustc` command, which does the same but allows passing flags to `rustc`:
```make
cargo:
@cargo rustc --target $(target) -- -Z no-landing-pads
```
Now we fixed all linking issues.
## The final problem
Unfortunately there is one last problem left, that gets triggered by the following code:
```rust
let mut a = 42;
a += 1;
```
When we add that code to `rust_main` and test it using `make run`, the OS will constantly reboot itself. Let's try to debug it.
[iter.rs:654]: https://doc.rust-lang.org/nightly/src/core/iter.rs.html#654
### Debugging
Such a boot loop is most likely caused by some [CPU exception][exception table]. When these exceptions aren't handled, a [Triple Fault] occurs and the processor resets itself. We can look at generated CPU interrupts/exceptions using QEMU:
[exception table]: http://wiki.osdev.org/Exceptions
[Triple Fault]: http://wiki.osdev.org/Triple_Fault
```
> qemu-system-x86_64 -d int -no-reboot -hda build/os-x86_64.iso
SMM: enter
...
SMM: after RSM
...
check_exception old: 0xffffffff new 0x6
0: v=06 e=0000 i=0 cpl=0 IP=0008:0000000000100200 pc=0000000000100200
SP=0010:0000000000102fd0 env->regs[R_EAX]=0000000080010010
...
check_exception old: 0xffffffff new 0xd
1: v=0d e=0062 i=0 cpl=0 IP=0008:0000000000100200 pc=0000000000100200
SP=0010:0000000000102fd0 env->regs[R_EAX]=0000000080010010
...
check_exception old: 0xd new 0xd
2: v=08 e=0000 i=0 cpl=0 IP=0008:0000000000100200 pc=0000000000100200
SP=0010:0000000000102fd0 env->regs[R_EAX]=0000000080010010
...
check_exception old: 0x8 new 0xd
```
Let me first explain the QEMU arguments: The `-d int` logs CPU interrupts to the console and the `-no-reboot` flag closes QEMU instead of constant rebooting. But what does the cryptical output mean? I already omitted most of it as we don't need it here. Let's break down the rest:
- The first two blocks, `SMM: enter` and `SMM: after RSM` are created before our OS boots, so we just ignore them.
- The next block, `check_exception old: 0xffffffff new 0x6` is the interesting one. It says: “a new CPU exception with number `0x6` occurred“.
- The last blocks indicate further exceptions. They were thrown because we didn't handle the `0x6` exception, so we're going to ignore them, too.
So let's look at the first exception: `old:0xffffffff` means that the CPU wasn't handling an interrupt when the exception occurred. The new exception has number `0x6`. By looking at an [exception table] we learn that `0x6` indicates a [Invalid Opcode] fault. So the lastly executed instruction was invalid. The register dump tells us that the current instruction was `0x100200` (through `IP` (instruction pointer) or `pc` (program counter)). Therefore the instruction at `0x100200` seems to be invalid. We can look at it using `objdump`:
[Invalid Opcode]: http://wiki.osdev.org/Exceptions#Invalid_Opcode
```
> objdump -D build/kernel-x86_64.bin | grep "100200:"
100200: 0f 28 05 49 01 00 00 movaps 0x149(%rip),%xmm0 ...
```
Through `objdump -D` we disassemble our whole kernel and `grep` picks the relevant line. The instruction at `0x100200` seems to be a valid `movaps` instruction. It's a [SSE] instruction that moves 128 bit between memory and SSE-registers (e.g. `xmm0`). But why the `Invalid Opcode` exception? The answer is hidden behind the [movaps documentation][movaps]: The section _Protected Mode Exceptions_ lists the conditions for the various exceptions. The short code of the `Invalid Opcode` is `#UD`, so the exception occurs
> For an unmasked Streaming SIMD Extensions 2 instructions numeric exception (CR4.OSXMMEXCPT =0). If EM in CR0 is set. If OSFXSR in CR4 is 0. If CPUID feature flag SSE2 is 0.
[SSE]: https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions
[movaps]: http://www.c3se.chalmers.se/Common/VTUNE-9.1/doc/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc181.htm
The rough translation of this cryptic definition is: _If SSE isn't enabled_. So apparently Rust uses SSE instructions by default and we didn't enable SSE before. So the fix for this bug is enabling SSE.
### Enabling SSE
To enable SSE, assembly code is needed again. We want to add a function that tests if SSE is available and enables it then. Else we want to print an error message. But we can't use our existing `error` procedure because it uses (now invalid) 32-bit instructions. So we need a new one (in `long_mode_init.asm`):
```nasm
; Prints `ERROR: ` and the given error code to screen and hangs.
; parameter: error code (in ascii) in al
error:
mov rbx, 0x4f4f4f524f524f45
mov [0xb8000], rbx
mov rbx, 0x4f204f204f3a4f52
mov [0xb8008], rbx
mov byte [0xb800e], al
hlt
jmp error
```
It's the nearly the same as the 32-bit procedure in the [previous post][32-bit error function] (instead of `ERR:` we print `ERROR:` here).
Now we can add a function that checks for SSE and enables it:
```nasm
; Check for SSE and enable it. If it's not supported throw error "a".
setup_SSE:
; check for SSE
mov rax, 0x1
cpuid
test edx, 1<<25
jz .no_SSE
; enable SSE
mov rax, cr0
and ax, 0xFFFB ; clear coprocessor emulation CR0.EM
or ax, 0x2 ; set coprocessor monitoring CR0.MP
mov cr0, rax
mov rax, cr4
or ax, 3 << 9 ; set CR4.OSFXSR and CR4.OSXMMEXCPT at the same time
mov cr4, rax
ret
.no_SSE:
mov al, "a"
jmp error
```
The code is from the great [OSDev Wiki][osdev sse] again. Notice that it sets/unsets exactly the bits that can cause the `Invalid Opcode` exception.
When we insert a `call setup_SSE` right before calling `rust_main`, our Rust code will finally work.
[32-bit error function]: {{ site.url }}{{ page.previous.url }}#some-tests
[osdev sse]: http://wiki.osdev.org/SSE#Checking_for_SSE
### “OS returned!”
Now that we're editing assembly anyway, we should change the `OKAY` message to something more meaningful. My suggestion is a red `OS returned!`:
```nasm
...
call rust_main
.os_returned:
; rust main returned, print `OS returned!`
mov rax, 0x4f724f204f534f4f
mov [0xb8000], rax
mov rax, 0x4f724f754f744f65
mov [0xb8008], rax
mov rax, 0x4f214f644f654f6e
mov [0xb8010], rax
hlt
```
Ok, that's enough assembly for now. Let's switch back to Rust.
## Hello World!
Finally, it's time for a `Hello World!` from Rust:
```rust
pub extern fn rust_main() {
// ATTENTION: we have a very small stack and no guard page
let hello = b"Hello World!";
let color_byte = 0x1f; // white foreground, blue background
let mut hello_colored = [color_byte; 24];
for (i, char_byte) in hello.into_iter().enumerate() {
hello_colored[i*2] = *char_byte;
}
// write `Hello World!` to the center of the VGA text buffer
let buffer_ptr = (0xb8000 + 1988) as *mut _;
unsafe { *buffer_ptr = hello_colored };
loop{}
}
```
Some notes:
- The `b` prefix creates a [byte string], which is just an array of `u8`
- [enumerate] is an `Iterator` method that adds the current index `i` to elements
- `buffer_ptr` is a [raw pointer] that points to the center of the VGA text buffer
- Rust doesn't know the VGA buffer and thus can't guarantee that writing to the `buffer_ptr` is safe (it could point to important data). So we need to tell Rust that we know what we are doing by using an [unsafe block].
[byte string]: https://doc.rust-lang.org/reference.html#characters-and-strings
[enumerate]: https://doc.rust-lang.org/nightly/core/iter/trait.Iterator.html#method.enumerate
[unsafe block]: https://doc.rust-lang.org/book/unsafe.html
### Stack Overflows
Since we still use the small 64 byte [stack from the last post], we must be careful not to [overflow] it. Normally, Rust tries to avoid stack overflows through _guard pages_: The page below the stack isn't mapped and such a stack overflow triggers a page fault (instead of silently overwriting random memory). But we can't unmap the page below our stack right now since we currently use only a single big page. Fortunately the stack is located just above the page tables. So some important page table entry would probably get overwritten on stack overflow and then a page fault occurs, too.
[stack from the last post]: {{ site.url }}{{ page.previous.url }}#creating-a-stack
[overflow]: https://en.wikipedia.org/wiki/Stack_overflow
## What's next?
Until now we write magic bits to some memory location when we want to print something to screen. In the [next post] we create a abstraction for the VGA text buffer that allows us to print strings in different colors and provides a simple interface.
[next post]: {{ site.url }}{{ page.next.url }}

View File

@@ -0,0 +1,435 @@
---
layout: post
title: 'Printing to Screen'
---
In the [previous post] we switched from assembly to [Rust], a systems programming language that provides great safety. But so far we are using unsafe features like [raw pointers] whenever we want to print to screen. In this post we will create a Rust module that provides a safe and easy-to-use interface for the VGA text buffer. It will support Rust's [formatting macros], too.
[previous post]: {{ site.url }}{{ page.previous.url }}
[Rust]: https://www.rust-lang.org/
[raw pointers]: https://doc.rust-lang.org/book/raw-pointers.html
[formatting macros]: https://doc.rust-lang.org/std/fmt/#related-macros
This post uses recent unstable features, so you need an up-to-date nighly compiler. If you have any questions, problems, or suggestions please [file an issue] or create a comment at the bottom. The code from this post is also available on [Github][code repository].
[file an issue]: https://github.com/phil-opp/blog_os/issues
[code repository]: https://github.com/phil-opp/blog_os/tree/printing_to_screen/src
## The VGA Text Buffer
The text buffer starts at physical address `0xb8000` and contains the characters displayed on screen. It has 25 rows and 80 columns. Each screen character has the following format:
Bit(s) | Value
------ | ----------------
0-7 | ASCII code point
8-11 | Foreground color
12-14 | Background color
15 | Blink
The following colors are available:
Number | Color | Number + Bright Bit | Bright Color
------ | ---------- | ------------------- | -------------
0x0 | Black | 0x8 | Dark Gray
0x1 | Blue | 0x9 | Light Blue
0x2 | Green | 0xa | Light Green
0x3 | Cyan | 0xb | Light Cyan
0x4 | Red | 0xc | Light Red
0x5 | Magenta | 0xd | Pink
0x6 | Brown | 0xe | Yellow
0x7 | Light Gray | 0xf | White
Bit 4 is the _bright bit_, which turns for example blue into light blue. It is unavailable in background color as the bit is used to enable blinking. However, it's possible to disable blinking through a [BIOS function][disable blinking]. Then the full 16 colors can be used as background.
[disable blinking]: http://www.ctyme.com/intr/rb-0117.htm
## A basic Rust Module
Now that we know how the VGA buffer works, we can create a Rust module to handle printing. To create a new module named `vga_buffer`, we just need to create a file named `src/vga_buffer.rs` and add a `mod vga_buffer` line to `src/lib.rs`.
### Colors
First, we represent the different colors using an enum:
```rust
#[repr(u8)]
pub enum Color {
Black = 0,
Blue = 1,
Green = 2,
Cyan = 3,
Red = 4,
Magenta = 5,
Brown = 6,
LightGray = 7,
DarkGray = 8,
LightBlue = 9,
LightGreen = 10,
LightCyan = 11,
LightRed = 12,
Pink = 13,
Yellow = 14,
White = 15,
}
```
We use a [C-like enum] here to explicitly specify the number for each color. Because of the `repr(u8)` attribute each enum variant is stored as an `u8`. Actually 4 bits would be sufficient, but Rust doesn't have an `u4` type.
[C-like enum]: http://rustbyexample.com/custom_types/enum/c_like.html
To represent a full color code that specifies foreground and background color, we create a [newtype] on top of `u8`:
[newtype]: https://aturon.github.io/features/types/newtype.html
```rust
struct ColorCode(u8);
impl ColorCode {
const fn new(foreground: Color, background: Color) -> ColorCode {
ColorCode((background as u8) << 4 | (foreground as u8))
}
}
```
The `ColorCode` contains the full color byte, containing foreground and background color. Blinking is enabled implicitly by using a bright background color (soon we will disable blinking anyway). The `new` function is a [const function] to allow it in static initializers. As `const` functions are unstable we need to add the `const_fn` feature in `src/lib.rs`.
[const function]: https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md
### The Text Buffer
Now we can add structures to represent a screen character and the text buffer:
```rust
#[repr(C)]
struct ScreenChar {
ascii_character: u8,
color_code: ColorCode,
}
const BUFFER_HEIGHT: usize = 25;
const BUFFER_WIDTH: usize = 80;
struct Buffer {
chars: [[ScreenChar; BUFFER_WIDTH]; BUFFER_HEIGHT],
}
```
Since the field ordering in default structs is undefined in Rust, we need the [repr(C)] attribute. It guarantees that the struct's fields are laid out exactly like in a C struct and thus guarantees the correct field ordering.
[repr(C)]: https://doc.rust-lang.org/nightly/nomicon/other-reprs.html#reprc
To actually write to screen, we now create a writer type:
```rust
pub struct Writer {
column_position: usize,
color_code: ColorCode,
buffer: Unique<Buffer>,
}
```
The writer will always write to the last line and shift lines up when a line is full (or on `\n`). The `column_position` field keeps track of the current position in the last row. The current foreground and background colors are specified by `color_code` and a pointer to the VGA buffer is stored in `buffer`. To make it possible to create a `static` Writer later, the `buffer` field stores an `Unique<Buffer>` instead of a plain `*mut Buffer`. [Unique] is a wrapper that implements Send/Sync and is thus usable as a `static`. Since it's unstable, you may need to add the `unique` feature to `lib.rs`.
[Unique]: https://doc.rust-lang.org/nightly/core/ptr/struct.Unique.html
## Printing to Screen
Now we can use the `Writer` to modify the buffer's characters. First we create a method to write a single ASCII byte (it doesn't compile yet):
```rust
impl Writer {
pub fn write_byte(&mut self, byte: u8) {
match byte {
b'\n' => self.new_line(),
byte => {
if self.column_position >= BUFFER_WIDTH {
self.new_line();
}
let row = BUFFER_HEIGHT - 1;
let col = self.column_position;
self.buffer().chars[row][col] = ScreenChar {
ascii_character: byte,
color_code: self.color_code,
};
self.column_position += 1;
}
}
}
fn buffer(&mut self) -> &mut Buffer {
unsafe{ self.buffer.get_mut() }
}
fn new_line(&mut self) {/* TODO */}
}
```
If the byte is the [newline] byte `\n`, the writer does not print anything. Instead it calls a `new_line` method, which we'll implement later. Other bytes get printed to the screen in the second match case.
[newline]: https://en.wikipedia.org/wiki/Newline
When printing a byte, the writer checks if the current line is full. In that case, a `new_line` call is required before to wrap the line. Then it writes a new `ScreenChar` to the buffer at the current position. Finally, the current column position is advanced.
The `buffer()` auxiliary method converts the raw pointer in the `buffer` field into a safe mutable buffer reference. The unsafe block is needed because the [get_mut()] method of `Unique` is unsafe. But our `buffer()` method itself isn't marked as unsafe, so it must not introduce any unsafety (e.g. cause segfaults). To guarantee that, it's very important that the `buffer` field always points to a valid `Buffer`. It's like a contract that we must stand to every time we create a `Writer`. To ensure that it's not possible to create an invalid `Writer` from outside of the module, the struct must have at least one private field and public creation functions are not allowed either.
[get_mut()]: https://doc.rust-lang.org/nightly/core/ptr/struct.Unique.html#method.get_mut
### Cannot Move out of Borrowed Content
When we try to compile it, we get the following error:
```
error: cannot move out of borrowed content [E0507]
color_code: self.color_code,
^~~~
```
The reason it that Rust _moves_ values by default instead of copying them like other languages. And we cannot move `color_code` out of `self` because we only borrowed `self`. For more information check out the [ownership section] in the Rust book. To fix it, we can implement the [Copy trait] for the `ColorCode` type by adding `#[derive(Clone, Copy)]` to its struct.
[ownership section]: https://doc.rust-lang.org/book/ownership.html
[Copy trait]: https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html
### Try it out!
To write some characters to the screen, you can create a temporary function:
```rust
pub fn print_something() {
let mut writer = Writer {
column_position: 0,
color_code: ColorCode::new(Color::LightGreen, Color::Black),
buffer: Unique::new(0xb8000 as *mut _),
}
writer.write_byte(b'H');
}
```
It just creates a new Writer that points to the VGA buffer at `0xb8000`. Then it writes the byte `b'H'` to it. The `b` prefix creates a [byte character], which represents an ASCII code point. When we call `vga_buffer::print_something` in main, a `H` should be printed in the _lower_ left corner of the screen in light green.
[byte character]: https://doc.rust-lang.org/reference.html#characters-and-strings
### Printing Strings
To print whole strings, we can convert them to bytes and print them one-by-one:
```rust
pub fn write_str(&mut self, s: &str) {
for byte in s.bytes() {
self.write_byte(byte)
}
}
```
You can try it yourself in the `print_something` function. Note that you need to add the `core_str_ext` feature, since `core` is [still unstable][core tracking issue].
When you print strings with some special characters like `ä` or `λ`, you'll notice that they cause weird symbols on screen. That's because they are represented by multiple bytes in [UTF-8]. By converting them to bytes, we of course get strange results. But since the VGA buffer doesn't support UTF-8, it's not possible to display these characters anyway. To ensure that a string contains only ASCII characters, you can prefix a `b` to create a [Byte String].
[core tracking issue]: https://github.com/rust-lang/rust/issues/27701
[UTF-8]: http://www.fileformat.info/info/unicode/utf8.htm
[Byte String]: https://doc.rust-lang.org/reference.html#characters-and-strings
### Support Formatting Macros
It would be nice to support Rust's formatting macros, too. That way, we can easily print different types like integers or floats. To support them, we need to implement the [core::fmt::Write] trait. The only required method of this trait is `write_str` that looks quite similar to our `write_str` method. To implement the trait, we just need to move it into an `impl ::core::fmt::Write for Writer` block and add a return type:
```rust
impl ::core::fmt::Write for Writer {
fn write_str(&mut self, s: &str) -> ::core::fmt::Result {
for byte in s.bytes() {
self.write_byte(byte)
}
Ok(())
}
}
```
The `Ok(())` is just a `Ok` Result containing the `()` type. We can drop the `pub` because trait methods are always public.
Now we can use Rust's built-in `write!`/`writeln!` formatting macros:
```rust
...
let mut writer = Writer::new(Color::Blue, Color::LightGreen);
writer.write_byte(b'H');
writer.write_str("ello! ");
write!(writer, "The numbers are {} and {}", 42, 1.0/3.0);
```
Now you should see a `Hello! The numbers are 42 and 0.3333333333333333` in strange colors at the bottom of the screen.
[core::fmt::Write]: https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html
### Newlines
Right now, we just ignore newlines and characters that don't fit into the line anymore. Instead we want to move every character one line up (the top line gets deleted) and start at the beginning of the last line again. To do this, we add an implementation for the `new_line` method of `Writer`:
```rust
fn new_line(&mut self) {
for row in 0..(BUFFER_HEIGHT-1) {
let buffer = self.buffer();
buffer.chars[row] = buffer.chars[row + 1]
}
self.clear_row(BUFFER_HEIGHT-1);
self.column_position = 0;
}
fn clear_row(&mut self, row: usize) {/* TODO */}
```
We just move each line to the line above. Notice that the range notation (`..`) is exclusive the upper bound. But when we try to compile it, we get an borrow checker error again:
```
error: cannot move out of indexed content [E0507]
buffer.chars[row] = buffer.chars[row + 1]
^~~~~~~~~~~~~~~~~~~~~
```
It's because of Rust's move semantics again: We try to move out the `ScreenChar` at `row+1`. If Rust would allow that, the array would become invalid as it would contain some valid and some moved out values. Fortunately, the `ScreenChar` type meets all criteria for the [Copy trait], so we can fix the problem by adding `#derive(Clone, Copy)` to `ScreenChar`.
Now we only need to implement the `clear_row` method to finish the newline code:
```rust
fn clear_row(&mut self, row: usize) {
let blank = ScreenChar {
ascii_character: b' ',
color_code: self.color_code,
};
self.buffer().chars[row] = [blank; BUFFER_WIDTH];
}
```
## Providing an Interface
To provide a global writer that can used as an interface from other modules, we can add a `static` writer:
```rust
pub static WRITER: Writer = Writer {
column_position: 0,
color_code: ColorCode::new(Color::LightGreen, Color::Black),
buffer: Unique::new(0xb8000 as *mut _),
};
```
But we can't use it to print anything! You can try it yourself in the `print_something` function. The reason is that we try to take a mutable reference (`&mut`) to a immutable `static` when calling `WRITER.print_byte`.
To resolve it, we could use a [mutable static]. But then every read and write to it would be unsafe since it could easily introduce data races and other bad things. Using `static mut` is highly discouraged, there are even proposals to [remove it][remove static mut].
[mutable static]: https://doc.rust-lang.org/book/const-and-static.html#mutability
[remove static mut]: https://internals.rust-lang.org/t/pre-rfc-remove-static-mut/1437
But what are the alternatives? We could try to use a cell type like [RefCell] or even [UnsafeCell] to provide [interior mutability]. But these types aren't [Sync] (with good reason), so we can't use them in statics.
[RefCell]: https://doc.rust-lang.org/nightly/core/cell/struct.RefCell.html
[UnsafeCell]: https://doc.rust-lang.org/nightly/core/cell/struct.UnsafeCell.html
[interior mutability]: https://doc.rust-lang.org/book/mutability.html#interior-vs.-exterior-mutability
[Sync]: https://doc.rust-lang.org/nightly/core/marker/trait.Sync.html
To get synchronized interior mutability, users of the standard library can use [Mutex]. It provides mutual exclusion by blocking threads when the resource is already locked. But our basic kernel does not have any blocking support or even a concept of threads, so we can't use it either. However there is a really basic kind of mutex in computer science that requires no operating system features: the [spinlock]. Instead of blocking, the threads simply try to lock it again and again in a tight loop and thus burn CPU time until the mutex is free again.
[Mutex]: https://doc.rust-lang.org/nightly/std/sync/struct.Mutex.html
[spinlock]: https://en.wikipedia.org/wiki/Spinlock
To use a spinning mutex, we can add the [spin crate] as a dependency in Cargo.toml:
[spin crate]: https://crates.io/crates/spin
```toml
...
[dependencies]
rlibc = "*"
spin = "*"
```
and a `extern crate spin;` definition in `src/lib.rs`. Then we can use the spinning Mutex to provide interior mutability to our static writer:
```rust
use spin::Mutex;
...
pub static WRITER: Mutex<Writer> = Mutex::new(Writer {
column_position: 0,
color_code: ColorCode::new(Color::LightGreen, Color::Black),
buffer: Unique::new(0xb8000 as *mut _),
});
```
[Mutex::new] is a const function, too, so it can be used in statics.
Now we can easily print from our main function:
[Mutex::new]: https://mvdnes.github.io/rust-docs/spinlock-rs/spin/struct.Mutex.html#method.new
```rust
pub extern fn rust_main() {
use core::fmt::Write;
vga_buffer::WRITER.lock().write_str("Hello again");
write!(vga_buffer::WRITER.lock(), ", some numbers: {} {}", 42, 1.337);
loop{}
}
```
Note that we need to import the `Write` trait if we want to use its functions.
## A println macro
Rust's [macro syntax] is a bit strange, so we won't try to write a macro from scratch. Instead we look at the source of the [`println!` macro] in the standard library:
[macro syntax]: https://doc.rust-lang.org/nightly/book/macros.html
[`println!` macro]: https://doc.rust-lang.org/nightly/std/macro.println!.html
```
macro_rules! println {
($fmt:expr) => (print!(concat!($fmt, "\n")));
($fmt:expr, $($arg:tt)*) => (print!(concat!($fmt, "\n"), $($arg)*));
}
```
It just adds a `\n` and then invokes the [`print!` macro], which is defined as:
[`print!` macro]: https://doc.rust-lang.org/nightly/std/macro.print!.html
```
macro_rules! print {
($($arg:tt)*) => ($crate::io::_print(format_args!($($arg)*)));
}
```
It calls the `_print` method in the `io` module of the current crate (`$crate`), which is `std`. The [`_print` function] in libstd is rather complicated, as it supports different `Stdout` devices.
[`_print` function]: https://doc.rust-lang.org/nightly/src/std/io/stdio.rs.html#578
To print to the VGA buffer, we just copy the `println!` macro and modify the `print!` macro to use our static `WRITER` instead of `_print`:
```
macro_rules! print {
($($arg:tt)*) => ({
use core::fmt::Write;
$crate::vga_buffer::WRITER.lock().write_fmt(format_args!($($arg)*)).unwrap();
});
}
```
Instead of a `_print` function, we call the `write_fmt` method of our static `Writer`. Since we're using a method from the `Write` trait, we need to import it before. The additional `unwrap()` at the end panics if printing isn't successful. But since we always return `Ok` in `write_str`, that should not happen.
Note the additional `{}` scope around the macro: I wrote `=> ({…})` instead of `=> (…)`. The additional `{}` avoids that the `Write` trait is silently imported when `print` is used.
### Clearing the screen
We can now use `println!` to add a rather trivial function to clear the screen:
```rust
pub fn clear_screen() {
for _ in 0..BUFFER_HEIGHT {
println!("");
}
}
```
### Hello World using `println`
To use `println` in `lib.rs`, we need to import the macros of the VGA buffer module first. Therefore we add a `#[macro_use]` attribute to the module declaration:
```rust
#[macro_use]
mod vga_buffer;
#[no_mangle]
pub extern fn rust_main() {
// ATTENTION: we have a very small stack and no guard page
vga_buffer::clear_screen();
println!("Hello World{}", "!");
loop{}
}
```
Since we imported the macros at crate level, they are available in all modules and thus provide an easy and safe interface to the VGA buffer.
## What's next?
In the next posts we will map the kernel pages correctly so that accessing `0x0` or writing to `.rodata` is not possible anymore. To obtain the loaded kernel sections we will read the multiboot information structure. Then we will create a paging module and use it to switch to a new page table where the kernel sections are mapped correctly.
## Other Rust OS Projects
Now that you know the very basics of OS development in Rust, you should also check out the following projects:
- [Rust Bare-Bones Kernel]: A basic kernel with roughly the same functionality as ours. Writes output to the serial port instead of the VGA buffer and maps the kernel to the [higher half] \(instead of our identity mapping).
_Note_: You need to [cross compile binutils] to build it (or you create some symbolic links[^fn-symlink] if you're on x86_64).
[Rust Bare-Bones Kernel]: https://github.com/thepowersgang/rust-barebones-kernel
[higher half]: http://wiki.osdev.org/Higher_Half_Kernel
[cross compile binutils]: {{ site.url }}/cross-compile-binutils.html
[^fn-symlink]: You will need to symlink `x86_64-none_elf-XXX` to `/usr/bin/XXX` where `XXX` is in {`as`, `ld`, `objcopy`, `objdump`, `strip`}. The `x86_64-none_elf-XXX` files must be in some folder that is in your `$PATH`. But then you can only build for your x86_64 host architecture, so use this hack only for testing.
- [RustOS]: More advanced kernel that supports allocation, keyboard inputs, and threads. It also has a scheduler and a basic network driver.
[RustOS]: https://github.com/RustOS-Fork-Holding-Ground/RustOS
- ["Tifflin" Experimental Kernel]: Big kernel project by thepowersgang, that is actively developed and has over 650 commits. It has a separate userspace and supports multiple file systems, even a GUI is included. Needs a cross compiler.
["Tifflin" Experimental Kernel]:https://github.com/thepowersgang/rust_os
- [Redox]: Probably the most complete Rust OS today. It has an active community and over 1000 Github stars. File systems, network, an audio player, a picture viewer, and much more. Just take a look at the [screenshots][redox screenshots].
[Redox]: https://github.com/redox-os/redox
[redox screenshots]: https://github.com/redox-os/redox#what-it-looks-like