From 5937ec2e04665b02a185d842920b551d4e7a8b3e Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Wed, 8 Jan 2020 11:02:42 +0100 Subject: [PATCH 01/41] Reintroduce allocator designs post The post was split off the Heap Allocations post because it became too large. To keep the tree clean, it was then temporarily removed. This commit restores the post by reverting the removal commit. This reverts commit 029d77ef21ec54f3f16c4bbad260d6f46e7a413c. --- .../allocation-fragmentation.svg | 2 + .../posts/11-allocator-designs/index.md | 726 ++++++++++++++++++ .../linked-list-allocation.svg | 2 + .../linked-list-allocator-push.svg | 2 + .../linked-list-allocator-remove-region.svg | 2 + .../qemu-bump-allocator.png | Bin 0 -> 9526 bytes blog/templates/second-edition/index.html | 1 + 7 files changed, 735 insertions(+) create mode 100644 blog/content/second-edition/posts/11-allocator-designs/allocation-fragmentation.svg create mode 100644 blog/content/second-edition/posts/11-allocator-designs/index.md create mode 100644 blog/content/second-edition/posts/11-allocator-designs/linked-list-allocation.svg create mode 100644 blog/content/second-edition/posts/11-allocator-designs/linked-list-allocator-push.svg create mode 100644 blog/content/second-edition/posts/11-allocator-designs/linked-list-allocator-remove-region.svg create mode 100644 blog/content/second-edition/posts/11-allocator-designs/qemu-bump-allocator.png diff --git a/blog/content/second-edition/posts/11-allocator-designs/allocation-fragmentation.svg b/blog/content/second-edition/posts/11-allocator-designs/allocation-fragmentation.svg new file mode 100644 index 00000000..b9c74784 --- /dev/null +++ b/blog/content/second-edition/posts/11-allocator-designs/allocation-fragmentation.svg @@ -0,0 +1,2 @@ + +
allocated
allocated
next
next
time
time
heap end
heap end
heap start
heap start
1
1
2
2
3
3
4
4
5
5
\ No newline at end of file diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md new file mode 100644 index 00000000..0ccefad1 --- /dev/null +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -0,0 +1,726 @@ ++++ +title = "Allocator Designs" +weight = 11 +path = "allocator-designs" +date = 0000-01-01 ++++ + +TODO + + + +This blog is openly developed on [GitHub]. If you have any problems or questions, please open an issue there. You can also leave comments [at the bottom]. The complete source code for this post can be found in the [`post-10`][post branch] branch. + +[GitHub]: https://github.com/phil-opp/blog_os +[at the bottom]: #comments +[post branch]: https://github.com/phil-opp/blog_os/tree/post-10 + + + +TODO optional + +## Introduction + +TODO + +## Design Goals + +The responsibility of an allocator is to manage the available heap memory. It needs to return unused memory on `alloc` calls and keep track of memory freed by `dealloc` so that it can be reused again. Most importantly, it must never hand out memory that is already in use somewhere else because this would cause undefined behavior. + +Apart from correctness, there are many secondary design goals. For example, it should effectively utilize the available memory and keep [fragmentation] low. Furthermore, it should work well for concurrent applications and scale to any number of processors. For maximal performance, it could even optimize the memory layout with respect to the CPU caches to improve [cache locality] and avoid [false sharing]. + +[cache locality]: http://docs.cray.com/books/S-2315-50/html-S-2315-50/qmeblljm.html +[fragmentation]: https://en.wikipedia.org/wiki/Fragmentation_(computing) +[false sharing]: http://mechanical-sympathy.blogspot.de/2011/07/false-sharing.html + +These requirements can make good allocators very complex. For example, [jemalloc] has over 30.000 lines of code. This complexity often undesired in kernel code where a single bug can lead to severe security vulnerabilities. Fortunately, the allocation patterns of kernel code are often much simpler compared to userspace code, so that relatively simple allocator design often suffice. In the following we explain three possible kernel allocator designs and explain their advantages and drawbacks. + +[jemalloc]: http://jemalloc.net/ + +## Bump Allocator + +The most simple allocator design is a _bump allocator_. It allocates memory linearly and only keeps track of the number of allocated bytes and the number of allocations. It is only useful in very specific use cases because it has a severe limitation: it can only free all memory at once. + +The base type looks like this: + +```rust +// in src/allocator.rs + +pub struct BumpAllocator { + heap_start: usize, + heap_end: usize, + next: usize, + allocations: usize, +} + +impl BumpAllocator { + /// Creates a new bump allocator with the given heap bounds. + /// + /// This method is unsafe because the caller must ensure that the given + /// memory range is unused. + pub const unsafe fn new(heap_start: usize, heap_size: usize) -> Self { + BumpAllocator { + heap_start, + heap_end: heap_start + heap_size, + next: heap_start, + allocations: 0, + } + } +} +``` + +Instead of using the `HEAP_START` and `HEAP_SIZE` constants directly, we use separate `heap_start` and `heap_end` fields. This makes the type more flexible, for example it also works when we only want to assign a part of the heap region. The purpose of the `next` field is to always point to the first unused byte of the heap, i.e. the start address of the next allocation. The `allocations` field is a simple counter for the active allocations with the goal of resetting the allocator after the last allocation was freed. + +We provide a simple constructor function that creates a new `BumpAllocator`. It initializes the `heap_start` and `heap_end` fields using the given start address and size. The `allocations` counter is initialized with 0. The `next` field is set to `heap_start` since the whole heap should be unused at this point. Since this is something that the caller must guarantee, the function needs to be unsafe. Given an invalid memory range, the planned implementation of the `GlobalAlloc` trait would cause undefined behavior when it is used as global allocator. + +### A `Locked` Wrapper + +Implementing the [`GlobalAlloc`] trait directly for the `BumpAllocator` struct is not possible. The problem is that the `alloc` and `dealloc` methods of the trait only take an immutable `&self` reference, but we need to update the `next` and `allocations` fields for every allocation, which is only possible with an exclusive `&mut self` reference. The reason that the `GlobalAlloc` trait is specified this way is that the global allocator needs to be stored in an immutable `static` that only allows `&self` references. + +To be able to implement the trait for our `BumpAllocator` struct, we need to add synchronized [interior mutability] to get mutable field access through the `&self` reference. A type that adds the required synchronization and allows interior mutabilty is the [`spin::Mutex`] spinlock that we already used multiple times for our kernel, for example [for our VGA buffer writer][vga-mutex]. To use it, we create a `Locked` wrapper type: + +[interior mutability]: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html +[`spin::Mutex`]: https://docs.rs/spin/0.5.0/spin/struct.Mutex.html +[vga-mutex]: ./second-edition/posts/03-vga-text-buffer/index.md#spinlocks + +```rust +// in src/allocator.rs + +pub struct Locked { + inner: spin::Mutex, +} + +impl Locked { + pub const fn new(inner: A) -> Self { + Locked { + inner: spin::Mutex::new(inner), + } + } +} +``` + +The type is a generic wrapper around a `spin::Mutex`. It imposes no restrictions on the wrapped type `A`, so it can be used to wrap all kinds of types, not just allocators. It provides a simple `new` constructor function that wraps a given value. + +### Implementing `GlobalAlloc` + +With the help of the `Locked` wrapper type we now can implement the `GlobalAlloc` trait for our bump allocator. The trick is to implement the trait not for the `BumpAllocator` directly, but for the wrapped `Locked` type. The implementation looks like this: + +```rust +// in src/allocator.rs + +unsafe impl GlobalAlloc for Locked { + unsafe fn alloc(&self, layout: Layout) -> *mut u8 { + let mut bump = self.inner.lock(); + + let alloc_start = align_up(bump.next, layout.align()); + let alloc_end = alloc_start + layout.size(); + + if alloc_end > bump.heap_end { + null_mut() // out of memory + } else { + bump.next = alloc_end; + bump.allocations += 1; + alloc_start as *mut u8 + } + } + + unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { + let mut bump = self.inner.lock(); + + bump.allocations -= 1; + if bump.allocations == 0 { + bump.next = bump.heap_start; + } + } +} +``` + +The first step for both `alloc` and `dealloc` is to call the [`Mutex::lock`] method to get a mutable reference to the wrapped allocator type. The instance remains locked until the end of the method, so that no data race can occur in multithreaded contexts (we will add threading support soon). + +[`Mutex::lock`]: https://docs.rs/spin/0.5.0/spin/struct.Mutex.html#method.lock + +The `alloc` implementation first performs the required alignment on the `next` address, as specified by the given [`Layout`]. This yields the start address of the allocation. The code for the `align_up` function is shown below. Next, we add the requested allocation size to `alloc_start` to get the end address of the allocation. If it is larger than the end address of the heap, we return a null pointer since there is not enough memory available. Otherwise, we update the `next` address (the next allocation should start after the current allocation), increase the `allocations` counter by 1, and return the `alloc_start` address converted to a `*mut u8` pointer. + +The `dealloc` function ignores the given pointer and `Layout` arguments. Instead, it just decreases the `allocations` counter. If the counter reaches `0` again, it means that all allocations were freed again. In this case, it resets the `next` address to the `heap_start` address to make the complete heap memory available again. + +The last remaining part of the implementation is the `align_up` function, which looks like this: + +```rust +// in src/allocator.rs + +fn align_up(addr: usize, align: usize) -> usize { + let remainder = addr % align; + if remainder == 0 { + addr // addr already aligned + } else { + addr - remainder + align + } +} +``` + +The function first computes the [remainder] of the division of `addr` by `align`. If the remainder is `0`, the address is already aligned with the given alignment. Otherwise, we align the address by subtracting the remainder (so that the new remainder is 0) and then adding the alignment (so that the address does not become smaller than the original address). + +[remainder]: https://en.wikipedia.org/wiki/Euclidean_division + +### Using It + +To use the bump allocator instead of the dummy allocator, we need to update the `ALLOCATOR` static in `lib.rs`: + +```rust +// in src/lib.rs + +use allocator::{Locked, BumpAllocator, HEAP_START, HEAP_SIZE}; + +#[global_allocator] +static ALLOCATOR: Locked = + Locked::new(BumpAllocator::new(HEAP_START, HEAP_SIZE)); +``` + +Here it becomes important that we declared both the `Locked::new` and the `BumpAllocator::new` as [`const` functions]. If they were normal functions, a compilation error would occur because the initialization expression of a `static` must evaluable at compile time. + +[`const` functions]: https://doc.rust-lang.org/reference/items/functions.html#const-functions + +Now we can use `Box` and `Vec` without runtime errors: + +```rust +// in src/main.rs + +use alloc::{boxed::Box, vec::Vec, collections::BTreeMap}; + +fn kernel_main(boot_info: &'static BootInfo) -> ! { + // […] initialize interrupts, mapper, frame_allocator, heap + + // allocate a number on the heap + let heap_value = Box::new(41); + println!("heap_value at {:p}", heap_value); + + // create a dynamically sized vector + let mut vec = Vec::new(); + for i in 0..500 { + vec.push(i); + } + println!("vec at {:p}", vec.as_slice()); + + // try to create one million boxes + for _ in 0..1_000_000 { + let _ = Box::new(1); + } + + // […] call `test_main` in test context + println!("It did not crash!"); + blog_os::hlt_loop(); +} +``` + +This code example only uses the `Box` and `Vec` types, but there are many more allocation and collection types in the `alloc` crate that we can now all use in our kernel, including: + +- the reference counted pointers [`Rc`] and [`Arc`] +- the owned string type [`String`] and the [`format!`] macro +- [`LinkedList`] +- the growable ring buffer [`VecDeque`] +- [`BinaryHeap`] +- [`BTreeMap`] and [`BTreeSet`] + +[`Rc`]: https://doc.rust-lang.org/alloc/rc/ +[`Arc`]: https://doc.rust-lang.org/alloc/arc/ +[`String`]: https://doc.rust-lang.org/collections/string/struct.String.html +[`format!`]: https://doc.rust-lang.org/alloc/macro.format.html +[`LinkedList`]: https://doc.rust-lang.org/collections/linked_list/struct.LinkedList.html +[`VecDeque`]: https://doc.rust-lang.org/collections/vec_deque/struct.VecDeque.html +[`BinaryHeap`]: https://doc.rust-lang.org/collections/binary_heap/struct.BinaryHeap.html +[`BTreeMap`]: https://doc.rust-lang.org/collections/btree_map/struct.BTreeMap.html +[`BTreeSet`]: https://doc.rust-lang.org/collections/btree_set/struct.BTreeSet.html + +When we run our project now, we see the following: + +![QEMU printing ` +heap_value at 0x444444440000 +vec at 0x4444444408000 +panicked at 'allocation error: Layout { size_: 4, align_: 4 }', src/lib.rs:91:5 +](qemu-bump-allocator.png) + +As expected, we see that the `Box` and `Vec` values live on the heap, as indicated by the pointer starting with `0x_4444_4444`. The reason that the vector starts at offset `0x800` is not that the boxed value is `0x800` bytes large, but the [reallocations] that occur when the vector needs to increase its capacity. For example, when the vector's capacity is 32 and we try to add the next element, the vector allocates a new backing array with capacity 64 behind the scenes and copies all elements over. Then it frees the old allocation, which in our case is equivalent to leaking it since our bump allocator doesn't reuse freed memory. + +[reallocations]: https://doc.rust-lang.org/alloc/vec/struct.Vec.html#capacity-and-reallocation + +While the basic `Box` and `Vec` examples work as expected, our loop that tries to create one million boxes causes a panic. The reason is that the bump allocator never reuses freed memory, so that for each created `Box` a few bytes are leaked. This makes the bump allocator unsuitable for many applications in practice, apart from some very specific use cases. + +### When to use a Bump Allocator + +The big advantage of bump allocation is that it's very fast. Compared to other allocator designs (see below) that need to actively look for a fitting memory block and perform various bookkeeping tasks on `alloc` and `dealloc`, a bump allocator can be optimized to just a few assembly instructions. This makes bump allocators useful for optimizing the allocation performance, for example when creating a [virtual DOM library]. + +[virtual DOM library]: https://hacks.mozilla.org/2019/03/fast-bump-allocated-virtual-doms-with-rust-and-wasm/ + +While a bump allocator is seldom used as the global allocator, the principle of bump allocation is often applied in form of [arena allocation], which basically batches individual allocations together to improve performance. An example for an arena allocator for Rust is the [`toolshed`] crate. + +[arena allocation]: https://mgravell.github.io/Pipelines.Sockets.Unofficial/docs/arenas.html +[`toolshed`]: https://docs.rs/toolshed/0.8.1/toolshed/index.html + +### Reusing Freed Memory? + +The main limitation of a bump allocator is that it never reuses deallocated memory. The question is: Can we extend our bump allocator somehow to remove this limitation? + +As we learned at the beginning of this post, allocations can live arbitarily long and can be freed in an arbitrary order. This means that we need to keep track of a potentially unbounded number of non-continuous, unused memory regions, as illustrated by the following example: + +![](allocation-fragmentation.svg) + +The graphic shows the heap over the course of time. At the beginning, the complete heap is unused and the `next` address is equal to `heap_start` (line 1). Then the first allocation occurs (line 2). In line 3, a second memory block is allocated and the first allocation is freed. Many more allocations are added in line 4. Half of them are very short-lived and already get freed in line 5, where also another new allocation is added. + +Line 5 shows the fundamental problem: We have five unused memory regions with different sizes in total, but the `next` pointer can only point to the beginning of the last region. While we could store the start addresses and sizes of the other unused memory regions in an array of size 4 for this example, this isn't a general solution since we could easily create an example with 8, 16, or 1000 unused memory regions. + +Normally when we have a potentially unbounded number of items, we can just use a heap allocated collection. This isn't really possible in our case, since the heap allocator can't depend on itself (it would cause endless recursion or deadlocks). So we need to find a different solution. + +## LinkedList Allocator + +A common trick to keep track of an arbitrary number of free memory areas is to use these areas itself as backing storage. This utilizes the fact that the regions are still mapped to a virtual address and backed by a physical frame, but the stored information is not needed anymore. By storing the information about the freed region in the region itself, we can keep track of an unbounded number of freed regions without needing additional memory. + +The most common implementation approach is to construct a single linked list in the freed memory, with each node being a freed memory region: + +![](linked-list-allocation.svg) + +Each list node contains two fields: The size of the memory region and a pointer to the next unused memory region. With this approach, we only need a pointer to the first unused region (called `head`), independent of the number of memory regions. + +In the following, we will create a simple `LinkedListAllocator` type that uses the above approach for keeping track of freed memory regions. Since the implementation is a bit longer, we will start with a simple placeholder type before we start to implement the `alloc` and `dealloc` operations. + +### The Allocator Type + +We start by creating a private `ListNode` struct: + +```rust +// in src/allocator.rs + +struct ListNode { + size: usize, + next: Option<&'static mut ListNode>, +} + +impl ListNode { + const fn new(size: usize) -> Self { + ListNode { + size, + next: None, + } + } + + fn start_addr(&self) -> usize { + self as *const Self as usize + } + + fn end_addr(&self) -> usize { + self.start_addr() + self.size + } +} +``` + +Like in the graphic, a list node has a `size` field and an optional pointer to the next node. The type has a simple constructor function and methods to calculate the start and end addresses of the represented region. + +With the `ListNode` struct as building block, we can now create the `LinkedListAllocator` struct: + +```rust +// in src/allocator.rs + +pub struct LinkedListAllocator { + head: ListNode, +} + +impl LinkedListAllocator { + pub const fn new() -> Self { + Self { + head: ListNode::new(0), + } + } + + /// Initialize the allocator with the given heap bounds. + /// + /// This function is unsafe because the caller must guarantee that the given + /// heap bounds are valid and that the heap is unused. This method must be + /// called only once. + pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { + self.add_free_region(heap_start, heap_size); + } + + /// Adds the given memory region to the front of the list. + unsafe fn add_free_region(&mut self, addr: usize, size: usize) { + unimplemented!(); + } +} +``` + +The struct contains a `head` node that points to the first heap region. We are only interested in the value of the `next` pointer, so we set the `size` to 0 in the `new` function. Making `head` a `ListNode` instead of just a `&'static mut ListNode` has the advantage that the implementation of the `alloc` method will be simpler. + +In contrast to the bump allocator, the `new` function doesn't initialize the allocator with the heap bounds. The reason is that the initialization requires to write a node to the heap memory, which can only happen at runtime. The `new` function, however, needs to be a [`const` function] that can be evaluated at compile time, because it will be used for initializing the `ALLOCATOR` static. To work around this, we provide a separate `init` method that can be called at runtime. + +[`const` function]: https://doc.rust-lang.org/reference/items/functions.html#const-functions + +The `init` method uses a `add_free_region` method, whose implementation will be shown in a moment. For now, we use the [`unimplemented!`] macro to provide a placeholder implementation that always panics. + +[`unimplemented!`]: https://doc.rust-lang.org/core/macro.unimplemented.html + +Our first goal is to set a prototype of the `LinkedListAllocator` as the global allocator. In order to be able to do that, we need to provide a placeholder implementation of the `GlobalAlloc` trait: + +```rust +// in src/allocator.rs + +unsafe impl GlobalAlloc for Locked { + unsafe fn alloc(&self, layout: Layout) -> *mut u8 { + unimplemented!(); + } + + unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { + unimplemented!(); + } +} +``` + +Like with the bump allocator, we don't implement the trait directly for the `LinkedListAllocator`, but only for a wrapped `Locked`. The [`Locked` wrapper] adds interior mutability through a spinlock, which allows us to modify the allocator instance even though the `alloc` and `dealloc` methods only take `&self` references. Instead of providing an implementation, we use the [`unimplemented!`] macro again to get a minimal prototype. + +[`Locked` wrapper]: ./second-edition/posts/10-heap-allocation/index.md#a-locked-wrapper + +With this placeholder implementation, we can now change the global allocator to a `LinkedListAllocator`: + +```rust +// in src/lib.rs + +use allocator::{Locked, LinkedListAllocator}; + +#[global_allocator] +static ALLOCATOR: Locked = + Locked::new(LinkedListAllocator::new()); +``` + +Since the `new` method creates an empty allocator, we also need to update our `allocator::init` function to call `LinkedListAllocator::init` with the heap bounds: + +```rust +// in src/allocator.rs + +pub fn init_heap( + mapper: &mut impl Mapper, + frame_allocator: &mut impl FrameAllocator, +) -> Result<(), MapToError> { + // […] map all heap pages + + // new + unsafe { + super::ALLOCATOR.inner.lock().init(HEAP_START, HEAP_SIZE); + } + + Ok(()) +} +``` + +It's important to call the `init` function after the mapping of the heap pages, because the function will already write to the heap (once we'll properly implement it). The `unsafe` block is safe here because we just mapped the heap region to unused frames, so that the passed heap region is valid. + +When we run our code now, it will of course panic since it runs into the `unimplemented!` in `add_free_region`. Let's fix that by providing a proper implementation for that method. + +### The `add_free_region` Method + +The `add_free_region` method provides the fundamental _push_ operation on the linked list. We currently only call this method from `init`, but it will also be the central method in our `dealloc` implementation. Remember, the `dealloc` method is called when an allocated memory region is freed again. To keep track of this freed memory region, we want to push it to the linked list. + +The implementation of the `add_free_region` method looks like this: + +```rust +// in src/allocator.rs + +impl LinkedListAllocator { + /// Adds the given memory region to the front of the list. + unsafe fn add_free_region(&mut self, addr: usize, size: usize) { + // ensure that the freed region is capable of holding ListNode + assert!(align_up(addr, mem::align_of::()) == addr); + assert!(size >= mem::size_of::()); + + // create a new list node and append it at the start of the list + let mut node = ListNode::new(size); + node.next = self.head.next.take(); + let node_ptr = addr as *mut ListNode; + node_ptr.write(node); + self.head.next = Some(&mut *node_ptr) + } +} +``` + +The method takes a memory region represented by an address and size as argument and adds it to the front of the list. First, it ensures that the given region has the neccessary size and alignment for storing a `ListNode`. Then it creates the node and inserts it to the list through the following steps: + +![](linked-list-allocator-push.svg) + +Step 0 shows the state of the heap before `add_free_region` is called. In step 1, the method is called with the memory region marked as `freed` in the graphic. After the initial checks, the method creates a new `node` on its stack with the size of the freed region. It then uses the [`Option::take`] method to set the `next` pointer of the node to the current `head` pointer, thereby resetting the `head` pointer to `None`. + +[`Option::take`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.take + +In step 2, the method writes the newly created `node` to the beginning of the freed memory region through the [`write`] method. It then points the `head` pointer to the new node. The resulting pointer structure looks a bit chaotic because the freed region is always inserted at the beginning of the list, but if we follow the pointers we see that each free region is still reachable from the `head` pointer. + +[`write`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write + +### The `find_region` Method + +The second fundamental operation on a linked list is finding an entry and removing it from the list. This is the central operation needed for implementing the `alloc` method. We implement the operation as a `find_region` method in the following way: + +```rust +// in src/allocator.rs + +impl LinkedListAllocator { + /// Looks for a free region with the given size and alignment and removes + /// it from the list. + /// + /// Returns a tuple of the list node and the start address of the allocation. + fn find_region(&mut self, size: usize, align: usize) + -> Option<(&'static mut ListNode, usize)> + { + // reference to current list node, updated for each iteration + let mut current = &mut self.head; + // look for a large enough memory region in linked list + while let Some(ref mut region) = current.next { + if let Ok(alloc_start) = Self::alloc_from_region(®ion, size, align) { + // region suitable for allocation -> remove node from list + let next = region.next.take(); + let ret = Some((current.next.take().unwrap(), alloc_start)); + current.next = next; + return ret; + } else { + // region not suitable -> continue with next region + current = current.next.as_mut().unwrap(); + } + } + + // no suitable region found + None + } +} +``` + +The method uses a `current` variable and a [`while let` loop] to iterate over the list elements. At the beginning, `current` is set to the (dummy) `head` node. On each iteration, it is then updated to to the `next` field of the current node (in the `else` block). If the region is suitable for an allocation with the given size and alignment, the region is removed from the list and returned together with the `alloc_start` address. + +[`while let` loop]: https://doc.rust-lang.org/reference/expressions/loop-expr.html#predicate-pattern-loops + +When the `current.next` pointer becomes `None`, the loop exits. This means that we iterated over the whole list but found no region that is suitable for an allocation. In that case, we return `None`. The check whether a region is suitable is done by a `alloc_from_region` function, whose implementation will be shown in a moment. + +Let's take a more detailed look at how a suitable region is removed from the list: + +![](linked-list-allocator-remove-region.svg) + +Step 0 shows the situation before any pointer adjustments. The `region` and `current` regions and the `region.next` and `current.next` pointers are marked in the graphic. In step 1, both the `region.next` and `current.next` pointers are reset to `None` by using the [`Option::take`] method. The original pointers are stored in local variables called `next` and `ret`. + +In step 2, the `current.next` pointer is set to the local `next` pointer, which is the original `region.next` pointer. The effect is that `current` now directly points to the region after `region`, so that `region` is no longer element of the linked list. The function then returns the pointer to `region` stored in the local `ret` variable. + +### The `alloc_from_region` Function + +The `alloc_from_region` function returns whether a region is suitable for an allocation with given size and alignment. It is defined like this: + +```rust +// in src/allocator.rs + +impl LinkedListAllocator { + /// Try to use the given region for an allocation with given size and alignment. + /// + /// Returns the allocation start address on success. + fn alloc_from_region(region: &ListNode, size: usize, align: usize) + -> Result + { + let alloc_start = align_up(region.start_addr(), align); + let alloc_end = alloc_start + size; + + if alloc_end > region.end_addr() { + // region too small + return Err(()); + } + + let excess_size = region.end_addr() - alloc_end; + if excess_size > 0 && excess_size < mem::size_of::() { + // rest of region too small to hold a ListNode (required because the + // allocation splits the region in a used and a free part) + return Err(()); + } + + // region suitable for allocation + Ok(alloc_start) + } +} +``` + +First, the function calculates the start and end address of a potential allocation, using the `align_up` function we defined earlier. If the end address is behind the end address of the region, the allocation doesn't fit in the region and we return an error. + +The function performs a less obvious check after that. This check is neccessary because most of the time an allocation does not fit a suitable region perfectly, so that a part of the region remains usable after the allocation. This part of the region must store its own `ListNode` after the allocation, so it must be large enough to do so. The check verifies exactly that: either the allocation fits perfectly (`excess_size == 0`) or the excess size is large enough to store a `ListNode`. + +### Implementing `GlobalAlloc` + +With the fundamental operations provided by the `add_free_region` and `find_region` methods, we can now finally implement the `GlobalAlloc` trait: + +```rust +// in src/allocator.rs + +unsafe impl GlobalAlloc for Locked { + unsafe fn alloc(&self, layout: Layout) -> *mut u8 { + // perform layout adjustments + let (size, align) = LinkedListAllocator::size_align(layout); + let mut allocator = self.inner.lock(); + + if let Some((region, alloc_start)) = allocator.find_region(size, align) { + let alloc_end = alloc_start + size; + let excess_size = region.end_addr() - alloc_end; + if excess_size > 0 { + allocator.add_free_region(alloc_end, excess_size); + } + alloc_start as *mut u8 + } else { + null_mut() + } + } + + unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { + // perform layout adjustments + let (size, _) = LinkedListAllocator::size_align(layout); + + self.inner.lock().add_free_region(ptr as usize, size) + } +} +``` + +Let's start with the `dealloc` method because it is simpler: First, it performs some layout adjustments, which we will explain in a moment, and retrieves a `&mut LinkedListAllocator` reference by calling the [`Mutex::lock`] function on the [`Locked` wrapper]. Then it calls the `add_free_region` function to add the deallocated region to the free list. + +The `alloc` method is a bit more complex. It starts with the same layout adjustments and also calls the [`Mutex::lock`] function to receive a mutable allocator reference. Then it uses the `find_region` method to find a suitable memory region for the allocation and remove it from the list. If this doesn't succeed and `None` is returned, it returns `null_mut` to signal an error as there is no suitable memory region. + +In the success case, the `find_region` method returns a tuple of the suitable region (no longer in the list) and the start address of the allocation. Using `alloc_start`, the allocation size, and the end address of the region, it calculates the end address of the allocation and the excess size again. If the excess size is not null, it calls `add_free_region` to add the excess size of the memory region back to the free list. Finally, it returns the `alloc_start` address casted as a `*mut u8` pointer. + +### Layout Adjustments + +```rust +// in src/allocator.rs + +impl LinkedListAllocator { + /// Adjust the given layout so that the resulting allocated memory + /// region is also capable of storing a `ListNode`. + /// + /// Returns the adjusted size and alignment as a (size, align) tuple. + fn size_align(layout: Layout) -> (usize, usize) { + let layout = layout.align_to(mem::align_of::()) + .and_then(|l| l.pad_to_align()) + .expect("adjusting alignment failed"); + let size = layout.size().max(mem::size_of::()); + (size, layout.align()) + } +} +``` + + + + + + + + + + +##### Allocation +In order to allocate a block of memory, we need to find a hole that satisfies the size and alignment requirements. If the found hole is larger than required, we split it into two smaller holes. For example, when we allocate a 24 byte block right after initialization, we split the single hole into a hole of size 24 and a hole with the remaining size: + +![split hole](split-hole.svg) + +Then we use the new 24 byte hole to perform the allocation: + +![24 bytes allocated](allocate.svg) + +To find a suitable hole, we can use several search strategies: + +- **best fit**: Search the whole list and choose the _smallest_ hole that satisfies the requirements. +- **worst fit**: Search the whole list and choose the _largest_ hole that satisfies the requirements. +- **first fit**: Search the list from the beginning and choose the _first_ hole that satisfies the requirements. + +Each strategy has its advantages and disadvantages. Best fit uses the smallest hole possible and leaves larger holes for large allocations. But splitting the smallest hole might create a tiny hole, which is too small for most allocations. In contrast, the worst fit strategy always chooses the largest hole. Thus, it does not create tiny holes, but it consumes the large block, which might be required for large allocations. + +For our use case, the best fit strategy is better than worst fit. The reason is that we have a minimal hole size of 16 bytes, since each hole needs to be able to store a size (8 bytes) and a pointer to the next hole (8 bytes). Thus, even the best fit strategy leads to holes of usable size. Furthermore, we will need to allocate very large blocks occasionally (e.g. for [DMA] buffers). + +[DMA]: https://en.wikipedia.org/wiki/Direct_memory_access + +However, both best fit and worst fit have a significant problem: They need to scan the whole list for each allocation in order to find the optimal block. This leads to long allocation times if the list is long. The first fit strategy does not have this problem, as it returns as soon as it finds a suitable hole. It is fairly fast for small allocations and might only need to scan the whole list for large allocations. + +### Deallocation +To deallocate a block of memory, we can just insert its corresponding hole somewhere into the list. However, we need to merge adjacent holes. Otherwise, we are unable to reuse the freed memory for larger allocations. For example: + +![deallocate memory, which leads to adjacent holes](deallocate.svg) + +In order to use these adjacent holes for a large allocation, we need to merge them to a single large hole first: + +![merge adjacent holes and allocate large block](merge-holes-and-allocate.svg) + +The easiest way to ensure that adjacent holes are always merged, is to keep the hole list sorted by address. Thus, we only need to check the predecessor and the successor in the list when we free a memory block. If they are adjacent to the freed block, we merge the corresponding holes. Else, we insert the freed block as a new hole at the correct position. + +## Implementation +The detailed implementation would go beyond the scope of this post, since it contains several hidden difficulties. For example: + +- Several merge cases: Merge with the previous hole, merge with the next hole, merge with both holes. +- We need to satisfy the alignment requirements, which requires additional splitting logic. +- The minimal hole size of 16 bytes: We must not create smaller holes when splitting a hole. + +I created the [linked_list_allocator] crate to handle all of these cases. It consists of a [Heap struct] that provides an `allocate_first_fit` and a `deallocate` method. It also contains a [LockedHeap] type that wraps `Heap` into spinlock so that it's usable as a static system allocator. If you are interested in the implementation details, check out the [source code][linked_list_allocator source]. + +[linked_list_allocator]: https://docs.rs/crate/linked_list_allocator/0.4.1 +[Heap struct]: https://docs.rs/linked_list_allocator/0.4.1/linked_list_allocator/struct.Heap.html +[LockedHeap]: https://docs.rs/linked_list_allocator/0.4.1/linked_list_allocator/struct.LockedHeap.html +[linked_list_allocator source]: https://github.com/phil-opp/linked-list-allocator + +We need to add the extern crate to our `Cargo.toml` and our `lib.rs`: + +``` shell +> cargo add linked_list_allocator +``` + +```rust +// in src/lib.rs +extern crate linked_list_allocator; +``` + +Now we can change our global allocator: + +```rust +use linked_list_allocator::LockedHeap; + +#[global_allocator] +static HEAP_ALLOCATOR: LockedHeap = LockedHeap::empty(); +``` + +We can't initialize the linked list allocator statically, since it needs to initialize the first hole (like described [above](#initialization)). This can't be done at compile time, so the function can't be a `const` function. Therefore we can only create an empty heap and initialize it later at runtime. For that, we add the following lines to our `rust_main` function: + +```rust +// in src/lib.rs + +#[no_mangle] +pub extern "C" fn rust_main(multiboot_information_address: usize) { + […] + + // set up guard page and map the heap pages + memory::init(boot_info); + + // initialize the heap allocator + unsafe { + HEAP_ALLOCATOR.lock().init(HEAP_START, HEAP_START + HEAP_SIZE); + } + […] +} +``` + +It is important that we initialize the heap _after_ mapping the heap pages, since the init function writes to the heap memory (the first hole). + +Our kernel uses the new allocator now, so we can deallocate memory without leaking it. The example from above should work now without causing an OOM situation: + +```rust +// in rust_main in src/lib.rs + +for i in 0..10000 { + format!("Some String"); +} +``` + +## Performance +The linked list based approach has some performance problems. Each allocation or deallocation might need to scan the complete list of holes in the worst case. However, I think it's good enough for now, since our heap will stay relatively small for the near future. When our allocator becomes a performance problem eventually, we can just replace it with a faster alternative. + +## Summary +Now we're able to use heap storage in our kernel without leaking memory. This allows us to effectively process dynamic data such as user supplied strings in the future. We can also use `Rc` and `Arc` to create types with shared ownership. And we have access to various data structures such as `Vec` or `Linked List`, which will make our lives much easier. We even have some well tested and optimized [binary heap] and [B-tree] implementations! + +[binary heap]:https://en.wikipedia.org/wiki/Binary_heap +[B-tree]: https://en.wikipedia.org/wiki/B-tree + + +--- + +TODO: update date + +--- diff --git a/blog/content/second-edition/posts/11-allocator-designs/linked-list-allocation.svg b/blog/content/second-edition/posts/11-allocator-designs/linked-list-allocation.svg new file mode 100644 index 00000000..a24ac160 --- /dev/null +++ b/blog/content/second-edition/posts/11-allocator-designs/linked-list-allocation.svg @@ -0,0 +1,2 @@ + +
heap end
heap end
heap start
heap start
size
size
next pointer
<div>next pointer</div>
head
head
\ No newline at end of file diff --git a/blog/content/second-edition/posts/11-allocator-designs/linked-list-allocator-push.svg b/blog/content/second-edition/posts/11-allocator-designs/linked-list-allocator-push.svg new file mode 100644 index 00000000..734bfaaf --- /dev/null +++ b/blog/content/second-edition/posts/11-allocator-designs/linked-list-allocator-push.svg @@ -0,0 +1,2 @@ + +
Step
[Not supported by viewer]
1
[Not supported by viewer]
2
[Not supported by viewer]
0
[Not supported by viewer]
heap end
heap end
heap start
heap start
size
size
next pointer
<div>next pointer</div>
head
head
heap end
heap end
heap start
heap start
size
size
next pointer
<div>next pointer</div>
head
head
node:
node:
freed
freed
heap end
heap end
heap start
heap start
size
size
next pointer
<div>next pointer</div>
head
head
Operations
<b>Operations</b>
\ No newline at end of file diff --git a/blog/content/second-edition/posts/11-allocator-designs/linked-list-allocator-remove-region.svg b/blog/content/second-edition/posts/11-allocator-designs/linked-list-allocator-remove-region.svg new file mode 100644 index 00000000..229130b9 --- /dev/null +++ b/blog/content/second-edition/posts/11-allocator-designs/linked-list-allocator-remove-region.svg @@ -0,0 +1,2 @@ + +
Step
[Not supported by viewer]
1
[Not supported by viewer]
2
[Not supported by viewer]
0
[Not supported by viewer]
heap end
heap end
heap start
heap start
size
size
next pointer
<div>next pointer</div>
head
head
Operations
<b>Operations</b>
region
region
current
<div>current</div>
current.next
current.next
region.next
region.next
heap end
heap end
heap start
heap start
size
size
next pointer
<div>next pointer</div>
region
region
current
<div>current</div>
head
head
ret
ret
next
[Not supported by viewer]
heap end
heap end
heap start
heap start
size
size
next pointer
<div>next pointer</div>
region
region
current
<div>current</div>
head
head
ret
ret
\ No newline at end of file diff --git a/blog/content/second-edition/posts/11-allocator-designs/qemu-bump-allocator.png b/blog/content/second-edition/posts/11-allocator-designs/qemu-bump-allocator.png new file mode 100644 index 0000000000000000000000000000000000000000..71397c2970bb068afcf9a424d6094c46da573e73 GIT binary patch literal 9526 zcmeAS@N?(olHy`uVBq!ia0y~yV7kP>z_^x!je&u|`rBk>1_lO}VkgfK4h{~E8jh3> z1_lO+64!{5;QX|b^2DN4hTO!GRNdm_qSVy9;*9)~6VpC;F)%1Fc)B=-RLpsMw{k-2 z^+fq+_iLl`jhhcTG06mo_&5l%<|$~t+rZ$()~K-LY1iW<74M>@LjMvi`{ta`3V$>y z`Kaddkad&Rt_TVf+n5wNZOu_J-P0lmW-x_&PC0Y*jwh#xroe568MV89f31Fh@6X3w z)m2qhm9@LlmsLOey-Mmu<@-Cu=VR|x?=Sy#r+9VFgL~We@4We`D?D( zmtADmX8d)-J1_eCnWft*tyXT++xYgAxah7eU(QY2|E05Y+Wkd+S-Y8p_2a`THhl}< z9aHy~NnC&9=cm{I7auo`32_h1HoGr7`FiNvlH2+>KTOiyzhYf(T-5FBn~uKDdK8xP zH*C}2Ptkft4+XVq1jA?Bn+C@2j;?FI_U2QCn&+&m|DQa%Ulw)zR(yHHY1PyDpPsOk z=EPTZ3!U|gz8CZ9{HJ+Kp4x6dy>12f>FJ?c-j&=ajFr~6udKDs|EKmm@8C0YX_@nL zb@%^kjGgQIT29Aivw3U|`}!@M{pmkHz3$H4aDb&WCnn{Qc^ju$@oS;my8o{PpI6yw z5wQMW(fix?@7$?)eDUop1_p-fYbUGwuWfs&CE5OBv3vfGU2B;b8kiqkS$X;A4|Ti9 zt?SC185kZYta?}bd-iPUJd2{Ow;34@FkUn9UA4+9_WH`%R$o_mPOhrVD4M*)D|Btl z`{1RSX8YcyKmE8YGwk)=fQr{uZHF(qe!`MvzA`HReAr@%ggDPB6AfO z7(_B=srg<`3Aw(~eK8N);VX+T7Olv;v}zi2_U^fP+vkR?{rKaN+UkWBOXq2YUNXs? zWz~Ezqp9lH)77iIvKRa8zOydT)+;uikv_ zeP7plzdE@~!A7cg)0Qn$_Lp4WxaZ}!*+s`M9^U!&g3mnL?Gy6nSk1Pt`Q&o%|8f7N z=2yLUJx(nXTDH3S{=a9>K0I{ZeXDWtMUC$6zh}?j*^e7tS8Uz_ggx3$;8;?d_5>gi@#{wU|!>%6`@Hq&2b)!qzymS;1^GTxh^ z;q{5E4!Pb-@ls1}zkT-a_q)@+=CWrqLefoE1z$hBC2H=L+iSPnUOUI4P)q*b33nq2 zo=dOGrdKaHe%Nf*vXco09=q?Jx_!ScKX32XqbpzczOMZDQ~cA${#qOF>{lyabK728 z{LnyuUf%QN`=3v>EX&aHwe@;sg<-;Q-$s7 z|NjhsZk=a6=h~x_e{N2i<1D|3rTO_y1>}KTk6bHGMh%<@2repPzq9USF^G|Cjpz=X?Gi znPb4q5EI^Z+Szi~+`M}~&#_;67b2;{dufU1X0;n(YqdgGl~mvDY4eq<{n9Mi$M&*h zSLW7LPX%m!RxJ;%(nqyK3(I zz5m~xF=@Wy-93BT{;RXkIbA(_|HobZCpQx($8Db;mi^oF-}$-L`*$yt?e6Zjto!p* z#>!;+%$YMwsvlpt^Zxtg_uoI?`KFJjVWc@%r8B>wny@x7z< z+I5oGT-JB@Dx=c#yFab7|L*?tYx*STyGta$ZT_;U{Q3Gl9|KlpO<8PyZt2SDr*88x z6!2!xQaiiiWJ0Fq_uBJ&GiN1=J!@ZWdsW*$G<5Ahp7!RF-Fx@E&CNg6ZujM3Nz~(w z=SpMKSKdEwz3=A{W)azmFM!?vf}r9^CQVQL{rT_vzT5tM)4yMTeqP!!?%sK)-2WZbf4cZ@*)8pOjazrm_-;3z zwZ7=fGU-cMv#zX-{#;xC(f8-u`F}&)=K8UH4z9fX{r>#h(Y~{mOpFd7vZ5dZe_sTRJ-<%b4TAV8^ zbn3x`88c_zl-d0^VqW#PimJD9Zq0#tyWeh;nKe6loqFY)jr}PWGI=}RvCI9~$HUfq zwW?3ve%{J&A#H~f<@V+3>E}H&wkXx|EUKx~Jg?7F^!=>3TVl=ICpCPfw?TP#aqH%t zZ!cW@vs~YD%K!I2mwwdf=G0+-IrsgJqequs@tyqg<;*`@<$pfB^SA5$iJzaJPf}T5 z_q6{Pv`&r)KAScIC1Du(y6w7p3M>U?s65!y)Vc8@~V4WD}Szduk+4@b5(qQ z&Humk&!gk<)BnHxcl~+e_CHboUX)+gi%!2FpL$30?x`;mUoP={Wo21(_+iN6yT*H0SoB7+j zd|qybf`BzKDL*D9>z_Py$jNQ-#ZQmFX6b+G;@*4mG2_bvlKg^} zWpB3W==HshjQrBbzVG{_D!J-6izRv3PG5d`wv=KkqEzf$Y}+t=~aZZA8Y+V=eXlYio8pOnvw z^~gVDv;9u~-p^mmSGm6|wSHM*gu&1L+Sue`o>&&gwkRHT zd}`RMz`6Oe*9Fa=?;m{Kb5i);9~pIz?tXtN|MxckbN%~;Ci{Q(UjDhU{eMkd>|D>< zZz+?OtaGn?dB^8f;9O7R*pC+3c|Mb=LCTfR5%`CEs}D#5`s{VwUu zIsfNVa`wK-DFu0U-Y0)7IyP1PmH4Nf$L;2-hqalTKmR;+^V^bDR;7JUKEzC2R&V#b z@4sW+@9+PT{=M7F|70J(>`B}Fy^81S-(39j&)B%~tij4>e5va^&#jZqzqe+~!kY>5 z(zh5HV(e$BlwMW|4Qo#b37vZX_g??JU1x6J`}XYWy;mz=mc4qK6Rywx^kBk(Nc zr7hhrCLitM9-6e*C=|FKfPijz5uL z;kEjz-}1|vJTpRP6{i<(^IWCuUaaEl*V7+5Y01jjKC-rxR+?B==h>Y-CA@g~=g#=Q z(|?|LbV}l-*4>qHyVeGZ8BN}KQ|s)Xnsezhj?4e_*jIV~`*Z#OYi}2?_}u&F)Z+Nl z8-Dp4?Vk`*Ts(92tT~x&l036jF3+4R$q*GSegVcJH|1AKLpqZii9v zv0w9Eh?d6c`3uOaU3J;yYu3;8e{Kfz+Ax?&&vgV(fnZ)PFkv?<4(Xck`BBf1Q$+wkkt5bL%C$&tGfaR?NM&@%pn~{gs<7 zA}Z^q9Q~Jb@58a(mwfH(KV4PVQ+dPfp7V40xdr>DBwk)}(&t&)Did$-$W>7;C-s`1p9gygmDm#>B+@t*!f41w{Cvfa@&X7-X~=oKXb;XPj0W_d7IBARd@61-kR6h z{{MFW@9rg@&%~cye*HD*x^LdzuiL)-?zSv`cBb$8#E_Sz^G>wO{mhd4wLWg*`~MgJ z8~Oj49DiP3F5WB3{Zm)@U-O?2&Ce{We7Di&*UZ)}n{Be)e@DFgz1n`x?rZmEEVY|w z|Kn`^eEl>lmU-!L!TKL}+_fOl`oeuwe_H*&iZ@b^0SsUzK`DeS`jPqCLo;h-XkKw_u zz);WSmrs5^Z$EwK&Y3-Ki$g-^hOPG7eK%_Ffe(jzqhCFp_rz+hpZf8BdH?NqJ+;?* zzmhz!JNM5)`9BxiUd!>ZuY8@hb)Mhy*{d>_UVr;ES%1!1?f5Asrw{w4rKf$q|F<># z$?j===^yTJ@BIAX;-9zd_b07aQ|Gc|NrBk z-rxK5?f)5nfA*%+Sn}Mx8Pe&uGtbqS{{Qs+U+BJB_geX0Ui01gd*AQk@3*?z>gTA} z|GrqW(0cAnzo}^(rYv3Nb@kPfuSeQHeg5x0xorLPm1UFbHk-Vg_|7+d+0`X0mtCrS zwEffme}|2KHtsK~zgO^l^{2aU6VKWFe){J(|33TupKe^9IZIP^9}`1?!PU~qdE2K8 ztNT5B7M*u8Wc}=2>w>qg+T#B9slwc+j!eF8Up0Jf-yO9p{1oCh*JkF;o6eq7ORU7N z8_SsPd_M2`vb%Y@{GTQ6|J|_e_})vOxw$h#L(iQ*|Gf2i{E0Jfa)MNYF9-kpu=1x( zP0EDnVQa4(UMe$NQuVX{+P+S?xm9-MeQuuX{cP=hR^L9o{G92}6FbZL?paJd+M4@p zUR>DS-`~v5_r{g~o-}*L+_`7hZofCp`u#5L?RP#c+OjHaUhTJ;HvfLifAZtwQBe6?^oLHJNHC--D~sDuls9^>TkVI+h(nI-G2X!qZi#u zw;0uCXa!%@_?$l7D5Jc%V%_B4_sjML`CD&puYReg{w2-iOj7-fgBg{6N5Zu4#+hH< zdrft*7X!mO_O)Twiyvw%-gRzn`Mr&|nD>5rrYU|mWa(8e+pJGN9@kGQTRu6iTx;#M zAK&gZFa7xClz#os`QBGgRk-z=epQK-du6!qaN;bj`2WAIPY!wcb=OPZ-h8GT@o%1= zc(n4*m9@`{mq@PhYu=vt+xqUDmAiggpO-n8@XlkB+Tnx8io@4$-C`oa{xWa7jYqua z*Mb>!`%lfS|GNBBF#r82kN-a`|0FH{$7kQCBcEr=U%QzgW5vKQ-E*>v?_|I2vzIPi z+SB%U=YsU~bo06HRdscC{`P-;ZoLg#wrsint_d5~&DQz)TeA0+jCEObT3Xt)*Ac(d z|7TVH+7r0%>ss~T>!r(2e(w2xV&&=B z{EwP>^L8t*uReXd=)U#6iS2QFr}Wo<(6+pP(U^y!K-JWDa+S%ckoD6;PM#E2zx-Zb zHFR>=?x|{zEoVJ{{`~VHZuL)xxb>G@pX-&Mm1XJ@ezoG=lU1vx>D%QLJvwsoHSeV6!$v2UwZx3X#WO_k29X{T6f&`($ekk_f^f} z`E^)lYu<_Nd;g^F+*)Te|Nmy=ms^fzFf?fAE`7f9^4ld|*3AzU79T9IQLCz);lBH- z-|pL<>o3R7Tb8%y%`eSQ$Nu}zN}ICupuypT1(g==b5~`7Mqe&xF2DVHS?E?huE%%p zT~W3k<8mjj{Mg>ptAD<6x)D1wzH0N&h2e7Y=LP4c+>Kp! zd^?BacdP#NZ#x3cpOSc)dpim=W_0+V#^0s?UdalSCz(i1^{?L^AF@{Kt<0NT28JDC zYnE($C?@_r*6;J@&*3Gtb3guX31DEj(fO!y2_wUUs|Wef}lu3oS#`jKJx;l$$b z-Leb}1qn8PE;P@+bAz9O;l?BdUIqqMmP`f)4FwNo1_wq*DFy~Ej!O&-0S*(`7#f(G zOc)qM1eP!|ENB>2IvN6_Aut*OqaiRF0s|ca4(gt>p4oa%TJ`JdRdutfoFMJfd0YRe zq?ToG&b@JVoA0i3Z(rR%X}2rpH#pgd!!oUy{cl9v8?TwlT6W!OBSu-#^ zI8gCpA^Y74tPBiZSZnSBN6b?qtzB9?x@z5Vqsecc zYe%v%+_=;CeB!=&vPWt+h!=;f=dG(Y{QgztbXO^QZHMc;cf{HH+U+;zFH6@7jJi>r?@kA7CA&CDR9<7#_p_T76`f9H2^=UR2!c7K`e^xfNYc7J;_)${v# z6}7v^XaC)O$z3p#M*my)UnOOENropdoi_(lYfIZt@GU9vqt{^-a5%`W+&_RO-f~GcpJ)Z|0Usbqq|&eZNj9RYTI@- zbzf$sEAtTdAHnMvu>Ma&NkPIPFkINKd=66iGAGp)NjGj3=L;(|6FLE z&i3-J6Tc(_!*5VjS5)em?Gf50ZKlw`%urzY_O!y@x;M|<7DfcHF+9k+HQh1FgN-3! z|L$eW_cJpzoOT5_+JVzeAn-N)Q-rRo^MJx4m{n{c-rs#`CHb!d7;<8hOsd`h%zed|DsY~ zHvfzN=gBwL^=&<$`{lim@8@~C`#Ey9t(E>;z4_Pr>qf6n=Wm~XecH`6_s+h$nJLN8 zAY{GoM#1uhOHA&(+x}tmrOUgv#ZHcSupc02asxBNbZ_mMIWL%`-^(*Ftn*);yySYn zf9&rYY2R*3T%Ax*1g_)#&7=qJ8xMTG(LWC=8uzF!pP9D78GrHCzu!*R54rA z*Dbc6V0#et)VObT?(uT7J(tRi|G$s(=`L4&btfzKcj~Eap20TEvl$pz_IkQFhHU%o z^=`JO?rDm?b&^* zp6`15Dl$I&oC0*X#=~KX&JI?T*{UIqJ_p zmwii$UAq4ArK)vH=I+_<^XW!f>Du_TNmkzuvsaz`eM9B>QMc6ZulTlAr*8iIwP;=T z(Kyy~o_$xtoO)&^2yO&F`bHT(~EC=$DZ7u_U6U8w#hXIazghl zndkX>#*>YcQq|TjcR5)W@3-`L@gz_Q=GlLAN$#u?`FBq(w@v=~E-hBOfBKZ0r%T^X zoqKKX^y6u}ji&!D-6hVLv3&j9yL-&1@A1=LzPIA<+pD|EfBgwdRlB=x_0GSqW7lnq zd8^M^{dL~X>+=gI+w6~>x%ZUW^yJ&>k=LWD|H|2(tO6yiz4EsTRWJQdEBl;EpuoVCgR{<@liwXyrQ z%oE*Rd47-7we8DG;@_3n#$@UDAHDh~X+zTT_3P(u`C9Z)s3YUF-iH7xCM1>!!ST^-#7Q(+B^My;O~>Bw%Ny{?)`Z1Df0KT z9n(uU8HaBDJ(F+UvGzMpE$Y7=-o>{lyY&B?qh)W8zRni99(5rm`%-PignPnzoR`>t zx^9WRtzOIcW_#_1`?rc!-(KA&f26iT;P#$yzDH&8XCH5^ySr5W+w9(Cn~e7}H%$Nb z>Z_^e-QSij@1K>{?fmxe=|Ao(DsOi`NjmJwwlqHEpjdLKne-mJlL}vYubH3RF=H*; zzx-|M_TF0O`DV$y-dp>xtL2_AJDxPj$olU2$y08=wYziL?EN>U*>mTdpMBNdbLQTt z>v1!moPPhS==R?2K2hi0&in6LclOk~V|lM;rR+A-GC%om_L4hwKOWpanVqsdxh(wE z4gL2sch~;EnZ3J~|K{uS`}f+s)AqV&tiCe+l3Co!;+)!>lZtDXzrN*lvTXXg{M}~u ze=aodp1qYh!T$5SqfzWXYb;LW?!WcMT5tMkqipsAbq@}Ies%NR`Gw)HXUHrmPF#QP z{@*H{bxU3+O7E;Ht8tz-_5R!$2kJJ(w@b(LzxjSQ+WzptQ^~4d8K$2<;&`iD@W%N) z?uBa${SN;tzw1-a&cIOZ39bN0=($g4dAY38f7UbGXuTz8zDBY!G;n_bjRrKnde|

From 620958a8a29b6a21df2bb5e66624e0559dd0c6ed Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Wed, 8 Jan 2020 11:06:33 +0100 Subject: [PATCH 02/41] Fix interal links --- .../second-edition/posts/11-allocator-designs/index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 0ccefad1..4c802b0e 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -81,7 +81,7 @@ To be able to implement the trait for our `BumpAllocator` struct, we need to add [interior mutability]: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html [`spin::Mutex`]: https://docs.rs/spin/0.5.0/spin/struct.Mutex.html -[vga-mutex]: ./second-edition/posts/03-vga-text-buffer/index.md#spinlocks +[vga-mutex]: @/second-edition/posts/03-vga-text-buffer/index.md#spinlocks ```rust // in src/allocator.rs @@ -374,7 +374,7 @@ unsafe impl GlobalAlloc for Locked { Like with the bump allocator, we don't implement the trait directly for the `LinkedListAllocator`, but only for a wrapped `Locked`. The [`Locked` wrapper] adds interior mutability through a spinlock, which allows us to modify the allocator instance even though the `alloc` and `dealloc` methods only take `&self` references. Instead of providing an implementation, we use the [`unimplemented!`] macro again to get a minimal prototype. -[`Locked` wrapper]: ./second-edition/posts/10-heap-allocation/index.md#a-locked-wrapper +[`Locked` wrapper]: #a-locked-wrapper With this placeholder implementation, we can now change the global allocator to a `LinkedListAllocator`: From 06ea0caeced655de83edaa8eb3d8153f25e5e4ea Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Wed, 8 Jan 2020 12:00:04 +0100 Subject: [PATCH 03/41] Code will be available in post-11 branch --- .../second-edition/posts/11-allocator-designs/index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 4c802b0e..044671e2 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -9,11 +9,11 @@ TODO -This blog is openly developed on [GitHub]. If you have any problems or questions, please open an issue there. You can also leave comments [at the bottom]. The complete source code for this post can be found in the [`post-10`][post branch] branch. +This blog is openly developed on [GitHub]. If you have any problems or questions, please open an issue there. You can also leave comments [at the bottom]. The complete source code for this post can be found in the [`post-11`][post branch] branch. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments -[post branch]: https://github.com/phil-opp/blog_os/tree/post-10 +[post branch]: https://github.com/phil-opp/blog_os/tree/post-11 From 26fc3ba62663a1d960fd93a522402f698eccffe0 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Wed, 8 Jan 2020 17:37:28 +0100 Subject: [PATCH 04/41] Add a small abstract --- blog/content/second-edition/posts/11-allocator-designs/index.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 044671e2..2b1fddab 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -5,6 +5,8 @@ path = "allocator-designs" date = 0000-01-01 +++ +This post explains how to implement heap allocators from scratch. It presents different allocator designs and explains their advantages and drawbacks. We then use this knowledge to create a kernel allocator with improved performance. + TODO From ed157c8a7535c7a7b5c6d594c1030fd416d133c6 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Wed, 8 Jan 2020 17:37:40 +0100 Subject: [PATCH 05/41] Write an introduction --- .../second-edition/posts/11-allocator-designs/index.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 2b1fddab..b474162d 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -23,9 +23,15 @@ TODO optional ## Introduction -TODO +In the [previous post] we added basic support for heap allocations to our kernel. For that, we [created a new memory region][map-heap] in the page tables and [used the `linked_list_allocator` crate][use-alloc-crate] to manage that memory. While we have a working heap now, we left most of the work to the allocator crate without understanding how it works. -## Design Goals +[previous post]: @/second-edition/posts/10-heap-allocation/index.md +[map-heap]: @/second-edition/posts/10-heap-allocation/index.md#creating-a-kernel-heap +[use-alloc-crate]: @/second-edition/posts/10-heap-allocation/index.md#using-an-allocator-crate + +In this post, we will show how to create our own heap allocator from scratch instead of relying on an existing allocator crate. We will discuss different allocator designs, including a simplistic _bump allocator_ and a basic _fixed-size block allocator_, and use this knowledge to implement an allocator with improved performance. + +### Design Goals The responsibility of an allocator is to manage the available heap memory. It needs to return unused memory on `alloc` calls and keep track of memory freed by `dealloc` so that it can be reused again. Most importantly, it must never hand out memory that is already in use somewhere else because this would cause undefined behavior. From c3feb6a9e679de381de2b0826847d3918a92a9ab Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Wed, 8 Jan 2020 17:38:06 +0100 Subject: [PATCH 06/41] Reword design section --- .../second-edition/posts/11-allocator-designs/index.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index b474162d..2655ae3e 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -41,10 +41,12 @@ Apart from correctness, there are many secondary design goals. For example, it s [fragmentation]: https://en.wikipedia.org/wiki/Fragmentation_(computing) [false sharing]: http://mechanical-sympathy.blogspot.de/2011/07/false-sharing.html -These requirements can make good allocators very complex. For example, [jemalloc] has over 30.000 lines of code. This complexity often undesired in kernel code where a single bug can lead to severe security vulnerabilities. Fortunately, the allocation patterns of kernel code are often much simpler compared to userspace code, so that relatively simple allocator design often suffice. In the following we explain three possible kernel allocator designs and explain their advantages and drawbacks. +These requirements can make good allocators very complex. For example, [jemalloc] has over 30.000 lines of code. This complexity often undesired in kernel code where a single bug can lead to severe security vulnerabilities. Fortunately, the allocation patterns of kernel code are often much simpler compared to userspace code, so that relatively simple allocator designs often suffice. [jemalloc]: http://jemalloc.net/ +In the following we present three possible kernel allocator designs and explain their advantages and drawbacks. + ## Bump Allocator The most simple allocator design is a _bump allocator_. It allocates memory linearly and only keeps track of the number of allocated bytes and the number of allocations. It is only useful in very specific use cases because it has a severe limitation: it can only free all memory at once. From 37aa01958ad2d108a014487231594076802097d9 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Wed, 8 Jan 2020 17:39:26 +0100 Subject: [PATCH 07/41] Start rewriting bump allocator section Remove the `Locked` wrapper type as we can just use `spin::Mutex` directly. --- .../11-allocator-designs/bump-allocation.svg | 3 + .../posts/11-allocator-designs/index.md | 151 +++++++++++++----- 2 files changed, 113 insertions(+), 41 deletions(-) create mode 100644 blog/content/second-edition/posts/11-allocator-designs/bump-allocation.svg diff --git a/blog/content/second-edition/posts/11-allocator-designs/bump-allocation.svg b/blog/content/second-edition/posts/11-allocator-designs/bump-allocation.svg new file mode 100644 index 00000000..ed47bbb5 --- /dev/null +++ b/blog/content/second-edition/posts/11-allocator-designs/bump-allocation.svg @@ -0,0 +1,3 @@ + + +
Heap Start
<div>Heap Start</div>
Heap End
<div>Heap End</div>
next
next
Heap Start
<div>Heap Start</div>
Heap End
<div>Heap End</div>
next
next
Heap Start
<div>Heap Start</div>
Heap End
<div>Heap End</div>
next
next
\ No newline at end of file diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 2655ae3e..d99e7df7 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -51,11 +51,28 @@ In the following we present three possible kernel allocator designs and explain The most simple allocator design is a _bump allocator_. It allocates memory linearly and only keeps track of the number of allocated bytes and the number of allocations. It is only useful in very specific use cases because it has a severe limitation: it can only free all memory at once. -The base type looks like this: +### Idea + +TODO + + +![TODO](bump-allocation.svg) + +### Implementation + +We start by creating a new `allocator::bump` submodule: ```rust // in src/allocator.rs +pub mod bump; +``` + +Now we can create the base type in a `src/allocator/bump.rs` file, which looks like this: + +```rust +// in src/allocator/bump.rs + pub struct BumpAllocator { heap_start: usize, heap_end: usize, @@ -68,10 +85,10 @@ impl BumpAllocator { /// /// This method is unsafe because the caller must ensure that the given /// memory range is unused. - pub const unsafe fn new(heap_start: usize, heap_size: usize) -> Self { + pub const unsafe fn new(heap_start: usize, heap_end: usize) -> Self { BumpAllocator { heap_start, - heap_end: heap_start + heap_size, + heap_end, next: heap_start, allocations: 0, } @@ -79,48 +96,94 @@ impl BumpAllocator { } ``` -Instead of using the `HEAP_START` and `HEAP_SIZE` constants directly, we use separate `heap_start` and `heap_end` fields. This makes the type more flexible, for example it also works when we only want to assign a part of the heap region. The purpose of the `next` field is to always point to the first unused byte of the heap, i.e. the start address of the next allocation. The `allocations` field is a simple counter for the active allocations with the goal of resetting the allocator after the last allocation was freed. +The `heap_start` and `heap_end` fields keep track of the lower and upper bound of the heap memory region. The caller need to ensure that these addresses are valid, otherwise the allocator would return invalid memory. For this reason, the `new` function needs to be `unsafe` to call. -We provide a simple constructor function that creates a new `BumpAllocator`. It initializes the `heap_start` and `heap_end` fields using the given start address and size. The `allocations` counter is initialized with 0. The `next` field is set to `heap_start` since the whole heap should be unused at this point. Since this is something that the caller must guarantee, the function needs to be unsafe. Given an invalid memory range, the planned implementation of the `GlobalAlloc` trait would cause undefined behavior when it is used as global allocator. +The purpose of the `next` field is to always point to the first unused byte of the heap, i.e. the start address of the next allocation. It is set to `heap_start` in the `new` function because at the beginning the complete heap is unused. On each allocation, this field will be increased by the allocation size (_"bumped"_) to ensure that we don't return the same memory region twice. -### A `Locked` Wrapper +The `allocations` field is a simple counter for the active allocations with the goal of resetting the allocator after the last allocation was freed. It is initialized with 0. -Implementing the [`GlobalAlloc`] trait directly for the `BumpAllocator` struct is not possible. The problem is that the `alloc` and `dealloc` methods of the trait only take an immutable `&self` reference, but we need to update the `next` and `allocations` fields for every allocation, which is only possible with an exclusive `&mut self` reference. The reason that the `GlobalAlloc` trait is specified this way is that the global allocator needs to be stored in an immutable `static` that only allows `&self` references. +### Implementing `GlobalAlloc` -To be able to implement the trait for our `BumpAllocator` struct, we need to add synchronized [interior mutability] to get mutable field access through the `&self` reference. A type that adds the required synchronization and allows interior mutabilty is the [`spin::Mutex`] spinlock that we already used multiple times for our kernel, for example [for our VGA buffer writer][vga-mutex]. To use it, we create a `Locked` wrapper type: +As [explained in the previous post][global-alloc], all heap allocators need to implement the [`GlobalAlloc`] trait, which is defined like this: -[interior mutability]: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html -[`spin::Mutex`]: https://docs.rs/spin/0.5.0/spin/struct.Mutex.html -[vga-mutex]: @/second-edition/posts/03-vga-text-buffer/index.md#spinlocks +[global-alloc]: @/second-edition/posts/10-heap-allocation/index.md#the-allocator-interface +[`GlobalAlloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html ```rust -// in src/allocator.rs +pub unsafe trait GlobalAlloc { + unsafe fn alloc(&self, layout: Layout) -> *mut u8; + unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout); -pub struct Locked
{ - inner: spin::Mutex, + unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 { ... } + unsafe fn realloc( + &self, + ptr: *mut u8, + layout: Layout, + new_size: usize + ) -> *mut u8 { ... } } +``` -impl Locked { - pub const fn new(inner: A) -> Self { - Locked { - inner: spin::Mutex::new(inner), - } +Only the `alloc` and `dealloc` methods are required, the other two methods have default implementations and can be omitted. + +#### First Implementation Attempt + +Let's try to implement the `alloc` method for our `BumpAllocator`: + +```rust +// in src/allocator/bump.rs + +unsafe impl GlobalAlloc for BumpAllocator { + unsafe fn alloc(&self, layout: Layout) -> *mut u8 { + // TODO alignment and bounds check + let alloc_start = self.next; + self.next = alloc_start + layout.size(); + self.allocations += 1; + alloc_start as *mut u8 + } + + unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { + unimplemented!(); } } ``` -The type is a generic wrapper around a `spin::Mutex`. It imposes no restrictions on the wrapped type `A`, so it can be used to wrap all kinds of types, not just allocators. It provides a simple `new` constructor function that wraps a given value. +First, we use the `next` field as the start address for our allocation. Then we update the `next` field to point at the end address of the allocation, which is the next unused address on the heap. Before returning the start address of the allocation as a `*mut u8` pointer, we increase the `allocations` counter by 1. -### Implementing `GlobalAlloc` +Note that we don't perform any bounds checks or alignment adjustments, so this implementation is not safe yet. This does not matter much because it fails to compile anyway with the following error: -With the help of the `Locked` wrapper type we now can implement the `GlobalAlloc` trait for our bump allocator. The trick is to implement the trait not for the `BumpAllocator` directly, but for the wrapped `Locked` type. The implementation looks like this: +TODO + +This error occurs because the [`alloc`] and [`dealloc`] methods of the `GlobalAlloc` trait only operate on an immutable `&self` reference, so updating the `next` and `allocations` fields is not possible. This is problematic because updating `next` on every allocation is the essential principle of a bump allocator. + +[`alloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.alloc +[`dealloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.dealloc + +#### `GlobalAlloc` and Mutability + +Before we look at a possible solution to this, let's try to understand why the `GlobalAlloc` trait is defined this way: As we saw [in the previous post][global-allocator], the global heap allocator is defined by adding the `#[global_allocator]` attribute to a `static` that implements the `GlobalAlloc` trait. Static variables are immutable in Rust, so there is no way to call a method that takes `&mut self` on the allocator `static`. For this reason, all the methods of `GlobalAlloc` only take an immutable `&self` reference. + +[global-allocator]: @/second-edition/posts/10-heap-allocation/index.md#the-global-allocator-attribute + +Fortunately there is a way how to get a `&mut self` reference from a `&self` reference: We can use synchronized [interior mutability] by wrapping the allocator in a [`spin::Mutex`] spinlock. This type provides a `lock` method that performs [mutual exclusion] and thus safely turns a `&self` reference to a `&mut self` reference. We already used the wrapper type multiple times in our kernel, for example for the [VGA text buffer][vga-mutex]. + +[interior mutability]: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html +[vga-mutex]: @/second-edition/posts/03-vga-text-buffer/index.md#spinlocks +[`spin::Mutex`]: https://docs.rs/spin/0.5.0/spin/struct.Mutex.html +[mutual exclusion]: https://en.wikipedia.org/wiki/Mutual_exclusion + +#### Implementation for `Spin` + +With the help of the `spin::Mutex` wrapper type we now can implement the `GlobalAlloc` trait for our bump allocator. The trick is to implement the trait not for the `BumpAllocator` directly, but for the wrapped `Spin` type. The full implementation looks like this: ```rust -// in src/allocator.rs +// in src/allocator/bump.rs -unsafe impl GlobalAlloc for Locked { +use super::align_up; + +unsafe impl GlobalAlloc for Spin { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { - let mut bump = self.inner.lock(); + let mut bump = self.lock(); // get a mutable let alloc_start = align_up(bump.next, layout.align()); let alloc_end = alloc_start + layout.size(); @@ -135,7 +198,7 @@ unsafe impl GlobalAlloc for Locked { } unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { - let mut bump = self.inner.lock(); + let mut bump = self.lock(); bump.allocations -= 1; if bump.allocations == 0 { @@ -149,11 +212,13 @@ The first step for both `alloc` and `dealloc` is to call the [`Mutex::lock`] met [`Mutex::lock`]: https://docs.rs/spin/0.5.0/spin/struct.Mutex.html#method.lock -The `alloc` implementation first performs the required alignment on the `next` address, as specified by the given [`Layout`]. This yields the start address of the allocation. The code for the `align_up` function is shown below. Next, we add the requested allocation size to `alloc_start` to get the end address of the allocation. If it is larger than the end address of the heap, we return a null pointer since there is not enough memory available. Otherwise, we update the `next` address (the next allocation should start after the current allocation), increase the `allocations` counter by 1, and return the `alloc_start` address converted to a `*mut u8` pointer. +Compared to the previous prototype, the `alloc` implementation now respects alignment requirements and performs a bounds check to ensure that the allocations stay inside the heap memory region. The first step is to round up the `next` address to the alignment specified by the `Layout` argument. The code for the `align_up` function is shown in a moment. Like before, we then add the requested allocation size to `alloc_start` to get the end address of the allocation. If it is larger than the end address of the heap, we return a null pointer to signal an out-of-memory situation. Otherwise, we update the `next` address and increase the `allocations` counter by 1 like before. Finally, we return the `alloc_start` address converted to a `*mut u8` pointer. + +[`Layout`]: https://doc.rust-lang.org/alloc/alloc/struct.Layout.html The `dealloc` function ignores the given pointer and `Layout` arguments. Instead, it just decreases the `allocations` counter. If the counter reaches `0` again, it means that all allocations were freed again. In this case, it resets the `next` address to the `heap_start` address to make the complete heap memory available again. -The last remaining part of the implementation is the `align_up` function, which looks like this: +The `align_up` function is general enough that we can put it into the parent `allocator` module. It looks like this: ```rust // in src/allocator.rs @@ -174,23 +239,27 @@ The function first computes the [remainder] of the division of `addr` by `align` ### Using It -To use the bump allocator instead of the dummy allocator, we need to update the `ALLOCATOR` static in `lib.rs`: +To use the bump allocator instead of the `linked_list_allocator` crate, we need to update the `ALLOCATOR` static in `allocator.rs`: ```rust -// in src/lib.rs +// in src/allocator.rs -use allocator::{Locked, BumpAllocator, HEAP_START, HEAP_SIZE}; +use allocator::{BumpAllocator, HEAP_START, HEAP_SIZE}; +use spin::Mutex; #[global_allocator] -static ALLOCATOR: Locked = - Locked::new(BumpAllocator::new(HEAP_START, HEAP_SIZE)); +static ALLOCATOR: Spin = + Spin::new(BumpAllocator::new(HEAP_START, HEAP_SIZE)); ``` -Here it becomes important that we declared both the `Locked::new` and the `BumpAllocator::new` as [`const` functions]. If they were normal functions, a compilation error would occur because the initialization expression of a `static` must evaluable at compile time. +Here it becomes important that we declared `BumpAllocator::new` as a [`const` function]. If it was normal functions, a compilation error would occur because the initialization expression of a `static` must evaluable at compile time. -[`const` functions]: https://doc.rust-lang.org/reference/items/functions.html#const-functions +[`const` function]: https://doc.rust-lang.org/reference/items/functions.html#const-functions -Now we can use `Box` and `Vec` without runtime errors: +--- + + +TODO: Now we can use `Box` and `Vec` without runtime errors: ```rust // in src/main.rs @@ -371,7 +440,7 @@ Our first goal is to set a prototype of the `LinkedListAllocator` as the global ```rust // in src/allocator.rs -unsafe impl GlobalAlloc for Locked { +unsafe impl GlobalAlloc for Spin { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { unimplemented!(); } @@ -382,7 +451,7 @@ unsafe impl GlobalAlloc for Locked { } ``` -Like with the bump allocator, we don't implement the trait directly for the `LinkedListAllocator`, but only for a wrapped `Locked`. The [`Locked` wrapper] adds interior mutability through a spinlock, which allows us to modify the allocator instance even though the `alloc` and `dealloc` methods only take `&self` references. Instead of providing an implementation, we use the [`unimplemented!`] macro again to get a minimal prototype. +Like with the bump allocator, we don't implement the trait directly for the `LinkedListAllocator`, but only for a wrapped `Spin`. The [`Locked` wrapper] adds interior mutability through a spinlock, which allows us to modify the allocator instance even though the `alloc` and `dealloc` methods only take `&self` references. Instead of providing an implementation, we use the [`unimplemented!`] macro again to get a minimal prototype. [`Locked` wrapper]: #a-locked-wrapper @@ -394,8 +463,8 @@ With this placeholder implementation, we can now change the global allocator to use allocator::{Locked, LinkedListAllocator}; #[global_allocator] -static ALLOCATOR: Locked = - Locked::new(LinkedListAllocator::new()); +static ALLOCATOR: Spin = + Spin::new(LinkedListAllocator::new()); ``` Since the `new` method creates an empty allocator, we also need to update our `allocator::init` function to call `LinkedListAllocator::init` with the heap bounds: @@ -557,7 +626,7 @@ With the fundamental operations provided by the `add_free_region` and `find_regi ```rust // in src/allocator.rs -unsafe impl GlobalAlloc for Locked { +unsafe impl GlobalAlloc for Spin { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { // perform layout adjustments let (size, align) = LinkedListAllocator::size_align(layout); From 851460fe124e6c1c8a47efaa3835cb79bd61e442 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Wed, 8 Jan 2020 18:03:16 +0100 Subject: [PATCH 08/41] Fix typos --- .../second-edition/posts/11-allocator-designs/index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index d99e7df7..c6b71057 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -339,7 +339,7 @@ While a bump allocator is seldom used as the global allocator, the principle of The main limitation of a bump allocator is that it never reuses deallocated memory. The question is: Can we extend our bump allocator somehow to remove this limitation? -As we learned at the beginning of this post, allocations can live arbitarily long and can be freed in an arbitrary order. This means that we need to keep track of a potentially unbounded number of non-continuous, unused memory regions, as illustrated by the following example: +As we learned at the beginning of this post, allocations can live arbitrarily long and can be freed in an arbitrary order. This means that we need to keep track of a potentially unbounded number of non-continuous, unused memory regions, as illustrated by the following example: ![](allocation-fragmentation.svg) @@ -517,7 +517,7 @@ impl LinkedListAllocator { } ``` -The method takes a memory region represented by an address and size as argument and adds it to the front of the list. First, it ensures that the given region has the neccessary size and alignment for storing a `ListNode`. Then it creates the node and inserts it to the list through the following steps: +The method takes a memory region represented by an address and size as argument and adds it to the front of the list. First, it ensures that the given region has the necessary size and alignment for storing a `ListNode`. Then it creates the node and inserts it to the list through the following steps: ![](linked-list-allocator-push.svg) @@ -617,7 +617,7 @@ impl LinkedListAllocator { First, the function calculates the start and end address of a potential allocation, using the `align_up` function we defined earlier. If the end address is behind the end address of the region, the allocation doesn't fit in the region and we return an error. -The function performs a less obvious check after that. This check is neccessary because most of the time an allocation does not fit a suitable region perfectly, so that a part of the region remains usable after the allocation. This part of the region must store its own `ListNode` after the allocation, so it must be large enough to do so. The check verifies exactly that: either the allocation fits perfectly (`excess_size == 0`) or the excess size is large enough to store a `ListNode`. +The function performs a less obvious check after that. This check is necessary because most of the time an allocation does not fit a suitable region perfectly, so that a part of the region remains usable after the allocation. This part of the region must store its own `ListNode` after the allocation, so it must be large enough to do so. The check verifies exactly that: either the allocation fits perfectly (`excess_size == 0`) or the excess size is large enough to store a `ListNode`. ### Implementing `GlobalAlloc` From e4652090a8cc6e52ff0e38aeaa5a5835d01e5b17 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 9 Jan 2020 13:46:37 +0100 Subject: [PATCH 09/41] Finish rewrite of bump allocator section --- .../posts/11-allocator-designs/index.md | 267 +++++++++++------- .../qemu-bump-allocator.png | Bin 9526 -> 0 bytes 2 files changed, 165 insertions(+), 102 deletions(-) delete mode 100644 blog/content/second-edition/posts/11-allocator-designs/qemu-bump-allocator.png diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index c6b71057..793b6aea 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -53,14 +53,20 @@ The most simple allocator design is a _bump allocator_. It allocates memory line ### Idea -TODO +The idea behind a bump allocator is to linearly allocate memory by increasing (_"bumping"_) a `next` variable, which points at the beginning of the unused memory. At the beginning, `next` is equal to the start address of the heap. On each allocation, `next` is increased by the allocation so that it always points to the boundary between used and unused memory: +![The heap memory area at three points in time: + 1: A single allocation exists at the start of the heap; the `next` pointer points to its end + 2: A second allocation was added right after the first; the `next` pointer points to the end of the second allocation + 3: A third allocation was added right after the second one; the `next pointer points to the end of the third allocation](bump-allocation.svg) -![TODO](bump-allocation.svg) +The `next` pointer only moves in a single direction and thus never hands out the same memory region twice. When it reaches the end of the heap, no more memory can be allocated, which results in an out-of-memory error. -### Implementation +A bump allocator is often implemented with an allocation counter, which is inreased by 1 on each `alloc` call and decreased by 1 on each `dealloc` call. When the allocation counter reaches zero it means that all allocations on the heap were deallocated so that the complete heap is unused again. In this case, the `next` pointer can be reset to the start address of the heap, so that the complete heap memory is available to allocations again. -We start by creating a new `allocator::bump` submodule: +### Type Implementation + +We start our implementation by creating a new `allocator::bump` submodule: ```rust // in src/allocator.rs @@ -81,24 +87,31 @@ pub struct BumpAllocator { } impl BumpAllocator { - /// Creates a new bump allocator with the given heap bounds. - /// - /// This method is unsafe because the caller must ensure that the given - /// memory range is unused. - pub const unsafe fn new(heap_start: usize, heap_end: usize) -> Self { + /// Creates a new empty bump allocator. + pub const fn new() -> Self { BumpAllocator { - heap_start, - heap_end, - next: heap_start, + heap_start: 0, + heap_end: 0, + next: 0, allocations: 0, } } + + /// Initializes the bump allocator with the given heap bounds. + /// + /// This method is unsafe because the caller must ensure that the given + /// memory range is unused. Also, this method must be called only once. + pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { + self.heap_start = heap_start; + self.heap_end = heap_start + heap_size; + self.next = heap_start; + } } ``` -The `heap_start` and `heap_end` fields keep track of the lower and upper bound of the heap memory region. The caller need to ensure that these addresses are valid, otherwise the allocator would return invalid memory. For this reason, the `new` function needs to be `unsafe` to call. +The `heap_start` and `heap_end` fields keep track of the lower and upper bound of the heap memory region. The caller need to ensure that these addresses are valid, otherwise the allocator would return invalid memory. For this reason, the `init` function needs to be `unsafe` to call. -The purpose of the `next` field is to always point to the first unused byte of the heap, i.e. the start address of the next allocation. It is set to `heap_start` in the `new` function because at the beginning the complete heap is unused. On each allocation, this field will be increased by the allocation size (_"bumped"_) to ensure that we don't return the same memory region twice. +The purpose of the `next` field is to always point to the first unused byte of the heap, i.e. the start address of the next allocation. It is set to `heap_start` in the `init` function because at the beginning the complete heap is unused. On each allocation, this field will be increased by the allocation size (_"bumped"_) to ensure that we don't return the same memory region twice. The `allocations` field is a simple counter for the active allocations with the goal of resetting the allocator after the last allocation was freed. It is initialized with 0. @@ -133,6 +146,8 @@ Let's try to implement the `alloc` method for our `BumpAllocator`: ```rust // in src/allocator/bump.rs +use alloc::alloc::{GlobalAlloc, Layout}; + unsafe impl GlobalAlloc for BumpAllocator { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { // TODO alignment and bounds check @@ -152,16 +167,29 @@ First, we use the `next` field as the start address for our allocation. Then we Note that we don't perform any bounds checks or alignment adjustments, so this implementation is not safe yet. This does not matter much because it fails to compile anyway with the following error: -TODO +``` +error[E0594]: cannot assign to `self.next` which is behind a `&` reference + --> src/allocator/bump.rs:29:9 + | +26 | unsafe fn alloc(&self, layout: Layout) -> *mut u8 { + | ----- help: consider changing this to be a mutable reference: `&mut self` +... +29 | self.next = alloc_start + layout.size(); + | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `self` is a `&` reference, so the data it refers to cannot be written +``` -This error occurs because the [`alloc`] and [`dealloc`] methods of the `GlobalAlloc` trait only operate on an immutable `&self` reference, so updating the `next` and `allocations` fields is not possible. This is problematic because updating `next` on every allocation is the essential principle of a bump allocator. +(The same error also occurs for the `self.allocations += 1` line. We omitted it here for brevity.) + +The error occurs because the [`alloc`] and [`dealloc`] methods of the `GlobalAlloc` trait only operate on an immutable `&self` reference, so updating the `next` and `allocations` fields is not possible. This is problematic because updating `next` on every allocation is the essential principle of a bump allocator. [`alloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.alloc [`dealloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.dealloc +Note that the compiler suggestion to change `&self` to `&mut self` in the method declaration does not work here. The reason is that the method signature is defined by the `GlobalAlloc` trait and can't be changed on the implementation side. I opened an [issue](https://github.com/rust-lang/rust/issues/68049) in the Rust repository about the invalid suggestion. + #### `GlobalAlloc` and Mutability -Before we look at a possible solution to this, let's try to understand why the `GlobalAlloc` trait is defined this way: As we saw [in the previous post][global-allocator], the global heap allocator is defined by adding the `#[global_allocator]` attribute to a `static` that implements the `GlobalAlloc` trait. Static variables are immutable in Rust, so there is no way to call a method that takes `&mut self` on the allocator `static`. For this reason, all the methods of `GlobalAlloc` only take an immutable `&self` reference. +Before we look at a possible solution to this mutability problem, let's try to understand why the `GlobalAlloc` trait methods are defined with `&self` arguments: As we saw [in the previous post][global-allocator], the global heap allocator is defined by adding the `#[global_allocator]` attribute to a `static` that implements the `GlobalAlloc` trait. Static variables are immutable in Rust, so there is no way to call a method that takes `&mut self` on the allocator `static`. For this reason, all the methods of `GlobalAlloc` only take an immutable `&self` reference. [global-allocator]: @/second-edition/posts/10-heap-allocation/index.md#the-global-allocator-attribute @@ -172,24 +200,74 @@ Fortunately there is a way how to get a `&mut self` reference from a `&self` ref [`spin::Mutex`]: https://docs.rs/spin/0.5.0/spin/struct.Mutex.html [mutual exclusion]: https://en.wikipedia.org/wiki/Mutual_exclusion -#### Implementation for `Spin` +#### A `Locked` Wrapper Type -With the help of the `spin::Mutex` wrapper type we now can implement the `GlobalAlloc` trait for our bump allocator. The trick is to implement the trait not for the `BumpAllocator` directly, but for the wrapped `Spin` type. The full implementation looks like this: +With the help of the `spin::Mutex` wrapper type we can implement the `GlobalAlloc` trait for our bump allocator. The trick is to implement the trait not for the `BumpAllocator` directly, but for the wrapped `spin::Mutex` type: + +```rust +unsafe impl GlobalAlloc for spin::Mutex {…} +``` + +Unfortunatly, the Rust compiler does not permit trait implementations for types defined in other crates: + +``` +error[E0117]: only traits defined in the current crate can be implemented for arbitrary types + --> src/allocator/bump.rs:28:1 + | +28 | unsafe impl GlobalAlloc for spin::Mutex { + | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^-------------------------- + | | | + | | `spin::mutex::Mutex` is not defined in the current crate + | impl doesn't use only types from inside the current crate + | + = note: define and implement a trait or new type instead +``` + +To fix this, we need to create our own wrapper type around `spin::Mutex`: + +```rust +// in src/allocator.rs + +/// A wrapper around spin::Mutex to permit trait implementations. +pub struct Locked { + inner: spin::Mutex, +} + +impl Locked { + pub const fn new(inner: A) -> Self { + Locked { + inner: spin::Mutex::new(inner), + } + } + + pub fn lock(&self) -> spin::MutexGuard { + self.inner.lock() + } +} +``` + +The type is a generic wrapper around a `spin::Mutex`. It imposes no restrictions on the wrapped type `A`, so it can be used to wrap all kinds of types, not just allocators. It provides a simple `new` constructor function that wraps a given value. For convenience, it also provides a `lock` function that calls `lock` on the wrapped `Mutex`. Since the `Locked` type is general enough to be useful for other allocator implementations too, we put it in the parent `allocator` module. + +#### Implementation for `Locked` + +The `Locked` type is defined in our own crate (in contrast to `spin::Mutex`), so we can use it to implement `GlobalAlloc` for our bump allocator. The full implementation looks like this: ```rust // in src/allocator/bump.rs -use super::align_up; +use super::{align_up, Locked}; +use alloc::alloc::{GlobalAlloc, Layout}; +use core::ptr; -unsafe impl GlobalAlloc for Spin { +unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { - let mut bump = self.lock(); // get a mutable + let mut bump = self.lock(); // get a mutable reference let alloc_start = align_up(bump.next, layout.align()); let alloc_end = alloc_start + layout.size(); if alloc_end > bump.heap_end { - null_mut() // out of memory + ptr::null_mut() // out of memory } else { bump.next = alloc_end; bump.allocations += 1; @@ -198,7 +276,7 @@ unsafe impl GlobalAlloc for Spin { } unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { - let mut bump = self.lock(); + let mut bump = self.lock(); // get a mutable reference bump.allocations -= 1; if bump.allocations == 0 { @@ -208,7 +286,7 @@ unsafe impl GlobalAlloc for Spin { } ``` -The first step for both `alloc` and `dealloc` is to call the [`Mutex::lock`] method to get a mutable reference to the wrapped allocator type. The instance remains locked until the end of the method, so that no data race can occur in multithreaded contexts (we will add threading support soon). +The first step for both `alloc` and `dealloc` is to call the [`Mutex::lock`] method through the `inner` field to get a mutable reference to the wrapped allocator type. The instance remains locked until the end of the method, so that no data race can occur in multithreaded contexts (we will add threading support soon). [`Mutex::lock`]: https://docs.rs/spin/0.5.0/spin/struct.Mutex.html#method.lock @@ -244,90 +322,37 @@ To use the bump allocator instead of the `linked_list_allocator` crate, we need ```rust // in src/allocator.rs -use allocator::{BumpAllocator, HEAP_START, HEAP_SIZE}; -use spin::Mutex; +use bump::BumpAllocator; #[global_allocator] -static ALLOCATOR: Spin = - Spin::new(BumpAllocator::new(HEAP_START, HEAP_SIZE)); +static ALLOCATOR: Locked = + Locked::new(BumpAllocator::new(HEAP_START, HEAP_SIZE)); ``` -Here it becomes important that we declared `BumpAllocator::new` as a [`const` function]. If it was normal functions, a compilation error would occur because the initialization expression of a `static` must evaluable at compile time. +Here it becomes important that we declared `BumpAllocator::new` and `Locked::new` as [`const` functions]. If they were normal functions, a compilation error would occur because the initialization expression of a `static` must evaluable at compile time. -[`const` function]: https://doc.rust-lang.org/reference/items/functions.html#const-functions +[`const` functions]: https://doc.rust-lang.org/reference/items/functions.html#const-functions ---- +We don't need to change the `ALLOCATOR.lock().init(HEAP_START, HEAP_SIZE)` call in our `init_heap` function because the bump allocator provides the same interface as the allocator provided by the `linked_list_allocator`. +Now our kernel uses our bump allocator! Everything should still work, including the [`heap_allocation` tests] that we created in the previous post: -TODO: Now we can use `Box` and `Vec` without runtime errors: +[`heap_allocation` tests]: @/second-edition/posts/10-heap-allocation/index.md#adding-a-test -```rust -// in src/main.rs - -use alloc::{boxed::Box, vec::Vec, collections::BTreeMap}; - -fn kernel_main(boot_info: &'static BootInfo) -> ! { - // […] initialize interrupts, mapper, frame_allocator, heap - - // allocate a number on the heap - let heap_value = Box::new(41); - println!("heap_value at {:p}", heap_value); - - // create a dynamically sized vector - let mut vec = Vec::new(); - for i in 0..500 { - vec.push(i); - } - println!("vec at {:p}", vec.as_slice()); - - // try to create one million boxes - for _ in 0..1_000_000 { - let _ = Box::new(1); - } - - // […] call `test_main` in test context - println!("It did not crash!"); - blog_os::hlt_loop(); -} +``` +> cargo xtest --test heap_allocation +[…] +Running 3 tests +simple_allocation... [ok] +large_vec... [ok] +many_boxes... [ok] ``` -This code example only uses the `Box` and `Vec` types, but there are many more allocation and collection types in the `alloc` crate that we can now all use in our kernel, including: +### Discussion -- the reference counted pointers [`Rc`] and [`Arc`] -- the owned string type [`String`] and the [`format!`] macro -- [`LinkedList`] -- the growable ring buffer [`VecDeque`] -- [`BinaryHeap`] -- [`BTreeMap`] and [`BTreeSet`] - -[`Rc`]: https://doc.rust-lang.org/alloc/rc/ -[`Arc`]: https://doc.rust-lang.org/alloc/arc/ -[`String`]: https://doc.rust-lang.org/collections/string/struct.String.html -[`format!`]: https://doc.rust-lang.org/alloc/macro.format.html -[`LinkedList`]: https://doc.rust-lang.org/collections/linked_list/struct.LinkedList.html -[`VecDeque`]: https://doc.rust-lang.org/collections/vec_deque/struct.VecDeque.html -[`BinaryHeap`]: https://doc.rust-lang.org/collections/binary_heap/struct.BinaryHeap.html -[`BTreeMap`]: https://doc.rust-lang.org/collections/btree_map/struct.BTreeMap.html -[`BTreeSet`]: https://doc.rust-lang.org/collections/btree_set/struct.BTreeSet.html - -When we run our project now, we see the following: - -![QEMU printing ` -heap_value at 0x444444440000 -vec at 0x4444444408000 -panicked at 'allocation error: Layout { size_: 4, align_: 4 }', src/lib.rs:91:5 -](qemu-bump-allocator.png) - -As expected, we see that the `Box` and `Vec` values live on the heap, as indicated by the pointer starting with `0x_4444_4444`. The reason that the vector starts at offset `0x800` is not that the boxed value is `0x800` bytes large, but the [reallocations] that occur when the vector needs to increase its capacity. For example, when the vector's capacity is 32 and we try to add the next element, the vector allocates a new backing array with capacity 64 behind the scenes and copies all elements over. Then it frees the old allocation, which in our case is equivalent to leaking it since our bump allocator doesn't reuse freed memory. - -[reallocations]: https://doc.rust-lang.org/alloc/vec/struct.Vec.html#capacity-and-reallocation - -While the basic `Box` and `Vec` examples work as expected, our loop that tries to create one million boxes causes a panic. The reason is that the bump allocator never reuses freed memory, so that for each created `Box` a few bytes are leaked. This makes the bump allocator unsuitable for many applications in practice, apart from some very specific use cases. - -### When to use a Bump Allocator - -The big advantage of bump allocation is that it's very fast. Compared to other allocator designs (see below) that need to actively look for a fitting memory block and perform various bookkeeping tasks on `alloc` and `dealloc`, a bump allocator can be optimized to just a few assembly instructions. This makes bump allocators useful for optimizing the allocation performance, for example when creating a [virtual DOM library]. +The big advantage of bump allocation is that it's very fast. Compared to other allocator designs (see below) that need to actively look for a fitting memory block and perform various bookkeeping tasks on `alloc` and `dealloc`, a bump allocator [can be optimized][bump downwards] to just a few assembly instructions. This makes bump allocators useful for optimizing the allocation performance, for example when creating a [virtual DOM library]. +[bump downwards]: https://fitzgeraldnick.com/2019/11/01/always-bump-downwards.html [virtual DOM library]: https://hacks.mozilla.org/2019/03/fast-bump-allocated-virtual-doms-with-rust-and-wasm/ While a bump allocator is seldom used as the global allocator, the principle of bump allocation is often applied in form of [arena allocation], which basically batches individual allocations together to improve performance. An example for an arena allocator for Rust is the [`toolshed`] crate. @@ -335,9 +360,47 @@ While a bump allocator is seldom used as the global allocator, the principle of [arena allocation]: https://mgravell.github.io/Pipelines.Sockets.Unofficial/docs/arenas.html [`toolshed`]: https://docs.rs/toolshed/0.8.1/toolshed/index.html -### Reusing Freed Memory? +#### The Drawback of a Bump Allocator -The main limitation of a bump allocator is that it never reuses deallocated memory. The question is: Can we extend our bump allocator somehow to remove this limitation? +The main limitation of a bump allocator is that it can only reuse deallocated memory after all allocations have been freed. This means that a single long-lived allocation suffices to prevent memory reuse. We can see this when we add a variation of the `many_boxes` test: + +```rust +// in tests/heap_allocation.rs + +#[test_case] +fn many_boxes_long_lived() { + serial_print!("many_boxes_long_lived... "); + let long_lived = Box::new(1); // new + for i in 0..HEAP_SIZE { + let x = Box::new(i); + assert_eq!(*x, i); + } + assert_eq!(*long_lived, 1); // new + serial_println!("[ok]"); +} +``` + +Like the `many_boxes` test, this test creates a large number of allocations to provoke an out-of-memory failure if the allocator does not reuse freed memory. Additionally, the test creates a `long_lived` allocation, which lives for the whole loop execution. + +When we try run our new test, we see that it indeed fails: + +``` +> cargo xtest --test heap_allocation +Running 4 tests +simple_allocation... [ok] +large_vec... [ok] +many_boxes... [ok] +many_boxes_long_lived... [failed] + +Error: panicked at 'allocation error: Layout { size_: 8, align_: 8 }', src/lib.rs:86:5 +``` + +Let's try to understand why this failure occurs in detail: First, the `long_lived` allocation is created at the start of the heap, thereby increasing the `allocations` counter by 1. For each iteration of the loop, a short lived allocation is created and directly freed again before the next iteration starts. This means that the `allocations` counter is temporarily increased to 2 at the beginning of an iteration and decreased to 1 at the end of it. The problem now is that the bump allocator can only reuse memory when _all_ allocations have been freed, i.e. the `allocations` counter falls to 0. Since this doesn't happen before the end of the loop, each loop iteration allocates a new region of memory, leading to an out-of-memory error after a number of iterations. + + +#### Reusing Freed Memory? + +The question is: Can we extend our bump allocator somehow to remove this limitation? As we learned at the beginning of this post, allocations can live arbitrarily long and can be freed in an arbitrary order. This means that we need to keep track of a potentially unbounded number of non-continuous, unused memory regions, as illustrated by the following example: @@ -440,7 +503,7 @@ Our first goal is to set a prototype of the `LinkedListAllocator` as the global ```rust // in src/allocator.rs -unsafe impl GlobalAlloc for Spin { +unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { unimplemented!(); } @@ -451,7 +514,7 @@ unsafe impl GlobalAlloc for Spin { } ``` -Like with the bump allocator, we don't implement the trait directly for the `LinkedListAllocator`, but only for a wrapped `Spin`. The [`Locked` wrapper] adds interior mutability through a spinlock, which allows us to modify the allocator instance even though the `alloc` and `dealloc` methods only take `&self` references. Instead of providing an implementation, we use the [`unimplemented!`] macro again to get a minimal prototype. +Like with the bump allocator, we don't implement the trait directly for the `LinkedListAllocator`, but only for a wrapped `Locked`. The [`Locked` wrapper] adds interior mutability through a spinlock, which allows us to modify the allocator instance even though the `alloc` and `dealloc` methods only take `&self` references. Instead of providing an implementation, we use the [`unimplemented!`] macro again to get a minimal prototype. [`Locked` wrapper]: #a-locked-wrapper @@ -463,7 +526,7 @@ With this placeholder implementation, we can now change the global allocator to use allocator::{Locked, LinkedListAllocator}; #[global_allocator] -static ALLOCATOR: Spin = +static ALLOCATOR: Locked = Spin::new(LinkedListAllocator::new()); ``` @@ -626,7 +689,7 @@ With the fundamental operations provided by the `add_free_region` and `find_regi ```rust // in src/allocator.rs -unsafe impl GlobalAlloc for Spin { +unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { // perform layout adjustments let (size, align) = LinkedListAllocator::size_align(layout); diff --git a/blog/content/second-edition/posts/11-allocator-designs/qemu-bump-allocator.png b/blog/content/second-edition/posts/11-allocator-designs/qemu-bump-allocator.png deleted file mode 100644 index 71397c2970bb068afcf9a424d6094c46da573e73..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 9526 zcmeAS@N?(olHy`uVBq!ia0y~yV7kP>z_^x!je&u|`rBk>1_lO}VkgfK4h{~E8jh3> z1_lO+64!{5;QX|b^2DN4hTO!GRNdm_qSVy9;*9)~6VpC;F)%1Fc)B=-RLpsMw{k-2 z^+fq+_iLl`jhhcTG06mo_&5l%<|$~t+rZ$()~K-LY1iW<74M>@LjMvi`{ta`3V$>y z`Kaddkad&Rt_TVf+n5wNZOu_J-P0lmW-x_&PC0Y*jwh#xroe568MV89f31Fh@6X3w z)m2qhm9@LlmsLOey-Mmu<@-Cu=VR|x?=Sy#r+9VFgL~We@4We`D?D( zmtADmX8d)-J1_eCnWft*tyXT++xYgAxah7eU(QY2|E05Y+Wkd+S-Y8p_2a`THhl}< z9aHy~NnC&9=cm{I7auo`32_h1HoGr7`FiNvlH2+>KTOiyzhYf(T-5FBn~uKDdK8xP zH*C}2Ptkft4+XVq1jA?Bn+C@2j;?FI_U2QCn&+&m|DQa%Ulw)zR(yHHY1PyDpPsOk z=EPTZ3!U|gz8CZ9{HJ+Kp4x6dy>12f>FJ?c-j&=ajFr~6udKDs|EKmm@8C0YX_@nL zb@%^kjGgQIT29Aivw3U|`}!@M{pmkHz3$H4aDb&WCnn{Qc^ju$@oS;my8o{PpI6yw z5wQMW(fix?@7$?)eDUop1_p-fYbUGwuWfs&CE5OBv3vfGU2B;b8kiqkS$X;A4|Ti9 zt?SC185kZYta?}bd-iPUJd2{Ow;34@FkUn9UA4+9_WH`%R$o_mPOhrVD4M*)D|Btl z`{1RSX8YcyKmE8YGwk)=fQr{uZHF(qe!`MvzA`HReAr@%ggDPB6AfO z7(_B=srg<`3Aw(~eK8N);VX+T7Olv;v}zi2_U^fP+vkR?{rKaN+UkWBOXq2YUNXs? zWz~Ezqp9lH)77iIvKRa8zOydT)+;uikv_ zeP7plzdE@~!A7cg)0Qn$_Lp4WxaZ}!*+s`M9^U!&g3mnL?Gy6nSk1Pt`Q&o%|8f7N z=2yLUJx(nXTDH3S{=a9>K0I{ZeXDWtMUC$6zh}?j*^e7tS8Uz_ggx3$;8;?d_5>gi@#{wU|!>%6`@Hq&2b)!qzymS;1^GTxh^ z;q{5E4!Pb-@ls1}zkT-a_q)@+=CWrqLefoE1z$hBC2H=L+iSPnUOUI4P)q*b33nq2 zo=dOGrdKaHe%Nf*vXco09=q?Jx_!ScKX32XqbpzczOMZDQ~cA${#qOF>{lyabK728 z{LnyuUf%QN`=3v>EX&aHwe@;sg<-;Q-$s7 z|NjhsZk=a6=h~x_e{N2i<1D|3rTO_y1>}KTk6bHGMh%<@2repPzq9USF^G|Cjpz=X?Gi znPb4q5EI^Z+Szi~+`M}~&#_;67b2;{dufU1X0;n(YqdgGl~mvDY4eq<{n9Mi$M&*h zSLW7LPX%m!RxJ;%(nqyK3(I zz5m~xF=@Wy-93BT{;RXkIbA(_|HobZCpQx($8Db;mi^oF-}$-L`*$yt?e6Zjto!p* z#>!;+%$YMwsvlpt^Zxtg_uoI?`KFJjVWc@%r8B>wny@x7z< z+I5oGT-JB@Dx=c#yFab7|L*?tYx*STyGta$ZT_;U{Q3Gl9|KlpO<8PyZt2SDr*88x z6!2!xQaiiiWJ0Fq_uBJ&GiN1=J!@ZWdsW*$G<5Ahp7!RF-Fx@E&CNg6ZujM3Nz~(w z=SpMKSKdEwz3=A{W)azmFM!?vf}r9^CQVQL{rT_vzT5tM)4yMTeqP!!?%sK)-2WZbf4cZ@*)8pOjazrm_-;3z zwZ7=fGU-cMv#zX-{#;xC(f8-u`F}&)=K8UH4z9fX{r>#h(Y~{mOpFd7vZ5dZe_sTRJ-<%b4TAV8^ zbn3x`88c_zl-d0^VqW#PimJD9Zq0#tyWeh;nKe6loqFY)jr}PWGI=}RvCI9~$HUfq zwW?3ve%{J&A#H~f<@V+3>E}H&wkXx|EUKx~Jg?7F^!=>3TVl=ICpCPfw?TP#aqH%t zZ!cW@vs~YD%K!I2mwwdf=G0+-IrsgJqequs@tyqg<;*`@<$pfB^SA5$iJzaJPf}T5 z_q6{Pv`&r)KAScIC1Du(y6w7p3M>U?s65!y)Vc8@~V4WD}Szduk+4@b5(qQ z&Humk&!gk<)BnHxcl~+e_CHboUX)+gi%!2FpL$30?x`;mUoP={Wo21(_+iN6yT*H0SoB7+j zd|qybf`BzKDL*D9>z_Py$jNQ-#ZQmFX6b+G;@*4mG2_bvlKg^} zWpB3W==HshjQrBbzVG{_D!J-6izRv3PG5d`wv=KkqEzf$Y}+t=~aZZA8Y+V=eXlYio8pOnvw z^~gVDv;9u~-p^mmSGm6|wSHM*gu&1L+Sue`o>&&gwkRHT zd}`RMz`6Oe*9Fa=?;m{Kb5i);9~pIz?tXtN|MxckbN%~;Ci{Q(UjDhU{eMkd>|D>< zZz+?OtaGn?dB^8f;9O7R*pC+3c|Mb=LCTfR5%`CEs}D#5`s{VwUu zIsfNVa`wK-DFu0U-Y0)7IyP1PmH4Nf$L;2-hqalTKmR;+^V^bDR;7JUKEzC2R&V#b z@4sW+@9+PT{=M7F|70J(>`B}Fy^81S-(39j&)B%~tij4>e5va^&#jZqzqe+~!kY>5 z(zh5HV(e$BlwMW|4Qo#b37vZX_g??JU1x6J`}XYWy;mz=mc4qK6Rywx^kBk(Nc zr7hhrCLitM9-6e*C=|FKfPijz5uL z;kEjz-}1|vJTpRP6{i<(^IWCuUaaEl*V7+5Y01jjKC-rxR+?B==h>Y-CA@g~=g#=Q z(|?|LbV}l-*4>qHyVeGZ8BN}KQ|s)Xnsezhj?4e_*jIV~`*Z#OYi}2?_}u&F)Z+Nl z8-Dp4?Vk`*Ts(92tT~x&l036jF3+4R$q*GSegVcJH|1AKLpqZii9v zv0w9Eh?d6c`3uOaU3J;yYu3;8e{Kfz+Ax?&&vgV(fnZ)PFkv?<4(Xck`BBf1Q$+wkkt5bL%C$&tGfaR?NM&@%pn~{gs<7 zA}Z^q9Q~Jb@58a(mwfH(KV4PVQ+dPfp7V40xdr>DBwk)}(&t&)Did$-$W>7;C-s`1p9gygmDm#>B+@t*!f41w{Cvfa@&X7-X~=oKXb;XPj0W_d7IBARd@61-kR6h z{{MFW@9rg@&%~cye*HD*x^LdzuiL)-?zSv`cBb$8#E_Sz^G>wO{mhd4wLWg*`~MgJ z8~Oj49DiP3F5WB3{Zm)@U-O?2&Ce{We7Di&*UZ)}n{Be)e@DFgz1n`x?rZmEEVY|w z|Kn`^eEl>lmU-!L!TKL}+_fOl`oeuwe_H*&iZ@b^0SsUzK`DeS`jPqCLo;h-XkKw_u zz);WSmrs5^Z$EwK&Y3-Ki$g-^hOPG7eK%_Ffe(jzqhCFp_rz+hpZf8BdH?NqJ+;?* zzmhz!JNM5)`9BxiUd!>ZuY8@hb)Mhy*{d>_UVr;ES%1!1?f5Asrw{w4rKf$q|F<># z$?j===^yTJ@BIAX;-9zd_b07aQ|Gc|NrBk z-rxK5?f)5nfA*%+Sn}Mx8Pe&uGtbqS{{Qs+U+BJB_geX0Ui01gd*AQk@3*?z>gTA} z|GrqW(0cAnzo}^(rYv3Nb@kPfuSeQHeg5x0xorLPm1UFbHk-Vg_|7+d+0`X0mtCrS zwEffme}|2KHtsK~zgO^l^{2aU6VKWFe){J(|33TupKe^9IZIP^9}`1?!PU~qdE2K8 ztNT5B7M*u8Wc}=2>w>qg+T#B9slwc+j!eF8Up0Jf-yO9p{1oCh*JkF;o6eq7ORU7N z8_SsPd_M2`vb%Y@{GTQ6|J|_e_})vOxw$h#L(iQ*|Gf2i{E0Jfa)MNYF9-kpu=1x( zP0EDnVQa4(UMe$NQuVX{+P+S?xm9-MeQuuX{cP=hR^L9o{G92}6FbZL?paJd+M4@p zUR>DS-`~v5_r{g~o-}*L+_`7hZofCp`u#5L?RP#c+OjHaUhTJ;HvfLifAZtwQBe6?^oLHJNHC--D~sDuls9^>TkVI+h(nI-G2X!qZi#u zw;0uCXa!%@_?$l7D5Jc%V%_B4_sjML`CD&puYReg{w2-iOj7-fgBg{6N5Zu4#+hH< zdrft*7X!mO_O)Twiyvw%-gRzn`Mr&|nD>5rrYU|mWa(8e+pJGN9@kGQTRu6iTx;#M zAK&gZFa7xClz#os`QBGgRk-z=epQK-du6!qaN;bj`2WAIPY!wcb=OPZ-h8GT@o%1= zc(n4*m9@`{mq@PhYu=vt+xqUDmAiggpO-n8@XlkB+Tnx8io@4$-C`oa{xWa7jYqua z*Mb>!`%lfS|GNBBF#r82kN-a`|0FH{$7kQCBcEr=U%QzgW5vKQ-E*>v?_|I2vzIPi z+SB%U=YsU~bo06HRdscC{`P-;ZoLg#wrsint_d5~&DQz)TeA0+jCEObT3Xt)*Ac(d z|7TVH+7r0%>ss~T>!r(2e(w2xV&&=B z{EwP>^L8t*uReXd=)U#6iS2QFr}Wo<(6+pP(U^y!K-JWDa+S%ckoD6;PM#E2zx-Zb zHFR>=?x|{zEoVJ{{`~VHZuL)xxb>G@pX-&Mm1XJ@ezoG=lU1vx>D%QLJvwsoHSeV6!$v2UwZx3X#WO_k29X{T6f&`($ekk_f^f} z`E^)lYu<_Nd;g^F+*)Te|Nmy=ms^fzFf?fAE`7f9^4ld|*3AzU79T9IQLCz);lBH- z-|pL<>o3R7Tb8%y%`eSQ$Nu}zN}ICupuypT1(g==b5~`7Mqe&xF2DVHS?E?huE%%p zT~W3k<8mjj{Mg>ptAD<6x)D1wzH0N&h2e7Y=LP4c+>Kp! zd^?BacdP#NZ#x3cpOSc)dpim=W_0+V#^0s?UdalSCz(i1^{?L^AF@{Kt<0NT28JDC zYnE($C?@_r*6;J@&*3Gtb3guX31DEj(fO!y2_wUUs|Wef}lu3oS#`jKJx;l$$b z-Leb}1qn8PE;P@+bAz9O;l?BdUIqqMmP`f)4FwNo1_wq*DFy~Ej!O&-0S*(`7#f(G zOc)qM1eP!|ENB>2IvN6_Aut*OqaiRF0s|ca4(gt>p4oa%TJ`JdRdutfoFMJfd0YRe zq?ToG&b@JVoA0i3Z(rR%X}2rpH#pgd!!oUy{cl9v8?TwlT6W!OBSu-#^ zI8gCpA^Y74tPBiZSZnSBN6b?qtzB9?x@z5Vqsecc zYe%v%+_=;CeB!=&vPWt+h!=;f=dG(Y{QgztbXO^QZHMc;cf{HH+U+;zFH6@7jJi>r?@kA7CA&CDR9<7#_p_T76`f9H2^=UR2!c7K`e^xfNYc7J;_)${v# z6}7v^XaC)O$z3p#M*my)UnOOENropdoi_(lYfIZt@GU9vqt{^-a5%`W+&_RO-f~GcpJ)Z|0Usbqq|&eZNj9RYTI@- zbzf$sEAtTdAHnMvu>Ma&NkPIPFkINKd=66iGAGp)NjGj3=L;(|6FLE z&i3-J6Tc(_!*5VjS5)em?Gf50ZKlw`%urzY_O!y@x;M|<7DfcHF+9k+HQh1FgN-3! z|L$eW_cJpzoOT5_+JVzeAn-N)Q-rRo^MJx4m{n{c-rs#`CHb!d7;<8hOsd`h%zed|DsY~ zHvfzN=gBwL^=&<$`{lim@8@~C`#Ey9t(E>;z4_Pr>qf6n=Wm~XecH`6_s+h$nJLN8 zAY{GoM#1uhOHA&(+x}tmrOUgv#ZHcSupc02asxBNbZ_mMIWL%`-^(*Ftn*);yySYn zf9&rYY2R*3T%Ax*1g_)#&7=qJ8xMTG(LWC=8uzF!pP9D78GrHCzu!*R54rA z*Dbc6V0#et)VObT?(uT7J(tRi|G$s(=`L4&btfzKcj~Eap20TEvl$pz_IkQFhHU%o z^=`JO?rDm?b&^* zp6`15Dl$I&oC0*X#=~KX&JI?T*{UIqJ_p zmwii$UAq4ArK)vH=I+_<^XW!f>Du_TNmkzuvsaz`eM9B>QMc6ZulTlAr*8iIwP;=T z(Kyy~o_$xtoO)&^2yO&F`bHT(~EC=$DZ7u_U6U8w#hXIazghl zndkX>#*>YcQq|TjcR5)W@3-`L@gz_Q=GlLAN$#u?`FBq(w@v=~E-hBOfBKZ0r%T^X zoqKKX^y6u}ji&!D-6hVLv3&j9yL-&1@A1=LzPIA<+pD|EfBgwdRlB=x_0GSqW7lnq zd8^M^{dL~X>+=gI+w6~>x%ZUW^yJ&>k=LWD|H|2(tO6yiz4EsTRWJQdEBl;EpuoVCgR{<@liwXyrQ z%oE*Rd47-7we8DG;@_3n#$@UDAHDh~X+zTT_3P(u`C9Z)s3YUF-iH7xCM1>!!ST^-#7Q(+B^My;O~>Bw%Ny{?)`Z1Df0KT z9n(uU8HaBDJ(F+UvGzMpE$Y7=-o>{lyY&B?qh)W8zRni99(5rm`%-PignPnzoR`>t zx^9WRtzOIcW_#_1`?rc!-(KA&f26iT;P#$yzDH&8XCH5^ySr5W+w9(Cn~e7}H%$Nb z>Z_^e-QSij@1K>{?fmxe=|Ao(DsOi`NjmJwwlqHEpjdLKne-mJlL}vYubH3RF=H*; zzx-|M_TF0O`DV$y-dp>xtL2_AJDxPj$olU2$y08=wYziL?EN>U*>mTdpMBNdbLQTt z>v1!moPPhS==R?2K2hi0&in6LclOk~V|lM;rR+A-GC%om_L4hwKOWpanVqsdxh(wE z4gL2sch~;EnZ3J~|K{uS`}f+s)AqV&tiCe+l3Co!;+)!>lZtDXzrN*lvTXXg{M}~u ze=aodp1qYh!T$5SqfzWXYb;LW?!WcMT5tMkqipsAbq@}Ies%NR`Gw)HXUHrmPF#QP z{@*H{bxU3+O7E;Ht8tz-_5R!$2kJJ(w@b(LzxjSQ+WzptQ^~4d8K$2<;&`iD@W%N) z?uBa${SN;tzw1-a&cIOZ39bN0=($g4dAY38f7UbGXuTz8zDBY!G;n_bjRrKnde|

Date: Thu, 9 Jan 2020 15:43:35 +0100 Subject: [PATCH 10/41] Fixes and improvements to bump allocator section --- .../posts/11-allocator-designs/index.md | 27 ++++++++++--------- 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 793b6aea..564efee4 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -60,13 +60,13 @@ The idea behind a bump allocator is to linearly allocate memory by increasing (_ 2: A second allocation was added right after the first; the `next` pointer points to the end of the second allocation 3: A third allocation was added right after the second one; the `next pointer points to the end of the third allocation](bump-allocation.svg) -The `next` pointer only moves in a single direction and thus never hands out the same memory region twice. When it reaches the end of the heap, no more memory can be allocated, which results in an out-of-memory error. +The `next` pointer only moves in a single direction and thus never hands out the same memory region twice. When it reaches the end of the heap, no more memory can be allocated, resulting in an out-of-memory error on the next allocation. -A bump allocator is often implemented with an allocation counter, which is inreased by 1 on each `alloc` call and decreased by 1 on each `dealloc` call. When the allocation counter reaches zero it means that all allocations on the heap were deallocated so that the complete heap is unused again. In this case, the `next` pointer can be reset to the start address of the heap, so that the complete heap memory is available to allocations again. +A bump allocator is often implemented with an allocation counter, which is inreased by 1 on each `alloc` call and decreased by 1 on each `dealloc` call. When the allocation counter reaches zero it means that all allocations on the heap were deallocated. In this case, the `next` pointer can be reset to the start address of the heap, so that the complete heap memory is available to allocations again. -### Type Implementation +### Implementation -We start our implementation by creating a new `allocator::bump` submodule: +We start our implementation by declaring a new `allocator::bump` submodule: ```rust // in src/allocator.rs @@ -74,7 +74,7 @@ We start our implementation by creating a new `allocator::bump` submodule: pub mod bump; ``` -Now we can create the base type in a `src/allocator/bump.rs` file, which looks like this: +The content of the submodule lives in a new `src/allocator/bump.rs` file, which we create with the following content: ```rust // in src/allocator/bump.rs @@ -109,12 +109,14 @@ impl BumpAllocator { } ``` -The `heap_start` and `heap_end` fields keep track of the lower and upper bound of the heap memory region. The caller need to ensure that these addresses are valid, otherwise the allocator would return invalid memory. For this reason, the `init` function needs to be `unsafe` to call. +The `heap_start` and `heap_end` fields keep track of the lower and upper bound of the heap memory region. The caller needs to ensure that these addresses are valid, otherwise the allocator would return invalid memory. For this reason, the `init` function needs to be `unsafe` to call. The purpose of the `next` field is to always point to the first unused byte of the heap, i.e. the start address of the next allocation. It is set to `heap_start` in the `init` function because at the beginning the complete heap is unused. On each allocation, this field will be increased by the allocation size (_"bumped"_) to ensure that we don't return the same memory region twice. The `allocations` field is a simple counter for the active allocations with the goal of resetting the allocator after the last allocation was freed. It is initialized with 0. +We chose to create a separate `init` function instead of performing the initialization directly in `new` in order to keep the interface identical to the allocator provided by the `linked_list_allocator` crate. This way, the allocators can be switched without additional code changes. + ### Implementing `GlobalAlloc` As [explained in the previous post][global-alloc], all heap allocators need to implement the [`GlobalAlloc`] trait, which is defined like this: @@ -185,11 +187,11 @@ The error occurs because the [`alloc`] and [`dealloc`] methods of the `GlobalAll [`alloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.alloc [`dealloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.dealloc -Note that the compiler suggestion to change `&self` to `&mut self` in the method declaration does not work here. The reason is that the method signature is defined by the `GlobalAlloc` trait and can't be changed on the implementation side. I opened an [issue](https://github.com/rust-lang/rust/issues/68049) in the Rust repository about the invalid suggestion. +Note that the compiler suggestion to change `&self` to `&mut self` in the method declaration does not work here. The reason is that the method signature is defined by the `GlobalAlloc` trait and can't be changed on the implementation side. (I opened an [issue](https://github.com/rust-lang/rust/issues/68049) in the Rust repository about the invalid suggestion.) #### `GlobalAlloc` and Mutability -Before we look at a possible solution to this mutability problem, let's try to understand why the `GlobalAlloc` trait methods are defined with `&self` arguments: As we saw [in the previous post][global-allocator], the global heap allocator is defined by adding the `#[global_allocator]` attribute to a `static` that implements the `GlobalAlloc` trait. Static variables are immutable in Rust, so there is no way to call a method that takes `&mut self` on the allocator `static`. For this reason, all the methods of `GlobalAlloc` only take an immutable `&self` reference. +Before we look at a possible solution to this mutability problem, let's try to understand why the `GlobalAlloc` trait methods are defined with `&self` arguments: As we saw [in the previous post][global-allocator], the global heap allocator is defined by adding the `#[global_allocator]` attribute to a `static` that implements the `GlobalAlloc` trait. Static variables are immutable in Rust, so there is no way to call a method that takes `&mut self` on the static allocator. For this reason, all the methods of `GlobalAlloc` only take an immutable `&self` reference. [global-allocator]: @/second-edition/posts/10-heap-allocation/index.md#the-global-allocator-attribute @@ -208,7 +210,7 @@ With the help of the `spin::Mutex` wrapper type we can implement the `GlobalAllo unsafe impl GlobalAlloc for spin::Mutex {…} ``` -Unfortunatly, the Rust compiler does not permit trait implementations for types defined in other crates: +Unfortunatly, this still doesn't work because the Rust compiler does not permit trait implementations for types defined in other crates: ``` error[E0117]: only traits defined in the current crate can be implemented for arbitrary types @@ -325,8 +327,7 @@ To use the bump allocator instead of the `linked_list_allocator` crate, we need use bump::BumpAllocator; #[global_allocator] -static ALLOCATOR: Locked = - Locked::new(BumpAllocator::new(HEAP_START, HEAP_SIZE)); +static ALLOCATOR: Locked = Locked::new(BumpAllocator::new()); ``` Here it becomes important that we declared `BumpAllocator::new` and `Locked::new` as [`const` functions]. If they were normal functions, a compilation error would occur because the initialization expression of a `static` must evaluable at compile time. @@ -402,7 +403,9 @@ Let's try to understand why this failure occurs in detail: First, the `long_live The question is: Can we extend our bump allocator somehow to remove this limitation? -As we learned at the beginning of this post, allocations can live arbitrarily long and can be freed in an arbitrary order. This means that we need to keep track of a potentially unbounded number of non-continuous, unused memory regions, as illustrated by the following example: +As we learned [in the previous post][heap-intro], allocations can live arbitrarily long and can be freed in an arbitrary order. This means that we need to keep track of a potentially unbounded number of non-continuous, unused memory regions, as illustrated by the following example: + +[heap-intro]: @/second-edition/posts/10-heap-allocation/index.md#dynamic-memory ![](allocation-fragmentation.svg) From dda99166d97ea196278d19965dec0f9815cd50dc Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 9 Jan 2020 16:32:26 +0100 Subject: [PATCH 11/41] Start rewriting linked list allocator section --- .../posts/11-allocator-designs/index.md | 146 ++++++++++-------- 1 file changed, 78 insertions(+), 68 deletions(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 564efee4..742da89b 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -160,7 +160,7 @@ unsafe impl GlobalAlloc for BumpAllocator { } unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { - unimplemented!(); + todo!(); } } ``` @@ -417,27 +417,44 @@ Normally when we have a potentially unbounded number of items, we can just use a ## LinkedList Allocator -A common trick to keep track of an arbitrary number of free memory areas is to use these areas itself as backing storage. This utilizes the fact that the regions are still mapped to a virtual address and backed by a physical frame, but the stored information is not needed anymore. By storing the information about the freed region in the region itself, we can keep track of an unbounded number of freed regions without needing additional memory. +A common trick to keep track of an arbitrary number of free memory areas when implementing allocators is to use these areas itself as backing storage. This utilizes the fact that the regions are still mapped to a virtual address and backed by a physical frame, but the stored information is not needed anymore. By storing the information about the freed region in the region itself, we can keep track of an unbounded number of freed regions without needing additional memory. The most common implementation approach is to construct a single linked list in the freed memory, with each node being a freed memory region: ![](linked-list-allocation.svg) -Each list node contains two fields: The size of the memory region and a pointer to the next unused memory region. With this approach, we only need a pointer to the first unused region (called `head`), independent of the number of memory regions. +Each list node contains two fields: The size of the memory region and a pointer to the next unused memory region. With this approach, we only need a pointer to the first unused region (called `head`) to keep track of all unused regions, independent of their number. As you can guess from the name, this is the technique that the `linked_list_allocator` crate uses. -In the following, we will create a simple `LinkedListAllocator` type that uses the above approach for keeping track of freed memory regions. Since the implementation is a bit longer, we will start with a simple placeholder type before we start to implement the `alloc` and `dealloc` operations. +In the following, we will create our own simple `LinkedListAllocator` type that uses the above approach for keeping track of freed memory regions. This part of the post isn't required for future posts, so you can skip the details if you like. ### The Allocator Type -We start by creating a private `ListNode` struct: +We start by creating a private `ListNode` struct in a new `allocator::linked_list` submodule: ```rust // in src/allocator.rs +pub mod linked_list; +``` + +```rust +// in src/allocator/linked_list.rs + struct ListNode { size: usize, next: Option<&'static mut ListNode>, } +``` + +Like in the graphic, a list node has a `size` field and an optional pointer to the next node, represented by the `Option<&'static mut ListNode>` type. The `&'static mut` type semantically describes an [owned] object behind a pointer. Basically, it's a [`Box`] without a destructor that frees the object at the end of the scope. + +[owned]: https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html +[`Box`]: https://doc.rust-lang.org/alloc/boxed/index.html + +We implement the following set of methods for `ListNode`: + +```rust +// in src/allocator/linked_list.rs impl ListNode { const fn new(size: usize) -> Self { @@ -457,18 +474,19 @@ impl ListNode { } ``` -Like in the graphic, a list node has a `size` field and an optional pointer to the next node. The type has a simple constructor function and methods to calculate the start and end addresses of the represented region. +The type has a simple constructor function and methods to calculate the start and end addresses of the represented region. With the `ListNode` struct as building block, we can now create the `LinkedListAllocator` struct: ```rust -// in src/allocator.rs +// in src/allocator/linked_list.rs pub struct LinkedListAllocator { head: ListNode, } impl LinkedListAllocator { + /// Creates an empty LinkedListAllocator. pub const fn new() -> Self { Self { head: ListNode::new(0), @@ -486,76 +504,20 @@ impl LinkedListAllocator { /// Adds the given memory region to the front of the list. unsafe fn add_free_region(&mut self, addr: usize, size: usize) { - unimplemented!(); + todo!(); } } ``` -The struct contains a `head` node that points to the first heap region. We are only interested in the value of the `next` pointer, so we set the `size` to 0 in the `new` function. Making `head` a `ListNode` instead of just a `&'static mut ListNode` has the advantage that the implementation of the `alloc` method will be simpler. +The struct contains a `head` node that points to the first heap region. We are only interested in the value of the `next` pointer, so we set the `size` to 0 in the `ListNone::new` function. Making `head` a `ListNode` instead of just a `&'static mut ListNode` has the advantage that the implementation of the `alloc` method will be simpler. -In contrast to the bump allocator, the `new` function doesn't initialize the allocator with the heap bounds. The reason is that the initialization requires to write a node to the heap memory, which can only happen at runtime. The `new` function, however, needs to be a [`const` function] that can be evaluated at compile time, because it will be used for initializing the `ALLOCATOR` static. To work around this, we provide a separate `init` method that can be called at runtime. +Like for the bump allocator, the `new` function doesn't initialize the allocator with the heap bounds. In addition to maintaining API compatibility, the reason is that the initialization routine requires to write a node to the heap memory, which can only happen at runtime. The `new` function, however, needs to be a [`const` function] that can be evaluated at compile time, because it will be used for initializing the `ALLOCATOR` static. For this reason, we again provide a separate, non-constant `init` method. [`const` function]: https://doc.rust-lang.org/reference/items/functions.html#const-functions -The `init` method uses a `add_free_region` method, whose implementation will be shown in a moment. For now, we use the [`unimplemented!`] macro to provide a placeholder implementation that always panics. +The `init` method uses a `add_free_region` method, whose implementation will be shown in a moment. For now, we use the [`todo!`] macro to provide a placeholder implementation that always panics. -[`unimplemented!`]: https://doc.rust-lang.org/core/macro.unimplemented.html - -Our first goal is to set a prototype of the `LinkedListAllocator` as the global allocator. In order to be able to do that, we need to provide a placeholder implementation of the `GlobalAlloc` trait: - -```rust -// in src/allocator.rs - -unsafe impl GlobalAlloc for Locked { - unsafe fn alloc(&self, layout: Layout) -> *mut u8 { - unimplemented!(); - } - - unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { - unimplemented!(); - } -} -``` - -Like with the bump allocator, we don't implement the trait directly for the `LinkedListAllocator`, but only for a wrapped `Locked`. The [`Locked` wrapper] adds interior mutability through a spinlock, which allows us to modify the allocator instance even though the `alloc` and `dealloc` methods only take `&self` references. Instead of providing an implementation, we use the [`unimplemented!`] macro again to get a minimal prototype. - -[`Locked` wrapper]: #a-locked-wrapper - -With this placeholder implementation, we can now change the global allocator to a `LinkedListAllocator`: - -```rust -// in src/lib.rs - -use allocator::{Locked, LinkedListAllocator}; - -#[global_allocator] -static ALLOCATOR: Locked = - Spin::new(LinkedListAllocator::new()); -``` - -Since the `new` method creates an empty allocator, we also need to update our `allocator::init` function to call `LinkedListAllocator::init` with the heap bounds: - -```rust -// in src/allocator.rs - -pub fn init_heap( - mapper: &mut impl Mapper, - frame_allocator: &mut impl FrameAllocator, -) -> Result<(), MapToError> { - // […] map all heap pages - - // new - unsafe { - super::ALLOCATOR.inner.lock().init(HEAP_START, HEAP_SIZE); - } - - Ok(()) -} -``` - -It's important to call the `init` function after the mapping of the heap pages, because the function will already write to the heap (once we'll properly implement it). The `unsafe` block is safe here because we just mapped the heap region to unused frames, so that the passed heap region is valid. - -When we run our code now, it will of course panic since it runs into the `unimplemented!` in `add_free_region`. Let's fix that by providing a proper implementation for that method. +[`todo!`]: https://doc.rust-lang.org/core/macro.todo.html ### The `add_free_region` Method @@ -727,6 +689,8 @@ In the success case, the `find_region` method returns a tuple of the suitable re ### Layout Adjustments +TODO + ```rust // in src/allocator.rs @@ -745,6 +709,52 @@ impl LinkedListAllocator { } ``` +TODO + +--- + + +### Setting the Global Allocator + +Our first goal is to set a prototype of the `LinkedListAllocator` as the global allocator. In order to be able to do that, we need to provide a placeholder implementation of the `GlobalAlloc` trait: + +```rust +// in src/allocator/linked_list.rs + +use super::Locked; + +unsafe impl GlobalAlloc for Locked { + unsafe fn alloc(&self, layout: Layout) -> *mut u8 { + todo!(); + } + + unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { + todo!(); + } +} +``` + +As with the bump allocator, we don't implement the trait directly for the `LinkedListAllocator`, but only for a wrapped `Locked`. The [`Locked` wrapper] adds interior mutability through a spinlock, which allows us to modify the allocator instance even though the `alloc` and `dealloc` methods only take `&self` references. Instead of providing an implementation, we use the [`todo!`] macro again to get a minimal prototype. + +[`Locked` wrapper]: @second-edition/posts/11-allocator-designs/index.md#a-locked-wrapper + +With this placeholder implementation, we can now change the global allocator to a `LinkedListAllocator`: + +```rust +// in src/allocator.rs + +use linked_list::LinkedListAllocator; + +#[global_allocator] +static ALLOCATOR: Locked = Locked::new(LinkedListAllocator::new()); +``` + +Since the `init` function behaves the same for the bump and linked list allocators, we don't need to modify the `init` call in `init_heap`. + +When we run our code now, it will of course panic since it runs into the `todo!` in `add_free_region`. Let's fix that by providing a proper implementation for that method. + +--- + From 6cc344918385668e0361a166a3998935126dca5e Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 9 Jan 2020 16:52:47 +0100 Subject: [PATCH 12/41] Restructure headings --- .../posts/11-allocator-designs/index.md | 42 +++++++++++++------ 1 file changed, 30 insertions(+), 12 deletions(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 742da89b..5df7c203 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -425,9 +425,11 @@ The most common implementation approach is to construct a single linked list in Each list node contains two fields: The size of the memory region and a pointer to the next unused memory region. With this approach, we only need a pointer to the first unused region (called `head`) to keep track of all unused regions, independent of their number. As you can guess from the name, this is the technique that the `linked_list_allocator` crate uses. +### Implementation + In the following, we will create our own simple `LinkedListAllocator` type that uses the above approach for keeping track of freed memory regions. This part of the post isn't required for future posts, so you can skip the details if you like. -### The Allocator Type +#### The Allocator Type We start by creating a private `ListNode` struct in a new `allocator::linked_list` submodule: @@ -519,7 +521,7 @@ The `init` method uses a `add_free_region` method, whose implementation will be [`todo!`]: https://doc.rust-lang.org/core/macro.todo.html -### The `add_free_region` Method +#### The `add_free_region` Method The `add_free_region` method provides the fundamental _push_ operation on the linked list. We currently only call this method from `init`, but it will also be the central method in our `dealloc` implementation. Remember, the `dealloc` method is called when an allocated memory region is freed again. To keep track of this freed memory region, we want to push it to the linked list. @@ -557,7 +559,7 @@ In step 2, the method writes the newly created `node` to the beginning of the fr [`write`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write -### The `find_region` Method +#### The `find_region` Method The second fundamental operation on a linked list is finding an entry and removing it from the list. This is the central operation needed for implementing the `alloc` method. We implement the operation as a `find_region` method in the following way: @@ -608,7 +610,7 @@ Step 0 shows the situation before any pointer adjustments. The `region` and `cur In step 2, the `current.next` pointer is set to the local `next` pointer, which is the original `region.next` pointer. The effect is that `current` now directly points to the region after `region`, so that `region` is no longer element of the linked list. The function then returns the pointer to `region` stored in the local `ret` variable. -### The `alloc_from_region` Function +##### The `alloc_from_region` Function The `alloc_from_region` function returns whether a region is suitable for an allocation with given size and alignment. It is defined like this: @@ -647,7 +649,7 @@ First, the function calculates the start and end address of a potential allocati The function performs a less obvious check after that. This check is necessary because most of the time an allocation does not fit a suitable region perfectly, so that a part of the region remains usable after the allocation. This part of the region must store its own `ListNode` after the allocation, so it must be large enough to do so. The check verifies exactly that: either the allocation fits perfectly (`excess_size == 0`) or the excess size is large enough to store a `ListNode`. -### Implementing `GlobalAlloc` +#### Implementing `GlobalAlloc` With the fundamental operations provided by the `add_free_region` and `find_region` methods, we can now finally implement the `GlobalAlloc` trait: @@ -687,7 +689,7 @@ The `alloc` method is a bit more complex. It starts with the same layout adjustm In the success case, the `find_region` method returns a tuple of the suitable region (no longer in the list) and the start address of the allocation. Using `alloc_start`, the allocation size, and the end address of the region, it calculates the end address of the allocation and the excess size again. If the excess size is not null, it calls `add_free_region` to add the excess size of the memory region back to the free list. Finally, it returns the `alloc_start` address casted as a `*mut u8` pointer. -### Layout Adjustments +#### Layout Adjustments TODO @@ -711,10 +713,8 @@ impl LinkedListAllocator { TODO ---- - -### Setting the Global Allocator +#### Setting the Global Allocator Our first goal is to set a prototype of the `LinkedListAllocator` as the global allocator. In order to be able to do that, we need to provide a placeholder implementation of the `GlobalAlloc` trait: @@ -738,7 +738,8 @@ As with the bump allocator, we don't implement the trait directly for the `Linke [`Locked` wrapper]: @second-edition/posts/11-allocator-designs/index.md#a-locked-wrapper -With this placeholder implementation, we can now change the global allocator to a `LinkedListAllocator`: + +### Using it ```rust // in src/allocator.rs @@ -751,7 +752,24 @@ static ALLOCATOR: Locked = Locked::new(LinkedListAllocator: Since the `init` function behaves the same for the bump and linked list allocators, we don't need to modify the `init` call in `init_heap`. -When we run our code now, it will of course panic since it runs into the `todo!` in `add_free_region`. Let's fix that by providing a proper implementation for that method. +### Discussion + +#### Implementation Drawbacks + +#### The `linked_list_allocator` Crate + + +advantages and drawback (performance!) + +-> fixed size blocks + +## Fixed-Size Block Allocator + + +## Summary + +## What's next? + --- @@ -762,7 +780,7 @@ When we run our code now, it will of course panic since it runs into the `todo!` - +# Old ##### Allocation In order to allocate a block of memory, we need to find a hole that satisfies the size and alignment requirements. If the found hole is larger than required, we split it into two smaller holes. For example, when we allocate a 24 byte block right after initialization, we split the single hole into a hole of size 24 and a hole with the remaining size: From 231b5d587b95753a9eac908359788e8e265705dc Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Fri, 10 Jan 2020 13:09:20 +0100 Subject: [PATCH 13/41] Update implementation section of linked list allocator --- .../posts/11-allocator-designs/index.md | 96 ++++++++++--------- 1 file changed, 52 insertions(+), 44 deletions(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 5df7c203..e3b3c184 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -460,10 +460,7 @@ We implement the following set of methods for `ListNode`: impl ListNode { const fn new(size: usize) -> Self { - ListNode { - size, - next: None, - } + ListNode { size, next: None } } fn start_addr(&self) -> usize { @@ -476,7 +473,7 @@ impl ListNode { } ``` -The type has a simple constructor function and methods to calculate the start and end addresses of the represented region. +The type has a simple constructor function named `new` and methods to calculate the start and end addresses of the represented region. We make the `new` function a [const function], which will be required later when constructing a static linked list allocator. Note that any use of mutable references in const functions (including setting the `next` field to `None`) is still unstable. In order to get it to compile, we need to add **`#![feature(const_fn)]`** to the beginning of our `lib.rs`. With the `ListNode` struct as building block, we can now create the `LinkedListAllocator` struct: @@ -528,7 +525,10 @@ The `add_free_region` method provides the fundamental _push_ operation on the li The implementation of the `add_free_region` method looks like this: ```rust -// in src/allocator.rs +// in src/allocator/linked_list.rs + +use super::align_up; +use core::mem; impl LinkedListAllocator { /// Adds the given memory region to the front of the list. @@ -564,7 +564,7 @@ In step 2, the method writes the newly created `node` to the beginning of the fr The second fundamental operation on a linked list is finding an entry and removing it from the list. This is the central operation needed for implementing the `alloc` method. We implement the operation as a `find_region` method in the following way: ```rust -// in src/allocator.rs +// in src/allocator/linked_list.rs impl LinkedListAllocator { /// Looks for a free region with the given size and alignment and removes @@ -615,10 +615,11 @@ In step 2, the `current.next` pointer is set to the local `next` pointer, which The `alloc_from_region` function returns whether a region is suitable for an allocation with given size and alignment. It is defined like this: ```rust -// in src/allocator.rs +// in src/allocator/linked_list.rs impl LinkedListAllocator { - /// Try to use the given region for an allocation with given size and alignment. + /// Try to use the given region for an allocation with given size and + /// alignment. /// /// Returns the allocation start address on success. fn alloc_from_region(region: &ListNode, size: usize, align: usize) @@ -651,10 +652,18 @@ The function performs a less obvious check after that. This check is necessary b #### Implementing `GlobalAlloc` -With the fundamental operations provided by the `add_free_region` and `find_region` methods, we can now finally implement the `GlobalAlloc` trait: +With the fundamental operations provided by the `add_free_region` and `find_region` methods, we can now finally implement the `GlobalAlloc` trait. As with the bump allocator, we don't implement the trait directly for the `LinkedListAllocator`, but only for a wrapped `Locked`. The [`Locked` wrapper] adds interior mutability through a spinlock, which allows us to modify the allocator instance even though the `alloc` and `dealloc` methods only take `&self` references. + +[`Locked` wrapper]: @second-edition/posts/11-allocator-designs/index.md#a-locked-wrapper + +The implementation looks like this: ```rust -// in src/allocator.rs +// in src/allocator/linked_list.rs + +use super::Locked; +use alloc::alloc::{GlobalAlloc, Layout}; +use core::ptr; unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { @@ -670,7 +679,7 @@ unsafe impl GlobalAlloc for Locked { } alloc_start as *mut u8 } else { - null_mut() + ptr::null_mut() } } @@ -691,67 +700,66 @@ In the success case, the `find_region` method returns a tuple of the suitable re #### Layout Adjustments -TODO +So what are these layout adjustments that we do at the beginning of both `alloc` and `dealloc`? They ensure that each allocated block is capable of storing a `ListNode`. This is important because the memory block is going to be deallocated at some point, where we want to write a `ListNode` to it. If the block is smaller than a `ListNode` or does not have the correct alignment, undefined behavior can occur. + +The layout adjustments are performed by a `size_align` function, which is defined like this: ```rust -// in src/allocator.rs +// in src/allocator/linked_list.rs impl LinkedListAllocator { /// Adjust the given layout so that the resulting allocated memory /// region is also capable of storing a `ListNode`. /// - /// Returns the adjusted size and alignment as a (size, align) tuple. + /// Returns the adjusted size and alignment as a (size, align) tuple. fn size_align(layout: Layout) -> (usize, usize) { - let layout = layout.align_to(mem::align_of::()) - .and_then(|l| l.pad_to_align()) - .expect("adjusting alignment failed"); + let layout = layout + .align_to(mem::align_of::()) + .expect("adjusting alignment failed") + .pad_to_align(); let size = layout.size().max(mem::size_of::()); (size, layout.align()) } } ``` -TODO +First, the function uses the [`align_to`] method on the passed [`Layout`] to increase the alignment to the alignment of a `ListNode` if necessary. It then uses the [`pad_to_align`] method to round up the size to a multiple of the alignment to ensure that the start address of the next memory block will have the correct alignment for storing a `ListNode` too. +In the second step it uses the [`max`] method to enforce a minimum allocation size of `mem::size_of::`. This way, the `dealloc` function can safetly write a `ListNode` to the freed memory block. +[`align_to`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.align_to +[`pad_to_align`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.pad_to_align +[`max`]: https://doc.rust-lang.org/std/cmp/trait.Ord.html#method.max -#### Setting the Global Allocator - -Our first goal is to set a prototype of the `LinkedListAllocator` as the global allocator. In order to be able to do that, we need to provide a placeholder implementation of the `GlobalAlloc` trait: - -```rust -// in src/allocator/linked_list.rs - -use super::Locked; - -unsafe impl GlobalAlloc for Locked { - unsafe fn alloc(&self, layout: Layout) -> *mut u8 { - todo!(); - } - - unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { - todo!(); - } -} -``` - -As with the bump allocator, we don't implement the trait directly for the `LinkedListAllocator`, but only for a wrapped `Locked`. The [`Locked` wrapper] adds interior mutability through a spinlock, which allows us to modify the allocator instance even though the `alloc` and `dealloc` methods only take `&self` references. Instead of providing an implementation, we use the [`todo!`] macro again to get a minimal prototype. - -[`Locked` wrapper]: @second-edition/posts/11-allocator-designs/index.md#a-locked-wrapper - +Both the `align_to` and the `pad_to_align` methods are still unstable. To enable then, we need to add **`#![feature(alloc_layout_extra)]`** to the beginning of our `lib.rs`. ### Using it +We can now update the `ALLOCATOR` static in the `allocator` module to use our new `LinkedListAllocator`: + ```rust // in src/allocator.rs use linked_list::LinkedListAllocator; #[global_allocator] -static ALLOCATOR: Locked = Locked::new(LinkedListAllocator::new()); +static ALLOCATOR: Locked = + Locked::new(LinkedListAllocator::new()); ``` Since the `init` function behaves the same for the bump and linked list allocators, we don't need to modify the `init` call in `init_heap`. +When we now run our `heap_allocation` tests again, we see that all tests pass now, including the `many_boxes_long_lived` test that failed with the bump allocator: + +``` +> cargo xtest --test heap_allocation +simple_allocation... [ok] +large_vec... [ok] +many_boxes... [ok] +many_boxes_long_lived... [ok] +``` + +This shows that our linked list allocator is able to reuse freed memory for subsequent allocations. + ### Discussion #### Implementation Drawbacks From 14c0cc7ece010dc84d17e52acf4f9d5085901289 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Fri, 10 Jan 2020 13:09:31 +0100 Subject: [PATCH 14/41] Fix typo --- blog/content/second-edition/posts/11-allocator-designs/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index e3b3c184..c531041e 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -210,7 +210,7 @@ With the help of the `spin::Mutex` wrapper type we can implement the `GlobalAllo unsafe impl GlobalAlloc for spin::Mutex {…} ``` -Unfortunatly, this still doesn't work because the Rust compiler does not permit trait implementations for types defined in other crates: +Unfortunately, this still doesn't work because the Rust compiler does not permit trait implementations for types defined in other crates: ``` error[E0117]: only traits defined in the current crate can be implemented for arbitrary types From 6a4fdf94fca3920a19312e73759dd9cdcd231c83 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Mon, 13 Jan 2020 10:07:25 +0100 Subject: [PATCH 15/41] Write discussion section --- .../posts/11-allocator-designs/index.md | 122 +++--------------- 1 file changed, 20 insertions(+), 102 deletions(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index c531041e..230a6601 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -762,12 +762,29 @@ This shows that our linked list allocator is able to reuse freed memory for subs ### Discussion -#### Implementation Drawbacks +In contrast to the bump allocator, the linked list allocator is much more suitable as a general purpose allocator, mainly because it is able to directly reuse freed memory. However, it also has some drawbacks. Some of them are only caused by our basic implementation, but there are also fundamental drawbacks of the allocator design itself. -#### The `linked_list_allocator` Crate +#### Merging Freed Blocks +The main problem of our implementation is that it only splits the heap into smaller blocks, but never merges them back together. Consider this example: -advantages and drawback (performance!) +TODO + +In the first line, three allocations are created on the heap. Two of them are freed again in line 2 and the third is freed in line 3. Now the complete heap is unused again, but it is still split into four individual blocks. At this point, a large allocation might not be possible anymore because none of the four blocks is large enough. Over time, the process continues and the heap is split into smaller and smaller blocks. At some point, the heap is so fragmented that even normal sized allocations will fail. + +To fix this problem, we need to merge adjacent freed blocks back together. For the above example, this would mean the following: + +TODO + +In line 3, we merge the rightmost allocation, which was just freed, together with the adjacent block representing the unused rest of the heap. In line TODO, we can merge all three unused blocks together because they're adjacent, with the result that the unused heap is represented by a single block again. + +The `linked_list_allocator` crate implements this merging strategy in the following way: Instead of inserting freed memory blocks at the beginning of the linked list on `deallocate`, it always keeps the list sorted by start address. This way, merging can be performed directly on the `deallocate` call by examining the addresses and sizes of the two neighbor blocks in the list. Of course, the deallocation operation is slower this way, but it prevents the heap fragmentation we saw above. + +#### Performance + +As we learned above, the bump allocator is extremely fast and can be optimized to just a few assembly operations. The linked list allocator performs much worse in this category. The problem is that an allocation request might need to traverse the complete linked list until it finds a suitable block. Since the list length depends on the number of unused memory blocks, the performance can vary extremely for different programs. A program that only creates a couple of allocations will experience a relatively fast allocation performance. For a program that fragments the heap with many allocations, however, will experience a very bad allocation performance. + +This isn't a problem with our allocation, but a fundamental disadvantage of the linked list approach. -> fixed size blocks @@ -790,105 +807,6 @@ advantages and drawback (performance!) # Old -##### Allocation -In order to allocate a block of memory, we need to find a hole that satisfies the size and alignment requirements. If the found hole is larger than required, we split it into two smaller holes. For example, when we allocate a 24 byte block right after initialization, we split the single hole into a hole of size 24 and a hole with the remaining size: - -![split hole](split-hole.svg) - -Then we use the new 24 byte hole to perform the allocation: - -![24 bytes allocated](allocate.svg) - -To find a suitable hole, we can use several search strategies: - -- **best fit**: Search the whole list and choose the _smallest_ hole that satisfies the requirements. -- **worst fit**: Search the whole list and choose the _largest_ hole that satisfies the requirements. -- **first fit**: Search the list from the beginning and choose the _first_ hole that satisfies the requirements. - -Each strategy has its advantages and disadvantages. Best fit uses the smallest hole possible and leaves larger holes for large allocations. But splitting the smallest hole might create a tiny hole, which is too small for most allocations. In contrast, the worst fit strategy always chooses the largest hole. Thus, it does not create tiny holes, but it consumes the large block, which might be required for large allocations. - -For our use case, the best fit strategy is better than worst fit. The reason is that we have a minimal hole size of 16 bytes, since each hole needs to be able to store a size (8 bytes) and a pointer to the next hole (8 bytes). Thus, even the best fit strategy leads to holes of usable size. Furthermore, we will need to allocate very large blocks occasionally (e.g. for [DMA] buffers). - -[DMA]: https://en.wikipedia.org/wiki/Direct_memory_access - -However, both best fit and worst fit have a significant problem: They need to scan the whole list for each allocation in order to find the optimal block. This leads to long allocation times if the list is long. The first fit strategy does not have this problem, as it returns as soon as it finds a suitable hole. It is fairly fast for small allocations and might only need to scan the whole list for large allocations. - -### Deallocation -To deallocate a block of memory, we can just insert its corresponding hole somewhere into the list. However, we need to merge adjacent holes. Otherwise, we are unable to reuse the freed memory for larger allocations. For example: - -![deallocate memory, which leads to adjacent holes](deallocate.svg) - -In order to use these adjacent holes for a large allocation, we need to merge them to a single large hole first: - -![merge adjacent holes and allocate large block](merge-holes-and-allocate.svg) - -The easiest way to ensure that adjacent holes are always merged, is to keep the hole list sorted by address. Thus, we only need to check the predecessor and the successor in the list when we free a memory block. If they are adjacent to the freed block, we merge the corresponding holes. Else, we insert the freed block as a new hole at the correct position. - -## Implementation -The detailed implementation would go beyond the scope of this post, since it contains several hidden difficulties. For example: - -- Several merge cases: Merge with the previous hole, merge with the next hole, merge with both holes. -- We need to satisfy the alignment requirements, which requires additional splitting logic. -- The minimal hole size of 16 bytes: We must not create smaller holes when splitting a hole. - -I created the [linked_list_allocator] crate to handle all of these cases. It consists of a [Heap struct] that provides an `allocate_first_fit` and a `deallocate` method. It also contains a [LockedHeap] type that wraps `Heap` into spinlock so that it's usable as a static system allocator. If you are interested in the implementation details, check out the [source code][linked_list_allocator source]. - -[linked_list_allocator]: https://docs.rs/crate/linked_list_allocator/0.4.1 -[Heap struct]: https://docs.rs/linked_list_allocator/0.4.1/linked_list_allocator/struct.Heap.html -[LockedHeap]: https://docs.rs/linked_list_allocator/0.4.1/linked_list_allocator/struct.LockedHeap.html -[linked_list_allocator source]: https://github.com/phil-opp/linked-list-allocator - -We need to add the extern crate to our `Cargo.toml` and our `lib.rs`: - -``` shell -> cargo add linked_list_allocator -``` - -```rust -// in src/lib.rs -extern crate linked_list_allocator; -``` - -Now we can change our global allocator: - -```rust -use linked_list_allocator::LockedHeap; - -#[global_allocator] -static HEAP_ALLOCATOR: LockedHeap = LockedHeap::empty(); -``` - -We can't initialize the linked list allocator statically, since it needs to initialize the first hole (like described [above](#initialization)). This can't be done at compile time, so the function can't be a `const` function. Therefore we can only create an empty heap and initialize it later at runtime. For that, we add the following lines to our `rust_main` function: - -```rust -// in src/lib.rs - -#[no_mangle] -pub extern "C" fn rust_main(multiboot_information_address: usize) { - […] - - // set up guard page and map the heap pages - memory::init(boot_info); - - // initialize the heap allocator - unsafe { - HEAP_ALLOCATOR.lock().init(HEAP_START, HEAP_START + HEAP_SIZE); - } - […] -} -``` - -It is important that we initialize the heap _after_ mapping the heap pages, since the init function writes to the heap memory (the first hole). - -Our kernel uses the new allocator now, so we can deallocate memory without leaking it. The example from above should work now without causing an OOM situation: - -```rust -// in rust_main in src/lib.rs - -for i in 0..10000 { - format!("Some String"); -} -``` ## Performance The linked list based approach has some performance problems. Each allocation or deallocation might need to scan the complete list of holes in the worst case. However, I think it's good enough for now, since our heap will stay relatively small for the near future. When our allocator becomes a performance problem eventually, we can just replace it with a faster alternative. From e4c07e035665adccbc95986f39fa0e595af1c590 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Mon, 13 Jan 2020 13:21:58 +0100 Subject: [PATCH 16/41] Fill in images related to merging freed blocks --- .../second-edition/posts/11-allocator-designs/index.md | 4 ++-- .../linked-list-allocator-fragmentation-on-dealloc.svg | 3 +++ .../linked-list-allocator-merge-on-dealloc.svg | 3 +++ 3 files changed, 8 insertions(+), 2 deletions(-) create mode 100644 blog/content/second-edition/posts/11-allocator-designs/linked-list-allocator-fragmentation-on-dealloc.svg create mode 100644 blog/content/second-edition/posts/11-allocator-designs/linked-list-allocator-merge-on-dealloc.svg diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 230a6601..e657afd8 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -768,13 +768,13 @@ In contrast to the bump allocator, the linked list allocator is much more suitab The main problem of our implementation is that it only splits the heap into smaller blocks, but never merges them back together. Consider this example: -TODO +![](linked-list-allocator-fragmentation-on-dealloc.svg) In the first line, three allocations are created on the heap. Two of them are freed again in line 2 and the third is freed in line 3. Now the complete heap is unused again, but it is still split into four individual blocks. At this point, a large allocation might not be possible anymore because none of the four blocks is large enough. Over time, the process continues and the heap is split into smaller and smaller blocks. At some point, the heap is so fragmented that even normal sized allocations will fail. To fix this problem, we need to merge adjacent freed blocks back together. For the above example, this would mean the following: -TODO +![](linked-list-allocator-merge-on-dealloc.svg) In line 3, we merge the rightmost allocation, which was just freed, together with the adjacent block representing the unused rest of the heap. In line TODO, we can merge all three unused blocks together because they're adjacent, with the result that the unused heap is represented by a single block again. diff --git a/blog/content/second-edition/posts/11-allocator-designs/linked-list-allocator-fragmentation-on-dealloc.svg b/blog/content/second-edition/posts/11-allocator-designs/linked-list-allocator-fragmentation-on-dealloc.svg new file mode 100644 index 00000000..5b890563 --- /dev/null +++ b/blog/content/second-edition/posts/11-allocator-designs/linked-list-allocator-fragmentation-on-dealloc.svg @@ -0,0 +1,3 @@ + + +
1
1
heap end
heap end
heap start
heap start
size
size
next pointer
<div>next pointer</div>
head
head
2
2
heap end
heap end
heap start
heap start
size
size
next pointer
<div>next pointer</div>
head
head
3
3
heap end
heap end
heap start
heap start
size
size
next pointer
<div>next pointer</div>
head
head
\ No newline at end of file diff --git a/blog/content/second-edition/posts/11-allocator-designs/linked-list-allocator-merge-on-dealloc.svg b/blog/content/second-edition/posts/11-allocator-designs/linked-list-allocator-merge-on-dealloc.svg new file mode 100644 index 00000000..02ec01e7 --- /dev/null +++ b/blog/content/second-edition/posts/11-allocator-designs/linked-list-allocator-merge-on-dealloc.svg @@ -0,0 +1,3 @@ + + +
1
1
heap end
heap end
heap start
heap start
size
size
next pointer
<div>next pointer</div>
head
head
2
2
heap end
heap end
heap start
heap start
size
size
next pointer
<div>next pointer</div>
head
head
3
3
heap end
heap end
heap start
heap start
size
size
next pointer
<div>next pointer</div>
head
head
\ No newline at end of file From b34cad7c61860000703602c29f0e64b31e319e87 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Tue, 14 Jan 2020 10:50:00 +0100 Subject: [PATCH 17/41] Improve example for merging freed blocks --- blog/content/second-edition/posts/11-allocator-designs/index.md | 2 +- .../linked-list-allocator-merge-on-dealloc.svg | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index e657afd8..ec0cb6ec 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -776,7 +776,7 @@ To fix this problem, we need to merge adjacent freed blocks back together. For t ![](linked-list-allocator-merge-on-dealloc.svg) -In line 3, we merge the rightmost allocation, which was just freed, together with the adjacent block representing the unused rest of the heap. In line TODO, we can merge all three unused blocks together because they're adjacent, with the result that the unused heap is represented by a single block again. +Like before, two of the three allocations are freed in line `2`. Instead of keeping the fragmented heap, we now perform an additional step in line `2a` to merge the two rightmost blocks back together. In line `3`, the third allocation is freed (like before), resulting in a completely unused heap represented by three distinct blocks. In an additional merging step in line `3a` we then merge the three adjacent blocks back together. The `linked_list_allocator` crate implements this merging strategy in the following way: Instead of inserting freed memory blocks at the beginning of the linked list on `deallocate`, it always keeps the list sorted by start address. This way, merging can be performed directly on the `deallocate` call by examining the addresses and sizes of the two neighbor blocks in the list. Of course, the deallocation operation is slower this way, but it prevents the heap fragmentation we saw above. diff --git a/blog/content/second-edition/posts/11-allocator-designs/linked-list-allocator-merge-on-dealloc.svg b/blog/content/second-edition/posts/11-allocator-designs/linked-list-allocator-merge-on-dealloc.svg index 02ec01e7..d9b408a2 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/linked-list-allocator-merge-on-dealloc.svg +++ b/blog/content/second-edition/posts/11-allocator-designs/linked-list-allocator-merge-on-dealloc.svg @@ -1,3 +1,3 @@ -
1
1
heap end
heap end
heap start
heap start
size
size
next pointer
<div>next pointer</div>
head
head
2
2
heap end
heap end
heap start
heap start
size
size
next pointer
<div>next pointer</div>
head
head
3
3
heap end
heap end
heap start
heap start
size
size
next pointer
<div>next pointer</div>
head
head
\ No newline at end of file +
2a
2a
heap end
heap end
heap start
heap start
size
s...
next pointer
n...
head
head
3a
3a
heap end
heap end
heap start
heap start
size
s...
next pointer
n...
head
head
1
1
heap end
heap end
heap start
heap start
size
s...
next pointer
n...
head
head
2
2
heap end
heap end
heap start
heap start
size
s...
next pointer
n...
head
head
3
3
heap end
heap end
heap start
heap start
size
s...
next pointer
n...
head
head
\ No newline at end of file From 8f80378e65e8c199f2805bc19f6bb701f88065e7 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Tue, 14 Jan 2020 10:50:32 +0100 Subject: [PATCH 18/41] Improve section about performance of linked list allocator --- .../second-edition/posts/11-allocator-designs/index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index ec0cb6ec..5c301b3a 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -782,11 +782,11 @@ The `linked_list_allocator` crate implements this merging strategy in the follow #### Performance -As we learned above, the bump allocator is extremely fast and can be optimized to just a few assembly operations. The linked list allocator performs much worse in this category. The problem is that an allocation request might need to traverse the complete linked list until it finds a suitable block. Since the list length depends on the number of unused memory blocks, the performance can vary extremely for different programs. A program that only creates a couple of allocations will experience a relatively fast allocation performance. For a program that fragments the heap with many allocations, however, will experience a very bad allocation performance. +As we learned above, the bump allocator is extremely fast and can be optimized to just a few assembly operations. The linked list allocator performs much worse in this category. The problem is that an allocation request might need to traverse the complete linked list until it finds a suitable block. -This isn't a problem with our allocation, but a fundamental disadvantage of the linked list approach. +Since the list length depends on the number of unused memory blocks, the performance can vary extremely for different programs. A program that only creates a couple of allocations will experience a relatively fast allocation performance. For a program that fragments the heap with many allocations, however, will experience a very bad allocation performance. --> fixed size blocks +It's worth noting that this performance issue isn't a problem with our implementation, but a fundamental disadvantage of the linked list approach. Since allocation performance can be very important for kernel-level code, we explore a third allocator design in the following that trades improved performance for reduced memory effiency. ## Fixed-Size Block Allocator From 1915e6feb474aa54edeb54668bb316fd711b3864 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Tue, 14 Jan 2020 11:48:09 +0100 Subject: [PATCH 19/41] Minor improvements to linked list section --- .../second-edition/posts/11-allocator-designs/index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 5c301b3a..3ab8dcfc 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -784,9 +784,9 @@ The `linked_list_allocator` crate implements this merging strategy in the follow As we learned above, the bump allocator is extremely fast and can be optimized to just a few assembly operations. The linked list allocator performs much worse in this category. The problem is that an allocation request might need to traverse the complete linked list until it finds a suitable block. -Since the list length depends on the number of unused memory blocks, the performance can vary extremely for different programs. A program that only creates a couple of allocations will experience a relatively fast allocation performance. For a program that fragments the heap with many allocations, however, will experience a very bad allocation performance. +Since the list length depends on the number of unused memory blocks, the performance can vary extremely for different programs. A program that only creates a couple of allocations will experience a relatively fast allocation performance. For a program that fragments the heap with many allocations, however, will experience a very bad allocation performance because the linked list will be very long and mostly contain very small blocks. -It's worth noting that this performance issue isn't a problem with our implementation, but a fundamental disadvantage of the linked list approach. Since allocation performance can be very important for kernel-level code, we explore a third allocator design in the following that trades improved performance for reduced memory effiency. +It's worth noting that this performance issue isn't a problem with our implementation, but a fundamental disadvantage of the linked list approach. Since allocation performance can be very important for kernel-level code, we explore a third allocator design in the following that trades improved performance for reduced memory utilization. ## Fixed-Size Block Allocator From c92b0d46dc32d4e644e6ef396e2e0440eca57a0b Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Tue, 14 Jan 2020 11:48:28 +0100 Subject: [PATCH 20/41] Begin introduction for fixed-size block allocator --- .../posts/11-allocator-designs/index.md | 37 +++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 3ab8dcfc..1fe4925f 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -790,6 +790,43 @@ It's worth noting that this performance issue isn't a problem with our implement ## Fixed-Size Block Allocator +In the following, we present an allocator design that uses fixed-size memory blocks for fulfilling allocation requests. This way, the allocator often returns blocks that are larger than needed for allocations, which results in wasted memory. On the other hand, it drastly reduces the time required to find a suitable block (compared to the linked list allocator), resulting in much better allocation performance. + +### Introduction + +The idea behind a _fixed-size block allocator_ is the following: Instead of allocating exactly as much memory as requested, we define a small number of block sizes and round up each allocation to the next block size. For example, with block sizes of 16, 64, and 512, an allocation of 4 bytes would return a 16-byte block, an allocation of 48 bytes a 64-byte block, and an allocation of 128 bytes an 512-byte block. + +Like the linked list allocator, we keep track of the unused memory by creating a linked list in the unused memory. Howver, instead of using a single list with different block sizes, we create a separate list for each block size. Each list then only stores blocks of a single size. For example, with block sizes 16, 64, and 512 there would be three separate linked lists in memory: + +![](fixed-size-block-example.svg). + +Instead of a single `head` pointer, we have the three head pointers `head_16`, `head_64`, and `head_512` that each point to the first unused block of the corresponding size. All nodes in a single list have the same size. For example, the list started by the `head_16` pointer only contains 16-byte blocks. This means that we no longer need to store the size in each list node since it is already specified by the name of the head pointer. + +Since each element in a list has the same size, each list element is equally suitable for an allocation request. This means that we can very efficiently perform an allocation using the following steps: + +- Round up the requested allocation size to the next block size. For example, when an allocation of 12 bytes is requested, we would choose the block size 16 in the above example. +- Retrieve the head pointer for the list, e.g. from an array. For block size 16, we need to use `head_16`. +- Remove the first block from the list and return it. + +Most notably, we can always return the first element of the list and no longer need to traverse the full list. Thus, allocations are much faster than on the linked list allocator. + +#### Block Sizes and Wasted Memory + +Depending on the block sizes, we lose a lot of memory by rounding up. For example, when a 512-byte block is returned for a 128 byte allocation, three quarters of the allocated memory are unused. By defining reasonable block sizes, it is possible to limit the amount of wasted memory to some degree. For example, when using the powers of 2 (4, 8, 16, 32, 64, 128, …) as block sizes, we can limit the memory waste to half of the allocation size in the worst case and a quarter of the allocation size in the average case. + +It is also common to optimize block sizes based on common allocation sizes in a program. For example, we could additionally add block size 24 to improve memory usage for programs that often perform allocations of 24 bytes. This way, the amount of wasted memory can be often reduced without losing the performance benefits. + +#### Deallocation + +Like allocation, deallocation is also very performant. It involves the following steps: + +- Round up the freed allocation size to the next block size. This is required since the compiler only passes the requested allocation size to `dealloc`, not the size of the block that was returned by `alloc`. By using the same size-adjustment function in both `alloc` and `dealloc` we can make sure that we always free the correct amount of memory. +- Retrieve the head pointer for the list, e.g. from an array. +- Add the freed block to the front of the list by updating the head pointer. + +Most notably, no traversal of the list is required for deallocation either. This means that the time required for a `dealloc` call stays the same regardless of the list length. + +### Fall-back Allocator ## Summary From 64e5b67f3508c40bb262cc5bfc8fdd36d8dc9f93 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Tue, 14 Jan 2020 11:49:07 +0100 Subject: [PATCH 21/41] Remove old content --- .../posts/11-allocator-designs/index.md | 18 ------------------ 1 file changed, 18 deletions(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 1fe4925f..0c498b21 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -833,28 +833,10 @@ Most notably, no traversal of the list is required for deallocation either. This ## What's next? ---- - - - - -# Old - - -## Performance -The linked list based approach has some performance problems. Each allocation or deallocation might need to scan the complete list of holes in the worst case. However, I think it's good enough for now, since our heap will stay relatively small for the near future. When our allocator becomes a performance problem eventually, we can just replace it with a faster alternative. - -## Summary -Now we're able to use heap storage in our kernel without leaking memory. This allows us to effectively process dynamic data such as user supplied strings in the future. We can also use `Rc` and `Arc` to create types with shared ownership. And we have access to various data structures such as `Vec` or `Linked List`, which will make our lives much easier. We even have some well tested and optimized [binary heap] and [B-tree] implementations! - -[binary heap]:https://en.wikipedia.org/wiki/Binary_heap -[B-tree]: https://en.wikipedia.org/wiki/B-tree - - --- TODO: update date From f042761ada991f44056e16b2008d0b091e97edd9 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Tue, 14 Jan 2020 11:50:28 +0100 Subject: [PATCH 22/41] Fix typo --- blog/content/second-edition/posts/11-allocator-designs/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 0c498b21..bb1910e8 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -796,7 +796,7 @@ In the following, we present an allocator design that uses fixed-size memory blo The idea behind a _fixed-size block allocator_ is the following: Instead of allocating exactly as much memory as requested, we define a small number of block sizes and round up each allocation to the next block size. For example, with block sizes of 16, 64, and 512, an allocation of 4 bytes would return a 16-byte block, an allocation of 48 bytes a 64-byte block, and an allocation of 128 bytes an 512-byte block. -Like the linked list allocator, we keep track of the unused memory by creating a linked list in the unused memory. Howver, instead of using a single list with different block sizes, we create a separate list for each block size. Each list then only stores blocks of a single size. For example, with block sizes 16, 64, and 512 there would be three separate linked lists in memory: +Like the linked list allocator, we keep track of the unused memory by creating a linked list in the unused memory. However, instead of using a single list with different block sizes, we create a separate list for each block size. Each list then only stores blocks of a single size. For example, with block sizes 16, 64, and 512 there would be three separate linked lists in memory: ![](fixed-size-block-example.svg). From ad671a3a92948b563d8f3c46924a5cb2e58d49a3 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Tue, 14 Jan 2020 12:34:26 +0100 Subject: [PATCH 23/41] Continue fixed-size block allocator section --- .../posts/11-allocator-designs/index.md | 37 ++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index bb1910e8..af9a3a4c 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -826,7 +826,42 @@ Like allocation, deallocation is also very performant. It involves the following Most notably, no traversal of the list is required for deallocation either. This means that the time required for a `dealloc` call stays the same regardless of the list length. -### Fall-back Allocator +#### Fallback Allocator + +Given that large allocations (>1KB) are often rare, especially in operating system kernels, it might make sense to fall back to a different allocator for these allocations. For example, we could fall back to a linked list allocator for allocations greater than 512 bytes in order to reduce memory waste. Since only very few allocations of that size are expected, the the linked list would stay small so that (de)allocations would be still reasonably fast. + +#### Creating new Blocks + +Above, we always assumed that there are always enough blocks of a specific size in the list to fulfill all allocation requests. However, at some point the linked list for a block size becomes empty. At this point, there are two ways how we can create new unused blocks of a specific size to fulfil an allocation request: + +- Allocate a new block from the fallback allocator (if there is one). +- Split a larger block from a different list. This best works if block sizes are powers of two. For example, a 32-byte block can be split into two 16-byte blocks. + +For our implementation, we will allocate new blocks from the fallback allocator since the implementation is much simpler. + +### Implementation + +Now that we know how a fixed-size block allocator works, we can start our implementation. We won't depend on the implementation of the linked list allocator created in the previous section, so you can follow this part even if you skipped the linked list allocator implementation. + +We start our implementation in a new `allocator::fixed_size_block` module: + +```rust +// in src/allocator.rs + +pub mod fixed_size_block; +``` + +```rust +// in src/allocator/fixed_size_block.rs + +/// The block sizes to use. +/// +/// The sizes must each be power of 2 because they are also used as +/// the block alignment (alignments must be always powers of 2). +const BLOCK_SIZES: &[usize] = &[8, 16, 32, 64, 128, 256, 512]; +``` + + ## Summary From 07b1ee01993dbe33e55dfa8e7aed2df244fc8736 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Wed, 15 Jan 2020 13:42:08 +0100 Subject: [PATCH 24/41] Fix internal link --- blog/content/second-edition/posts/11-allocator-designs/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index af9a3a4c..9dc0d08e 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -654,7 +654,7 @@ The function performs a less obvious check after that. This check is necessary b With the fundamental operations provided by the `add_free_region` and `find_region` methods, we can now finally implement the `GlobalAlloc` trait. As with the bump allocator, we don't implement the trait directly for the `LinkedListAllocator`, but only for a wrapped `Locked`. The [`Locked` wrapper] adds interior mutability through a spinlock, which allows us to modify the allocator instance even though the `alloc` and `dealloc` methods only take `&self` references. -[`Locked` wrapper]: @second-edition/posts/11-allocator-designs/index.md#a-locked-wrapper +[`Locked` wrapper]: @/second-edition/posts/11-allocator-designs/index.md#a-locked-wrapper-type The implementation looks like this: From db20a407453f5edc3ad7a74e1789163c0311df54 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Wed, 15 Jan 2020 13:42:39 +0100 Subject: [PATCH 25/41] Write implementation section for fixed-size block allocator --- .../posts/11-allocator-designs/index.md | 250 +++++++++++++++++- 1 file changed, 249 insertions(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 9dc0d08e..f6f668fd 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -843,7 +843,9 @@ For our implementation, we will allocate new blocks from the fallback allocator Now that we know how a fixed-size block allocator works, we can start our implementation. We won't depend on the implementation of the linked list allocator created in the previous section, so you can follow this part even if you skipped the linked list allocator implementation. -We start our implementation in a new `allocator::fixed_size_block` module: +#### List Node + +We start our implementation by creating a `ListNode` type in a new `allocator::fixed_size_block` module: ```rust // in src/allocator.rs @@ -854,6 +856,28 @@ pub mod fixed_size_block; ```rust // in src/allocator/fixed_size_block.rs +struct ListNode { + next: Option<&'static mut ListNode>, +} + +impl ListNode { + const fn new() -> Self { + Self { next: None } + } +} +``` + +This type is similar to the `ListNode` type of our [linked list allocator implementation], with the difference that we don't have a second `size` field. The `size` field isn't needed because every block in a list has the same size with the fixed-size block allocator design. + +[linked list allocator implementation]: #the-allocator-type + +#### Block Sizes + +Next, we define a constant `BLOCK_SIZES` slice with the block sizes used for our implementation: + +```rust +// in src/allocator/fixed_size_block.rs + /// The block sizes to use. /// /// The sizes must each be power of 2 because they are also used as @@ -861,7 +885,231 @@ pub mod fixed_size_block; const BLOCK_SIZES: &[usize] = &[8, 16, 32, 64, 128, 256, 512]; ``` +As block sizes, we use powers of 2 starting from 8 up to 512. We don't define any block sizes smaller than 8 because each block must be capable of storing a 64-bit pointer to the next block when freed. For allocations greater than 512 bytes we will fall back to a linked list allocator. +To simplify the implementation, we define that the size of a block is also its required alignment in memory. So a 16 byte block is always aligned on a 16-byte boundary and a 512 byte block is aligned on a 512-byte boundary. Since alignments always need to be powers of 2, this rules out any other block sizes. If we need block sizes that are not powers of 2 in the future, we can still adjust our implementation for this (e.g. by defining a second `BLOCK_ALIGNMENTS` array). + +#### The Allocator Type + +Using the `ListNode` type and the `BLOCK_SIZES` slice, we can now define our allocator type: + +```rust +// in src/allocator/fixed_size_block.rs + +pub struct FixedSizeBlockAllocator { + list_heads: [ListNode; BLOCK_SIZES.len()], + fallback_allocator: linked_list_allocator::Heap, +} +``` + +The `list_heads` field is an array of `head` pointers, one for each block size. This is implemented by using the `len()` of the `BLOCK_SIZES` slice as the array length. As a backup allocator for allocations larger than the largest block size we use the allocator provided by the `linked_list_allocator` as fallback. We could also used the `LinkedListAllocator` we implemented ourselves instead, but it has the disadvantage that it does not [merge freed blocks]. + +[merge freed blocks]: #merging-freed-blocks + +For constructing a `FixedSizeBlockAllocator`, we provide the same `new` and `init` functions that we implemented for the other allocator types too: + +```rust +// in src/allocator/fixed_size_block.rs + +impl FixedSizeBlockAllocator { + /// Creates an empty FixedSizeBlockAllocator. + pub const fn new() -> Self { + FixedSizeBlockAllocator { + list_heads: [ListNode::new(); BLOCK_SIZES.len()], + fallback_allocator: linked_list_allocator::Heap::empty(), + } + } + + /// Initialize the allocator with the given heap bounds. + /// + /// This function is unsafe because the caller must guarantee that the given + /// heap bounds are valid and that the heap is unused. This method must be + /// called only once. + pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { + self.fallback_allocator.init(heap_start, heap_size); + } +} +``` + +The `new` function just initializes the `list_heads` array with empty nodes and creates an [`empty`] linked list allocator as `fallback_allocator`. The unsafe `init` function only calls the [`init`] function of the `fallback_allocator` without doing any additional initialization of the `list_heads` array. + +[`empty`]: https://docs.rs/linked_list_allocator/0.6.4/linked_list_allocator/struct.Heap.html#method.empty +[`init`]: https://docs.rs/linked_list_allocator/0.6.4/linked_list_allocator/struct.Heap.html#method.init + +For convenience, we also create a private `fallback_alloc` method that allocates using the `fallback_allocator`: + +```rust +// in src/allocator/fixed_size_block.rs + +use alloc::alloc::Layout; +use core::ptr; + +impl FixedSizeBlockAllocator { + /// Allocates using the fallback allocator. + fn fallback_alloc(&mut self, layout: Layout) -> *mut u8 { + match self.fallback_allocator.allocate_first_fit(layout) { + Ok(ptr) => ptr.as_ptr(), + Err(_) => ptr::null_mut(), + } + } +} +``` + +Since the [`Heap`] type of the `linked_list_allocator` crate does not implement [`GlobalAlloc`] (as it's [not possible without locking]). Instead, it provides an [`allocate_first_fit`] method that has a slightly different interface. Instead of returning a `*mut u8` and using a null pointer to signal an error, it returns a `Result, AllocErr>`. The [`NonNull`] type is an abstraction for a raw pointer that is guaranteed to be not the null pointer. The [`AllocErr`] type a marker type for signaling an allocation error. By mapping the `Ok` case to the [`NonNull::as_ptr`] method and the `Err` case to a null pointer, we can easily translate this back to a `*mut u8` type. + +[`Heap`]: https://docs.rs/linked_list_allocator/0.6.4/linked_list_allocator/struct.Heap.html +[not possible without locking]: #globalalloc-and-mutability +[`allocate_first_fit`]: https://docs.rs/linked_list_allocator/0.6.4/linked_list_allocator/struct.Heap.html#method.allocate_first_fit +[`NonNull`]: https://doc.rust-lang.org/nightly/core/ptr/struct.NonNull.html +[`AllocErr`]: https://doc.rust-lang.org/nightly/core/alloc/struct.AllocErr.html +[`NonNull::as_ptr`]: https://doc.rust-lang.org/nightly/core/ptr/struct.NonNull.html#method.as_ptr + +#### Calculating the List Index + +Before we implement the `GlobalAlloc` trait, we define a `list_index` helper function that returns the lowest possible block size for a given [`Layout`]: + +```rust +// in src/allocator/fixed_size_block.rs + +/// Choose an appropriate block size for the given layout. +/// +/// Returns an index into the `BLOCK_SIZES` array. +fn list_index(layout: &Layout) -> Option { + let required_block_size = layout.size().max(layout.align()); + BLOCK_SIZES.iter().position(|&s| s >= required_block_size) +} +``` + +The block must be have at least the size and alignment required by the given `Layout`. Since we defined that the block size is also its alignment, this means that the `required_block_size` is the [maximum] of the layout's [`size()`] and [`align()`] attributes. To find the next-larger block in the `BLOCK_SIZES` slice, we first use the [`iter()`] method to get an iterator and then the [`position()`] method to find the index of the first block that is as least as large as the `required_block_size`. + +[maximum]: https://doc.rust-lang.org/core/cmp/trait.Ord.html#method.max +[`size()`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.size +[`align()`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.align +[`iter()`]: https://doc.rust-lang.org/core/primitive.slice.html#method.iter +[`position()`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.position + +Note that we don't return the block size itself, but the index into the `BLOCK_SIZES` slice. The reason is that we want to use the returned index as an index into the `list_heads` array. + +#### Implementing `GlobalAlloc` + +The last step is to implement the `GlobalAlloc` trait: + +```rust +// in src/allocator/fixed_size_block.rs + +use super::Locked; +use alloc::alloc::GlobalAlloc; + +unsafe impl GlobalAlloc for Locked { + unsafe fn alloc(&self, layout: Layout) -> *mut u8 { + todo!(); + } + + unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { + todo!(); + } +} +``` + +Like for the other allocators, we don't implement the `GlobalAlloc` trait directly for our allocator type, but use the [`Locked` wrapper] to add synchronized interior mutability. Since the `alloc` and `dealloc` implementations are relatively large, we introduce them one by one in the following. + +##### `alloc` + +The implementation of the `alloc` method looks like this: + +```rust +// in `impl` block in src/allocator/fixed_size_block.rs + +unsafe fn alloc(&self, layout: Layout) -> *mut u8 { + let mut allocator = self.lock(); + match list_index(&layout) { + Some(index) => { + match allocator.list_heads[index].next.take() { + Some(node) => { + allocator.list_heads[index].next = node.next.take(); + node as *mut ListNode as *mut u8 + } + None => { + // no block exists in list => allocate new block + let block_size = BLOCK_SIZES[index]; + // only works if all block sizes are a power of 2 + let block_align = block_size; + let layout = Layout::from_size_align(block_size, block_align) + .unwrap(); + allocator.fallback_alloc(layout) + } + } + } + None => allocator.fallback_alloc(layout), + } +} +``` + +Let's go through it step by step: + +First, we use the `Locked::lock` method to get a mutable reference to the wrapped allocator instance. Next, we call the `list_index` function we just defined to calculate the appropriate block size for the given layout and get the corresponding index into the `list_heads` array. If this index is `None`, no block size fits for the allocation, therefore we use the `fallback_allocator` using the `fallback_alloc` function. + +If the list index is `Some`, we try to remove the first node in the corresponding list started by `list_heads[index]`. For that, we call the [`Option::take`] method on the `next` field of the list head. If the list is not empty, we enter the `Some(node)` branch of the `match` statement, where we point the head pointer of the list to the successor of the popped `node` (by using [`take`][`Option::take`] again). Finally, we return the popped `node` pointer as a `*mut u8`. + +[`Option::take`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.take + +TODO graphic + +If the `next` pointer of the list head is `None`, it indicates that the list of blocks is empty. This means that we need to construct a new block as [described above](#creating-new-blocks). For that, we first get the current block size from the `BLOCK_SIZES` slice and use it as both the size and the alignment for the new block. Then we create a new `Layout` from it and call the `fallback_alloc` method to perform the allocation. The reason for adjusting the layout and alignment is that the block will be added to the block list on deallocation. + +TODO graphic + +#### `dealloc` + +The implementation of the `dealloc` method looks like this: + +```rust +// in `impl` block in src/allocator/fixed_size_block.rs + +#[allow(unused_unsafe)] +unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { + let mut allocator = self.lock(); + match list_index(&layout) { + Some(index) => { + let new_node = ListNode { + next: allocator.list_heads[index].next.take(), + }; + // verify that block has size and alignment required for storing node + assert!(mem::size_of::() <= BLOCK_SIZES[index]); + assert!(mem::align_of::() <= BLOCK_SIZES[index]); + let new_node_ptr = ptr as *mut ListNode; + new_node_ptr.write(new_node); + allocator.list_heads[index].next = Some(unsafe { &mut *new_node_ptr }); + } + None => { + let ptr = NonNull::new(ptr).unwrap(); + unsafe { + allocator.fallback_allocator.deallocate(ptr, layout); + } + } + } +} +``` + +Like in `alloc`, we first use the `lock` method to get a mutable allocator reference and then the `list_index` function to get the block list corresponding to the given `Layout`. If the index is `None`, no fitting block size exists in `BLOCK_SIZES`, which indicates that the allocation was created by the fallback allocator. Therefore we use its [`deallocate`][`Heap::deallocate`] to free the memory again. The method expects a [`NonNull`] instead of a `*mut u8`, so we need to convert the pointer first. (The `unwrap` call only fails when the pointer is null, which should never happen when the compiler calls `dealloc`.) + +[`Heap::deallocate`]: https://docs.rs/linked_list_allocator/0.6.4/linked_list_allocator/struct.Heap.html#method.deallocate + +If `list_index` returns a block index, we need to add the freed memory block to the list. For that, we first create a new `ListNode` that points to the current list head (by using [`Option::take`] again). Before we write the new node into the freed memory block, we first assert that the current block size specified by `index` has the required size and alignment for storing a `ListNode`. Then we perform the write by converting the given `*mut u8` pointer to a `*mut ListNode` pointer and then calling the [`write`][`pointer::write`] method on it. The last step is to set the head pointer of the list, which is currently `None` since we called `take` on it, to our newly written `ListNode`. + +[`pointer::write`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write + +TODO graphic + +There are a few things worth noting: + +- We don't differentiate between blocks allocated from a block list and blocks allocated from the fallback allocator. This means that new blocks created in `alloc` are added to the block list on `dealloc`, thereby increasing the number of blocks of that size. +- The `alloc` method is the only place where new blocks are created in our implemenation. This means that we initially start with empty block lists and only fill the lists lazily when allocations for that block size are performed. +- We use `unsafe` blocks in `dealloc`, even though they are not required in functions that are themselves declared as `unsafe`. However, using explicit `unsafe` blocks has the advantage that it's obvious which operations are unsafe and which not. There is also a [proposed RFC](https://github.com/rust-lang/rfcs/pull/2585) to change this behavior. Since the compiler normally warns on unneeded `unsafe` blocks, we add the `#[allow(unused_unsafe)]` to the method to silence that warning. + +### Using it + +### Discussion ## Summary From 5d1472260134ccc69e999f6dbf2aedb2fe516cab Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Wed, 15 Jan 2020 15:34:13 +0100 Subject: [PATCH 26/41] Store a `Option<&mut ListNode` in head array instead of a dummy ListNode --- .../posts/11-allocator-designs/index.md | 22 +++++++------------ 1 file changed, 8 insertions(+), 14 deletions(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index f6f668fd..7eb6bf39 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -859,12 +859,6 @@ pub mod fixed_size_block; struct ListNode { next: Option<&'static mut ListNode>, } - -impl ListNode { - const fn new() -> Self { - Self { next: None } - } -} ``` This type is similar to the `ListNode` type of our [linked list allocator implementation], with the difference that we don't have a second `size` field. The `size` field isn't needed because every block in a list has the same size with the fixed-size block allocator design. @@ -897,7 +891,7 @@ Using the `ListNode` type and the `BLOCK_SIZES` slice, we can now define our all // in src/allocator/fixed_size_block.rs pub struct FixedSizeBlockAllocator { - list_heads: [ListNode; BLOCK_SIZES.len()], + list_heads: [Option<&'static mut ListNode>; BLOCK_SIZES.len()], fallback_allocator: linked_list_allocator::Heap, } ``` @@ -915,7 +909,7 @@ impl FixedSizeBlockAllocator { /// Creates an empty FixedSizeBlockAllocator. pub const fn new() -> Self { FixedSizeBlockAllocator { - list_heads: [ListNode::new(); BLOCK_SIZES.len()], + list_heads: [None; BLOCK_SIZES.len()], fallback_allocator: linked_list_allocator::Heap::empty(), } } @@ -1024,9 +1018,9 @@ unsafe fn alloc(&self, layout: Layout) -> *mut u8 { let mut allocator = self.lock(); match list_index(&layout) { Some(index) => { - match allocator.list_heads[index].next.take() { + match allocator.list_heads[index].take() { Some(node) => { - allocator.list_heads[index].next = node.next.take(); + allocator.list_heads[index] = node.next.take(); node as *mut ListNode as *mut u8 } None => { @@ -1049,13 +1043,13 @@ Let's go through it step by step: First, we use the `Locked::lock` method to get a mutable reference to the wrapped allocator instance. Next, we call the `list_index` function we just defined to calculate the appropriate block size for the given layout and get the corresponding index into the `list_heads` array. If this index is `None`, no block size fits for the allocation, therefore we use the `fallback_allocator` using the `fallback_alloc` function. -If the list index is `Some`, we try to remove the first node in the corresponding list started by `list_heads[index]`. For that, we call the [`Option::take`] method on the `next` field of the list head. If the list is not empty, we enter the `Some(node)` branch of the `match` statement, where we point the head pointer of the list to the successor of the popped `node` (by using [`take`][`Option::take`] again). Finally, we return the popped `node` pointer as a `*mut u8`. +If the list index is `Some`, we try to remove the first node in the corresponding list started by `list_heads[index]` using the [`Option::take`] method. If the list is not empty, we enter the `Some(node)` branch of the `match` statement, where we point the head pointer of the list to the successor of the popped `node` (by using [`take`][`Option::take`] again). Finally, we return the popped `node` pointer as a `*mut u8`. [`Option::take`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.take TODO graphic -If the `next` pointer of the list head is `None`, it indicates that the list of blocks is empty. This means that we need to construct a new block as [described above](#creating-new-blocks). For that, we first get the current block size from the `BLOCK_SIZES` slice and use it as both the size and the alignment for the new block. Then we create a new `Layout` from it and call the `fallback_alloc` method to perform the allocation. The reason for adjusting the layout and alignment is that the block will be added to the block list on deallocation. +If the list head is `None`, it indicates that the list of blocks is empty. This means that we need to construct a new block as [described above](#creating-new-blocks). For that, we first get the current block size from the `BLOCK_SIZES` slice and use it as both the size and the alignment for the new block. Then we create a new `Layout` from it and call the `fallback_alloc` method to perform the allocation. The reason for adjusting the layout and alignment is that the block will be added to the block list on deallocation. TODO graphic @@ -1072,14 +1066,14 @@ unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { match list_index(&layout) { Some(index) => { let new_node = ListNode { - next: allocator.list_heads[index].next.take(), + next: allocator.list_heads[index].take(), }; // verify that block has size and alignment required for storing node assert!(mem::size_of::() <= BLOCK_SIZES[index]); assert!(mem::align_of::() <= BLOCK_SIZES[index]); let new_node_ptr = ptr as *mut ListNode; new_node_ptr.write(new_node); - allocator.list_heads[index].next = Some(unsafe { &mut *new_node_ptr }); + allocator.list_heads[index] = Some(unsafe { &mut *new_node_ptr }); } None => { let ptr = NonNull::new(ptr).unwrap(); From 04857a063d589ef732166a4c2835a32106448e07 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Wed, 15 Jan 2020 15:42:16 +0100 Subject: [PATCH 27/41] Write short 'Using it' section --- .../posts/11-allocator-designs/index.md | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 7eb6bf39..ed91e990 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -1103,6 +1103,29 @@ There are a few things worth noting: ### Using it +To use our new `FixedSizeBlockAllocator`, we need to update the `ALLOCATOR` static in the `allocator` module: + +```rust +// in src/allocator.rs + +use fixed_size_block::FixedSizeBlockAllocator; + +#[global_allocator] +static ALLOCATOR: Locked = Locked::new(FixedSizeBlockAllocator::new()); +``` + +Since the `init` function behaves the same for all allocators we implemented, we don't need to modify the `init` call in `init_heap`. + +When we now run our `heap_allocation` tests again, all tests should still pass: + +``` +> cargo xtest --test heap_allocation +simple_allocation... [ok] +large_vec... [ok] +many_boxes... [ok] +many_boxes_long_lived... [ok] +``` + ### Discussion ## Summary From 658212c1f57eac2111e87667688c7458bc0cb876 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Wed, 15 Jan 2020 17:17:27 +0100 Subject: [PATCH 28/41] Add bullet points for discussion section --- .../posts/11-allocator-designs/index.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index ed91e990..c963ddd2 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -1126,8 +1126,25 @@ many_boxes... [ok] many_boxes_long_lived... [ok] ``` +Our new allocator seems to work! + ### Discussion +While the fixed-size block approach has a much better performance than the linked list approach, it wastes up to half of the memory when using powers of 2 as block sizes. Whether this tradeoff is worth it heavily depends on the application type. For an operating system kernel + ++ Better performance +- Memory waste + + Only half of memory in worst case, quarter of memory in average case ++ Fallback allocator makes implementation simple + - Performance not fully predictable ++ Fixed-block approach used in Redox +- Implementation only permits power-of-2 block sizes + +#### Variations + +- Buddy allocator +- Slab allocator + ## Summary ## What's next? From 687c81eedb4a09ab7fdf59a55f31bc992c8dd509 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Wed, 15 Jan 2020 18:06:30 +0100 Subject: [PATCH 29/41] Array initialization using non-Copy types requires feature gate --- .../second-edition/posts/11-allocator-designs/index.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index c963ddd2..a8d5d532 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -925,9 +925,12 @@ impl FixedSizeBlockAllocator { } ``` -The `new` function just initializes the `list_heads` array with empty nodes and creates an [`empty`] linked list allocator as `fallback_allocator`. The unsafe `init` function only calls the [`init`] function of the `fallback_allocator` without doing any additional initialization of the `list_heads` array. +The `new` function just initializes the `list_heads` array with empty nodes and creates an [`empty`] linked list allocator as `fallback_allocator`. Since array initializations using non-`Copy` types are still unstable, we need to add **`#![feature(const_in_array_repeat_expressions)]`** to the beginning of our `lib.rs`. The reason that `None` is not `Copy` in this case is that `ListNode` does not implement `Copy`. Thus, the `Option` wrapper and its `None` variant are not `Copy` either. [`empty`]: https://docs.rs/linked_list_allocator/0.6.4/linked_list_allocator/struct.Heap.html#method.empty + +The unsafe `init` function only calls the [`init`] function of the `fallback_allocator` without doing any additional initialization of the `list_heads` array. Instead, we will initialize the lists lazily on `alloc` and `dealloc` calls. + [`init`]: https://docs.rs/linked_list_allocator/0.6.4/linked_list_allocator/struct.Heap.html#method.init For convenience, we also create a private `fallback_alloc` method that allocates using the `fallback_allocator`: From 2cfa13a48fce79b999289a51c245b03cc1e59522 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Wed, 15 Jan 2020 18:13:43 +0100 Subject: [PATCH 30/41] Add missing imports --- .../second-edition/posts/11-allocator-designs/index.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index a8d5d532..de3601c3 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -1061,7 +1061,11 @@ TODO graphic The implementation of the `dealloc` method looks like this: ```rust -// in `impl` block in src/allocator/fixed_size_block.rs +// in src/allocator/fixed_size_block.rs + +use core::{mem, ptr::NonNull}; + +// inside the `unsafe impl GlobalAlloc` block #[allow(unused_unsafe)] unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { From deca65eb1f851c8d20d26e7ecbbde24461112cc6 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Wed, 15 Jan 2020 18:23:59 +0100 Subject: [PATCH 31/41] Break long line in code excerpt --- .../content/second-edition/posts/11-allocator-designs/index.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index de3601c3..aa216c60 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -1118,7 +1118,8 @@ To use our new `FixedSizeBlockAllocator`, we need to update the `ALLOCATOR` stat use fixed_size_block::FixedSizeBlockAllocator; #[global_allocator] -static ALLOCATOR: Locked = Locked::new(FixedSizeBlockAllocator::new()); +static ALLOCATOR: Locked = Locked::new( + FixedSizeBlockAllocator::new()); ``` Since the `init` function behaves the same for all allocators we implemented, we don't need to modify the `init` call in `init_heap`. From 6368938b9e5b0ffc4bfd88dc779942c818d55c31 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Mon, 20 Jan 2020 10:08:33 +0100 Subject: [PATCH 32/41] Link to `free list` wikipedia article --- .../second-edition/posts/11-allocator-designs/index.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index aa216c60..2095fcec 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -423,7 +423,11 @@ The most common implementation approach is to construct a single linked list in ![](linked-list-allocation.svg) -Each list node contains two fields: The size of the memory region and a pointer to the next unused memory region. With this approach, we only need a pointer to the first unused region (called `head`) to keep track of all unused regions, independent of their number. As you can guess from the name, this is the technique that the `linked_list_allocator` crate uses. +Each list node contains two fields: The size of the memory region and a pointer to the next unused memory region. With this approach, we only need a pointer to the first unused region (called `head`) to keep track of all unused regions, independent of their number. The resulting data structure is often called a [_free list_]. + +[_free list_]: https://en.wikipedia.org/wiki/Free_list + +As you might guess from the name, this is the technique that the `linked_list_allocator` crate uses. ### Implementation From dc8d0a833b3c169208113fd7db422c133a08cde0 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Mon, 20 Jan 2020 10:09:08 +0100 Subject: [PATCH 33/41] Add image --- .../posts/11-allocator-designs/fixed-size-block-example.svg | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 blog/content/second-edition/posts/11-allocator-designs/fixed-size-block-example.svg diff --git a/blog/content/second-edition/posts/11-allocator-designs/fixed-size-block-example.svg b/blog/content/second-edition/posts/11-allocator-designs/fixed-size-block-example.svg new file mode 100644 index 00000000..2a2de226 --- /dev/null +++ b/blog/content/second-edition/posts/11-allocator-designs/fixed-size-block-example.svg @@ -0,0 +1,3 @@ + + +
heap start
heap start
head_16
<div>head_16</div>
head_64
<div>head_64</div>
head_512
<div>head_512</div>
\ No newline at end of file From 987138f5bfe38885b72419c7080fb788ece1001a Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Mon, 20 Jan 2020 11:04:21 +0100 Subject: [PATCH 34/41] Write discussion section for fixed-size block allocator --- .../posts/11-allocator-designs/index.md | 48 ++++++++++++++----- 1 file changed, 36 insertions(+), 12 deletions(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 2095fcec..868378f0 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -794,7 +794,7 @@ It's worth noting that this performance issue isn't a problem with our implement ## Fixed-Size Block Allocator -In the following, we present an allocator design that uses fixed-size memory blocks for fulfilling allocation requests. This way, the allocator often returns blocks that are larger than needed for allocations, which results in wasted memory. On the other hand, it drastly reduces the time required to find a suitable block (compared to the linked list allocator), resulting in much better allocation performance. +In the following, we present an allocator design that uses fixed-size memory blocks for fulfilling allocation requests. This way, the allocator often returns blocks that are larger than needed for allocations, which results in wasted memory due to [internal fragmentation]. On the other hand, it drastly reduces the time required to find a suitable block (compared to the linked list allocator), resulting in much better allocation performance. ### Introduction @@ -1142,20 +1142,44 @@ Our new allocator seems to work! ### Discussion -While the fixed-size block approach has a much better performance than the linked list approach, it wastes up to half of the memory when using powers of 2 as block sizes. Whether this tradeoff is worth it heavily depends on the application type. For an operating system kernel +While the fixed-size block approach has a much better performance than the linked list approach, it wastes up to half of the memory when using powers of 2 as block sizes. Whether this tradeoff is worth it heavily depends on the application type. For an operating system kernel, where performance is critical, the fixed-size block approach seems to be the better choice. -+ Better performance -- Memory waste - + Only half of memory in worst case, quarter of memory in average case -+ Fallback allocator makes implementation simple - - Performance not fully predictable -+ Fixed-block approach used in Redox -- Implementation only permits power-of-2 block sizes +On the implementation side, there are various things that we could improve in our current implementation: -#### Variations +- Instead of only allocating blocks lazily using the fallback allocator, it might be better to pre-fill the lists to improve the performance of initial allocations. +- To simplify the implementation, we only allowed block sizes that are powers of 2 so that we could use them also as the block alignment. By storing (or calculating) the alignment in a different way, we could also allow arbitrary other block sizes. This way, we could introduce blocks for common allocation sizes to minimize the wasted memory. +- Instead of falling back to a linked list allocator, we could a special allocator for allocations greater than 4KiB. The idea is to utilize [paging], which operates on 4KiB pages, to map a continuous block of virtual memory to non-continuos physical frames. This way, fragmentation of unused memory is no longer a problem for large allocations. +- With such a page allocator, it might make sense to add block sizes up to 4KiB and drop the linked list allocator completely. The main advantage of this would be performance predictability, i.e. improved worse-case performance. + +[paging]: @/second-edition/posts/08-paging-introduction/index.md + +It's important to note that the implementation improvements outlined above are only suggestions. Allocators used in operating system kernels are typically highly optimized to the specific workload of the kernel, which is only possible through extensive profiling. + +### Variations + +There are also many variations of the fixed-size block allocator design. Two popular examples are the _slab allocator_ and the _buddy allocator_, which are also used in popular kernels such as Linux. In the following, we give a short introduction to these two designs. + +#### Slab Allocator + +The idea behind a [slab allocator] is to use block sizes that directly correspond to selected types in the kernel. This way, allocations of those types fit a block size exactly and no memory is wasted. Sometimes, it might be even possible to preinitialize type instances in unused blocks to further improve performance. + +[slab allocator]: https://en.wikipedia.org/wiki/Slab_allocation + +Slab allocation is often combined with other allocators. For example, it can be used together with a fixed-size allocator to further split an allocated block in order to reduce memory waste. It is also often used to implement an [object pool pattern] on top of a single large allocation. + +[object pool pattern]: https://en.wikipedia.org/wiki/Object_pool_pattern + +#### Buddy Allocator + +Instead of using a linked list to manage freed blocks, the [buddy allocator] design uses a [binary tree] data structure together with power-of-2 block sizes. When a new block of a certain size is required, it splits a larger sized block into two halves, thereby creating two child nodes in the tree. Whenever a block is freed again, the neighbor block in the tree is analyzed. If the neighbor is also free, the two blocks are joined back together to a block of twice the size. + +The advantage of this merge process is that [external fragmentation] is reduced so that small freed blocks can be reused for a large allocation. It also does not use a fallback allocator, so the performance is more predictable. The biggest drawback is that only power-of-2 block sizes are possible, which might result in a large amount of wasted memory due to [internal fragmentation]. For this reason, buddy allocators are often combined with a slab allocator to further split an allocated block into multiple smaller blocks. + +[buddy allocator]: https://en.wikipedia.org/wiki/Buddy_memory_allocation +[binary tree]: https://en.wikipedia.org/wiki/Binary_tree +[external fragmentation]: https://en.wikipedia.org/wiki/Fragmentation_(computing)#External_fragmentation +[internal fragmentation]: https://en.wikipedia.org/wiki/Fragmentation_(computing)#Internal_fragmentation -- Buddy allocator -- Slab allocator ## Summary From 9d25a18f9e55bcce80381b9a2cf01ef157b251d0 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Mon, 20 Jan 2020 11:38:02 +0100 Subject: [PATCH 35/41] Write summary --- .../posts/11-allocator-designs/index.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 868378f0..f24fd2c4 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -1183,6 +1183,25 @@ The advantage of this merge process is that [external fragmentation] is reduced ## Summary +This post gave an overview over different allocator designs. We learned how to implement a basic [bump allocator], which hands out memory linearly by increasing a single `next` pointer. While bump allocation is very fast, it can only reuse memory after all allocations have been freed. For this reason, it is seldom used as a global allocator. + +[bump allocator]: @/second-edition/posts/11-allocator-designs/index.md#bump-allocator + +Next, we created a [linked list allocator] that uses the freed memory blocks itself to create a linked list, the so-called [free list]. This list makes it possible to store an arbitrary number of freed blocks of different sizes. While no [internal fragmentation] occurs, the approach suffers from poor performance because an allocation request might require a complete traversal of the list. Our implementation also suffers from [external fragmentation] because it does not merge adjacent freed blocks back together. + +[linked list allocator]: @/second-edition/posts/11-allocator-designs/index.md#linkedlist-allocator +[free list]: https://en.wikipedia.org/wiki/Free_list + +To fix the performance problems of the linked list approach, we created a [fixed-size block allocator] that predefines a fixed set of block sizes. For each block size, a separate [free list] exists so that allocations and deallocations only need to insert/pop at the front of the list and are thus very fast. Since each allocation is rounded up to the next larger block size, some memory is wasted due to [internal fragmentation]. + +[fixed-size block allocator]: @/second-edition/posts/11-allocator-designs/index.md#fixed-size-block-allocator + +There are many more allocator designs with different tradeoffs. [Slab allocation] works well to optimize the allocation of common fixed-size structures, but is not applicable in all situations. [Buddy allocation] uses a binary tree to merge freed blocks back together, but wastes a large amount of memory because it only supports power-of-2 block sizes. It's also important to remember that each kernel implementation has a unique workload, so there is no "best" allocator design that fits all cases. + +[Slab allocation]: @/second-edition/posts/11-allocator-designs/index.md#slab-allocator +[Buddy allocation]: @/second-edition/posts/11-allocator-designs/index.md#buddy-allocator + + ## What's next? From bd94c52c363707bfc2b90b7ee9cc9d783c43f1c7 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Mon, 20 Jan 2020 11:45:29 +0100 Subject: [PATCH 36/41] What's next? --- .../second-edition/posts/11-allocator-designs/index.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index f24fd2c4..4b7c2ca2 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -1204,8 +1204,13 @@ There are many more allocator designs with different tradeoffs. [Slab allocation ## What's next? +With this post, we conclude our memory management implementation for now. Next, we will start exploring [_multitasking_], starting with [_threads_]. In subsequent post we will then explore [_multiprocessing_], [_processes_], and cooperative multitasking in the form of [_async/await_]. - +[_multitasking_]: https://en.wikipedia.org/wiki/Computer_multitasking +[_threads_]: https://en.wikipedia.org/wiki/Thread_(computing) +[_processes_]: https://en.wikipedia.org/wiki/Process_(computing) +[_multiprocessing_]: https://en.wikipedia.org/wiki/Multiprocessing +[_async/await_]: https://rust-lang.github.io/async-book/01_getting_started/04_async_await_primer.html From 0b9a80684a87c6bf7ec3d46b9e3da1563db42dd8 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Mon, 20 Jan 2020 11:47:54 +0100 Subject: [PATCH 37/41] Fix typo --- blog/content/second-edition/posts/11-allocator-designs/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 4b7c2ca2..8029e357 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -1148,7 +1148,7 @@ On the implementation side, there are various things that we could improve in ou - Instead of only allocating blocks lazily using the fallback allocator, it might be better to pre-fill the lists to improve the performance of initial allocations. - To simplify the implementation, we only allowed block sizes that are powers of 2 so that we could use them also as the block alignment. By storing (or calculating) the alignment in a different way, we could also allow arbitrary other block sizes. This way, we could introduce blocks for common allocation sizes to minimize the wasted memory. -- Instead of falling back to a linked list allocator, we could a special allocator for allocations greater than 4KiB. The idea is to utilize [paging], which operates on 4KiB pages, to map a continuous block of virtual memory to non-continuos physical frames. This way, fragmentation of unused memory is no longer a problem for large allocations. +- Instead of falling back to a linked list allocator, we could a special allocator for allocations greater than 4KiB. The idea is to utilize [paging], which operates on 4KiB pages, to map a continuous block of virtual memory to non-continuous physical frames. This way, fragmentation of unused memory is no longer a problem for large allocations. - With such a page allocator, it might make sense to add block sizes up to 4KiB and drop the linked list allocator completely. The main advantage of this would be performance predictability, i.e. improved worse-case performance. [paging]: @/second-edition/posts/08-paging-introduction/index.md From 13d65e64b54c751f358a0f45ff50d0c2665f5d22 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Mon, 20 Jan 2020 13:09:46 +0100 Subject: [PATCH 38/41] Improve post introduction --- .../second-edition/posts/11-allocator-designs/index.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 8029e357..15d58341 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -5,9 +5,7 @@ path = "allocator-designs" date = 0000-01-01 +++ -This post explains how to implement heap allocators from scratch. It presents different allocator designs and explains their advantages and drawbacks. We then use this knowledge to create a kernel allocator with improved performance. - -TODO +This post explains how to implement heap allocators from scratch. It presents and discusses different allocator designs, including bump allocation, linked list allocation, and fixed-size block allocation. For each of the three designs, we will create a basic implementation that can be used for our kernel. From 6ecf2998eebd4834fe274ea7d99314a9d4bf94e0 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Mon, 20 Jan 2020 14:00:52 +0100 Subject: [PATCH 39/41] Minor improvements --- .../posts/11-allocator-designs/index.md | 66 ++++++++----------- 1 file changed, 29 insertions(+), 37 deletions(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 15d58341..9bc82c43 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -17,26 +17,24 @@ This blog is openly developed on [GitHub]. If you have any problems or questions -TODO optional - ## Introduction -In the [previous post] we added basic support for heap allocations to our kernel. For that, we [created a new memory region][map-heap] in the page tables and [used the `linked_list_allocator` crate][use-alloc-crate] to manage that memory. While we have a working heap now, we left most of the work to the allocator crate without understanding how it works. +In the [previous post] we added basic support for heap allocations to our kernel. For that, we [created a new memory region][map-heap] in the page tables and [used the `linked_list_allocator` crate][use-alloc-crate] to manage that memory. While we have a working heap now, we left most of the work to the allocator crate without trying to understand how it works. [previous post]: @/second-edition/posts/10-heap-allocation/index.md [map-heap]: @/second-edition/posts/10-heap-allocation/index.md#creating-a-kernel-heap [use-alloc-crate]: @/second-edition/posts/10-heap-allocation/index.md#using-an-allocator-crate -In this post, we will show how to create our own heap allocator from scratch instead of relying on an existing allocator crate. We will discuss different allocator designs, including a simplistic _bump allocator_ and a basic _fixed-size block allocator_, and use this knowledge to implement an allocator with improved performance. +In this post, we will show how to create our own heap allocator from scratch instead of relying on an existing allocator crate. We will discuss different allocator designs, including a simplistic _bump allocator_ and a basic _fixed-size block allocator_, and use this knowledge to implement an allocator with improved performance (compared to the `linked_list_allocator` crate). ### Design Goals The responsibility of an allocator is to manage the available heap memory. It needs to return unused memory on `alloc` calls and keep track of memory freed by `dealloc` so that it can be reused again. Most importantly, it must never hand out memory that is already in use somewhere else because this would cause undefined behavior. -Apart from correctness, there are many secondary design goals. For example, it should effectively utilize the available memory and keep [fragmentation] low. Furthermore, it should work well for concurrent applications and scale to any number of processors. For maximal performance, it could even optimize the memory layout with respect to the CPU caches to improve [cache locality] and avoid [false sharing]. +Apart from correctness, there are many secondary design goals. For example, the allocator should effectively utilize the available memory and keep [_fragmentation_] low. Furthermore, it should work well for concurrent applications and scale to any number of processors. For maximal performance, it could even optimize the memory layout with respect to the CPU caches to improve [cache locality] and avoid [false sharing]. [cache locality]: http://docs.cray.com/books/S-2315-50/html-S-2315-50/qmeblljm.html -[fragmentation]: https://en.wikipedia.org/wiki/Fragmentation_(computing) +[_fragmentation_]: https://en.wikipedia.org/wiki/Fragmentation_(computing) [false sharing]: http://mechanical-sympathy.blogspot.de/2011/07/false-sharing.html These requirements can make good allocators very complex. For example, [jemalloc] has over 30.000 lines of code. This complexity often undesired in kernel code where a single bug can lead to severe security vulnerabilities. Fortunately, the allocation patterns of kernel code are often much simpler compared to userspace code, so that relatively simple allocator designs often suffice. @@ -60,7 +58,7 @@ The idea behind a bump allocator is to linearly allocate memory by increasing (_ The `next` pointer only moves in a single direction and thus never hands out the same memory region twice. When it reaches the end of the heap, no more memory can be allocated, resulting in an out-of-memory error on the next allocation. -A bump allocator is often implemented with an allocation counter, which is inreased by 1 on each `alloc` call and decreased by 1 on each `dealloc` call. When the allocation counter reaches zero it means that all allocations on the heap were deallocated. In this case, the `next` pointer can be reset to the start address of the heap, so that the complete heap memory is available to allocations again. +A bump allocator is often implemented with an allocation counter, which is increased by 1 on each `alloc` call and decreased by 1 on each `dealloc` call. When the allocation counter reaches zero it means that all allocations on the heap were deallocated. In this case, the `next` pointer can be reset to the start address of the heap, so that the complete heap memory is available to allocations again. ### Implementation @@ -413,7 +411,7 @@ Line 5 shows the fundamental problem: We have five unused memory regions with di Normally when we have a potentially unbounded number of items, we can just use a heap allocated collection. This isn't really possible in our case, since the heap allocator can't depend on itself (it would cause endless recursion or deadlocks). So we need to find a different solution. -## LinkedList Allocator +## Linked List Allocator A common trick to keep track of an arbitrary number of free memory areas when implementing allocators is to use these areas itself as backing storage. This utilizes the fact that the regions are still mapped to a virtual address and backed by a physical frame, but the stored information is not needed anymore. By storing the information about the freed region in the region itself, we can keep track of an unbounded number of freed regions without needing additional memory. @@ -429,7 +427,7 @@ As you might guess from the name, this is the technique that the `linked_list_al ### Implementation -In the following, we will create our own simple `LinkedListAllocator` type that uses the above approach for keeping track of freed memory regions. This part of the post isn't required for future posts, so you can skip the details if you like. +In the following, we will create our own simple `LinkedListAllocator` type that uses the above approach for keeping track of freed memory regions. This part of the post isn't required for future posts, so you can skip the implementation details if you like. #### The Allocator Type @@ -477,6 +475,8 @@ impl ListNode { The type has a simple constructor function named `new` and methods to calculate the start and end addresses of the represented region. We make the `new` function a [const function], which will be required later when constructing a static linked list allocator. Note that any use of mutable references in const functions (including setting the `next` field to `None`) is still unstable. In order to get it to compile, we need to add **`#![feature(const_fn)]`** to the beginning of our `lib.rs`. +[const function]: https://doc.rust-lang.org/reference/items/functions.html#const-functions + With the `ListNode` struct as building block, we can now create the `LinkedListAllocator` struct: ```rust @@ -786,9 +786,9 @@ The `linked_list_allocator` crate implements this merging strategy in the follow As we learned above, the bump allocator is extremely fast and can be optimized to just a few assembly operations. The linked list allocator performs much worse in this category. The problem is that an allocation request might need to traverse the complete linked list until it finds a suitable block. -Since the list length depends on the number of unused memory blocks, the performance can vary extremely for different programs. A program that only creates a couple of allocations will experience a relatively fast allocation performance. For a program that fragments the heap with many allocations, however, will experience a very bad allocation performance because the linked list will be very long and mostly contain very small blocks. +Since the list length depends on the number of unused memory blocks, the performance can vary extremely for different programs. A program that only creates a couple of allocations will experience a relatively fast allocation performance. For a program that fragments the heap with many allocations, however, the allocation performance will be very bad because the linked list will be very long and mostly contain very small blocks. -It's worth noting that this performance issue isn't a problem with our implementation, but a fundamental disadvantage of the linked list approach. Since allocation performance can be very important for kernel-level code, we explore a third allocator design in the following that trades improved performance for reduced memory utilization. +It's worth noting that this performance issue isn't a problem caused by our basic implementation, but a fundamental problem of the linked list approach. Since allocation performance can be very important for kernel-level code, we explore a third allocator design in the following that trades improved performance for reduced memory utilization. ## Fixed-Size Block Allocator @@ -796,9 +796,9 @@ In the following, we present an allocator design that uses fixed-size memory blo ### Introduction -The idea behind a _fixed-size block allocator_ is the following: Instead of allocating exactly as much memory as requested, we define a small number of block sizes and round up each allocation to the next block size. For example, with block sizes of 16, 64, and 512, an allocation of 4 bytes would return a 16-byte block, an allocation of 48 bytes a 64-byte block, and an allocation of 128 bytes an 512-byte block. +The idea behind a _fixed-size block allocator_ is the following: Instead of allocating exactly as much memory as requested, we define a small number of block sizes and round up each allocation to the next block size. For example, with block sizes of 16, 64, and 512 bytes, an allocation of 4 bytes would return a 16-byte block, an allocation of 48 bytes a 64-byte block, and an allocation of 128 bytes an 512-byte block. -Like the linked list allocator, we keep track of the unused memory by creating a linked list in the unused memory. However, instead of using a single list with different block sizes, we create a separate list for each block size. Each list then only stores blocks of a single size. For example, with block sizes 16, 64, and 512 there would be three separate linked lists in memory: +Like the linked list allocator, we keep track of the unused memory by creating a linked list in the unused memory. However, instead of using a single list with different block sizes, we create a separate list for each size class. Each list then only stores blocks of a single size. For example, with block sizes 16, 64, and 512 there would be three separate linked lists in memory: ![](fixed-size-block-example.svg). @@ -810,7 +810,7 @@ Since each element in a list has the same size, each list element is equally sui - Retrieve the head pointer for the list, e.g. from an array. For block size 16, we need to use `head_16`. - Remove the first block from the list and return it. -Most notably, we can always return the first element of the list and no longer need to traverse the full list. Thus, allocations are much faster than on the linked list allocator. +Most notably, we can always return the first element of the list and no longer need to traverse the full list. Thus, allocations are much faster than with the linked list allocator. #### Block Sizes and Wasted Memory @@ -830,7 +830,7 @@ Most notably, no traversal of the list is required for deallocation either. This #### Fallback Allocator -Given that large allocations (>1KB) are often rare, especially in operating system kernels, it might make sense to fall back to a different allocator for these allocations. For example, we could fall back to a linked list allocator for allocations greater than 512 bytes in order to reduce memory waste. Since only very few allocations of that size are expected, the the linked list would stay small so that (de)allocations would be still reasonably fast. +Given that large allocations (>2KB) are often rare, especially in operating system kernels, it might make sense to fall back to a different allocator for these allocations. For example, we could fall back to a linked list allocator for allocations greater than 2048 bytes in order to reduce memory waste. Since only very few allocations of that size are expected, the the linked list would stay small so that (de)allocations would be still reasonably fast. #### Creating new Blocks @@ -878,10 +878,10 @@ Next, we define a constant `BLOCK_SIZES` slice with the block sizes used for our /// /// The sizes must each be power of 2 because they are also used as /// the block alignment (alignments must be always powers of 2). -const BLOCK_SIZES: &[usize] = &[8, 16, 32, 64, 128, 256, 512]; +const BLOCK_SIZES: &[usize] = &[8, 16, 32, 64, 128, 256, 512, 1024, 2048]; ``` -As block sizes, we use powers of 2 starting from 8 up to 512. We don't define any block sizes smaller than 8 because each block must be capable of storing a 64-bit pointer to the next block when freed. For allocations greater than 512 bytes we will fall back to a linked list allocator. +As block sizes, we use powers of 2 starting from 8 up to 2048. We don't define any block sizes smaller than 8 because each block must be capable of storing a 64-bit pointer to the next block when freed. For allocations greater than 2048 bytes we will fall back to a linked list allocator. To simplify the implementation, we define that the size of a block is also its required alignment in memory. So a 16 byte block is always aligned on a 16-byte boundary and a 512 byte block is aligned on a 512-byte boundary. Since alignments always need to be powers of 2, this rules out any other block sizes. If we need block sizes that are not powers of 2 in the future, we can still adjust our implementation for this (e.g. by defining a second `BLOCK_ALIGNMENTS` array). @@ -898,7 +898,7 @@ pub struct FixedSizeBlockAllocator { } ``` -The `list_heads` field is an array of `head` pointers, one for each block size. This is implemented by using the `len()` of the `BLOCK_SIZES` slice as the array length. As a backup allocator for allocations larger than the largest block size we use the allocator provided by the `linked_list_allocator` as fallback. We could also used the `LinkedListAllocator` we implemented ourselves instead, but it has the disadvantage that it does not [merge freed blocks]. +The `list_heads` field is an array of `head` pointers, one for each block size. This is implemented by using the `len()` of the `BLOCK_SIZES` slice as the array length. As a fallback allocator for allocations larger than the largest block size we use the allocator provided by the `linked_list_allocator`. We could also used the `LinkedListAllocator` we implemented ourselves instead, but it has the disadvantage that it does not [merge freed blocks]. [merge freed blocks]: #merging-freed-blocks @@ -1052,12 +1052,8 @@ If the list index is `Some`, we try to remove the first node in the correspondin [`Option::take`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.take -TODO graphic - If the list head is `None`, it indicates that the list of blocks is empty. This means that we need to construct a new block as [described above](#creating-new-blocks). For that, we first get the current block size from the `BLOCK_SIZES` slice and use it as both the size and the alignment for the new block. Then we create a new `Layout` from it and call the `fallback_alloc` method to perform the allocation. The reason for adjusting the layout and alignment is that the block will be added to the block list on deallocation. -TODO graphic - #### `dealloc` The implementation of the `dealloc` method looks like this: @@ -1069,7 +1065,6 @@ use core::{mem, ptr::NonNull}; // inside the `unsafe impl GlobalAlloc` block -#[allow(unused_unsafe)] unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { let mut allocator = self.lock(); match list_index(&layout) { @@ -1082,13 +1077,11 @@ unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { assert!(mem::align_of::() <= BLOCK_SIZES[index]); let new_node_ptr = ptr as *mut ListNode; new_node_ptr.write(new_node); - allocator.list_heads[index] = Some(unsafe { &mut *new_node_ptr }); + allocator.list_heads[index] = Some(&mut *new_node_ptr); } None => { let ptr = NonNull::new(ptr).unwrap(); - unsafe { - allocator.fallback_allocator.deallocate(ptr, layout); - } + allocator.fallback_allocator.deallocate(ptr, layout); } } } @@ -1098,17 +1091,15 @@ Like in `alloc`, we first use the `lock` method to get a mutable allocator refer [`Heap::deallocate`]: https://docs.rs/linked_list_allocator/0.6.4/linked_list_allocator/struct.Heap.html#method.deallocate -If `list_index` returns a block index, we need to add the freed memory block to the list. For that, we first create a new `ListNode` that points to the current list head (by using [`Option::take`] again). Before we write the new node into the freed memory block, we first assert that the current block size specified by `index` has the required size and alignment for storing a `ListNode`. Then we perform the write by converting the given `*mut u8` pointer to a `*mut ListNode` pointer and then calling the [`write`][`pointer::write`] method on it. The last step is to set the head pointer of the list, which is currently `None` since we called `take` on it, to our newly written `ListNode`. +If `list_index` returns a block index, we need to add the freed memory block to the list. For that, we first create a new `ListNode` that points to the current list head (by using [`Option::take`] again). Before we write the new node into the freed memory block, we first assert that the current block size specified by `index` has the required size and alignment for storing a `ListNode`. Then we perform the write by converting the given `*mut u8` pointer to a `*mut ListNode` pointer and then calling the unsafe [`write`][`pointer::write`] method on it. The last step is to set the head pointer of the list, which is currently `None` since we called `take` on it, to our newly written `ListNode`. For that we convert the raw `new_node_ptr` to a mutable reference. [`pointer::write`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write -TODO graphic - There are a few things worth noting: - We don't differentiate between blocks allocated from a block list and blocks allocated from the fallback allocator. This means that new blocks created in `alloc` are added to the block list on `dealloc`, thereby increasing the number of blocks of that size. - The `alloc` method is the only place where new blocks are created in our implemenation. This means that we initially start with empty block lists and only fill the lists lazily when allocations for that block size are performed. -- We use `unsafe` blocks in `dealloc`, even though they are not required in functions that are themselves declared as `unsafe`. However, using explicit `unsafe` blocks has the advantage that it's obvious which operations are unsafe and which not. There is also a [proposed RFC](https://github.com/rust-lang/rfcs/pull/2585) to change this behavior. Since the compiler normally warns on unneeded `unsafe` blocks, we add the `#[allow(unused_unsafe)]` to the method to silence that warning. +- We don't need `unsafe` blocks in `alloc` and `dealloc`, even though we perform some `unsafe` operations. The reason is that Rust currently treats the complete body of unsafe functions as one large `unsafe` block. Since using explicit `unsafe` blocks has the advantage that it's obvious which operations are unsafe and which not, there is a [proposed RFC](https://github.com/rust-lang/rfcs/pull/2585) to change this behavior. ### Using it @@ -1145,9 +1136,10 @@ While the fixed-size block approach has a much better performance than the linke On the implementation side, there are various things that we could improve in our current implementation: - Instead of only allocating blocks lazily using the fallback allocator, it might be better to pre-fill the lists to improve the performance of initial allocations. -- To simplify the implementation, we only allowed block sizes that are powers of 2 so that we could use them also as the block alignment. By storing (or calculating) the alignment in a different way, we could also allow arbitrary other block sizes. This way, we could introduce blocks for common allocation sizes to minimize the wasted memory. +- To simplify the implementation, we only allowed block sizes that are powers of 2 so that we could use them also as the block alignment. By storing (or calculating) the alignment in a different way, we could also allow arbitrary other block sizes. This way, we could add more block sizes, e.g. for common allocation sizes, in order to minimize the wasted memory. +- We currently only create new blocks, but never free them again. This results in fragmentation and might eventually result in allocation failure for large allocations. It might make sense to enforce a maximum list length for each block size. When the maximum length is reached, subsequent deallocations are freed using the fallback allocator instead of being added to the list. - Instead of falling back to a linked list allocator, we could a special allocator for allocations greater than 4KiB. The idea is to utilize [paging], which operates on 4KiB pages, to map a continuous block of virtual memory to non-continuous physical frames. This way, fragmentation of unused memory is no longer a problem for large allocations. -- With such a page allocator, it might make sense to add block sizes up to 4KiB and drop the linked list allocator completely. The main advantage of this would be performance predictability, i.e. improved worse-case performance. +- With such a page allocator, it might make sense to add block sizes up to 4KiB and drop the linked list allocator completely. The main advantages of this would be reduced fragmentation and improved performance predictability, i.e. better worse-case performance. [paging]: @/second-edition/posts/08-paging-introduction/index.md @@ -1163,7 +1155,7 @@ The idea behind a [slab allocator] is to use block sizes that directly correspon [slab allocator]: https://en.wikipedia.org/wiki/Slab_allocation -Slab allocation is often combined with other allocators. For example, it can be used together with a fixed-size allocator to further split an allocated block in order to reduce memory waste. It is also often used to implement an [object pool pattern] on top of a single large allocation. +Slab allocation is often combined with other allocators. For example, it can be used together with a fixed-size block allocator to further split an allocated block in order to reduce memory waste. It is also often used to implement an [object pool pattern] on top of a single large allocation. [object pool pattern]: https://en.wikipedia.org/wiki/Object_pool_pattern @@ -1181,13 +1173,13 @@ The advantage of this merge process is that [external fragmentation] is reduced ## Summary -This post gave an overview over different allocator designs. We learned how to implement a basic [bump allocator], which hands out memory linearly by increasing a single `next` pointer. While bump allocation is very fast, it can only reuse memory after all allocations have been freed. For this reason, it is seldom used as a global allocator. +This post gave an overview over different allocator designs. We learned how to implement a basic [bump allocator], which hands out memory linearly by increasing a single `next` pointer. While bump allocation is very fast, it can only reuse memory after all allocations have been freed. For this reason, it is rarely used as a global allocator. [bump allocator]: @/second-edition/posts/11-allocator-designs/index.md#bump-allocator -Next, we created a [linked list allocator] that uses the freed memory blocks itself to create a linked list, the so-called [free list]. This list makes it possible to store an arbitrary number of freed blocks of different sizes. While no [internal fragmentation] occurs, the approach suffers from poor performance because an allocation request might require a complete traversal of the list. Our implementation also suffers from [external fragmentation] because it does not merge adjacent freed blocks back together. +Next, we created a [linked list allocator] that uses the freed memory blocks itself to create a linked list, the so-called [free list]. This list makes it possible to store an arbitrary number of freed blocks of different sizes. While no memory waste occurs, the approach suffers from poor performance because an allocation request might require a complete traversal of the list. Our implementation also suffers from [external fragmentation] because it does not merge adjacent freed blocks back together. -[linked list allocator]: @/second-edition/posts/11-allocator-designs/index.md#linkedlist-allocator +[linked list allocator]: @/second-edition/posts/11-allocator-designs/index.md#linked-list-allocator [free list]: https://en.wikipedia.org/wiki/Free_list To fix the performance problems of the linked list approach, we created a [fixed-size block allocator] that predefines a fixed set of block sizes. For each block size, a separate [free list] exists so that allocations and deallocations only need to insert/pop at the front of the list and are thus very fast. Since each allocation is rounded up to the next larger block size, some memory is wasted due to [internal fragmentation]. From d6da1c8485b6adaea0cb39715882e9999ec265dd Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Mon, 20 Jan 2020 14:17:44 +0100 Subject: [PATCH 40/41] Typo fixes --- .../second-edition/posts/11-allocator-designs/index.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index 9bc82c43..ecc925f6 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -726,7 +726,7 @@ impl LinkedListAllocator { ``` First, the function uses the [`align_to`] method on the passed [`Layout`] to increase the alignment to the alignment of a `ListNode` if necessary. It then uses the [`pad_to_align`] method to round up the size to a multiple of the alignment to ensure that the start address of the next memory block will have the correct alignment for storing a `ListNode` too. -In the second step it uses the [`max`] method to enforce a minimum allocation size of `mem::size_of::`. This way, the `dealloc` function can safetly write a `ListNode` to the freed memory block. +In the second step it uses the [`max`] method to enforce a minimum allocation size of `mem::size_of::`. This way, the `dealloc` function can safely write a `ListNode` to the freed memory block. [`align_to`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.align_to [`pad_to_align`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.pad_to_align @@ -792,7 +792,7 @@ It's worth noting that this performance issue isn't a problem caused by our basi ## Fixed-Size Block Allocator -In the following, we present an allocator design that uses fixed-size memory blocks for fulfilling allocation requests. This way, the allocator often returns blocks that are larger than needed for allocations, which results in wasted memory due to [internal fragmentation]. On the other hand, it drastly reduces the time required to find a suitable block (compared to the linked list allocator), resulting in much better allocation performance. +In the following, we present an allocator design that uses fixed-size memory blocks for fulfilling allocation requests. This way, the allocator often returns blocks that are larger than needed for allocations, which results in wasted memory due to [internal fragmentation]. On the other hand, it drastically reduces the time required to find a suitable block (compared to the linked list allocator), resulting in much better allocation performance. ### Introduction @@ -834,7 +834,7 @@ Given that large allocations (>2KB) are often rare, especially in operating syst #### Creating new Blocks -Above, we always assumed that there are always enough blocks of a specific size in the list to fulfill all allocation requests. However, at some point the linked list for a block size becomes empty. At this point, there are two ways how we can create new unused blocks of a specific size to fulfil an allocation request: +Above, we always assumed that there are always enough blocks of a specific size in the list to fulfill all allocation requests. However, at some point the linked list for a block size becomes empty. At this point, there are two ways how we can create new unused blocks of a specific size to fulfill an allocation request: - Allocate a new block from the fallback allocator (if there is one). - Split a larger block from a different list. This best works if block sizes are powers of two. For example, a 32-byte block can be split into two 16-byte blocks. @@ -1098,7 +1098,7 @@ If `list_index` returns a block index, we need to add the freed memory block to There are a few things worth noting: - We don't differentiate between blocks allocated from a block list and blocks allocated from the fallback allocator. This means that new blocks created in `alloc` are added to the block list on `dealloc`, thereby increasing the number of blocks of that size. -- The `alloc` method is the only place where new blocks are created in our implemenation. This means that we initially start with empty block lists and only fill the lists lazily when allocations for that block size are performed. +- The `alloc` method is the only place where new blocks are created in our implementation. This means that we initially start with empty block lists and only fill the lists lazily when allocations for that block size are performed. - We don't need `unsafe` blocks in `alloc` and `dealloc`, even though we perform some `unsafe` operations. The reason is that Rust currently treats the complete body of unsafe functions as one large `unsafe` block. Since using explicit `unsafe` blocks has the advantage that it's obvious which operations are unsafe and which not, there is a [proposed RFC](https://github.com/rust-lang/rfcs/pull/2585) to change this behavior. ### Using it From 2f926b601c92862fe1ce23e2bfb1ed9d604ac528 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Mon, 20 Jan 2020 14:18:10 +0100 Subject: [PATCH 41/41] Update release date --- .../second-edition/posts/11-allocator-designs/index.md | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/blog/content/second-edition/posts/11-allocator-designs/index.md b/blog/content/second-edition/posts/11-allocator-designs/index.md index ecc925f6..65bc094e 100644 --- a/blog/content/second-edition/posts/11-allocator-designs/index.md +++ b/blog/content/second-edition/posts/11-allocator-designs/index.md @@ -2,7 +2,7 @@ title = "Allocator Designs" weight = 11 path = "allocator-designs" -date = 0000-01-01 +date = 2020-01-20 +++ This post explains how to implement heap allocators from scratch. It presents and discusses different allocator designs, including bump allocation, linked list allocation, and fixed-size block allocation. For each of the three designs, we will create a basic implementation that can be used for our kernel. @@ -1201,11 +1201,3 @@ With this post, we conclude our memory management implementation for now. Next, [_processes_]: https://en.wikipedia.org/wiki/Process_(computing) [_multiprocessing_]: https://en.wikipedia.org/wiki/Multiprocessing [_async/await_]: https://rust-lang.github.io/async-book/01_getting_started/04_async_await_primer.html - - - ---- - -TODO: update date - ----