Finish the post

2025-12-16 14:27:49 +00:00 · 2020-03-27 17:23:29 +01:00
parent da58c31ed4
commit d29a28591e
2 changed files with 65 additions and 42 deletions
--- a/blog/content/second-edition/posts/12-async-await/index.md
+++ b/blog/content/second-edition/posts/12-async-await/index.md
@@ -808,14 +808,14 @@ pub struct Task {
 }
 ```

-The `Task` struct is a newtype wrapper around a pinned, heap allocated, dynamically dispatched future with the empty type `()` as output. Let's go through it in detail:
+The `Task` struct is a newtype wrapper around a pinned, heap allocated, and dynamically dispatched future with the empty type `()` as output. Let's go through it in detail:

 - We require that the future associated with a task returns `()`. This means that tasks don't return any result, they are just executed for its side effects. For example, the `example_task` function we defined above has no return value, but it prints something to the screen as a side effect.
- The `dyn` keyword indicates that we store a [trait object] in the `Box`. This means that the type of the future is [dynamically dispatched], which makes it possible to store different types of futures in the `Task` type. This is important because each `async fn` has their own type and we want to be able to create different tasks later.
+- The `dyn` keyword indicates that we store a [_trait object_] in the `Box`. This means that the methods on the future are [_dynamically dispatched_], which makes it possible to store different types of futures in the `Task` type. This is important because each `async fn` has its own type and we want to be able to create multiple different tasks.
 - As we learned in the [section about pinning], the `Pin<Box>` type ensures that a value cannot be moved in memory by placing it on the heap and preventing the creation of `&mut` references to it. This is important because futures generated by async/await might be self-referential, i.e. contain pointers to itself that would be invalidated when the future is moved.

-[trait object]: https://doc.rust-lang.org/book/ch17-02-trait-objects.html
-[dynamically dispatched]: https://doc.rust-lang.org/book/ch17-02-trait-objects.html#trait-objects-perform-dynamic-dispatch
+[_trait object_]: https://doc.rust-lang.org/book/ch17-02-trait-objects.html
+[_dynamically dispatched_]: https://doc.rust-lang.org/book/ch17-02-trait-objects.html#trait-objects-perform-dynamic-dispatch
 [section about pinning]: #pinning

 To allow the creation of new `Task` structs from futures, we create a `new` function:
@@ -832,9 +832,9 @@ impl Task {
 }
 ```

-The function takes an arbitrary future with output type `()` and pins it in memory through the [`Box::pin`] function. Then it wraps it in the `Task` struct and returns the new task. The `'static` lifetime is required here because the returned `Task` can live for an arbitrary time, so the future needs to be valid for that time too.
+The function takes an arbitrary future with output type `()` and pins it in memory through the [`Box::pin`] function. Then it wraps the boxed future in the `Task` struct and returns it. The `'static` lifetime is required here because the returned `Task` can live for an arbitrary time, so the future needs to be valid for that time too.

-We also add a `poll` method to allow the executor to poll the corresponding future:
+We also add a `poll` method to allow the executor to poll the stored future:

 ```rust
 // in src/task/mod.rs
@@ -883,7 +883,7 @@ impl SimpleExecutor {
 }
 ```

-The struct contains a single `task_queue` field of type [`VecDeque`], which is basically a vector that allows to push and pop on both ends. The idea behind using this type is that we insert new tasks through the `spawn` method at the end and pop the next task for execution from the front. This way, we get a simple [FIFO queue] (_"first in, first out"_).
+The struct contains a single `task_queue` field of type [`VecDeque`], which is basically a vector that allows to push and pop operations on both ends. The idea behind using this type is that we insert new tasks through the `spawn` method at the end and pop the next task for execution from the front. This way, we get a simple [FIFO queue] (_"first in, first out"_).

 [`VecDeque`]: https://doc.rust-lang.org/stable/alloc/collections/vec_deque/struct.VecDeque.html
 [FIFO queue]: https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics)
@@ -972,7 +972,7 @@ impl SimpleExecutor {
 }
 ```

-The function uses a `while let` loop to handle all tasks in the `task_queue`. For each task, it first creates a `Context` type by wrapping a `Waker` instance returned by our `dummy_waker` function. Then it invokes the `Task::poll` method with this `Context`. If the `poll` method returns `Poll::Ready`, the task is finished and we can continue with the next task. If the task is still `Poll::Pending`, we add it to the back of the queue again so that it will be polled again in a subsequent loop iteration.
+The function uses a `while let` loop to handle all tasks in the `task_queue`. For each task, it first creates a `Context` type by wrapping a `Waker` instance returned by our `dummy_waker` function. Then it invokes the `Task::poll` method with this `context`. If the `poll` method returns `Poll::Ready`, the task is finished and we can continue with the next task. If the task is still `Poll::Pending`, we add it to the back of the queue again so that it will be polled again in a subsequent loop iteration.

 #### Trying It

@@ -992,6 +992,18 @@ fn kernel_main(boot_info: &'static BootInfo) -> ! {

    // […] test_main, "it did not crash" message, hlt_loop
 }
+
+
+// Below is the example_task function again so that you don't have to scroll up
+
+async fn async_number() -> u32 {
+    42
+}
+
+async fn example_task() {
+    let number = async_number().await;
+    println!("async number: {}", number);
+}
 ```

 When we run it, we see that the expected _"async number: 42"_ message is printed to the screen:
@@ -1119,7 +1131,7 @@ We use the [`OnceCell::try_get`] to get a reference to the initialized queue. If

 [`OnceCell::try_get`]: https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html#method.try_get

-The fact that the [`ArrayQueue::push`] method requires only a `&self` reference makes it very simple to call the method on the static queue. The `ArrayQueue` type performs all necessary synchronization itself, so we don't need a mutex wrapper here.
+The fact that the [`ArrayQueue::push`] method requires only a `&self` reference makes it very simple to call the method on the static queue. The `ArrayQueue` type performs all necessary synchronization itself, so we don't need a mutex wrapper here. In case the queue is full, we print a warning too.

 [`ArrayQueue::push`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.push

@@ -1163,16 +1175,14 @@ impl ScancodeStream {
    pub fn new() -> Self {
        SCANCODE_QUEUE.try_init_once(|| ArrayQueue::new(100))
            .expect("ScancodeStream::new should only be called once");
-        ScancodeStream {
-            _private: (),
-        }
+        ScancodeStream { _private: () }
    }
 }
 ```

-The purpose of the `_private` field is to prevent construction of the struct from outside of the module. This makes the `new` function the only way to construct the type. In the function, we first try to initialize the `SCANCODE_QUEUE` static. We panic if it is already initialized to ensure that only a single `ScancodeStream` type can be created.
+The purpose of the `_private` field is to prevent construction of the struct from outside of the module. This makes the `new` function the only way to construct the type. In the function, we first try to initialize the `SCANCODE_QUEUE` static. We panic if it is already initialized to ensure that only a single `ScancodeStream` instance can be created.

-To make the scancodes available to asynchronous tasks, the next step is to implement `poll`-like method that tries to pop the next scancode off the queue. While this sounds like we should implement [`Future`] trait for our type, this does not quite fit here. The problem is that the `Future` trait only abstracts over a single asynchronous value and expects that the `poll` method is not called again after it returns `Poll::Ready`. Our scancode queue, however, contains multiple asynchronous tasks so that it is ok to keep polling it.
+To make the scancodes available to asynchronous tasks, the next step is to implement `poll`-like method that tries to pop the next scancode off the queue. While this sounds like we should implement [`Future`] trait for our type, this does not quite fit here. The problem is that the `Future` trait only abstracts over a single asynchronous value and expects that the `poll` method is not called again after it returns `Poll::Ready`. Our scancode queue, however, contains multiple asynchronous values so that it is ok to keep polling it.

 ##### The `Stream` Trait

@@ -1234,13 +1244,13 @@ impl Stream for ScancodeStream {
 }
 ```

-We first use the [`OnceCell::try_get`] method to get a reference to the initialized scancode queue. This should never fail since we initialize the queue in the `new` function, so we can safely use the `expect` method to panic if it's not initialized. Next, we use the [`ArrayQueue::pop`] to try to get the next element from the queue. If it succeeds we return the scancode wrapped in `Poll::Ready(Some(…))`. If it fails, it means that the queue is empty. In that case, we return `Poll::Pending`.
+We first use the [`OnceCell::try_get`] method to get a reference to the initialized scancode queue. This should never fail since we initialize the queue in the `new` function, so we can safely use the `expect` method to panic if it's not initialized. Next, we use the [`ArrayQueue::pop`]  method to try to get the next element from the queue. If it succeeds we return the scancode wrapped in `Poll::Ready(Some(…))`. If it fails, it means that the queue is empty. In that case, we return `Poll::Pending`.

 [`ArrayQueue::pop`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.pop

 #### Waker Support

-Like the `Futures::poll` method, the `Stream::poll_next` method requires that the asynchronous task notifies the executor when it becomes ready after `Poll::Pending` is returned for the first time. This way, the executor does not need to poll the same task again until it is notified, which greatly reduces the performance overhead of waiting tasks.
+Like the `Futures::poll` method, the `Stream::poll_next` method requires that the asynchronous task notifies the executor when it becomes ready after `Poll::Pending` is returned. This way, the executor does not need to poll the same task again until it is notified, which greatly reduces the performance overhead of waiting tasks.

 To send this notification, the task should extract the [`Waker`] from the passed [`Context`] reference and store it somewhere. When the task becomes ready, it should invoke the [`wake`] method on the stored `Waker` to notify the executor that the task should be polled again.

@@ -1287,18 +1297,18 @@ impl Stream for ScancodeStream {
            Ok(scancode) => {
                WAKER.take();
                Poll::Ready(Some(scancode))
-            },
+            }
            Err(crossbeam_queue::PopError) => Poll::Pending,
        }
    }
 }
 ```

-Like before, we first use the [`OnceCell::try_get`] function to get a reference to the initialized scancode queue. We then optimistically try to `pop` from the queue and return `Poll::Ready` when it succeeds. This exploits the fact that it's only required to register a wakeup when returning `Poll::Pending`.
+Like before, we first use the [`OnceCell::try_get`] function to get a reference to the initialized scancode queue. We then optimistically try to `pop` from the queue and return `Poll::Ready` when it succeeds. This way, we can avoid the performance overhead of registering a waker when the queue is not empty.

-If the first call to `queue.pop()` does not succeed, the queue is potentially empty. Only potentially because the interrupt handler might have filled the queue asynchronously immediately after the check. Since this race condition can occur again on the next check, we need to register the `Waker` in the `WAKER` static before the second check. This way, a wakeup might happen before we return `Poll::Pending`, but it is guaranteed that we get a wakeup for any scancodes pushed after the check.
+If the first call to `queue.pop()` does not succeed, the queue is potentially empty. Only potentially because the interrupt handler might have filled the queue asynchronously immediately after the check. Since this race condition can occur again for the next check, we need to register the `Waker` in the `WAKER` static before the second check. This way, a wakeup might happen before we return `Poll::Pending`, but it is guaranteed that we get a wakeup for any scancodes pushed after the check.

-After registering the `Waker` contained in the passed [`Context`] through the [`AtomicWaker::register`] function, we try popping from the queue a second time. If it now succeeds, we return `Poll::Ready`. We also remove the registered waker again using [`Waker::take`] because a waker notification is no longer needed. In case `queue.pop()` fails for a second time, we return `Poll::Pending` like before, but this time with a registered wakeup.
+After registering the `Waker` contained in the passed [`Context`] through the [`AtomicWaker::register`] function, we try popping from the queue a second time. If it now succeeds, we return `Poll::Ready`. We also remove the registered waker again using [`AtomicWaker::take`] because a waker notification is no longer needed. In case `queue.pop()` fails for a second time, we return `Poll::Pending` like before, but this time with a registered wakeup.

 [`AtomicWaker::register`]: https://docs.rs/futures-util/0.3.4/futures_util/task/struct.AtomicWaker.html#method.register
 [`AtomicWaker::take`]: https://docs.rs/futures/0.3.4/futures/task/struct.AtomicWaker.html#method.take
@@ -1319,6 +1329,8 @@ pub(crate) add_scancode(scancode: u8) {
        } else {
            WAKER.wake(); // new
        }
+    } else {
+        println!("WARNING: scancode queue uninitialized");
    }
 }
 ```
@@ -1371,8 +1383,9 @@ Let's add the `print_keypresses` task to our executor in our `main.rs` to get wo
 ```rust
 // in src/main.rs

+use blog_os::task::keyboard; // new
+
 fn kernel_main(boot_info: &'static BootInfo) -> ! {
-    use blog_os::task::keyboard;

    // […] initialization routines, including init_heap, test_main

@@ -1418,7 +1431,7 @@ pub struct Task {
 }
 ```

-The idea is to use the memory address of this future as an ID. This address is unique because because no two futures are stored at the same address. The `Pin` type ensures that they can't move in memory, so we also know that the address stays the same as long as the task exists. These properties make the address a good candidate for an ID.
+The idea is to use the memory address of this future as an ID. This address is unique because no two futures are stored at the same address. The `Pin` type ensures that they can't move in memory, so we also know that the address stays the same as long as the task exists. These properties make the address a good candidate for an ID.

 The implementation looks like this:

@@ -1453,7 +1466,7 @@ pub mod executor;
 // in src/task/executor.rs

 use super::{Task, TaskId};
-use alloc::collections::{BTreeMap, VecDeque};
+use alloc::{collections::{BTreeMap, VecDeque}, sync::Arc};
 use core::task::Waker;
 use crossbeam_queue::ArrayQueue;

@@ -1476,10 +1489,10 @@ impl Executor {
 }
 ```

-In addition to a `task_queue`, that stores the tasks that are ready to execute, the type has a `waiting_tasks` map, a `wake_queue` and a `waker_cache`. These fields have the following purpose:
+In addition to a `task_queue`, which stores the tasks that are ready to execute, the type has a `waiting_tasks` map, a `wake_queue` and a `waker_cache`. These fields have the following purpose:

- The `waiting_tasks` map stores tasks that returned `Poll::Pending`. The map is indexed by the `TaskId` to allow efficient continuation a specific task.
- The `wake_queue` is [`ArrayQueue`] of task IDs, wrapped into the [`Arc`] type that implements _reference counting_. Reference countingmakes it possible to share ownership of the value between multiple owners. It works by allocating the value on the heap and counting the number of active references to it. When the number of active references reaches zero, the value is no longer needed and can be deallocated.
+- The `waiting_tasks` map stores tasks that returned `Poll::Pending`. The map is indexed by the `TaskId` to allow efficient continuation of a specific task.
+- The `wake_queue` is [`ArrayQueue`] of task IDs, wrapped into the [`Arc`] type that implements _reference counting_. Reference counting makes it possible to share ownership of the value between multiple owners. It works by allocating the value on the heap and counting the number of active references to it. When the number of active references reaches zero, the value is no longer needed and can be deallocated.

  We use the `Arc` wrapper for the `wake_queue` because it will be shared between the executor and wakers. The idea is that the wakers push the ID of the woken task to the queue. The executor sits on the receiving end of the queue and moves all woken tasks from the `waiting_tasks` map back to the `task_queue`. The reason for using a fixed-size queue instead of an unbounded queue such as [`SegQueue`] is that interrupt handlers that should not allocate will push to this queue.
 - The `waker_cache` map caches the [`Waker`] of a task after its creation. This has two reasons: First, it improves performance by reusing the same waker for multiple wake-ups of the same task instead of creating a new waker each time. Second, it ensures that reference-counted wakers are not deallocated inside interrupt handlers because it could lead to deadlocks (there are more details on this below).
@@ -1512,20 +1525,24 @@ To execute all tasks in the `task_queue`, we create a private `run_ready_tasks`
 ```rust
 // in src/task/executor.rs

+use core::task::{Context, Poll};
+
 impl Executor {
    fn run_ready_tasks(&mut self) {
        while let Some(mut task) = self.task_queue.pop_front() {
-            let waker = self.waker_cache.entry(&task.id()).or_insert_with(|| {
-                self.create_waker(task.id())
-            });
+            let task_id = task.id();
+            if !self.waker_cache.contains_key(&task_id) {
+                self.waker_cache.insert(task_id, self.create_waker(task_id));
+            }
+            let waker = self.waker_cache.get(&task_id).expect("should exist");
            let mut context = Context::from_waker(waker);
            match task.poll(&mut context) {
                Poll::Ready(()) => {
                    // task done -> remove cached waker
-                    self.waker_cache.remove(task.id());
+                    self.waker_cache.remove(&task_id);
                }
                Poll::Pending => {
-                    if self.waiting_tasks.insert(task.id(), task).is_some() {
+                    if self.waiting_tasks.insert(task_id, task).is_some() {
                        panic!("task with same ID already in waiting_tasks");
                    }
                },
@@ -1537,11 +1554,15 @@ impl Executor {

 The basic idea of this function is similar to our `SimpleExecutor`: Loop over all tasks in the `task_queue`, create a waker for each task, and then poll it. However, instead of adding pending tasks back to the end of the `task_queue`, we store them in the `waiting_tasks` map until they are woken again. The waker creation is done by a method named `create_waker`, whose implemenation will be shown in a moment.

-To avoid the performance overhead of creating a waker on each poll, we use the `waker_cache` map to store the waker for each task after it has been created. For this, we first use the [`BTreeMap::entry`] method to find the [`Entry`] corresponding to the task ID. We then use the [`Entry::or_insert_with`] method to optionally create a new `Waker` if not present and then get a reference to the `Waker`. Note reusing wakers like this is not possible for all waker implementations, but our implemenation will allow it. To clean up the `waker_cache` when a task is finished, we use use the [`BTreeMap::remove`] method to remove any cached waker for that task from the map.
+To avoid the performance overhead of creating a waker on each poll, we use the `waker_cache` map to store the waker for each task after it has been created. For this, we first use the [`BTreeMap::contains_key`] method to check whether a cached waker exists for the task. If not, we use the [`BTreeMap::insert`] method to create it. Afterwards, we can be sure that the waker exists, so we use the [`BTreeMap::get`] method in combination with an [`expect`] call to get a reference to it.
+
+[`BTreeMap::contains_key`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.contains_key
+[`BTreeMap::insert`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.insert
+[`BTreeMap::get`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.get
+[`expect`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.expect
+
+Note that reusing wakers like this is not possible for all waker implementations, but our implemenation will allow it. To clean up the `waker_cache` when a task is finished, we use use the [`BTreeMap::remove`] method to remove any cached waker for that task from the map.

-[`BTreeMap::entry`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.entry
-[`Entry`]: https://doc.rust-lang.org/alloc/collections/btree_map/enum.Entry.html
-[`Entry::or_insert_with`]: https://doc.rust-lang.org/alloc/collections/btree_map/enum.Entry.html#method.or_insert_with
 [`BTreeMap::remove`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.remove

 #### Waker Design
@@ -1577,7 +1598,7 @@ We push the `task_id` to the referenced `wake_queue`. Since modifications of the

 ##### The `Wake` Trait

-In order to use our `TaskWaker` type for polling futures, we need to convert it to a [`Waker`] instance first. This is required because the [`Future::poll`] method takes a [`Context`] instance as argument, which can only be constructed from the `Waker` type. While we could do this by providing an implementation of the [`RawWaker`] type, it's both simpler and safer to instead implement the [`Wake`] trait and then using the [`From`] implementations provided by the standard library to construct the `Waker`.
+In order to use our `TaskWaker` type for polling futures, we need to convert it to a [`Waker`] instance first. This is required because the [`Future::poll`] method takes a [`Context`] instance as argument, which can only be constructed from the `Waker` type. While we could do this by providing an implementation of the [`RawWaker`] type, it's both simpler and safer to instead implement the `Arc`-based [`Wake`] trait and then using the [`From`] implementations provided by the standard library to construct the `Waker`.

 The trait implementation looks like this:

@@ -1672,12 +1693,12 @@ We can now change our `kernel_main` to use our new `Executor` instead of the `Si
 ```rust
 // in src/main.rs

-fn kernel_main(boot_info: &'static BootInfo) -> ! {
-    use blog_os::task::executor::Executor;
+use blog_os::task::executor::Executor; // new

+fn kernel_main(boot_info: &'static BootInfo) -> ! {
    // […] initialization routines, including init_heap, test_main

-    let mut executor = Executor::new();
+    let mut executor = Executor::new(); // new
    executor.spawn(Task::new(example_task()));
    executor.spawn(Task::new(keyboard::print_keypresses()));
    executor.run();
@@ -1688,7 +1709,7 @@ We only need to change the import and the type name. Since our `run` function is

 When we run our kernel using `cargo xrun` now, we see that keyboard input still works:

-TODO gif
+![QEMU printing ".....H...e...l...l..o..... ...a..g..a....i...n...!"](qemu-keyboard-output-again.gif)

 However, the CPU utilization of QEMU did not get any better. The reason for this is that we still keep the CPU busy for the whole time. We no longer poll tasks until they are woken again, but we still check the `wake_queue` and the `task_queue` in a busy loop. To fix this, we need to put the CPU to sleep if there is no more work to do.

@@ -1736,10 +1757,12 @@ if self.wake_queue.is_empty() {

 In case this interrupt pushes to the `wake_queue`, we put the CPU to sleep even though there is now a ready task. In the worst case, this could delay the handling of a keyboard interrupt until the next keypress or the next timer interrupt. So how do we prevent it?

-The answer is to disable interrupts on the CPU before the check and atomically enable them again together with the `hlt` instruction. This way, all interrupts that happen between in between are delayed after the `hlt` instruction so that no wake-ups are missed. To implement this approach, we can use the [`enable_interrupts_and_hlt`] function provided by the [`x86_64`] crate:
+The answer is to disable interrupts on the CPU before the check and atomically enable them again together with the `hlt` instruction. This way, all interrupts that happen between in between are delayed after the `hlt` instruction so that no wake-ups are missed. To implement this approach, we can use the [`enable_interrupts_and_hlt`] function provided by the [`x86_64`] crate. This function is only available since version 0.9.6, so you might need to update your `x86_64` dependency to use it.

 [`enable_interrupts_and_hlt`]: https://docs.rs/x86_64/0.9.6/x86_64/instructions/interrupts/fn.enable_interrupts_and_hlt.html

+The updated implementation of our `sleep_if_idle` function looks like this:
+
 ```rust
 // in src/task/executor.rs

@@ -1764,7 +1787,7 @@ impl Executor {

 To avoid unnecessarily disabling interrupts, we early return if the `wake_queue` is not empty. Otherwise, we disable interrupts and check the `wake_queue` again. If it is still empty, we use the [`enable_interrupts_and_hlt`] function to enable interrupts and put the CPU to sleep as a single atomic operation. In case the queue is no longer empty, it means that an interrupt woke a task between the first and the second check. In that case, we enable interrupts again and directly continue execution without executing `hlt`.

-Now our executor properly puts the CPU to sleep when there is nothing to do. We can see that the QEMU process as a much lower CPU utilization when we run our kernel using `cargo xrun` now.
+Now our executor properly puts the CPU to sleep when there is nothing to do. We can see that the QEMU process as a much lower CPU utilization when we run our kernel using `cargo xrun` again.

 #### Possible Extensions

--- a/blog/content/second-edition/posts/12-async-await/qemu-keyboard-output-again.gif
+++ b/blog/content/second-edition/posts/12-async-await/qemu-keyboard-output-again.gif