Prevent deadlock by basing DummyWaker directly on RawWaker

Don't use Arc for DummyWaker. It causes a drop in the `add_scancode` function, which can easily lead to a deadlock because the function is called directly from the interrupt handler.
2025-12-16 14:27:49 +00:00 · 2020-03-23 11:21:24 +01:00
parent 89f5350ac4
commit c26d36ebce
1 changed files with 82 additions and 88 deletions
--- a/blog/content/second-edition/posts/12-async-await/index.md
+++ b/blog/content/second-edition/posts/12-async-await/index.md
@@ -890,53 +890,68 @@ The struct contains a single `task_queue` field of type [`VecDeque`], which is b

 #### Dummy Waker

-In order to call the `poll` method, we need to create a [`Context`] type, which wraps a [`Waker`] type. To start simple, we will first create a dummy waker that does nothing. The simplest way to do this is by implementing the unstable [`Wake`] trait for an empty `DummyWaker` struct:
+In order to call the `poll` method, we need to create a [`Context`] type, which wraps a [`Waker`] type. To start simple, we will first create a dummy waker that does nothing. For this, we create a [`RawWaker`] instance, which defines the implementation of the different `Waker` methods, and then use the [`Waker::from_raw`] function to turn it into a `Waker`:

-[`Wake`]: https://doc.rust-lang.org/nightly/alloc/task/trait.Wake.html
+[`RawWaker`]: https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html
+[`Waker::from_raw`]: https://doc.rust-lang.org/stable/core/task/struct.Waker.html#method.from_raw

 ```rust
 // in src/task/simple_executor.rs

-use alloc::{sync::Arc, task::Wake};
+use core::task::{Waker, RawWaker};

-struct DummyWaker;
+fn dummy_raw_waker() -> RawWaker {
+    todo!();
+}

-impl Wake for DummyWaker {
-    fn wake(self: Arc<Self>) {
-        // do nothing
-    }
+fn dummy_waker() -> Waker {
+    unsafe { Waker::from_raw(dummy_raw_waker()) }
 }
 ```

-The trait is still unstable, so we have to add **`#![feature(wake_trait)]`** to the top of our `lib.rs` to use it. The `wake` method of the trait is normally responsible for waking the corresponding task in the executor. However, our `SimpleExecutor` will not differentiate between ready and waiting tasks, so we don't need to do anything on `wake` calls.
+The `from_raw` function is unsafe because undefined behavior can occur if the programmer does not uphelp the documented requirements of `RawWaker`. Before we look at the implementation of the `dummy_raw_waker` function, we first try to understand how the `RawWaker` type works.

-Since wakers are normally shared between the executor and the asynchronous tasks, the `wake` method requires that the `Self` instance is wrapped in the [`Arc`] type, which implements reference-counted ownership. The basic idea is that the value is heap-allocated and the number of active references to it are counted. If the number of active references reaches zero, the value is no longer needed and can be deallocated.
+##### `RawWaker`

+The [`RawWaker`] type requires the programmer to explicitly define a [_virtual method table_] (_vtable_) that specifies the functions that should be called when the `RawWaker` is cloned, woken, or dropped. The layout of this vtable is defined by the [`RawWakerVTable`] type. Each function receives a `*const ()` argument that is basically a _type-erased_ `&self` pointer to some struct, e.g. allocated on the heap. The reason for using a `*const ()` pointer instead of a proper reference is that the `RawWaker` type should be non-generic. The pointer value that is passed to the functions is the `data` pointer given to [`RawWaker::new`].
+
+[_virtual method table_]: https://en.wikipedia.org/wiki/Virtual_method_table
+[`RawWakerVTable`]: https://doc.rust-lang.org/stable/core/task/struct.RawWakerVTable.html
+[`RawWaker::new`]: https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html#method.new
+
+Typically, the `RawWaker` is created for some heap allocated struct that is wrapped into the [`Box`] or [`Arc`] type. For such types, methods like [`Box::into_raw`] can be used to convert the `Box<T>` to a `*const T` pointer. This pointer can then be casted to an anonymous `*const ()` pointer and passed to `RawWaker::new`. Since each vtable function receives the same `*const ()` as argument, the functions can sately cast the pointer back to a `Box<T>` or a `&T` to operate on it. As you can imagine, this process is highly dangerous and can easily lead to undefined behavior on mistakes. For this reason, manually creating a `RawWaker` is not recommended unless necessary.
+
+[`Box`]: https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html
 [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html
+[`Box::into_raw`]: https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html#method.into_raw

-To make our `DummyWaker` usable with the [`Context`] type, we need a method to convert it to the [`Waker`] defined in the core library:
+##### A Dummy `RawWaker`
+
+While manually creating a `RawWaker` is not recommended, there is currently no other way to create a dummy `Waker` that does nothing. Fortunately, the fact that we want to do nothing makes it relatively safe to implement the `dummy_raw_waker` function:

 ```rust
 // in src/task/simple_executor.rs

-use core::task::Waker;
+use core::task::RawWakerVTable;

-impl DummyWaker {
-    fn to_waker(self) -> Waker {
-        Waker::from(Arc::new(self))
+fn dummy_raw_waker() -> RawWaker {
+    fn no_op(_: *const ()) {}
+    fn clone(_: *const ()) -> RawWaker {
+        dummy_raw_waker()
    }
+
+    let vtable = &RawWakerVTable::new(clone, no_op, no_op, no_op);
+    RawWaker::new(0 as *const (), vtable)
 }
 ```

-The method first makes the `self` instance reference-counted by wrapping it in an [`Arc`]. Then it uses the [`Waker::from`] method to create the `Waker`. This method is available for all reference counted types that implement the [`Wake`] trait.
+First, we define two inner functions named `no_op` and `clone`. The `no_op` function takes a `*const ()` pointer and does nothing. The `clone` function also takes a `*const ()` pointer and returns a new `RawWaker` by calling `dummy_raw_waker` again. We use these two functions to create a minimal `RawWakerVTable`: The `clone` function is used for the cloning operations and the `no_op` function is used for all other operations. Since the `RawWaker` does nothing, it does not matter that we return a new `RawWaker` from `clone` instead of cloning it.

-[`Waker::from`]: TODO
-
-Now we have a way to create a `Waker` instance, we can use it to implement a `run` method on our executor.
+After creating the `vtable`, we use the [`RawWaker::new`] function to create the `RawWaker`. The passed `*const ()` does not matter since none of the vtable function uses it. For this reason, we simply pass a null pointer.

 #### A `run` Method

-The most simple `run` method is to repeatedly poll all queued tasks in a loop until all are done. This is not very efficient since it does not utilize the notifications of the `Waker` type, but it is an easy way to get things running:
+Now we have a way to create a `Waker` instance, we can use it to implement a `run` method on our executor. The most simple `run` method is to repeatedly poll all queued tasks in a loop until all are done. This is not very efficient since it does not utilize the notifications of the `Waker` type, but it is an easy way to get things running:

 ```rust
 // in src/task/simple_executor.rs
@@ -1366,78 +1381,57 @@ fn kernel_main(boot_info: &'static BootInfo) -> ! {
 }
 ```

-When we execute `cargo xrun` now, we see that keyboard input works again, but only for a short time:
-
-![QEMU printing output for keypresses "H", "e", and "l", then it hangs](keyboard-deadlock.gif)
-
-After pressing a few keys, the complete execution hangs. Not even the dots by the timer interrupt are printed anymore. Such bugs are typically caused by a [_deadlock_], which is a state where we endlessly wait on some lock. To find out where the program hangs, the best approach is to connect a debugger and print the backtrace. Expand the section below for the exact debugging steps:
-
-[_deadlock_]: https://en.wikipedia.org/wiki/Deadlock
-
-<details style="margin-bottom: 1rem;">
-<summary><strong>Debugging Steps</strong></summary>
-
- Make sure `gdb` or `gdb-multiarch` is installed on your system.
- Pass the `-s` flag to QEMU when running your kernel. You can do this through the command `cargo xrun -- -s`.
- Run `gdb` with the file name of your kernel as argument:
-  ```
-  gdb target/x86_64-blog_os/debug/blog_os
-  ```
- From the `gdb` console, connect to the QEMU instance by executing `target ext :1234`.
- Print the backtrace by executing `backtrace` or `bt`.
-
-
-The backtrace in this case looks like this:
-
-```
-#0  AtomicBool::load (self=0x22d250 <blog_os::allocator::ALLOCATOR>, …)
-    at libcore/sync/atomic.rs:404
-#1  spin::Mutex<T>::obtain_lock (self=0x22d250 <blog_os::allocator::ALLOCATOR>)
-    at spin-0.5.2/src/mutex.rs:134
-#2  spin::Mutex<T>::lock (self=0x22d250 <blog_os::allocator::ALLOCATOR>)
-    at spin-0.5.2/src/mutex.rs:158
-#3  blog_os::allocator::Locked<A>::lock (…)
-    at src/allocator.rs:73
-#4  Locked::dealloc (…) at src/allocator/fixed_size_block.rs:83
-#5  __rg_dealloc (…) at src/allocator.rs:19
-#6  alloc::alloc::dealloc (…) at liballoc/alloc.rs:103
-#7  alloc::alloc::Global::dealloc (…) at liballoc/alloc.rs:174
-#8  alloc::sync::Arc<T>::drop_slow (…) at liballoc/sync.rs:743
-#9  alloc::sync::Arc<T>::drop (…) at liballoc/sync.rs:1249
-#10 core::ptr::drop_in_place () at libcore/ptr/mod.rs:174
-#11 blog_os::task::simple_executor::DummyWaker::wake (…)
-    at src/task/simple_executor.rs:37
-#12 alloc::task::raw_waker::wake (waker=0x4444444400d0)
-    at liballoc/task.rs:69
-#13 core::task::wake::Waker::wake (self=...) at libcore/task/wake.rs:241
-#14 AtomicWaker::wake (self=0x22d210 <blog_os::task::keyboard::WAKER>)
-    at futures-core-0.3.4/src/task/__internal/atomic_waker.rs:355
-#15 blog_os::task::keyboard::add_scancode (scancode=31) at src/task/keyboard.rs:24
-#16 blog_os::interrupts::keyboard_interrupt_handler (…) at src/interrupts.rs:87
-```
-
-Note that I shortened the output a bit to make it more readable.
-
-</details>
-
-From the backtrace, we can deduce that the deadlock was caused by the following order of operations:
-
-```
-keyboard_interrupt_handler -> add_scancode -> AtomicWaker::wake
-> GlobalAlloc::dealloc -> allocator::Locked::lock
-```
-
-TODO
-
-
-
-#### Fixing the Deadlock
-
-
+When we execute `cargo xrun` now, we see that keyboard input works again:

+TODO image

 If you keep an eye on the CPU utilization of your computer, you will see that the `QEMU` process now keeps one CPU completely busy. This happens because our `SimpleExecutor` polls tasks over and over again in a loop. So even if we don't press any keys on the keyboard, the executor repeatedly calls `poll` on our `print_keypresses` task, even though the task cannot make any progress and will return `Poll::Pending` each time.

 To fix this, we need to create an executor that properly utilizes the `Waker` notifications. This way, the executor is notified when the next keyboard interrupt occurs, so it does not need to keep polling the `print_keypresses` task over and over again.

 ### Executor with Waker Support
+
+
+#### The `Wake` Trait
+
+The simplest way to do this is by implementing the unstable [`Wake`] trait for an empty `DummyWaker` struct:
+
+[`Wake`]: https://doc.rust-lang.org/nightly/alloc/task/trait.Wake.html
+
+```rust
+// in src/task/simple_executor.rs
+
+use alloc::{sync::Arc, task::Wake};
+
+struct DummyWaker;
+
+impl Wake for DummyWaker {
+    fn wake(self: Arc<Self>) {
+        // do nothing
+    }
+}
+```
+
+The trait is still unstable, so we have to add **`#![feature(wake_trait)]`** to the top of our `lib.rs` to use it. The `wake` method of the trait is normally responsible for waking the corresponding task in the executor. However, our `SimpleExecutor` will not differentiate between ready and waiting tasks, so we don't need to do anything on `wake` calls.
+
+Since wakers are normally shared between the executor and the asynchronous tasks, the `wake` method requires that the `Self` instance is wrapped in the [`Arc`] type, which implements reference-counted ownership. The basic idea is that the value is heap-allocated and the number of active references to it are counted. If the number of active references reaches zero, the value is no longer needed and can be deallocated.
+
+[`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html
+
+To make our `DummyWaker` usable with the [`Context`] type, we need a method to convert it to the [`Waker`] defined in the core library:
+
+```rust
+// in src/task/simple_executor.rs
+
+use core::task::Waker;
+
+impl DummyWaker {
+    fn to_waker(self) -> Waker {
+        Waker::from(Arc::new(self))
+    }
+}
+```
+
+The method first makes the `self` instance reference-counted by wrapping it in an [`Arc`]. Then it uses the [`Waker::from`] method to create the `Waker`. This method is available for all reference counted types that implement the [`Wake`] trait.
+
+[`Waker::from`]: TODO