From 6f7c5a35dddc968d6707365f89a09907c7b07c77 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Tue, 11 Feb 2020 16:44:01 +0100 Subject: [PATCH 01/51] Begin a new post about async/await --- .../posts/12-async-await/index.md | 102 ++++++++++++++++++ .../regain-control-on-interrupt.svg | 3 + 2 files changed, 105 insertions(+) create mode 100644 blog/content/second-edition/posts/12-async-await/index.md create mode 100644 blog/content/second-edition/posts/12-async-await/regain-control-on-interrupt.svg diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md new file mode 100644 index 00000000..3a2b0a03 --- /dev/null +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -0,0 +1,102 @@ ++++ +title = "Async/Await" +weight = 12 +path = "async-await" +date = 0000-01-01 + +[extra] +chapter = "Interrupts" ++++ + +In this post we explore _cooperative multitasking_ and the _async/await_ feature of Rust. This will make it possible to run multiple concurrent tasks in our kernel. TODO + + + +This blog is openly developed on [GitHub]. If you have any problems or questions, please open an issue there. You can also leave comments [at the bottom]. The complete source code for this post can be found in the [`post-12`][post branch] branch. + +[GitHub]: https://github.com/phil-opp/blog_os +[at the bottom]: #comments +[post branch]: https://github.com/phil-opp/blog_os/tree/post-12 + + + +## Multitasking + +One of the fundamental features of most operating systems is [_multitasking_], which is the ability to execute multiple tasks concurrently. For example, you probably have other programs open while looking at this post, such as a text editor or a terminal window. Even if you have only a single browser window open, there are probably various background tasks for managing your desktop windows, checking for updates, or indexing files. + +[_multitasking_]: https://en.wikipedia.org/wiki/Computer_multitasking + +While it seems like all tasks run in parallel, only a single task can be executed on a CPU core at a time. To create the illusion that the tasks run in parallel, the operating system rapidly switches between active tasks so that each one can make a bit of progress. Since computers are fast, we don't notice these switches most of the time. + +While single-core CPUs can only execute a single task at a time, multi-core CPUs can run multiple tasks in a truly parallel way. For example, a CPU with 8 cores can run 8 tasks at the same time. We will explain how to setup multi-core CPUs in a future post. For this post, we will focus on single-core CPUs for simplicity. (It's worth noting that all multi-core CPUs start with only a single active core, so we can treat them as single-core CPUs for now.) + +There are two forms of multitasking: _Cooperative_ multitasking requires tasks to regularly give up control of the CPU so that other tasks can make progress. _Preemptive_ multitasking uses operating system capabilities to switch threads at arbitrary points in time by forcibly pausing them. In the following we will explore the two forms of multitasking in more detail and discuss their respective advantages and drawbacks. + +### Preemptive Multitasking + +The idea behind preemptive multitasking is that the operating system controls when to switch tasks. For that, it utilizes the fact that it regains control of the CPU on each interrupt. This makes it possible to switch tasks whenever new input is available to the system. For example, it would be possible to switch tasks when the mouse is moved or a network packet arrives. The operating system can also determine the exact time that a task is allowed to run by configuring a hardware timer to send an interrupt after that time. + +The following graphic illustrates the task switching process on a hardware interrupt: + +![](regain-control-on-interrupt.svg) + +In the first row, the CPU is executing task `A1` of program `A`. All other tasks are paused. In the second row, a hardware interrupt arrives at the CPU. As described in the [_Hardware Interrupts_] post, the CPU immediately stops the execution of task `A1` and jumps to the interrupt handler defined in the interrupt descriptor table (IDT). Through this interrupt handler, the operating system now has control of the CPU again, which allows it to switch to task `B1` instead of continuing task `A1`. + +[_Hardware Interrupts_]: @/second-edition/posts/07-hardware-interrupts/index.md + +#### Saving State + +Since tasks are interrupted at arbitrary points in time, they might be in the middle of some calculation. In order to be able to resume them later, the operating system must backup the whole state of the task, including its [call stack] and the values of all CPU registers. This process is called a [_context switch_]. + +[call stack]: https://en.wikipedia.org/wiki/Call_stack +[_context switch_]: https://en.wikipedia.org/wiki/Context_switch + +As the call stack can be very large, the operating system typically sets up a separate call stack for each task instead of backing up the call stack content on each task switch. Such a task with a separate stack is called a [_thread of execution_] or _thread_ for short. By using a separate stack for each task, only the register contents need to be saved on a context switch (including the program counter and stack pointer). This approach minimizes the performance overhead of a context switch, which is very important since context switches often occur up to 100 times per second. + +[_thread of execution_]: https://en.wikipedia.org/wiki/Thread_(computing) + +#### Discussion + +The main advantage of preemptive multitasking is that the operating system can fully control the allowed execution time of a task. This way, it can guarantee that each task gets a fair share of the CPU time, without the need to trust the tasks to cooperate. This is especially important when running third-party tasks or when multiple users share a system. + +The disadvantage of preemption is that each task requires its own stack. Compared to a shared stack, this results in a higher memory usage per task and often limits the number of tasks in the system. Another disadvantage is that the operating system always has to save the complete CPU register state on each task switch, even if the task only used a small subset of the registers. + +Preemptive multitasking and threads are fundamental components of an operating system because they make it possible to run untrusted userspace programs. We will therefore discuss these concepts in full detail in future posts. For this post, however, we will focus on cooperative multitasking, which also provides useful capabilities for our kernel. + +### Cooperative Multitasking + +Instead of forcibly pausing running tasks at arbitrary points in time, cooperative multitasking lets each task run until it voluntarily gives up control of the CPU. This allows tasks to pause themselves at convenient points in time, for example when it needs to wait for an I/O operation anyway. + +Cooperative multitasking is often used at the language level, for example in form of [coroutines] or [async/await]. The idea is that either the programmer or the compiler inserts [_yield_] operations into the program, which give up control of the CPU and allow other tasks to run. For example, a yield could be inserted after each iteration of a complex loop. + +[coroutines]: https://en.wikipedia.org/wiki/Coroutine +[async/await]: https://rust-lang.github.io/async-book/01_getting_started/04_async_await_primer.html +[_yield_]: https://en.wikipedia.org/wiki/Yield_(multithreading) + +It is common to combine cooperative multitasking with [asynchronous operations]. Instead of [blocking] until an operation is finished and preventing other tasks to run in this time, asynchronous operations return a "not ready" status if the operation is not finished yet. In this case, the waiting task can execute a yield operation to let other tasks run. + +[asynchronous operations]: https://en.wikipedia.org/wiki/Asynchronous_I/O +[blocking]: http://faculty.salina.k-state.edu/tim/ossg/Device/blocking.html + +#### Saving State + +Since tasks define their pause points themselves, they don't need the operating system to save their state. Instead, they can save exactly the state they need for continuation before they pause themselves, which often results in better performance. For example, a task that just finished a complex computation might only need to backup the final result of the computation since it does not need the intermediate results anymore. + +Language-supported implementations of cooperative tasks are often even able to backup up the required parts of the call stack before pausing. As an example, Rust's async/await implementation stores all local variables that are still needed in an automatically generated struct (see below). By backing up the relevant parts of the call stack before pausing, all tasks can share the same call stack, which results in a much smaller memory consumption per task. As a result, it is possible to create an almost arbitrary number of tasks without running out of memory. + +#### Discussion + +The drawback of cooperative multitasking is that an uncooperative task can potentially run for an unlimited amount of time. Thus, a malicious or buggy task can prevent other tasks from running and slow down or even block the whole system. For this reason, cooperative multitasking should only be used when all tasks are known to cooperate. As a counterexample, it's not a good idea to make the operating system rely on the cooperation of arbitrary userlevel programs. + +However, the strong performance and memory benefits of cooperative multitasking make it a good approach for usage _within_ a program, especially in combination with asynchronous operations. Since an operating system kernel is a performance-critical program that interacts with asynchronous hardware, cooperative multitasking seems like a good approach for concurrency in our kernel. In the remainder of this post, we will therefore implement a basic async/await based multitasking system. + +## Async/Await in Rust + + + + + + + + + diff --git a/blog/content/second-edition/posts/12-async-await/regain-control-on-interrupt.svg b/blog/content/second-edition/posts/12-async-await/regain-control-on-interrupt.svg new file mode 100644 index 00000000..06a4e51c --- /dev/null +++ b/blog/content/second-edition/posts/12-async-await/regain-control-on-interrupt.svg @@ -0,0 +1,3 @@ + + +
Operating System
Operating System
CPU
CPU
Interrupt Handler
Interrupt H...
Program A
Program A
Task A1
Task A1
Task A2
Task A2
Program B
Program B
...
...
Task B1
Task B1
Operating System
Operating System
CPU
CPU
Program A
Program A
Task A2
Task A2
Program B
Program B
...
...
Task B1
Task B1
Task A1
Task A1
Interrupt Handler
Interrupt H...
Operating System
Operating System
CPU
CPU
Program A
Program A
Task A2
Task A2
Program B
Program B
...
...
Task A1
Task A1
Interrupt Handler
Interrupt H...
Task B1
Task B1
Task A1 is executing
Task A1 is ex...
Hardware Interrupt occurs
Interrupt Handler executing
Hardware Inte...
Interrupt Handler switched to Task B1
Interrupt Han...
Time
T...
Viewer does not support full SVG 1.1
\ No newline at end of file From bdcd392dbf8f92832fbf0f3b11fa6d2e444d6bfb Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 13 Feb 2020 15:31:25 +0100 Subject: [PATCH 02/51] Start explaining futures in Rust --- .../posts/12-async-await/async-example.svg | 3 + .../posts/12-async-await/index.md | 65 +++++++++++++++++-- 2 files changed, 63 insertions(+), 5 deletions(-) create mode 100644 blog/content/second-edition/posts/12-async-await/async-example.svg diff --git a/blog/content/second-edition/posts/12-async-await/async-example.svg b/blog/content/second-edition/posts/12-async-await/async-example.svg new file mode 100644 index 00000000..601431eb --- /dev/null +++ b/blog/content/second-edition/posts/12-async-await/async-example.svg @@ -0,0 +1,3 @@ + + +
File System
File System
main
main
read_file()
read_file()
return File
return File
foo
foo
foo()
foo()
return
return
async_read_file()
async_read_file()
return Future
return Future
foo()
foo()
File available
File available
return
return
Viewer does not support full SVG 1.1
\ No newline at end of file diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 3a2b0a03..9819f3d6 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -61,7 +61,7 @@ The main advantage of preemptive multitasking is that the operating system can f The disadvantage of preemption is that each task requires its own stack. Compared to a shared stack, this results in a higher memory usage per task and often limits the number of tasks in the system. Another disadvantage is that the operating system always has to save the complete CPU register state on each task switch, even if the task only used a small subset of the registers. -Preemptive multitasking and threads are fundamental components of an operating system because they make it possible to run untrusted userspace programs. We will therefore discuss these concepts in full detail in future posts. For this post, however, we will focus on cooperative multitasking, which also provides useful capabilities for our kernel. +Preemptive multitasking and threads are fundamental components of an operating system because they make it possible to run untrusted userspace programs. We will discuss these concepts in full detail in future posts. For this post, however, we will focus on cooperative multitasking, which also provides useful capabilities for our kernel. ### Cooperative Multitasking @@ -88,13 +88,68 @@ Language-supported implementations of cooperative tasks are often even able to b The drawback of cooperative multitasking is that an uncooperative task can potentially run for an unlimited amount of time. Thus, a malicious or buggy task can prevent other tasks from running and slow down or even block the whole system. For this reason, cooperative multitasking should only be used when all tasks are known to cooperate. As a counterexample, it's not a good idea to make the operating system rely on the cooperation of arbitrary userlevel programs. -However, the strong performance and memory benefits of cooperative multitasking make it a good approach for usage _within_ a program, especially in combination with asynchronous operations. Since an operating system kernel is a performance-critical program that interacts with asynchronous hardware, cooperative multitasking seems like a good approach for concurrency in our kernel. In the remainder of this post, we will therefore implement a basic async/await based multitasking system. +However, the strong performance and memory benefits of cooperative multitasking make it a good approach for usage _within_ a program, especially in combination with asynchronous operations. Since an operating system kernel is a performance-critical program that interacts with asynchronous hardware, cooperative multitasking seems like a good approach for concurrency in our kernel. ## Async/Await in Rust - - - +The Rust language provides first-class support for cooperative multitasking in form of async/await. Before we can explore what async/await is and how it works, we need to understand how _futures_ and asynchronous programming work in Rust. + +### Futures + +A _future_ represents a value that might not be available yet. This could be for example an integer that is computed by another task or a file that is downloaded from the network. Instead of waiting until the value is available, futures make it possible to continue execution until the value is needed. + +#### Example + +The concept of futures is best illustrated with a small example: + +![Sequence diagram: main calls `read_file` and is blocked until it returns; then it calls `foo()` and is also blocked until it returns. The same process is repeated, but this time `async_read_file` is called, which directly returns a future; then `foo()` is called again, which now runs concurrently to the file load. The file is available before `foo()` returns.](async-example.svg) + +This sequence diagram shows a `main` function that reads a file from the file system and then calls a function `foo`. This process is repeated to times: Once with a synchronous `read_file` call and once with an asynchronous `async_read_file` call. + +With the synchronous call, the `main` function needs to wait until the file is loaded from the file system. Only then it can call the `foo` function, which requires it to again wait for the result. + +With the asynchronous `async_read_file` call, the file system directly returns a future and loads the file asynchronously in the background. This allows the `main` function to call `foo` much earlier, which then runs in parallel with the file load. In this example, the file load even finishes before `foo` returns, so `main` can directly work with the file without further waiting after `foo` returns. + +#### Futures in Rust + +In Rust, futures are represented by the [`Future`] trait, which looks like this: + +[`Future`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html + +```rust +pub trait Future { + type Output; + fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll; +} +``` + +The [associated type] `Output` specfies the type of the asynchronous value. For example, the `async_read_file` function in the diagram above would return a `Future` instance with `Output` set to `File`. + +[associated type]: https://doc.rust-lang.org/book/ch19-03-advanced-traits.html#specifying-placeholder-types-in-trait-definitions-with-associated-types + +The [`poll`] method allows to check if the value is already available. It returns a [`Poll`] enum, which looks like this: + +[`poll`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll +[`Poll`]: https://doc.rust-lang.org/nightly/core/task/enum.Poll.html + +```rust +pub enum Poll { + Ready(T), + Pending, +} +``` + +When the value is already available (e.g. the file was fully read from disk), it is returned wrapped in the `Ready` variant. Otherwise, the `Pending` variant is returned, which signals the caller that the value is not yet available. + +The `poll` method takes two arguments: `self: Pin<&mut Self>` and `cx: &mut Context`. The former behaves like a normal `&mut self` reference, with the difference that the `Self` value is [_pinned_] to its memory location. Understanding `Pin` and why it is needed is difficult without understanding how async/await works first. We will therefore explain it later in this post. + +[_pinned_]: https://doc.rust-lang.org/nightly/core/pin/index.html + +The purpose of the `cx: &mut Context` parameter is … + +### Async/Await + +### Generators From 752accdd337b4368d1a64653f99f7a1d641c96b7 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Fri, 14 Feb 2020 14:55:55 +0100 Subject: [PATCH 03/51] Explain how to work with futures and introduce async/await --- .../posts/12-async-await/index.md | 256 +++++++++++++++++- blog/static/css/main.css | 4 + 2 files changed, 258 insertions(+), 2 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 9819f3d6..971b274d 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -145,9 +145,261 @@ The `poll` method takes two arguments: `self: Pin<&mut Self>` and `cx: &mut Cont [_pinned_]: https://doc.rust-lang.org/nightly/core/pin/index.html -The purpose of the `cx: &mut Context` parameter is … +The purpose of the `cx: &mut Context` parameter is to pass a [`Waker`] instance to the asynchronous task, e.g. the file system load. This `Waker` allows the asynchronous task to signal that it (or a part of it) is finished, e.g. that the file was loaded from disk. Since the main task knows that it will be notified when the `Future` is ready, it does not need to call `poll` over and over again. We will explain this process in more detail later when we implement an own `Waker` type. -### Async/Await +[`Waker`]: https://doc.rust-lang.org/nightly/core/task/struct.Waker.html + +### Working with Futures + +We now know how futures are defined and the rough idea behind the `poll` method. However, we still don't know how to effectively work with futures. The problem is that futures represent results of asynchronous tasks, which might be not available yet. In practice, however, we often need these values directly for further calculations. So the question is: How can we efficiently retrieve the value of a future when we need it? + +#### Waiting on Futures + +One possible answer is to wait until a future becomes ready. This could look something like this: + +```rust +let future = async_read_file("foo.txt"); +let file_content = loop { + match future.poll(…) { + Poll::Ready(value) => break value, + Poll::Pending => {}, // do nothing + } +} +``` + +Here we _actively_ wait for the future by calling `poll` over and over again in a loop. The arguments to `poll` don't matter here, so we omitted them. While this solution works, it is very inefficient because we keep the CPU busy until the value becomes available. + +A more effective approach could be to _block_ the current thread until the future becomes available. This is of course only possible if you have threads, so this solution does not work for kernel, at least not yet. Even on systems where blocking is supported, it is often not desired because it turns an asynchronous task into a synchronous task again, thereby inhibiting the potential performance benefits. + +#### Future Combinators + +An alternative to waiting is to use future combinators. Future combinators are functions like `map` that allow chaining and combining futures together, similar to the functions on [`Iterator`]. Instead of waiting on the future, these combinators return a future themselves, which applies the mapping operation on `poll`. + +[`Iterator`]: https://doc.rust-lang.org/stable/core/iter/trait.Iterator.html + +As an example, a simple `string_len` combinator for converting `Future` to a `Future { + inner_future: F, +} + +impl Future for StringLen where Fut: Future { + type Output = usize; + + fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll { + match self.inner_future.poll(cx) { + Poll::Ready(s) => Poll::Ready(s.len()), + Poll::Pending => Poll::Pending, + } + } +} + +fn string_len(string: impl Future) + -> impl Future +{ + StringLen { + inner_future: string, + } +} + +// Usage +fn file_len() -> impl Future { + let file_content_future = async_read_file("foo.txt"); + string_len(file_content_future) +} +``` + +This code does not quite work because it does not handle [_pinning_], but it suffices as an example. The basic idea is that the `string_len` function wraps a given `Future` instance into a new `StringLen` struct, which also implements `Future`. When the wrapped future is polled, it polls the inner future. If the value is not ready yet, `Poll::Pending` is returned from the wrapped future too. If the value is ready, the string is extracted from the `Poll::Ready` variant and its length is calculated. Afterwards, it is wrapped in `Poll::Ready` again and returned. + +[_pinning_]: https://doc.rust-lang.org/stable/core/pin/index.html + +Manually writing correct combinator methods is difficult, therefore they are often provided by libraries. While the Rust standard library itself provides no combinator methods yet, the semi-official (and `no_std` compatible) [`futures`] crate does. Its [`FutureExt`] trait provides high-level combinator methods such as [`map`] or [`then`], which can be used to manipulate the result with arbitrary closures. + +[`futures`]: https://docs.rs/futures/0.3.4/futures/ +[`FutureExt`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html +[`map`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.map +[`then`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.then + +##### Advantages + +The big advantage of future combinators is that they keep the operations asynchronous. In combination with asynchronous I/O interfaces, this approach can lead to very high performance. The fact that future combinators are implemented as normal structs with trait implementations allows the compiler to excessively optimizing them to a efficient state machine. For more details, see the [_Zero-cost futures in Rust_] post, which announced the addition of futures to the Rust ecosystem. + +[_Zero-cost futures in Rust_]: https://aturon.github.io/blog/2016/08/11/futures/ + +##### Drawbacks + +While future combinators make it possible to write very efficient code, they can be difficult to use in some situations because of the type system and the closure based interface. For example, consider code like this: + +```rust +async_read_file("foo.txt").then(|content| { + if content.len() > 100 { + Either::Left(async_read_file("bar.txt")) + } else { + Either::Right(future::ready(content)) + } +}) +``` + +([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=97a2d231584113452ff9e67d1b34604c)) + +Here we read the file `foo.txt` and then use the [`then`] combinator to chain a second future based on the file content. If the content length is greater than 100, we read a different `bar.txt` file and return its content, otherwise we return the content of `foo.txt`. + +The reason for the [`Either`] wrapper is that if and else blocks must always have the same type. Since we return different future types in the blocks, we must use the wrapper type to unify them into a single type. The [`ready`] function wraps a value into a future, which is immediately ready. The function is required here because the `Either` wrapper expects that the wrapped value implements `Future`. + +[`Either`]: https://docs.rs/futures/0.3.4/futures/future/enum.Either.html +[`ready`]: https://docs.rs/futures/0.3.4/futures/future/fn.ready.html + +As you can imagine, this can quickly lead to very complex code for larger projects. It gets especially complicated if borrowing and different lifetimes are involved. For this reason, a lot of work was invested to add support for async/await to Rust, with the goal of making asynchronous code radically simpler to write. + +### The Async/Await Pattern + +The idea behind async/await is to let the programmer write code that _looks_ like normal synchronous code, but is turned into asynchronous code by the compiler. It works based on the two keywords `async` and `await`. The `async` keyword can be used in a function signature to turn a synchronous function into an asynchronous function that returns a future: + +```rust +async fn foo() -> u32 { + 0 +} + +// the above is roughly translated by the compiler to: +fn foo() -> impl Future { + future::ready(0) +} +``` + +This keyword alone wouldn't be that useful. However, inside `async` functions, the `await` keyword can be used to retrieve the asynchronous value of a future: + +```rust +async fn foo() -> String { + let content = async_read_file("foo.txt").await; + if content.len() > 100 { + async_read_file("bar.txt").await + } else { + content + } +} +``` + +([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=9f94ac348c2b7f5421a50e2a02f33b1d)) + +This function is a direct translation of the future combinator code example, which required the `Either` wrapper type. Using the `.await` operator, we can retrieve the value of a future without needing any closures. As a result, we can write our code like we write normal synchronous code, with the difference that _this is still asynchronous code_. + +#### State Machine Transformation + +What the compiler does behind this scenes is to transform the body of the `async` function into a [_state machine_], with each `.await` call representing a different state. For the above `foo` function, the compiler creates a state machine with the following four states: + +[_state machine_]: https://en.wikipedia.org/wiki/Finite-state_machine + +``` +start waiting on 1st future waiting on 2nd future end +``` + +This state machine implements the `Future` trait by making each `poll` call a possible state switch event: + +``` +start waiting on 1st future waiting on 2nd future end +| ^ ^ ^ +| | | | +------------------------------------------------------------ +``` + +The first `poll` call starts the function and lets it run until it reaches a future that is not ready yet. If all futures are ready, the function can run till its end and return its return value wrapped in `Poll::Ready`. Otherwise, `Poll::Pending` is returned. Internally, the stack machine keeps track of the active state, so that it can continue there on the next `poll` call. + +On subsequent calls to `poll`, the state machine continues from the current state and polls the future it currently waits on again. In case it is ready now, it continues execution until it reaches the next future that is not ready. If it is still not ready, it stays in the state and returns `Poll::Pending` again. + + + + + + + + + + + + + + +### The Async Keyword + +The purpose of the async/await pattern is to make working with futures easier. Rust has language-level support for this pattern built on the two keywords `async` and `await`. We will explain them individually, starting with `async`. + +The purpose of the `async` keyword is to turn a synchronous function into an asynchronous function that returns a `Future`: + +```rust +fn synchronous() -> u32 { + 42 +} + +async fn asynchronous() -> u32 { + 42 +} +``` + +While both functions specify a return type of `u32`, the `async` keyword turns the return type of the second function into `impl Future`. So instead of returning an `u32` directly, the `asynchronous` function returns a type that implements the `Future` trait with output type `u32`. We can see this when we try to assign the result to a variable of type `u32`: + +```rust +let val: u32 = asynchronous(); +``` + +The compiler responds with the following error ([try it on the playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=590273d2f4ef75eb890c5354f788e29c)): + +``` +error[E0308]: mismatched types + --> src/main.rs:3:23 + | +3 | let val: u32 = asynchronous(); + | --- ^^^^^^^^^^^^^^ expected `u32`, found opaque type + | | + | expected due to this +... +10 | async fn asynchronous() -> u32 { + | --- the `Output` of this `async fn`'s found opaque type + | + = note: expected type `u32` + found opaque type `impl std::future::Future` +``` + +The relevant part of that error message are the last two lines: It expects an `u32` because of the type annotation, but the function returned an implementation of the `Future` trait instead. + +Of course, changing the return type alone would not work. Instead, the compiler also needs to convert the function body, which is `42` in our case, into a future. Since `42` is not asynchronous, the compiler just generates a future that returns the result on the first `poll`. The generated code _could_ look something like this: + +```rust +struct GeneratedFuture; + +impl Future for GeneratedFuture { + type Output = u32; + + fn poll(self: Pin<&mut Self>, _cx: &mut Context) -> Poll { + Poll::Ready(42) + } +} + +fn asynchronous() -> impl Future { + GeneratedFuture +} +``` + +Instead of returning `u32`, the `asynchronous` function now returns an instance of a new `GeneratedFuture` struct. This struct implements the `Future` trait by returning `Poll::Ready(42)` on `poll`. The `42` is the body of `asynchronous` in this case. + +Note that this is just an example implementation. The actual code generated by the compiler uses a much more powerful approach, which we will explain in a moment. + +In addition to `async` futures, Rust also supports `async` blocks: + +```rust +let future = async { + 42 +}; +``` + +The `future` variable also has the type `impl Future` in this case. The generated code is very similar to the `async fn`, only without a function call: `let future = GeneratedFuture;`. + +We now know roughly what the `async` keyword does, but we still don't know why it's useful yet. After all, there is no advantage of returning a `impl Future` instead of returning the `u32` directly. To answer this question, we have to explore different ways to work with futures. + + + + +#### Await ### Generators diff --git a/blog/static/css/main.css b/blog/static/css/main.css index dd835c03..f35814d9 100644 --- a/blog/static/css/main.css +++ b/blog/static/css/main.css @@ -430,6 +430,10 @@ details summary h3, details summary h4, details summary h5, details summary h6 { margin-top: .5rem; } +h5 { + font-style: italic; + font-size: 0.9rem; +} .gray { color: gray; } From 51a02a40647634e87ddbb5f68a23254f0c594b91 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 20 Feb 2020 10:43:53 +0100 Subject: [PATCH 04/51] Typo fix --- blog/content/second-edition/posts/12-async-await/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 971b274d..968720df 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -104,7 +104,7 @@ The concept of futures is best illustrated with a small example: ![Sequence diagram: main calls `read_file` and is blocked until it returns; then it calls `foo()` and is also blocked until it returns. The same process is repeated, but this time `async_read_file` is called, which directly returns a future; then `foo()` is called again, which now runs concurrently to the file load. The file is available before `foo()` returns.](async-example.svg) -This sequence diagram shows a `main` function that reads a file from the file system and then calls a function `foo`. This process is repeated to times: Once with a synchronous `read_file` call and once with an asynchronous `async_read_file` call. +This sequence diagram shows a `main` function that reads a file from the file system and then calls a function `foo`. This process is repeated two times: Once with a synchronous `read_file` call and once with an asynchronous `async_read_file` call. With the synchronous call, the `main` function needs to wait until the file is loaded from the file system. Only then it can call the `foo` function, which requires it to again wait for the result. From 3cff5d09617c0de74a77df49ee94f90b1d58a1cc Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 20 Feb 2020 11:48:03 +0100 Subject: [PATCH 05/51] Small improvements --- .../second-edition/posts/12-async-await/index.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 968720df..be42893f 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -145,13 +145,13 @@ The `poll` method takes two arguments: `self: Pin<&mut Self>` and `cx: &mut Cont [_pinned_]: https://doc.rust-lang.org/nightly/core/pin/index.html -The purpose of the `cx: &mut Context` parameter is to pass a [`Waker`] instance to the asynchronous task, e.g. the file system load. This `Waker` allows the asynchronous task to signal that it (or a part of it) is finished, e.g. that the file was loaded from disk. Since the main task knows that it will be notified when the `Future` is ready, it does not need to call `poll` over and over again. We will explain this process in more detail later when we implement an own `Waker` type. +The purpose of the `cx: &mut Context` parameter is to pass a [`Waker`] instance to the asynchronous task, e.g. the file system load. This `Waker` allows the asynchronous task to signal that it (or a part of it) is finished, e.g. that the file was loaded from disk. Since the main task knows that it will be notified when the `Future` is ready, it does not need to call `poll` over and over again. We will explain this process in more detail later in this post when we implement an own `Waker` type. [`Waker`]: https://doc.rust-lang.org/nightly/core/task/struct.Waker.html ### Working with Futures -We now know how futures are defined and the rough idea behind the `poll` method. However, we still don't know how to effectively work with futures. The problem is that futures represent results of asynchronous tasks, which might be not available yet. In practice, however, we often need these values directly for further calculations. So the question is: How can we efficiently retrieve the value of a future when we need it? +We now know how futures are defined and understand the basic idea behind the `poll` method. However, we still don't know how to effectively work with futures. The problem is that futures represent results of asynchronous tasks, which might be not available yet. In practice, however, we often need these values directly for further calculations. So the question is: How can we efficiently retrieve the value of a future when we need it? #### Waiting on Futures @@ -169,7 +169,7 @@ let file_content = loop { Here we _actively_ wait for the future by calling `poll` over and over again in a loop. The arguments to `poll` don't matter here, so we omitted them. While this solution works, it is very inefficient because we keep the CPU busy until the value becomes available. -A more effective approach could be to _block_ the current thread until the future becomes available. This is of course only possible if you have threads, so this solution does not work for kernel, at least not yet. Even on systems where blocking is supported, it is often not desired because it turns an asynchronous task into a synchronous task again, thereby inhibiting the potential performance benefits. +A more efficient approach could be to _block_ the current thread until the future becomes available. This is of course only possible if you have threads, so this solution does not work for kernel, at least not yet. Even on systems where blocking is supported, it is often not desired because it turns an asynchronous task into a synchronous task again, thereby inhibiting the potential performance benefits of parallel tasks. #### Future Combinators @@ -214,7 +214,9 @@ This code does not quite work because it does not handle [_pinning_], but it suf [_pinning_]: https://doc.rust-lang.org/stable/core/pin/index.html -Manually writing correct combinator methods is difficult, therefore they are often provided by libraries. While the Rust standard library itself provides no combinator methods yet, the semi-official (and `no_std` compatible) [`futures`] crate does. Its [`FutureExt`] trait provides high-level combinator methods such as [`map`] or [`then`], which can be used to manipulate the result with arbitrary closures. +With this `string_len` function, we can calculate the length of an asynchronous string without waiting for it. Since the function returns a `Future` again, the caller can't work directly on the returned value, but needs to use combinator functions again. This way, the whole call graph becomes asynchronous and we can efficiently wait for multiple futures at once at some point, e.g. in the main function. + +Manually writing combinator functions is difficult, therefore they are often provided by libraries. While the Rust standard library itself provides no combinator methods yet, the semi-official (and `no_std` compatible) [`futures`] crate does. Its [`FutureExt`] trait provides high-level combinator methods such as [`map`] or [`then`], which can be used to manipulate the result with arbitrary closures. [`futures`]: https://docs.rs/futures/0.3.4/futures/ [`FutureExt`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html @@ -223,7 +225,7 @@ Manually writing correct combinator methods is difficult, therefore they are oft ##### Advantages -The big advantage of future combinators is that they keep the operations asynchronous. In combination with asynchronous I/O interfaces, this approach can lead to very high performance. The fact that future combinators are implemented as normal structs with trait implementations allows the compiler to excessively optimizing them to a efficient state machine. For more details, see the [_Zero-cost futures in Rust_] post, which announced the addition of futures to the Rust ecosystem. +The big advantage of future combinators is that they keep the operations asynchronous. In combination with asynchronous I/O interfaces, this approach can lead to very high performance. The fact that future combinators are implemented as normal structs with trait implementations allows the compiler to excessively optimizing them. For more details, see the [_Zero-cost futures in Rust_] post, which announced the addition of futures to the Rust ecosystem. [_Zero-cost futures in Rust_]: https://aturon.github.io/blog/2016/08/11/futures/ From 3d89841a51cbf6a0b9d13706500f93bedad6e538 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 20 Feb 2020 14:01:35 +0100 Subject: [PATCH 06/51] Update async/await sections and 'saving state' section --- .../async-state-machine-basic.svg | 3 + .../async-state-machine-states.svg | 3 + .../posts/12-async-await/index.md | 97 +++++++++++++------ 3 files changed, 72 insertions(+), 31 deletions(-) create mode 100644 blog/content/second-edition/posts/12-async-await/async-state-machine-basic.svg create mode 100644 blog/content/second-edition/posts/12-async-await/async-state-machine-states.svg diff --git a/blog/content/second-edition/posts/12-async-await/async-state-machine-basic.svg b/blog/content/second-edition/posts/12-async-await/async-state-machine-basic.svg new file mode 100644 index 00000000..e310f0bd --- /dev/null +++ b/blog/content/second-edition/posts/12-async-await/async-state-machine-basic.svg @@ -0,0 +1,3 @@ + + +
Waiting on foo.txt
Waiting on foo.txt
End
End
Waiting on bar.txt
Waiting on bar.txt
Start
Start
poll()
poll()
foo.txt
ready?
foo.txt...
no
no
bar.txt
ready?
bar.txt...
yes
yes
no
no
yes
y...
poll()
p...
poll()
p...
Viewer does not support full SVG 1.1
\ No newline at end of file diff --git a/blog/content/second-edition/posts/12-async-await/async-state-machine-states.svg b/blog/content/second-edition/posts/12-async-await/async-state-machine-states.svg new file mode 100644 index 00000000..fa095e1e --- /dev/null +++ b/blog/content/second-edition/posts/12-async-await/async-state-machine-states.svg @@ -0,0 +1,3 @@ + + +
Waiting on foo.txt
Waiting on foo.txt
End
End
Waiting on bar.txt
Waiting on bar.txt
Start
Start
Viewer does not support full SVG 1.1
\ No newline at end of file diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index be42893f..dbaa7380 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -234,21 +234,24 @@ The big advantage of future combinators is that they keep the operations asynchr While future combinators make it possible to write very efficient code, they can be difficult to use in some situations because of the type system and the closure based interface. For example, consider code like this: ```rust -async_read_file("foo.txt").then(|content| { - if content.len() > 100 { - Either::Left(async_read_file("bar.txt")) - } else { - Either::Right(future::ready(content)) - } -}) +fn example(min_len: usize) -> impl Future { + async_read_file("foo.txt").then(move |content| { + if content.len() < min_len { + Either::Left(async_read_file("bar.txt").map(|s| content + &s)) + } else { + Either::Right(future::ready(content)) + } + }) +} ``` -([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=97a2d231584113452ff9e67d1b34604c)) +([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=91fc09024eecb2448a85a7ef6a97b8d8)) -Here we read the file `foo.txt` and then use the [`then`] combinator to chain a second future based on the file content. If the content length is greater than 100, we read a different `bar.txt` file and return its content, otherwise we return the content of `foo.txt`. +Here we read the file `foo.txt` and then use the [`then`] combinator to chain a second future based on the file content. If the content length is smaller than the given `min_len`, we read a different `bar.txt` file and append it to `content` using the [`map`] combinator. Otherwise we return only the content of `foo.txt`. -The reason for the [`Either`] wrapper is that if and else blocks must always have the same type. Since we return different future types in the blocks, we must use the wrapper type to unify them into a single type. The [`ready`] function wraps a value into a future, which is immediately ready. The function is required here because the `Either` wrapper expects that the wrapped value implements `Future`. +We need to use the [`move` keyword] for the closure passed to `then` because otherwise there would be a lifetime error for `min_len`. The reason for the [`Either`] wrapper is that if and else blocks must always have the same type. Since we return different future types in the blocks, we must use the wrapper type to unify them into a single type. The [`ready`] function wraps a value into a future, which is immediately ready. The function is required here because the `Either` wrapper expects that the wrapped value implements `Future`. +[`move` keyword]: https://doc.rust-lang.org/std/keyword.move.html [`Either`]: https://docs.rs/futures/0.3.4/futures/future/enum.Either.html [`ready`]: https://docs.rs/futures/0.3.4/futures/future/fn.ready.html @@ -272,45 +275,77 @@ fn foo() -> impl Future { This keyword alone wouldn't be that useful. However, inside `async` functions, the `await` keyword can be used to retrieve the asynchronous value of a future: ```rust -async fn foo() -> String { +async fn example(min_len: usize) -> String { let content = async_read_file("foo.txt").await; - if content.len() > 100 { - async_read_file("bar.txt").await + if content.len() < min_len { + content + &async_read_file("bar.txt").await } else { content } } ``` -([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=9f94ac348c2b7f5421a50e2a02f33b1d)) +([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=d93c28509a1c67661f31ff820281d434)) -This function is a direct translation of the future combinator code example, which required the `Either` wrapper type. Using the `.await` operator, we can retrieve the value of a future without needing any closures. As a result, we can write our code like we write normal synchronous code, with the difference that _this is still asynchronous code_. +This function is a direct translation of the `example` function above, which used combinator functions. Using the `.await` operator, we can retrieve the value of a future without needing any closures or `Either` types. As a result, we can write our code like we write normal synchronous code, with the difference that _this is still asynchronous code_. #### State Machine Transformation -What the compiler does behind this scenes is to transform the body of the `async` function into a [_state machine_], with each `.await` call representing a different state. For the above `foo` function, the compiler creates a state machine with the following four states: +What the compiler does behind this scenes is to transform the body of the `async` function into a [_state machine_], with each `.await` call representing a different state. For the above `example` function, the compiler creates a state machine with the following four states: [_state machine_]: https://en.wikipedia.org/wiki/Finite-state_machine -``` -start waiting on 1st future waiting on 2nd future end +![Four states: start, waiting on foo.txt, waiting on bar.txt, end](async-state-machine-states.svg) + +Each state represents a different pause point of the function. The _"Start"_ and _"End"_ states represent the function at the beginning and end of its execution. The _"Waiting on foo.txt"_ state represents that the function is currently waiting for the first `async_read_file` result. Similarly, the _"Waiting on bar.txt"_ state represents the pause point where the function is waiting on the second `async_read_file` result. + +The state machine implements the `Future` trait by making each `poll` call a possible state transition: + +![Four states: start, waiting on foo.txt, waiting on bar.txt, end](async-state-machine-basic.svg) + +The diagram uses arrows to represent state switches and diamond shapes to represent alternative ways. For example, if the `foo.txt` file is not ready, the path marked with _"no"_ is takes and the _"Waiting on foo.txt"_ state is reached. Otherwise, the _"yes"_ path is taken. The small red diamond without caption represents the `if content.len() < 100` branch of the `example` function. + +We see that the first `poll` call starts the function and lets it run until it reaches a future that is not ready yet. If all futures on the path are ready, the function can run till the _"End"_ state, where it returns its result wrapped in `Poll::Ready`. Otherwise, the state machine enters a waiting state and returns `Poll::Pending`. On the next `poll` call, the state machine then starts from the last waiting state and retries the last operation. + +#### Saving State + +In order to be able to continue from the last waiting state, the state machine must save it internally. In addition, it must save all the variables that it needs to continue execution on the next `poll` call. This is where the compiler can really shine: Since it knows which variables are used when, it can automatically generate structs with exactly the variables that are needed. + +As an example, the compiler generates the following structs for the above `example` function: + +```rust +// The `example` function again so that you don't have to scroll up +async fn example(min_len: usize) -> String { + let content = async_read_file("foo.txt").await; + if content.len() < min_len { + content + &async_read_file("bar.txt").await + } else { + content + } +} + +// The compiler-generated state structs: + +struct StartState { + min_len: usize, +} + +struct WaitingOnFooTxtState { + min_len: usize, +} + +struct WaitingOnBarTxtState { + content: String, +} + +struct EndState {} ``` -This state machine implements the `Future` trait by making each `poll` call a possible state switch event: - -``` -start waiting on 1st future waiting on 2nd future end -| ^ ^ ^ -| | | | ------------------------------------------------------------- -``` - -The first `poll` call starts the function and lets it run until it reaches a future that is not ready yet. If all futures are ready, the function can run till its end and return its return value wrapped in `Poll::Ready`. Otherwise, `Poll::Pending` is returned. Internally, the stack machine keeps track of the active state, so that it can continue there on the next `poll` call. - -On subsequent calls to `poll`, the state machine continues from the current state and polls the future it currently waits on again. In case it is ready now, it continues execution until it reaches the next future that is not ready. If it is still not ready, it stays in the state and returns `Poll::Pending` again. - +In the "start" and _"Waiting on foo.txt"_ states, the `min_len` parameter needs to be stored because it is required for the comparison with `content.len()` later. It is no longer stored in the _"Waiting on bar.txt"_ state because `min_len` is no longer needed after the comparison. In the _"end"_ state, no variables are stored because the function did already run to completion. +Keep in mind that this is only an example for the code that the compiler could generate. The struct names and the field layout are an implementation detail and might be different. +#### The Full State Machine Type From 7ce491df535f5de5099577aab981dc06acf44d5f Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 20 Feb 2020 14:38:30 +0100 Subject: [PATCH 07/51] Start creating full state machine for example --- .../posts/12-async-await/index.md | 98 ++++++++++++++++++- 1 file changed, 97 insertions(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index dbaa7380..37733985 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -332,23 +332,119 @@ struct StartState { struct WaitingOnFooTxtState { min_len: usize, + foo_txt_future: impl Future, } struct WaitingOnBarTxtState { content: String, + bar_txt_future: impl Future, } struct EndState {} ``` -In the "start" and _"Waiting on foo.txt"_ states, the `min_len` parameter needs to be stored because it is required for the comparison with `content.len()` later. It is no longer stored in the _"Waiting on bar.txt"_ state because `min_len` is no longer needed after the comparison. In the _"end"_ state, no variables are stored because the function did already run to completion. +In the "start" and _"Waiting on foo.txt"_ states, the `min_len` parameter needs to be stored because it is required for the comparison with `content.len()` later. The _"Waiting on foo.txt"_ state additionally stores a `foo_txt_future`, which represents the future returned by the `async_read_file` call. This future needs to be polled again when the state machine continues, so it needs to be saved. + +The _"Waiting on bar.txt"_ state contains the `content` variable because it is needed for the string concatenation after `bar.txt` is ready. It also stores a `bar_txt_future` that represents the in-progress load of `bar.txt`. The struct does not contain the `min_len` variable because it is no longer needed after the `content.len()` comparison. In the _"end"_ state, no variables are stored because the function did already run to completion. Keep in mind that this is only an example for the code that the compiler could generate. The struct names and the field layout are an implementation detail and might be different. #### The Full State Machine Type +While the exact compiler-generated code is an implementation detail, it helps in understanding to imagine how the generated state machine _could_ look for the `example` function. We already defined the structs representing the different states and containing the required variables. To create a state machine on top of them, we can combine them into an [`enum`]: + +[`enum`]: https://doc.rust-lang.org/book/ch06-01-defining-an-enum.html + +```rust +enum ExampleStateMachine { + Start(StartState), + WaitingOnFooTxt(WaitingOnFooTxtState), + WaitingOnBarTxt(WaitingOnBarTxtState), + End(EndState), +} +``` + +We define a separate enum variant for each state and add the corresponding state struct to each variant as a field. To implement the state transitions, the compiler generates an implementation of the `Future` trait based on the `example` function: + +```rust +impl Future for ExampleStateMachine { + type Output = String; // return type of `example` + + fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll { + loop { + match self { // TODO: handle pinning + ExampleStateMachine::Start(state) => {…} + ExampleStateMachine::WaitingOnFooTxt(state) => {…} + ExampleStateMachine::WaitingOnFooTxt(state) => {…} + ExampleStateMachine::End(state) => {…} + } + } + } +} +``` + +TODO +```rust +impl Future for ExampleStateMachine { + type Output = String; // return type of `example` + + fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll { + loop { + match self { // TODO: handle pinning + ExampleStateMachine::Start(state) => { + // from body of `example` + let foo_txt_future = async_read_file("foo.txt"); + // `.await` operation + let state = WaitingOnFooTxtState { + min_len: state.min_len, + foo_txt_future, + }; + *self = ExampleStateMachine::WaitingOnFooTxt(state); + } + ExampleStateMachine::WaitingOnFooTxt(state) => { + match state.foo_txt_future.poll(cx) { + Poll::Pending => return Poll::Pending, + Poll::Ready(content) => { + // from body of `example` + if content.len() < state.min_len { + let bar_txt_future = async_read_file("bar.txt"); + // `.await` operation + let state = WaitingOnBarTxtState { + content, + bar_txt_future, + }; + *self = ExampleStateMachine::WaitingOnBarTxt(state); + } else { + *self = ExampleStateMachine::End(EndState)); + return Poll::Ready(content); + } + } + } + } + ExampleStateMachine::WaitingOnFooTxt(state) => { + match state.bar_txt_future.poll(cx) { + match state.bar_txt_future.poll(cx) { + Poll::Pending => return Poll::Pending, + Poll::Ready(bar_txt) => { + *self = ExampleStateMachine::End(EndState)); + // from body of `example` + return Poll::Ready(state.content + &bar_txt); + } + } + } + } + ExampleStateMachine::End(_) => { + panic!("poll called after Poll::Ready was returned"); + } + } + } + } +} +``` + +### Pinning From 2ff011ffba95018013ad753e87116baf11fefb61 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 20 Feb 2020 14:40:47 +0100 Subject: [PATCH 08/51] Split code example into individual match cases; add code for `example` --- .../posts/12-async-await/index.md | 107 +++++++++--------- 1 file changed, 56 insertions(+), 51 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 37733985..e0ddc86a 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -387,62 +387,67 @@ TODO ```rust -impl Future for ExampleStateMachine { - type Output = String; // return type of `example` - - fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll { - loop { - match self { // TODO: handle pinning - ExampleStateMachine::Start(state) => { - // from body of `example` - let foo_txt_future = async_read_file("foo.txt"); - // `.await` operation - let state = WaitingOnFooTxtState { - min_len: state.min_len, - foo_txt_future, - }; - *self = ExampleStateMachine::WaitingOnFooTxt(state); - } - ExampleStateMachine::WaitingOnFooTxt(state) => { - match state.foo_txt_future.poll(cx) { - Poll::Pending => return Poll::Pending, - Poll::Ready(content) => { - // from body of `example` - if content.len() < state.min_len { - let bar_txt_future = async_read_file("bar.txt"); - // `.await` operation - let state = WaitingOnBarTxtState { - content, - bar_txt_future, - }; - *self = ExampleStateMachine::WaitingOnBarTxt(state); - } else { - *self = ExampleStateMachine::End(EndState)); - return Poll::Ready(content); - } - } - } - } - ExampleStateMachine::WaitingOnFooTxt(state) => { - match state.bar_txt_future.poll(cx) { - match state.bar_txt_future.poll(cx) { - Poll::Pending => return Poll::Pending, - Poll::Ready(bar_txt) => { - *self = ExampleStateMachine::End(EndState)); - // from body of `example` - return Poll::Ready(state.content + &bar_txt); - } - } - } - } - ExampleStateMachine::End(_) => { - panic!("poll called after Poll::Ready was returned"); - } +ExampleStateMachine::Start(state) => { + // from body of `example` + let foo_txt_future = async_read_file("foo.txt"); + // `.await` operation + let state = WaitingOnFooTxtState { + min_len: state.min_len, + foo_txt_future, + }; + *self = ExampleStateMachine::WaitingOnFooTxt(state); +} +``` +```rust +ExampleStateMachine::WaitingOnFooTxt(state) => { + match state.foo_txt_future.poll(cx) { + Poll::Pending => return Poll::Pending, + Poll::Ready(content) => { + // from body of `example` + if content.len() < state.min_len { + let bar_txt_future = async_read_file("bar.txt"); + // `.await` operation + let state = WaitingOnBarTxtState { + content, + bar_txt_future, + }; + *self = ExampleStateMachine::WaitingOnBarTxt(state); + } else { + *self = ExampleStateMachine::End(EndState)); + return Poll::Ready(content); } } } } ``` +```rust +ExampleStateMachine::WaitingOnFooTxt(state) => { + match state.bar_txt_future.poll(cx) { + match state.bar_txt_future.poll(cx) { + Poll::Pending => return Poll::Pending, + Poll::Ready(bar_txt) => { + *self = ExampleStateMachine::End(EndState)); + // from body of `example` + return Poll::Ready(state.content + &bar_txt); + } + } + } +} +``` +```rust +ExampleStateMachine::End(_) => { + panic!("poll called after Poll::Ready was returned"); +} +``` + + +```rust +fn example(min_len: usize) -> ExampleStateMachine { + ExampleStateMachine::Start(StartState { + min_len, + }) +} +``` ### Pinning From 868a6f03ec33e5fd4c64a1dd5a14244834cd222c Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 20 Feb 2020 16:21:32 +0100 Subject: [PATCH 09/51] Add explantion for state machine code --- .../posts/12-async-await/index.md | 55 ++++++++++++++++--- 1 file changed, 46 insertions(+), 9 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index e0ddc86a..b97e0c5e 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -383,8 +383,11 @@ impl Future for ExampleStateMachine { } ``` -TODO +The `Output` type of the future is `String` because it's the return type of the `example` function. To implement the `poll` function, we use a match statement on the current state inside a `loop`. The idea is that we switch to the next state as long as possible and use an explicit `return Poll::Pending` when we can't continue. +For simplicitly, we only show simplified code and don't handle [pinning][_pinning_], lifetimes, etc. So this and the following code should be treated as pseudo-code and not used directly. Of course, the real compiler-generated code handles everything correctly. + +To keep the code excerpts small, we present the code for each match arm separately. Let's begin with the `Start` state: ```rust ExampleStateMachine::Start(state) => { @@ -398,6 +401,11 @@ ExampleStateMachine::Start(state) => { *self = ExampleStateMachine::WaitingOnFooTxt(state); } ``` + +The state machine is in the `Start` state when it is right at the beginning of the function. In this case, we execute all the code from the body of the `example` function until the first `.await`. To make the code more readable, we introduce a new `foo_txt_future` to represent the future returned by `async_read_file` right before the `.await`. To handle the `.await` operation, we change the state of `self` to `WaitingOnFooTxt`, which includes the construction of the `WaitingOnFooTxtState` struct. + +Since the `match self {…}` statement is executed in a loop, the execution jumps to the `WaitingOnFooTxt` arm next: + ```rust ExampleStateMachine::WaitingOnFooTxt(state) => { match state.foo_txt_future.poll(cx) { @@ -420,26 +428,51 @@ ExampleStateMachine::WaitingOnFooTxt(state) => { } } ``` + +In this match arm we first call the `poll` function of the `foo_txt_future`. If it is not ready, we exit the loop and return `Poll::Pending` too. Since `self` stays in the `WaitingOnFooTxt` state in this case, the next `poll` call on the state machine will enter the same match arm and retry polling the `foo_txt_future`. + +When the `foo_txt_future` is ready, we assign the result to the `content` variable and continue to execute the code of the `example` function: If `content.len()` is smaller than the `min_len` saved in the state struct, the `bar.txt` file is read asynchronously. We again translate the `.await` operation into a state change, this time into the `WaitingOnBarTxt` state. Since we're executing the `match` inside a loop, the execution directly jumps to the match arm for the new state afterwards, where the `bar_txt_future` is polled. + +In case we enter the `else` branch, no further `.await` operation occurs. We reach the end of the function and return `content` wrapped in `Poll::Ready`. We also change the current state to the `End` state. + +The code for the `WaitingOnBarTxt` state looks like this: + ```rust -ExampleStateMachine::WaitingOnFooTxt(state) => { +ExampleStateMachine::WaitingOnBarTxt(state) => { match state.bar_txt_future.poll(cx) { - match state.bar_txt_future.poll(cx) { - Poll::Pending => return Poll::Pending, - Poll::Ready(bar_txt) => { - *self = ExampleStateMachine::End(EndState)); - // from body of `example` - return Poll::Ready(state.content + &bar_txt); - } + Poll::Pending => return Poll::Pending, + Poll::Ready(bar_txt) => { + *self = ExampleStateMachine::End(EndState)); + // from body of `example` + return Poll::Ready(state.content + &bar_txt); } } } ``` + +Similar to the `WaitingOnFooTxt` state, we start by polling the `bar_txt_future`. If it is still pending, we exit the loop and return `Poll::Pending` too. Otherwise, we can perform the last operation of the `example` function: Concatenating the `content` variable with the result from the future. We update the state machine to the `End` state and then return the result wrapped in `Poll::Ready`. + +Finally, the code for the `End` state looks like this: + ```rust ExampleStateMachine::End(_) => { panic!("poll called after Poll::Ready was returned"); } ``` +Futures should not be polled again after they returned `Poll::Ready`, therefore we panic if `poll` is called when we are already in the `End` state. + +We now know how the compiler-generated state machine and its implementation of the `Future` trait _could_ look like. In practice, the compiler generates code in different way. (In case you're interested, the implementation is currently based on [_generators_], but this is only an implementation detail.) + +[_generators_]: https://doc.rust-lang.org/nightly/unstable-book/language-features/generators.html + +The last piece of the puzzle is the generated code for the `example` function itself. Remember, the function header was defined like this: + +```rust +async fn example(min_len: usize) -> String +``` + +Since the complete function body is now implemented by the state machine, the only thing that the function needs to do is to initialize the state machine. The generated code for this could look like this: ```rust fn example(min_len: usize) -> ExampleStateMachine { @@ -449,6 +482,10 @@ fn example(min_len: usize) -> ExampleStateMachine { } ``` +The function no longer has an `async` modifier since it now explicitly returns a `ExampleStateMachine` type, which implements the `Future` trait. As expected, the state machine is constructed in the `Start` state and the corresponding state struct is initialized with the `min_len` parameter. + +Note that this function does not start the execution of the state machine. This is a fundamental design decision of Rust's futures: They do nothing until they are polled for the first time. + ### Pinning From 58faf5adf0620c63d154d7c9a3cb9a6c82d563cd Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Tue, 25 Feb 2020 14:20:26 +0100 Subject: [PATCH 10/51] Remove old section --- .../posts/12-async-await/index.md | 94 +------------------ 1 file changed, 3 insertions(+), 91 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index b97e0c5e..fd0e1474 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -488,97 +488,9 @@ Note that this function does not start the execution of the state machine. This ### Pinning - - - - - - - -### The Async Keyword - -The purpose of the async/await pattern is to make working with futures easier. Rust has language-level support for this pattern built on the two keywords `async` and `await`. We will explain them individually, starting with `async`. - -The purpose of the `async` keyword is to turn a synchronous function into an asynchronous function that returns a `Future`: - -```rust -fn synchronous() -> u32 { - 42 -} - -async fn asynchronous() -> u32 { - 42 -} -``` - -While both functions specify a return type of `u32`, the `async` keyword turns the return type of the second function into `impl Future`. So instead of returning an `u32` directly, the `asynchronous` function returns a type that implements the `Future` trait with output type `u32`. We can see this when we try to assign the result to a variable of type `u32`: - -```rust -let val: u32 = asynchronous(); -``` - -The compiler responds with the following error ([try it on the playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=590273d2f4ef75eb890c5354f788e29c)): - -``` -error[E0308]: mismatched types - --> src/main.rs:3:23 - | -3 | let val: u32 = asynchronous(); - | --- ^^^^^^^^^^^^^^ expected `u32`, found opaque type - | | - | expected due to this -... -10 | async fn asynchronous() -> u32 { - | --- the `Output` of this `async fn`'s found opaque type - | - = note: expected type `u32` - found opaque type `impl std::future::Future` -``` - -The relevant part of that error message are the last two lines: It expects an `u32` because of the type annotation, but the function returned an implementation of the `Future` trait instead. - -Of course, changing the return type alone would not work. Instead, the compiler also needs to convert the function body, which is `42` in our case, into a future. Since `42` is not asynchronous, the compiler just generates a future that returns the result on the first `poll`. The generated code _could_ look something like this: - -```rust -struct GeneratedFuture; - -impl Future for GeneratedFuture { - type Output = u32; - - fn poll(self: Pin<&mut Self>, _cx: &mut Context) -> Poll { - Poll::Ready(42) - } -} - -fn asynchronous() -> impl Future { - GeneratedFuture -} -``` - -Instead of returning `u32`, the `asynchronous` function now returns an instance of a new `GeneratedFuture` struct. This struct implements the `Future` trait by returning `Poll::Ready(42)` on `poll`. The `42` is the body of `asynchronous` in this case. - -Note that this is just an example implementation. The actual code generated by the compiler uses a much more powerful approach, which we will explain in a moment. - -In addition to `async` futures, Rust also supports `async` blocks: - -```rust -let future = async { - 42 -}; -``` - -The `future` variable also has the type `impl Future` in this case. The generated code is very similar to the `async fn`, only without a function call: `let future = GeneratedFuture;`. - -We now know roughly what the `async` keyword does, but we still don't know why it's useful yet. After all, there is no advantage of returning a `impl Future` instead of returning the `u32` directly. To answer this question, we have to explore different ways to work with futures. - - - - -#### Await - -### Generators - - +### Executors + +## Implementation From 642ff0f27f697ac0c5b0684dcf578b22074d618c Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Tue, 25 Feb 2020 16:11:09 +0100 Subject: [PATCH 11/51] Minor improvement --- blog/content/second-edition/posts/12-async-await/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index fd0e1474..3ade5482 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -385,7 +385,7 @@ impl Future for ExampleStateMachine { The `Output` type of the future is `String` because it's the return type of the `example` function. To implement the `poll` function, we use a match statement on the current state inside a `loop`. The idea is that we switch to the next state as long as possible and use an explicit `return Poll::Pending` when we can't continue. -For simplicitly, we only show simplified code and don't handle [pinning][_pinning_], lifetimes, etc. So this and the following code should be treated as pseudo-code and not used directly. Of course, the real compiler-generated code handles everything correctly. +For simplicitly, we only show simplified code and don't handle [pinning][_pinning_], ownership, lifetimes, etc. So this and the following code should be treated as pseudo-code and not used directly. Of course, the real compiler-generated code handles everything correctly, albeit possibly in a different way. To keep the code excerpts small, we present the code for each match arm separately. Let's begin with the `Start` state: From bf07f26e73f3ee36e4f3a60a977aec47258988d3 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Tue, 25 Feb 2020 16:11:24 +0100 Subject: [PATCH 12/51] Begin section about pinning --- .../posts/12-async-await/index.md | 54 +++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 3ade5482..1f3845f2 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -488,6 +488,60 @@ Note that this function does not start the execution of the state machine. This ### Pinning +We already stumbled across _pinning_ multiple times in this post. Now is finally the time to explore what pinning is and why it is needed. + +#### Self-Referential Structs + +As explained above, the state machine transformation stores the local variables of each pause point in a struct. For small examples like our `example` function, this was straightforward and did not lead to any problems. However, things become more difficult when variables reference each other. For example, consider this function: + +```rust +async fn pin_example() -> i32 { + let array = [1, 2, 3]; + let element = &array[2]; + async_write_file("foo.txt", element.to_string()).await; + *element +} +``` + +This function creates a small `array` with the contents `1`, `2`, and `3`. It then creates a reference to the last array element and stores it in an `element` variable. Next, it asynchronously writes the number converted to a string to a `foo.txt` file. Finally, it returns the number referenced by `element`. + +Since the function uses a single `await` operation, the resulting state machine has three states: start, end, and "waiting on write". The function takes no arguments, so the struct for the start state is empty. Like before, the struct for the end state is empty too because the function is finished at this point. The struct for the "waiting on write" state is more interesting: + +```rust +struct WaitingOnWriteState { + array: [1, 2, 3], + element: 0x1001a, // address of the last array element +} +``` + +We need to store both the `array` and `element` variables because `element` is required for the return type and `array` is referenced by `element`. Since `element` is a reference, it stores a _pointer_ (i.e. a memory address) to the referenced element. We used `0x1001a` as an example memory address here. In reality it needs to be the address of the last element of the `array` field, so it depends on where the struct lives in memory. Structs with such internal pointers are called _self-referential_ structs because they reference themselves from one of their fields. + +#### The Problem with Self-Referential Structs + +The internal pointer of our self-referential struct leads to a fundamental problem, which becomes apparent when we look at its memory layout: + +![array at 0x10014 with fields 1, 2, and 3; element at address 0x10020, pointing to the last array element at 0x1001a](self-referential-struct.svg) + +The `array` field starts at address 0x10014 and the `element` field at address 0x10020. It points to address 0x1001a because the last array element lives at this address. At this point, everything is still fine. However, an issue occurs when we move this struct to a different memory address: + +![array at 0x10024 with fields 1, 2, and 3; element at address 0x10030, still pointing to 0x1001a, even though the last array element now lives at 0x1002a](self-referential-struct-moved.svg) + +We moved the struct a bit so that it starts at address `0x10024` now. The problem is that the `element` field still points to address `0x1001a` even though the last `array` element now lives at address `0x1002a`. Thus, the pointer is dangling with the result that undefined behavior occurs on the next `poll` call. + +#### Possible Solutions + +There are two fundamental approaches to solve the dangling pointer problem: + +- **Update the pointer on move:** The idea is to update the internal pointer whenever the struct is moved in memory so that it is still valid after the move. Unfortunately, this approach would require extensive changes to Rust that would result in potentially huge performance losses. The reason is that some kind of runtime would need to keep track of the type of all struct fields and check on every move operation whether a pointer update is required. +- **Forbid moving the struct:** As we saw above, the dangling pointer only occurs when we move the struct in memory. By completely forbidding move operations on self-referential structs, the problem can be also avoided. The big advantage of this approach is that it can be implemented at the type system level without additional runtime costs. The drawback is that it puts the burden of dealing with move operations on possibly self-referential structs on the programmer. + +Rust understandably decided for the second solution. The required type system additions were proposed in [RFC 2349](https://github.com/rust-lang/rfcs/blob/master/text/2349-pin.md). The result was the [_pinning_] API, which we already encountered a few times in this post. In the following, we will give a short overview of this API and explain how it works with async/await and futures. + +#### The `Pin` Type + + + + ### Executors ## Implementation From 81f71982f4b46c35887b6b476a9af83a84469478 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Wed, 26 Feb 2020 16:34:04 +0100 Subject: [PATCH 13/51] Finish first draft of pinning section --- .../posts/12-async-await/index.md | 166 +++++++++++++++++- 1 file changed, 164 insertions(+), 2 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 1f3845f2..1bebc11d 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -535,12 +535,174 @@ There are two fundamental approaches to solve the dangling pointer problem: - **Update the pointer on move:** The idea is to update the internal pointer whenever the struct is moved in memory so that it is still valid after the move. Unfortunately, this approach would require extensive changes to Rust that would result in potentially huge performance losses. The reason is that some kind of runtime would need to keep track of the type of all struct fields and check on every move operation whether a pointer update is required. - **Forbid moving the struct:** As we saw above, the dangling pointer only occurs when we move the struct in memory. By completely forbidding move operations on self-referential structs, the problem can be also avoided. The big advantage of this approach is that it can be implemented at the type system level without additional runtime costs. The drawback is that it puts the burden of dealing with move operations on possibly self-referential structs on the programmer. -Rust understandably decided for the second solution. The required type system additions were proposed in [RFC 2349](https://github.com/rust-lang/rfcs/blob/master/text/2349-pin.md). The result was the [_pinning_] API, which we already encountered a few times in this post. In the following, we will give a short overview of this API and explain how it works with async/await and futures. +Rust understandably decided for the second solution. For this, the [_pinning_] API was proposed in [RFC 2349](https://github.com/rust-lang/rfcs/blob/master/text/2349-pin.md). In the following, we will give a short overview of this API and explain how it works with async/await and futures. -#### The `Pin` Type +#### Heap Values +The first observation is that [heap allocated] values already have a fixed memory address most of the time. They are created using a call to `allocate` and are not moved in memory until they are freed through a `deallocate` call again. This is required because a `Box` is essentially a pointer to the heap memory, so that an address change would make the pointer invalid. +[heap allocated]: @/second-edition/posts/10-heap-allocation/index.md +Using heap allocation, we can try to create a self-referential struct: + +```rust +fn main() { + let mut heap_value = Box::new(SelfReferential { + self_ptr: 0 as *const _, + }); + let ptr = &*heap_value as *const SelfReferential; + heap_value.self_ptr = ptr; + println!("heap value at: {:p}", heap_value); + println!("internal reference: {:p}", heap_value.self_ptr); +} + +struct SelfReferential { + self_ptr: *const Self, +} +``` + +([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=ce1aff3a37fcc1c8188eeaf0f39c97e8)) + +When we execute this, we see that the address of heap value and its internal pointer are equal, which means that the `self_ptr` field is valid. Since the `heap_value` variable is only a pointer, moving it (e.g. by passing it to a function) does not change the address, so that the `self_ptr` stays valid. + +However, there is still a way to break this: We can move out of a `Box` or replace its content: + +```rust +let stack_value = mem::replace(&mut *heap_value, SelfReferential { + self_ptr: 0 as *const _, +}); +println!("value at: {:p}", &stack_value); +println!("internal reference: {:p}", stack_value.self_ptr); +``` + +([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e160ee8a64cba4cebc1c0473dcecb7c8)) + +Here we use the [`mem::replace`] function to replace the heap allocated value with a new struct instance. This allows us to move the original `heap_value` to the stack, while the `self_ptr` field of of the struct still points to the heap. When you try to run the example on the playground, you see that the printed _"value at:"_ and _"internal reference:"_ lines show indeed different pointers. + +[`mem::replace`]: https://doc.rust-lang.org/nightly/core/mem/fn.replace.html + +The fundamental problem is that `Box` allows us to get a `&mut T` reference to the heap allocated value. This `&mut` reference allows us to to use methods like [`mem::replace`] or [`mem::swap`] to invalidate the heap allocated value. To resolve this problem, we must prevent that `&mut` references to self-referential structs can be created. + +[`mem::swap`]: https://doc.rust-lang.org/nightly/core/mem/fn.swap.html + +#### `Pin>` and `Unpin` + +The pinning API provides a solution to the `&mut T` problem in form of the [`Pin`] wrapper type and the [`Unpin`] marker trait. The idea behind these types is to gate all methods of `Pin` that can be used to get `&mut` references (e.g. [`get_mut`][pin-get-mut] or [`deref_mut`][pin-deref-mut]) on the `Unpin` trait. The `Unpin` trait is an [_auto trait_], which is automatically implemented for all types except types that explicitly opt-out. By making self-referential structs opt-out of `Unpin`, there is no (safe) way to get a `&mut T` from a `Pin>` type for them. As a result, their internal self-references are guaranteed to stay valid. + +[`Pin`]: https://doc.rust-lang.org/stable/core/pin/struct.Pin.html +[`Unpin`]: https://doc.rust-lang.org/nightly/std/marker/trait.Unpin.html +[pin-get-mut]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.get_mut +[pin-deref-mut]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#impl-DerefMut +[_auto trait_]: https://doc.rust-lang.org/reference/special-types-and-traits.html#auto-traits + +As an example, let's update the `SelfReferential` type from the above example to opt-out of `Unpin`: + +```rust +use core::marker::PhantomPinned; + +struct SelfReferential { + self_ptr: *const Self, + _pin: PhantomPinned, +} +``` + +We opt-out by adding a second `_pin` field of type [`PhantomPinned`]. This type is a zero-sized marker type whose only purpose is to _not_ implement the `Unpin` trait. Because of the way [auto traits][_auto trait_] work, a single field that is not `Unpin` suffices to make the complete struct opt-out of `Unpin`. + +[`PhantomPinned`]: https://doc.rust-lang.org/nightly/core/marker/struct.PhantomPinned.html + +The second step is to change the `Box` type in the example to a `Pin>` type. The easiest way to do this is to use the [`Box::pin`] function instead of [`Box::new`] for creating the heap allocated value: + +[`Box::pin`]: https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.pin +[`Box::new`]: https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.new + +```rust +let mut heap_value = Box::pin(SelfReferential { + self_ptr: 0 as *const _, + _pin: PhantomPinned, +}); +``` + +In addition to changing `Box::new` to `Box::pin`, we also need to add the new `_pin` field in the struct initializer. Since `PhantomPinned` is a zero sized type, we only need its type name to initialize it. + +When we [try to run our adjusted example](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=961b0db194bbe851ff4d0ed08d3bd98a) now, we see that it no longer works: + +``` +error[E0594]: cannot assign to data in a dereference of `std::pin::Pin>` + --> src/main.rs:10:5 + | +10 | heap_value.self_ptr = ptr; + | ^^^^^^^^^^^^^^^^^^^^^^^^^ cannot assign + | + = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `std::pin::Pin>` + +error[E0596]: cannot borrow data in a dereference of `std::pin::Pin>` as mutable + --> src/main.rs:16:36 + | +16 | let stack_value = mem::replace(&mut *heap_value, SelfReferential { + | ^^^^^^^^^^^^^^^^ cannot borrow as mutable + | + = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `std::pin::Pin>` +``` + +Both errors occur because the `Pin>` type no longer implements the `DerefMut` trait. This exactly what we wanted because the `DerefMut` trait would return a `&mut` reference, which we want to prevent. This only works because we both opted-out of `Unpin` and changed `Box::new` to `Box::pin`. + +The problem now is that the compiler does not only prevent moving the type in line 16, but also forbids to initialize the `self_ptr` field in line 10. This happens because the compiler can't differentiate between valid and invalid uses of `&mut` references. To get the initialization working again, we have to use the unsafe [`get_unchecked_mut`] method: + +[`get_unchecked_mut`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.get_unchecked_mut + +```rust +// safe because modifying a field doesn't move the whole struct +unsafe { + let mut_ref = Pin::as_mut(&mut heap_value); + Pin::get_unchecked_mut(mut_ref).self_ptr = ptr; +} +``` + +([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=b9ebbb11429d9d79b3f9fffe819e2018)) + +The [`get_unchecked_mut`] function works on a `Pin<&mut T>` instead of a `Pin>`, so we have to use the [`Pin::as_mut`] for converting the value before. Then we can set the `self_ptr` field using the `&mut` reference returned by `get_unchecked_mut`. + +[`Pin::as_mut`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.as_mut + +Now the only error left is the desired error on `mem::replace`. Remember, this operation tries to move the heap allocated value to stack, which would break the self-reference stored in the `self_ptr` field. By opting out of `Unpin` and using `Pin>`, we can prevent this error and safely work with self-referential structs. Note that the compiler is not able to prove that the creation of the self-reference is safe (yet), so we need to use an unsafe block and verify the correctness ourselves. + +#### Stack Pinning and `Pin<&mut T>` + +In the previous section we learned how to use `Pin>` to safely create a heap allocated self-referential value. While this approach works fine and is relatively safe (apart from the unsafe construction), the required heap allocation comes with a performance cost. Since Rust always wants to provide _zero-cost abstractions_ when possible, the pinning API also allows to create `Pin<&mut T>` instances that point to stack allocated values. + +Unlike `Pin>` instances, which have _ownership_ of the wrapped value, `Pin<&mut T>` instances only temporarily borrow the wrapped value. This makes things more compilicated, as it requires the programmer to ensure additional guarantees themself. Most importantly, a `Pin<&mut T>` must stay pinned for the whole lifetime of the referenced `T`, which can be difficult to verify for stack based variables. To help with this, crates like [`pin-utils`] exist, but I still wouldn't recommend pinning to the stack unless you really know what you're doing. + +[`pin-utils`]: https://docs.rs/pin-utils/0.1.0-alpha.4/pin_utils/ + +For further reading, check out the documentation of the [`pin` module] and the [`Pin::new_unchecked`] method. + +[`pin` module]: https://doc.rust-lang.org/nightly/core/pin/index.html +[`Pin::new_unchecked`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.new_unchecked + +#### Pinning and Futures + +As we already saw in this post, the [`Future::poll`] method uses pinning in form of a `Pin<&mut Self>` parameter: + +[`Future::poll`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll + +```rust +fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll +``` + +The reason that this method takes `self: Pin<&mut Self>` instead of the normal `&mut self` is that future instances created from async/await are often self-referential, as we saw [above][self-ref-async-await]. By wrapping `Self` into `Pin` and letting the compiler opt-out of `Unpin` for self-referentual futures generated from async/await, it is guaranteed that the futures are not moved in memory between `poll` calls. This ensures that all internal references are still valid. + +[self-ref-async-await]: @/second-edition/posts/12-async-await/index.md#self-referential-structs + +It is worth noting that moving futures before the first `poll` call is fine. This is a result of the fact that futures are lazy and do nothing until they're polled for the first time. The `start` state of the generated state machines therefore only contains the function arguments, but no internal references. In order to call `poll`, the caller must wrap the future into `Pin` first, which ensures that the future cannot moved in memory anymore. + +Since the `Pin<&mut Self>` interface is predefined by the `Future` trait, there is no way to use the safer `Pin>` instead. This can make it quite challenging to safely implement `Future` yourself. For this reason I recommend against implementing `Future` manually and instead sticking to using async/await and the combinator methods of the [`futures`] crate. + +[`futures`]: https://docs.rs/futures/0.3.4/futures/ + +In case you're interested in understanding how to safely implement `Future` yourself, take a look at the relatively short [source of the `map` combinator method][map-src] of the `futures` crate and the section about [projections and structural pinning] of the pin documentation. + +[map-src]: https://docs.rs/futures-util/0.3.4/src/futures_util/future/future/map.rs.html +[projections and structural pinning]: file:///home/philipp/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/share/doc/rust/html/std/pin/index.html#projections-and-structural-pinning ### Executors From 817c0c56abd46acbb741263a03ea55336ea81cc9 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Wed, 26 Feb 2020 16:34:15 +0100 Subject: [PATCH 14/51] Fix typo --- blog/content/second-edition/posts/12-async-await/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 1bebc11d..3d75af88 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -385,7 +385,7 @@ impl Future for ExampleStateMachine { The `Output` type of the future is `String` because it's the return type of the `example` function. To implement the `poll` function, we use a match statement on the current state inside a `loop`. The idea is that we switch to the next state as long as possible and use an explicit `return Poll::Pending` when we can't continue. -For simplicitly, we only show simplified code and don't handle [pinning][_pinning_], ownership, lifetimes, etc. So this and the following code should be treated as pseudo-code and not used directly. Of course, the real compiler-generated code handles everything correctly, albeit possibly in a different way. +For simplicity, we only show simplified code and don't handle [pinning][_pinning_], ownership, lifetimes, etc. So this and the following code should be treated as pseudo-code and not used directly. Of course, the real compiler-generated code handles everything correctly, albeit possibly in a different way. To keep the code excerpts small, we present the code for each match arm separately. Let's begin with the `Start` state: From ba6452c5b0841b1363ca58bca10e538f4b6cd838 Mon Sep 17 00:00:00 2001 From: Rob Gries Date: Wed, 26 Feb 2020 16:08:00 -0500 Subject: [PATCH 15/51] Fix typos (#759) --- blog/content/second-edition/posts/12-async-await/index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 3d75af88..ab0124be 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -169,7 +169,7 @@ let file_content = loop { Here we _actively_ wait for the future by calling `poll` over and over again in a loop. The arguments to `poll` don't matter here, so we omitted them. While this solution works, it is very inefficient because we keep the CPU busy until the value becomes available. -A more efficient approach could be to _block_ the current thread until the future becomes available. This is of course only possible if you have threads, so this solution does not work for kernel, at least not yet. Even on systems where blocking is supported, it is often not desired because it turns an asynchronous task into a synchronous task again, thereby inhibiting the potential performance benefits of parallel tasks. +A more efficient approach could be to _block_ the current thread until the future becomes available. This is of course only possible if you have threads, so this solution does not work for our kernel, at least not yet. Even on systems where blocking is supported, it is often not desired because it turns an asynchronous task into a synchronous task again, thereby inhibiting the potential performance benefits of parallel tasks. #### Future Combinators @@ -225,7 +225,7 @@ Manually writing combinator functions is difficult, therefore they are often pro ##### Advantages -The big advantage of future combinators is that they keep the operations asynchronous. In combination with asynchronous I/O interfaces, this approach can lead to very high performance. The fact that future combinators are implemented as normal structs with trait implementations allows the compiler to excessively optimizing them. For more details, see the [_Zero-cost futures in Rust_] post, which announced the addition of futures to the Rust ecosystem. +The big advantage of future combinators is that they keep the operations asynchronous. In combination with asynchronous I/O interfaces, this approach can lead to very high performance. The fact that future combinators are implemented as normal structs with trait implementations allows the compiler to excessively optimize them. For more details, see the [_Zero-cost futures in Rust_] post, which announced the addition of futures to the Rust ecosystem. [_Zero-cost futures in Rust_]: https://aturon.github.io/blog/2016/08/11/futures/ From def0e6762d54366a6778e9f8a8ee8347270ceec2 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 27 Feb 2020 17:24:08 +0100 Subject: [PATCH 16/51] Add images for pinning section --- .../posts/12-async-await/self-referential-struct-moved.svg | 3 +++ .../posts/12-async-await/self-referential-struct.svg | 3 +++ 2 files changed, 6 insertions(+) create mode 100644 blog/content/second-edition/posts/12-async-await/self-referential-struct-moved.svg create mode 100644 blog/content/second-edition/posts/12-async-await/self-referential-struct.svg diff --git a/blog/content/second-edition/posts/12-async-await/self-referential-struct-moved.svg b/blog/content/second-edition/posts/12-async-await/self-referential-struct-moved.svg new file mode 100644 index 00000000..c0b364b2 --- /dev/null +++ b/blog/content/second-edition/posts/12-async-await/self-referential-struct-moved.svg @@ -0,0 +1,3 @@ + + +
1
1
2
2
3
3
0x1001a
0x1001a
0x10020
0x10020
array
array
element
element
0x10030
0x10030
Viewer does not support full SVG 1.1
\ No newline at end of file diff --git a/blog/content/second-edition/posts/12-async-await/self-referential-struct.svg b/blog/content/second-edition/posts/12-async-await/self-referential-struct.svg new file mode 100644 index 00000000..4732cb7e --- /dev/null +++ b/blog/content/second-edition/posts/12-async-await/self-referential-struct.svg @@ -0,0 +1,3 @@ + + +
1
1
2
2
3
3
0x1001a
0x1001a
0x10014
0x10014
0x10020
0x10020
array
array
element
element
Viewer does not support full SVG 1.1
\ No newline at end of file From 75e2626dc0d53d201abbabc513bcf53d4235fd6b Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Mon, 9 Mar 2020 15:12:01 +0100 Subject: [PATCH 17/51] Some minor improvements --- .../posts/12-async-await/index.md | 59 ++++++++++--------- 1 file changed, 30 insertions(+), 29 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index ab0124be..f303d624 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -30,7 +30,7 @@ While it seems like all tasks run in parallel, only a single task can be execute While single-core CPUs can only execute a single task at a time, multi-core CPUs can run multiple tasks in a truly parallel way. For example, a CPU with 8 cores can run 8 tasks at the same time. We will explain how to setup multi-core CPUs in a future post. For this post, we will focus on single-core CPUs for simplicity. (It's worth noting that all multi-core CPUs start with only a single active core, so we can treat them as single-core CPUs for now.) -There are two forms of multitasking: _Cooperative_ multitasking requires tasks to regularly give up control of the CPU so that other tasks can make progress. _Preemptive_ multitasking uses operating system capabilities to switch threads at arbitrary points in time by forcibly pausing them. In the following we will explore the two forms of multitasking in more detail and discuss their respective advantages and drawbacks. +There are two forms of multitasking: _Cooperative_ multitasking requires tasks to regularly give up control of the CPU so that other tasks can make progress. _Preemptive_ multitasking uses operating system functionality to switch threads at arbitrary points in time by forcibly pausing them. In the following we will explore the two forms of multitasking in more detail and discuss their respective advantages and drawbacks. ### Preemptive Multitasking @@ -46,7 +46,7 @@ In the first row, the CPU is executing task `A1` of program `A`. All other tasks #### Saving State -Since tasks are interrupted at arbitrary points in time, they might be in the middle of some calculation. In order to be able to resume them later, the operating system must backup the whole state of the task, including its [call stack] and the values of all CPU registers. This process is called a [_context switch_]. +Since tasks are interrupted at arbitrary points in time, they might be in the middle of some calculations. In order to be able to resume them later, the operating system must backup the whole state of the task, including its [call stack] and the values of all CPU registers. This process is called a [_context switch_]. [call stack]: https://en.wikipedia.org/wiki/Call_stack [_context switch_]: https://en.wikipedia.org/wiki/Context_switch @@ -73,16 +73,15 @@ Cooperative multitasking is often used at the language level, for example in for [async/await]: https://rust-lang.github.io/async-book/01_getting_started/04_async_await_primer.html [_yield_]: https://en.wikipedia.org/wiki/Yield_(multithreading) -It is common to combine cooperative multitasking with [asynchronous operations]. Instead of [blocking] until an operation is finished and preventing other tasks to run in this time, asynchronous operations return a "not ready" status if the operation is not finished yet. In this case, the waiting task can execute a yield operation to let other tasks run. +It is common to combine cooperative multitasking with [asynchronous operations]. Instead of waiting until an operation is finished and preventing other tasks to run in this time, asynchronous operations return a "not ready" status if the operation is not finished yet. In this case, the waiting task can execute a yield operation to let other tasks run. [asynchronous operations]: https://en.wikipedia.org/wiki/Asynchronous_I/O -[blocking]: http://faculty.salina.k-state.edu/tim/ossg/Device/blocking.html #### Saving State Since tasks define their pause points themselves, they don't need the operating system to save their state. Instead, they can save exactly the state they need for continuation before they pause themselves, which often results in better performance. For example, a task that just finished a complex computation might only need to backup the final result of the computation since it does not need the intermediate results anymore. -Language-supported implementations of cooperative tasks are often even able to backup up the required parts of the call stack before pausing. As an example, Rust's async/await implementation stores all local variables that are still needed in an automatically generated struct (see below). By backing up the relevant parts of the call stack before pausing, all tasks can share the same call stack, which results in a much smaller memory consumption per task. As a result, it is possible to create an almost arbitrary number of tasks without running out of memory. +Language-supported implementations of cooperative tasks are often even able to backup up the required parts of the call stack before pausing. As an example, Rust's async/await implementation stores all local variables that are still needed in an automatically generated struct (see below). By backing up the relevant parts of the call stack before pausing, all tasks can share a single call stack, which results in a much smaller memory consumption per task. This makes it possible to create an almost arbitrary number of cooperative tasks without running out of memory. #### Discussion @@ -145,7 +144,7 @@ The `poll` method takes two arguments: `self: Pin<&mut Self>` and `cx: &mut Cont [_pinned_]: https://doc.rust-lang.org/nightly/core/pin/index.html -The purpose of the `cx: &mut Context` parameter is to pass a [`Waker`] instance to the asynchronous task, e.g. the file system load. This `Waker` allows the asynchronous task to signal that it (or a part of it) is finished, e.g. that the file was loaded from disk. Since the main task knows that it will be notified when the `Future` is ready, it does not need to call `poll` over and over again. We will explain this process in more detail later in this post when we implement an own `Waker` type. +The purpose of the `cx: &mut Context` parameter is to pass a [`Waker`] instance to the asynchronous task, e.g. the file system load. This `Waker` allows the asynchronous task to signal that it (or a part of it) is finished, e.g. that the file was loaded from disk. Since the main task knows that it will be notified when the `Future` is ready, it does not need to call `poll` over and over again. We will explain this process in more detail later in this post when we implement an own waker type. [`Waker`]: https://doc.rust-lang.org/nightly/core/task/struct.Waker.html @@ -177,7 +176,7 @@ An alternative to waiting is to use future combinators. Future combinators are f [`Iterator`]: https://doc.rust-lang.org/stable/core/iter/trait.Iterator.html -As an example, a simple `string_len` combinator for converting `Future` to a `Future` to a `Future` could look like this: ```rust struct StringLen { @@ -287,7 +286,7 @@ async fn example(min_len: usize) -> String { ([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=d93c28509a1c67661f31ff820281d434)) -This function is a direct translation of the `example` function above, which used combinator functions. Using the `.await` operator, we can retrieve the value of a future without needing any closures or `Either` types. As a result, we can write our code like we write normal synchronous code, with the difference that _this is still asynchronous code_. +This function is a direct translation of the `example` function that used combinator functions from [above](#drawbacks). Using the `.await` operator, we can retrieve the value of a future without needing any closures or `Either` types. As a result, we can write our code like we write normal synchronous code, with the difference that _this is still asynchronous code_. #### State Machine Transformation @@ -303,7 +302,7 @@ The state machine implements the `Future` trait by making each `poll` call a pos ![Four states: start, waiting on foo.txt, waiting on bar.txt, end](async-state-machine-basic.svg) -The diagram uses arrows to represent state switches and diamond shapes to represent alternative ways. For example, if the `foo.txt` file is not ready, the path marked with _"no"_ is takes and the _"Waiting on foo.txt"_ state is reached. Otherwise, the _"yes"_ path is taken. The small red diamond without caption represents the `if content.len() < 100` branch of the `example` function. +The diagram uses arrows to represent state switches and diamond shapes to represent alternative ways. For example, if the `foo.txt` file is not ready, the path marked with _"no"_ is taken and the _"Waiting on foo.txt"_ state is reached. Otherwise, the _"yes"_ path is taken. The small red diamond without caption represents the `if content.len() < 100` branch of the `example` function. We see that the first `poll` call starts the function and lets it run until it reaches a future that is not ready yet. If all futures on the path are ready, the function can run till the _"End"_ state, where it returns its result wrapped in `Poll::Ready`. Otherwise, the state machine enters a waiting state and returns `Poll::Pending`. On the next `poll` call, the state machine then starts from the last waiting state and retries the last operation. @@ -311,7 +310,7 @@ We see that the first `poll` call starts the function and lets it run until it r In order to be able to continue from the last waiting state, the state machine must save it internally. In addition, it must save all the variables that it needs to continue execution on the next `poll` call. This is where the compiler can really shine: Since it knows which variables are used when, it can automatically generate structs with exactly the variables that are needed. -As an example, the compiler generates the following structs for the above `example` function: +As an example, the compiler generates structs like the following for the above `example` function: ```rust // The `example` function again so that you don't have to scroll up @@ -402,7 +401,7 @@ ExampleStateMachine::Start(state) => { } ``` -The state machine is in the `Start` state when it is right at the beginning of the function. In this case, we execute all the code from the body of the `example` function until the first `.await`. To make the code more readable, we introduce a new `foo_txt_future` to represent the future returned by `async_read_file` right before the `.await`. To handle the `.await` operation, we change the state of `self` to `WaitingOnFooTxt`, which includes the construction of the `WaitingOnFooTxtState` struct. +The state machine is in the `Start` state when it is right at the beginning of the function. In this case, we execute all the code from the body of the `example` function until the first `.await`. To handle the `.await` operation, we change the state of the `self` state machine to `WaitingOnFooTxt`, which includes the construction of the `WaitingOnFooTxtState` struct. Since the `match self {…}` statement is executed in a loop, the execution jumps to the `WaitingOnFooTxt` arm next: @@ -429,7 +428,7 @@ ExampleStateMachine::WaitingOnFooTxt(state) => { } ``` -In this match arm we first call the `poll` function of the `foo_txt_future`. If it is not ready, we exit the loop and return `Poll::Pending` too. Since `self` stays in the `WaitingOnFooTxt` state in this case, the next `poll` call on the state machine will enter the same match arm and retry polling the `foo_txt_future`. +In this match arm we first call the `poll` function of the `foo_txt_future`. If it is not ready, we exit the loop and return `Poll::Pending`. Since `self` stays in the `WaitingOnFooTxt` state in this case, the next `poll` call on the state machine will enter the same match arm and retry polling the `foo_txt_future`. When the `foo_txt_future` is ready, we assign the result to the `content` variable and continue to execute the code of the `example` function: If `content.len()` is smaller than the `min_len` saved in the state struct, the `bar.txt` file is read asynchronously. We again translate the `.await` operation into a state change, this time into the `WaitingOnBarTxt` state. Since we're executing the `match` inside a loop, the execution directly jumps to the match arm for the new state afterwards, where the `bar_txt_future` is polled. @@ -450,7 +449,7 @@ ExampleStateMachine::WaitingOnBarTxt(state) => { } ``` -Similar to the `WaitingOnFooTxt` state, we start by polling the `bar_txt_future`. If it is still pending, we exit the loop and return `Poll::Pending` too. Otherwise, we can perform the last operation of the `example` function: Concatenating the `content` variable with the result from the future. We update the state machine to the `End` state and then return the result wrapped in `Poll::Ready`. +Similar to the `WaitingOnFooTxt` state, we start by polling the `bar_txt_future`. If it is still pending, we exit the loop and return `Poll::Pending`. Otherwise, we can perform the last operation of the `example` function: Concatenating the `content` variable with the result from the future. We update the state machine to the `End` state and then return the result wrapped in `Poll::Ready`. Finally, the code for the `End` state looks like this: @@ -472,7 +471,7 @@ The last piece of the puzzle is the generated code for the `example` function it async fn example(min_len: usize) -> String ``` -Since the complete function body is now implemented by the state machine, the only thing that the function needs to do is to initialize the state machine. The generated code for this could look like this: +Since the complete function body is now implemented by the state machine, the only thing that the function needs to do is to initialize the state machine and return it. The generated code for this could look like this: ```rust fn example(min_len: usize) -> ExampleStateMachine { @@ -484,7 +483,7 @@ fn example(min_len: usize) -> ExampleStateMachine { The function no longer has an `async` modifier since it now explicitly returns a `ExampleStateMachine` type, which implements the `Future` trait. As expected, the state machine is constructed in the `Start` state and the corresponding state struct is initialized with the `min_len` parameter. -Note that this function does not start the execution of the state machine. This is a fundamental design decision of Rust's futures: They do nothing until they are polled for the first time. +Note that this function does not start the execution of the state machine. This is a fundamental design decision of futures in Rust: They do nothing until they are polled for the first time. ### Pinning @@ -526,7 +525,7 @@ The `array` field starts at address 0x10014 and the `element` field at address 0 ![array at 0x10024 with fields 1, 2, and 3; element at address 0x10030, still pointing to 0x1001a, even though the last array element now lives at 0x1002a](self-referential-struct-moved.svg) -We moved the struct a bit so that it starts at address `0x10024` now. The problem is that the `element` field still points to address `0x1001a` even though the last `array` element now lives at address `0x1002a`. Thus, the pointer is dangling with the result that undefined behavior occurs on the next `poll` call. +We moved the struct a bit so that it starts at address `0x10024` now. This could for example happen when we pass the struct as a function argument or assign it to a different stack variable. The problem is that the `element` field still points to address `0x1001a` even though the last `array` element now lives at address `0x1002a`. Thus, the pointer is dangling with the result that undefined behavior occurs on the next `poll` call. #### Possible Solutions @@ -539,7 +538,7 @@ Rust understandably decided for the second solution. For this, the [_pinning_] A #### Heap Values -The first observation is that [heap allocated] values already have a fixed memory address most of the time. They are created using a call to `allocate` and are not moved in memory until they are freed through a `deallocate` call again. This is required because a `Box` is essentially a pointer to the heap memory, so that an address change would make the pointer invalid. +The first observation is that [heap allocated] values already have a fixed memory address most of the time. They are created using a call to `allocate` and then referenced by a pointer type such as `Box`. While moving the pointer type is possible, the heap value that the pointer points to stays at the same memory address until it is freed through a `deallocate` call again. [heap allocated]: @/second-edition/posts/10-heap-allocation/index.md @@ -561,11 +560,15 @@ struct SelfReferential { } ``` -([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=ce1aff3a37fcc1c8188eeaf0f39c97e8)) +([Try it on the playground][playground-self-ref]) -When we execute this, we see that the address of heap value and its internal pointer are equal, which means that the `self_ptr` field is valid. Since the `heap_value` variable is only a pointer, moving it (e.g. by passing it to a function) does not change the address, so that the `self_ptr` stays valid. +[playground-self-ref]: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=ce1aff3a37fcc1c8188eeaf0f39c97e8 -However, there is still a way to break this: We can move out of a `Box` or replace its content: +We create a simple struct named `SelfReferential` that contains a single pointer field. First, we initialize this struct with a null pointer and then allocate it on the heap using `Box::new`. We then determine the memory address of the heap allocated struct and store it in a `ptr` variable. Finally, we make the struct self-referential by assigning the `ptr` variable to the `self_ptr` field. + +When we execute this code [on the playground][playground-self-ref], we see that the address of heap value and its internal pointer are equal, which means that the `self_ptr` field is a valid self-reference. Since the `heap_value` variable is only a pointer, moving it (e.g. by passing it to a function) does not change the address of the struct itself, so the `self_ptr` stays valid even if the pointer is moved. + +However, there is still a way to break this example: We can move out of a `Box` or replace its content: ```rust let stack_value = mem::replace(&mut *heap_value, SelfReferential { @@ -577,17 +580,17 @@ println!("internal reference: {:p}", stack_value.self_ptr); ([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e160ee8a64cba4cebc1c0473dcecb7c8)) -Here we use the [`mem::replace`] function to replace the heap allocated value with a new struct instance. This allows us to move the original `heap_value` to the stack, while the `self_ptr` field of of the struct still points to the heap. When you try to run the example on the playground, you see that the printed _"value at:"_ and _"internal reference:"_ lines show indeed different pointers. +Here we use the [`mem::replace`] function to replace the heap allocated value with a new struct instance. This allows us to move the original `heap_value` to the stack, while the `self_ptr` field of the struct is now a dangling pointer that still points to the old heap address. When you try to run the example on the playground, you see that the printed _"value at:"_ and _"internal reference:"_ lines show indeed different pointers. So heap allpcating a value is not enough to make self-references safe. [`mem::replace`]: https://doc.rust-lang.org/nightly/core/mem/fn.replace.html -The fundamental problem is that `Box` allows us to get a `&mut T` reference to the heap allocated value. This `&mut` reference allows us to to use methods like [`mem::replace`] or [`mem::swap`] to invalidate the heap allocated value. To resolve this problem, we must prevent that `&mut` references to self-referential structs can be created. +The fundamental problem that allowed the above breakage is that `Box` allows us to get a `&mut T` reference to the heap allocated value. This `&mut` reference makes it possible to use methods like [`mem::replace`] or [`mem::swap`] to invalidate the heap allocated value. To resolve this problem, we must prevent that `&mut` references to self-referential structs can be created. [`mem::swap`]: https://doc.rust-lang.org/nightly/core/mem/fn.swap.html #### `Pin>` and `Unpin` -The pinning API provides a solution to the `&mut T` problem in form of the [`Pin`] wrapper type and the [`Unpin`] marker trait. The idea behind these types is to gate all methods of `Pin` that can be used to get `&mut` references (e.g. [`get_mut`][pin-get-mut] or [`deref_mut`][pin-deref-mut]) on the `Unpin` trait. The `Unpin` trait is an [_auto trait_], which is automatically implemented for all types except types that explicitly opt-out. By making self-referential structs opt-out of `Unpin`, there is no (safe) way to get a `&mut T` from a `Pin>` type for them. As a result, their internal self-references are guaranteed to stay valid. +The pinning API provides a solution to the `&mut T` problem in form of the [`Pin`] wrapper type and the [`Unpin`] marker trait. The idea behind these types is to gate all methods of `Pin` that can be used to get `&mut` references to the wrapped value (e.g. [`get_mut`][pin-get-mut] or [`deref_mut`][pin-deref-mut]) on the `Unpin` trait. The `Unpin` trait is an [_auto trait_], which is automatically implemented for all types except types that explicitly opt-out. By making self-referential structs opt-out of `Unpin`, there is no (safe) way to get a `&mut T` from a `Pin>` type for them. As a result, their internal self-references are guaranteed to stay valid. [`Pin`]: https://doc.rust-lang.org/stable/core/pin/struct.Pin.html [`Unpin`]: https://doc.rust-lang.org/nightly/std/marker/trait.Unpin.html @@ -595,7 +598,7 @@ The pinning API provides a solution to the `&mut T` problem in form of the [`Pin [pin-deref-mut]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#impl-DerefMut [_auto trait_]: https://doc.rust-lang.org/reference/special-types-and-traits.html#auto-traits -As an example, let's update the `SelfReferential` type from the above example to opt-out of `Unpin`: +As an example, let's update the `SelfReferential` type from above to opt-out of `Unpin`: ```rust use core::marker::PhantomPinned; @@ -644,7 +647,7 @@ error[E0596]: cannot borrow data in a dereference of `std::pin::Pin>` ``` -Both errors occur because the `Pin>` type no longer implements the `DerefMut` trait. This exactly what we wanted because the `DerefMut` trait would return a `&mut` reference, which we want to prevent. This only works because we both opted-out of `Unpin` and changed `Box::new` to `Box::pin`. +Both errors occur because the `Pin>` type no longer implements the `DerefMut` trait. This exactly what we wanted because the `DerefMut` trait would return a `&mut` reference, which we want to prevent. This only happens because we both opted-out of `Unpin` and changed `Box::new` to `Box::pin`. The problem now is that the compiler does not only prevent moving the type in line 16, but also forbids to initialize the `self_ptr` field in line 10. This happens because the compiler can't differentiate between valid and invalid uses of `&mut` references. To get the initialization working again, we have to use the unsafe [`get_unchecked_mut`] method: @@ -693,13 +696,11 @@ The reason that this method takes `self: Pin<&mut Self>` instead of the normal ` [self-ref-async-await]: @/second-edition/posts/12-async-await/index.md#self-referential-structs -It is worth noting that moving futures before the first `poll` call is fine. This is a result of the fact that futures are lazy and do nothing until they're polled for the first time. The `start` state of the generated state machines therefore only contains the function arguments, but no internal references. In order to call `poll`, the caller must wrap the future into `Pin` first, which ensures that the future cannot moved in memory anymore. - -Since the `Pin<&mut Self>` interface is predefined by the `Future` trait, there is no way to use the safer `Pin>` instead. This can make it quite challenging to safely implement `Future` yourself. For this reason I recommend against implementing `Future` manually and instead sticking to using async/await and the combinator methods of the [`futures`] crate. +It is worth noting that moving futures before the first `poll` call is fine. This is a result of the fact that futures are lazy and do nothing until they're polled for the first time. The `start` state of the generated state machines therefore only contains the function arguments, but no internal references. In order to call `poll`, the caller must wrap the future into `Pin` first, which ensures that the future cannot be moved in memory anymore. Since stack pinning is more difficult to get right, I recommend to always use [`Box::pin`] combined with [`Pin::as_mut`] for this. [`futures`]: https://docs.rs/futures/0.3.4/futures/ -In case you're interested in understanding how to safely implement `Future` yourself, take a look at the relatively short [source of the `map` combinator method][map-src] of the `futures` crate and the section about [projections and structural pinning] of the pin documentation. +In case you're interested in understanding how to safely implement a future combinator function using stack pinning yourself, take a look at the relatively short [source of the `map` combinator method][map-src] of the `futures` crate and the section about [projections and structural pinning] of the pin documentation. [map-src]: https://docs.rs/futures-util/0.3.4/src/futures_util/future/future/map.rs.html [projections and structural pinning]: file:///home/philipp/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/share/doc/rust/html/std/pin/index.html#projections-and-structural-pinning From fb0f30b9f06416cb47e33f8752b5e2760bc2ec83 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Mon, 9 Mar 2020 17:57:40 +0100 Subject: [PATCH 18/51] Write section about executors and wakers --- .../posts/12-async-await/index.md | 43 ++++++++++++++++++- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index f303d624..ecbca386 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -705,7 +705,48 @@ In case you're interested in understanding how to safely implement a future comb [map-src]: https://docs.rs/futures-util/0.3.4/src/futures_util/future/future/map.rs.html [projections and structural pinning]: file:///home/philipp/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/share/doc/rust/html/std/pin/index.html#projections-and-structural-pinning -### Executors +### Executors and Wakers + +Using async/await, it is possible to ergonomically work with futures in a completely asynchronous way. However, as we learned above, futures do nothing until they are polled. This means we have to have to call `poll` on them at some point, otherwise the asynchronous code is never executed. + +With a single future, we can always wait for the future using a loop [as described above](#waiting-on-futures). However, this approach is very inefficient, especially for programs that create a large number of futures. An example for such a program could be a web server that handles each request using an asynchronous function: + +```rust +async fn handle_request(request: Request) {…} +``` + +The function is invoked for each request the webserver receives. It has no return type, so it results in a future with the empty type `()` as output. When the web server receives many concurrent requests, this can easily result in hundreds or thousands of futures in the system. While these futures have no return value that we need for future computations, we still want them to be polled to completion because otherwise the requests would not be handled. + +The most common approach for this is to define a global _executor_ that is responsible for polling all futures in the system until they are finished. + +#### Executors + +The purpose of an executor is to allow spawning futures as independent tasks, typically through some sort of `spawn` method. The executor is then responsible for polling all futures until they are completed. The big advantage of managing all futures in a central place is that the executor can switch to a different future whenever a future returns `Poll::Pending`. Thus, asynchronous operations are run in parallel and the CPU is kept busy. + +Many executor implementations can also take advantage of systems with multiple CPU cores. They create a [thread pool] that is able to utilize all cores if there is enough work available and use techniques such as [work stealing] to balance the load between cores. There are also special executor implementations for embedded systems that optimize for low latency and memory overhead. + +[thread pool]: https://en.wikipedia.org/wiki/Thread_pool +[work stealing]: https://en.wikipedia.org/wiki/Work_stealing + +To avoid the overhead of polling futures over and over again, executors typically also take advantage of the _waker_ API supported by Rust's futures. + +#### Wakers + +The idea behind the waker API is that a special [`Waker`] type is passed to each invocation of `poll`, wrapped in a [`Context`] type for future extensibility. This `Waker` type is created by the executor and can be used by the asynchronous task to signal its (partial) completion. As a result, the executor does not need to call `poll` on a future that previously returned `Poll::Pending` again until it is notified by the corresponding waker. + +[`Context`]: https://doc.rust-lang.org/nightly/core/task/struct.Context.html + +This is best illustrated by a small example: + +```rust +async fn write_file() { + async_write_file("foo.txt", "Hello").await; +} +``` + +This function asynchronously writes the string "Hello" to a `foo.txt` file. Since hard disk writes take some time, the first `poll` call on this future will likely return `Poll::Pending`. However, the hard disk driver will internally store the `Waker` passed in the `poll` call and signal it as soon as the file was written to disk. This way, the executor does not need to waste any time trying to `poll` the future again before it receives the waker notification. + +We will see how the `Waker` type works in detail when we implement our own executor with waker support in the following section. ## Implementation From ae167faee58de168229c8b6a2688bad5a6e1ac30 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Tue, 10 Mar 2020 13:59:13 +0100 Subject: [PATCH 19/51] Explain how async/await implements cooperative multitasking --- .../posts/12-async-await/index.md | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index ecbca386..482b2ee7 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -746,7 +746,23 @@ async fn write_file() { This function asynchronously writes the string "Hello" to a `foo.txt` file. Since hard disk writes take some time, the first `poll` call on this future will likely return `Poll::Pending`. However, the hard disk driver will internally store the `Waker` passed in the `poll` call and signal it as soon as the file was written to disk. This way, the executor does not need to waste any time trying to `poll` the future again before it receives the waker notification. -We will see how the `Waker` type works in detail when we implement our own executor with waker support in the following section. +We will see how the `Waker` type works in detail when we create our own executor with waker support in the implementation section of this post. + +### Cooperative Multitasking? + +At the beginning of this post we talked about preemptive and cooperative multitasking. While preemptive multitasking relies on the operating system to forcibly switch between running tasks, cooperative multitasking requires that the tasks voluntarily give up control of the CPU through a _yield_ operation on a regular basis. The big advantage of the cooperative approach is that tasks can save their state themselves, which results in more efficient context switches and makes it possible to share the same call stack between tasks. + +It might not be immediately apparent, but futures and async/await are an implementation of the cooperative multitasking pattern: + +- Each future that is added to the executor is basically an cooperative task. +- Instead of using an explicit yield operation, futures give up control of the CPU core by returning `Poll::Pending` (or `Poll::Ready` at the end). + - There is nothing that forces futures to give up the CPU. If they want, they can never return from `poll`, e.g. by spinning endlessly in a loop. + - Since each future can block the execution of the other futures in the executor, we need to trust they are not malicious. +- Futures internally store all the state they need to continue execution on the next `poll` call. With async/await, the compiler automatically detects all variables that are needed and stores them inside the generated state machine. + - Only the minimum state required for continuation is saved. + - Since the `poll` method gives up the call stack when it returns, the same stack can be used for polling other futures. + +We see that futures and async/await fit the cooperative multitasking pattern perfectly, they just use some different terminology. In the following, we will therefore use the terms "task" and "future" interchangeably. ## Implementation From 326a35939ac2ce0e80b7cbbd35bc6837e3ab5076 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Tue, 10 Mar 2020 15:43:13 +0100 Subject: [PATCH 20/51] Start implementation section --- .../posts/12-async-await/index.md | 185 ++++++++++++++++++ 1 file changed, 185 insertions(+) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 482b2ee7..6de2d9e4 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -766,5 +766,190 @@ We see that futures and async/await fit the cooperative multitasking pattern per ## Implementation +Now that we understand how cooperative multitasking based on futures and async/await works in Rust, it's time to add support for it to our kernel. Since the [`Future`] trait is part of the `core` library and async/await is a feature of the language itself, there is nothing special we need to do to use it in our `#![no_std]` kernel. The only requirement is that we use at least nightly-TODO of Rust because async/await was based on parts of the standard library before. +With a recent-enough nightly, we can start using async/await in our `main.rs`: + +```rust +// in src/main.rs + +async fn async_number() -> u32 { + 42 +} + +async fn example_task() { + let number = async_number().await; + println!("async number: {}", number); +} +``` + +The `async_number` function is an `async fn`, so the compiler transforms it into a state machine that implements `Future`. Since the function only returns `42`, the resulting future will directly return `Poll::Ready(42)` on the first `poll` call. Like `async_number`, the `example_task` function is also an `async fn`. It awaits the number returned by `async_number` and then prints it using the `println` macro. + +To run the future returned by `example_task`, we need to call `poll` on it until it signals its completion by returning `Poll::Ready`. To do this, we need to create a simple executor type. + +### Task + +Before we start the executor implementation, we create a new `task` module with a `Task` type: + +```rust +// in src/lib.rs + +pub mod task; +``` + +```rust +// in src/task/mod.rs + +pub struct Task { + future: Pin>>, +} +``` + +The `Task` struct is a newtype wrapper around a pinned, heap allocated, dynamically dispatched future with the empty type `()` as output. Let's go through it in detail: + +- We require that the future associated with a task returns `()`. So tasks don't return any result, they are just executed for its side effects. For example, the `example_task` function we defined above has no return value, but it prints something to the screen as a side effect. +- The `dyn` keyword indicates that we store a [trait object] in the `Box`. This means that the type of the future is [dynamically dispatched], which makes it possible to store different types of futures in the task. This is important because each `async fn` has their own type and we want to be able to create different tasks later. +- As we learned in the [section about pinning], the `Pin` type ensures that a value cannot be moved in memory by placing it on the heap and preventing the creation of `&mut` references to it. This is important because futures generated by async/await might be self-referential, i.e. contain pointers to itself that would be invalidated when the future is moved. + +[trait object]: https://doc.rust-lang.org/book/ch17-02-trait-objects.html +[dynamically dispatched]: https://doc.rust-lang.org/book/ch17-02-trait-objects.html#trait-objects-perform-dynamic-dispatch +[section about pinning]: #pinning + +To allow the creation of new `Task` structs from futures, we create a `new` function: + +```rust +// in src/task/mod.rs + +impl Task { + pub fn new(future: impl Future) -> Task { + Task { + future: Box::pin(future), + } + } +} +``` + +The function takes an arbitrary future with output type `()` and pins it in memory through the [`Box::pin`] function. Then it wraps it in the `Task` struct and returns the new task. + +We also add a `poll` method to allow the executor to poll the corresponding future: + +```rust +// in src/task/mod.rs + +impl Task { + fn poll(&mut self, context: &mut Context) -> Poll { + self.future.as_mut().poll(context) + } +} +``` + +Since the [`poll`] method of the `Future` trait expects to be called on a `Pin<&mut T>` type, we use the [`Pin::as_mut`] method to convert the `self.future` field of type `Pin>` first. Then we call `poll` on the converted `self.future` field and return the result. Since the `Task::poll` method should be only called by the executor that we create in a moment, we keep the function private to the `task` module. + +### Simple Executor + +Since executors can be quite complex, we deliberately start with creating a very basic executor before we implement a more featureful executor later. For this, we first create a new `task::simple_executor` submodule: + +```rust +// in src/task/mod.rs + +pub mod simple_executor; +``` + +```rust +// in src/task/simple_executor.rs + +use super::Task; +use alloc::collections::VecDeque; + +pub struct SimpleExecutor { + task_queue: VecDeque, +} + +impl SimpleExecutor { + pub fn new() -> SimpleExecutor { + SimpleExecutor { + task_queue:: VecDeque::new(), + } + } + + pub fn spawn(&mut self, task: Task) { + self.task_queue.push_back(task) + } +} +``` + +The struct contains a single `task_queue` field of type [`VecDeque`], which is basically a vector that allows to push and pop on both ends. The idea behind using this type is that we insert new tasks through the `spawn` method at the end and pop the next task for execution from the front. This way, we get a simple [FIFO queue] (_"first in, first out"_). + +[`VecDeque`]: https://doc.rust-lang.org/stable/alloc/collections/vec_deque/struct.VecDeque.html +[FIFO queue]: https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics) + +#### Dummy Waker + +In order to call the `poll` method, we need to create a [`Context`] type, which wraps a [`Waker`] type. To start simple, we will first create a dummy waker that does nothing. The simplest way to do this is by implementing the [`Wake`] trait: + +[`Wake`]: https://doc.rust-lang.org/nightly/alloc/task/trait.Wake.html + +```rust +// in src/task/simple_executor.rs + +use alloc::task::Wake; + +struct DummyWaker; + +impl Wake for DummyWaker { + fn wake(self: Arc) { + // do nothing + } +} +``` + +Since the [`Waker`] type implements the [`From>`] trait for all types `W` that implement the [`Wake`] trait, we can easily create a `Waker` through `Waker::from(DummyWaker)`. We will utilize this in the following to create a simple `Executor::run` method. + +[`From>`]: TODO + +#### A `run` Method + +The most simple `run` method is to repeatedly poll all queued tasks in a loop until all are done. This is not very efficient since it does not utilize the notifications of the `Waker` type, but it is an easy way to get things running: + +```rust +// in src/task/simple_executor.rs + +impl SimpleExecutor { + pub fn run(&mut self) { + while let Some(mut task) = self.task_queue.pop_front() { + let mut context = Context::from_waker(Waker::from(DummyWaker)); + match task.poll(&mut context) { + Poll::Ready(()) => {} // task done + Poll::Pending => self.task_queue.push_back(task), + } + } + } +} +``` + +The function uses a `while let` loop to handle all tasks in the `task_queue`. For each task, it first creates a `Context` type by wrapping a `Waker` instance created from our `DummyWaker` type. Then it invokes the `Task::poll` method with this `Context`. If the `poll` method returns `Poll::Ready`, the task is finished and we can continue with the next task. If the task is still `Poll::Pending`, we add it to the back of the queue again so that it will be polled again in a subsequent loop iteration. + +#### Trying It + +With our `SimpleExecutor` type, we can now try running the task returned by the `example_task` function in our `main.rs`: + +```rust +// in src/main.rs + +use blog_os::task::{Task, simple_executor::SimpleExecutor}; + +fn kernel_main(boot_info: &'static BootInfo) -> ! { + // […] initialization routines, including `init_heap` + + let mut executor = SimpleExecutor::new(); + executor.spawn(Task::new(example_task())): + executor.run(); + + // […] test_main, "it did not crash" message, hlt_loop +} +``` + +When we run it, we see that the expected _"async number: 42"_ message is printed to the screen: + +TODO image From 50db561774baa31438f503ad352a1a3840bb0608 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 19 Mar 2020 16:58:04 +0100 Subject: [PATCH 21/51] Update implementation section --- .../posts/12-async-await/index.md | 59 +++++++++++++----- .../12-async-await/qemu-simple-executor.png | Bin 0 -> 6995 bytes 2 files changed, 44 insertions(+), 15 deletions(-) create mode 100644 blog/content/second-edition/posts/12-async-await/qemu-simple-executor.png diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 6de2d9e4..2cb4b9ac 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -766,7 +766,7 @@ We see that futures and async/await fit the cooperative multitasking pattern per ## Implementation -Now that we understand how cooperative multitasking based on futures and async/await works in Rust, it's time to add support for it to our kernel. Since the [`Future`] trait is part of the `core` library and async/await is a feature of the language itself, there is nothing special we need to do to use it in our `#![no_std]` kernel. The only requirement is that we use at least nightly-TODO of Rust because async/await was based on parts of the standard library before. +Now that we understand how cooperative multitasking based on futures and async/await works in Rust, it's time to add support for it to our kernel. Since the [`Future`] trait is part of the `core` library and async/await is a feature of the language itself, there is nothing special we need to do to use it in our `#![no_std]` kernel. The only requirement is that we use at least nightly-TODO of Rust because async/await was not `no_std` compatible before. With a recent-enough nightly, we can start using async/await in our `main.rs`: @@ -800,6 +800,9 @@ pub mod task; ```rust // in src/task/mod.rs +use core::{future::Future, pin::Pin}; +use alloc::boxed::Box; + pub struct Task { future: Pin>>, } @@ -807,8 +810,8 @@ pub struct Task { The `Task` struct is a newtype wrapper around a pinned, heap allocated, dynamically dispatched future with the empty type `()` as output. Let's go through it in detail: -- We require that the future associated with a task returns `()`. So tasks don't return any result, they are just executed for its side effects. For example, the `example_task` function we defined above has no return value, but it prints something to the screen as a side effect. -- The `dyn` keyword indicates that we store a [trait object] in the `Box`. This means that the type of the future is [dynamically dispatched], which makes it possible to store different types of futures in the task. This is important because each `async fn` has their own type and we want to be able to create different tasks later. +- We require that the future associated with a task returns `()`. This means that tasks don't return any result, they are just executed for its side effects. For example, the `example_task` function we defined above has no return value, but it prints something to the screen as a side effect. +- The `dyn` keyword indicates that we store a [trait object] in the `Box`. This means that the type of the future is [dynamically dispatched], which makes it possible to store different types of futures in the `Task` type. This is important because each `async fn` has their own type and we want to be able to create different tasks later. - As we learned in the [section about pinning], the `Pin` type ensures that a value cannot be moved in memory by placing it on the heap and preventing the creation of `&mut` references to it. This is important because futures generated by async/await might be self-referential, i.e. contain pointers to itself that would be invalidated when the future is moved. [trait object]: https://doc.rust-lang.org/book/ch17-02-trait-objects.html @@ -821,7 +824,7 @@ To allow the creation of new `Task` structs from futures, we create a `new` func // in src/task/mod.rs impl Task { - pub fn new(future: impl Future) -> Task { + pub fn new(future: impl Future + 'static) -> Task { Task { future: Box::pin(future), } @@ -829,15 +832,17 @@ impl Task { } ``` -The function takes an arbitrary future with output type `()` and pins it in memory through the [`Box::pin`] function. Then it wraps it in the `Task` struct and returns the new task. +The function takes an arbitrary future with output type `()` and pins it in memory through the [`Box::pin`] function. Then it wraps it in the `Task` struct and returns the new task. The `'static` lifetime is required here because the returned `Task` can live for an arbitrary time, so the future needs to be valid for that time too. We also add a `poll` method to allow the executor to poll the corresponding future: ```rust // in src/task/mod.rs +use core::task::{Context, Poll}; + impl Task { - fn poll(&mut self, context: &mut Context) -> Poll { + fn poll(&mut self, context: &mut Context) -> Poll<()> { self.future.as_mut().poll(context) } } @@ -868,7 +873,7 @@ pub struct SimpleExecutor { impl SimpleExecutor { pub fn new() -> SimpleExecutor { SimpleExecutor { - task_queue:: VecDeque::new(), + task_queue: VecDeque::new(), } } @@ -885,14 +890,14 @@ The struct contains a single `task_queue` field of type [`VecDeque`], which is b #### Dummy Waker -In order to call the `poll` method, we need to create a [`Context`] type, which wraps a [`Waker`] type. To start simple, we will first create a dummy waker that does nothing. The simplest way to do this is by implementing the [`Wake`] trait: +In order to call the `poll` method, we need to create a [`Context`] type, which wraps a [`Waker`] type. To start simple, we will first create a dummy waker that does nothing. The simplest way to do this is by implementing the unstable [`Wake`] trait for an empty `DummyWaker` struct: [`Wake`]: https://doc.rust-lang.org/nightly/alloc/task/trait.Wake.html ```rust // in src/task/simple_executor.rs -use alloc::task::Wake; +use alloc::{sync::Arc, task::Wake}; struct DummyWaker; @@ -903,9 +908,31 @@ impl Wake for DummyWaker { } ``` -Since the [`Waker`] type implements the [`From>`] trait for all types `W` that implement the [`Wake`] trait, we can easily create a `Waker` through `Waker::from(DummyWaker)`. We will utilize this in the following to create a simple `Executor::run` method. +The trait is still unstable, so we have to add **`#![feature(wake_trait)]`** to the top of our `lib.rs` to use it. The `wake` method of the trait is normally responsible for waking the corresponding task in the executor. However, our `SimpleExecutor` will not differentiate between ready and waiting tasks, so we don't need to do anything on `wake` calls. -[`From>`]: TODO +Since wakers are normally shared between the executor and the asynchronous tasks, the `wake` method requires that the `Self` instance is wrapped in the [`Arc`] type, which implements reference-counted ownership. The basic idea is that the value is heap-allocated and the number of active references to it are counted. If the number of active references reaches zero, the value is no longer needed and can be deallocated. + +[`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html + +To make our `DummyWaker` usable with the [`Context`] type, we need a method to convert it to the [`Waker`] defined in the core library: + +```rust +// in src/task/simple_executor.rs + +use core::task::Waker; + +impl DummyWaker { + fn to_waker(self) -> Waker { + Waker::from(Arc::new(self)) + } +} +``` + +The method first makes the `self` instance reference-counted by wrapping it in an [`Arc`]. Then it uses the [`Waker::from`] method to create the `Waker`. This method is available for all reference counted types that implement the [`Wake`] trait. + +[`Waker::from`]: TODO + +Now we have a way to create a `Waker` instance, we can use it to implement a `run` method on our executor. #### A `run` Method @@ -914,10 +941,13 @@ The most simple `run` method is to repeatedly poll all queued tasks in a loop un ```rust // in src/task/simple_executor.rs +use core::task::{Context, Poll}; + impl SimpleExecutor { pub fn run(&mut self) { while let Some(mut task) = self.task_queue.pop_front() { - let mut context = Context::from_waker(Waker::from(DummyWaker)); + let waker = DummyWaker.to_waker(); + let mut context = Context::from_waker(&waker); match task.poll(&mut context) { Poll::Ready(()) => {} // task done Poll::Pending => self.task_queue.push_back(task), @@ -942,7 +972,7 @@ fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] initialization routines, including `init_heap` let mut executor = SimpleExecutor::new(); - executor.spawn(Task::new(example_task())): + executor.spawn(Task::new(example_task())); executor.run(); // […] test_main, "it did not crash" message, hlt_loop @@ -951,5 +981,4 @@ fn kernel_main(boot_info: &'static BootInfo) -> ! { When we run it, we see that the expected _"async number: 42"_ message is printed to the screen: -TODO image - +![QEMU printing "Hello World", "async number: 42", and "It did not crash!"](qemu-simple-executor.png) diff --git a/blog/content/second-edition/posts/12-async-await/qemu-simple-executor.png b/blog/content/second-edition/posts/12-async-await/qemu-simple-executor.png new file mode 100644 index 0000000000000000000000000000000000000000..7f9dc3f0f33f562305168a87d0a9d98f5f0a6e7f GIT binary patch literal 6995 zcmeAS@N?(olHy`uVBq!ia0y~yV0yy9z<8g7je&td;^WzA3=9k`#ZI0f92^`RH5@4& z3=9mCC9V-A!TD(=<%vb94C#6Kxv9Fv$wjHDdBqv|CGVN{+c7XOXo3_u7o{eaWaj57 zgkEIn=qgp%_roBh_E-SMo+YLQN9qfYRwtwv8>f+zP(u0F1le9Fh@ z=`Vt1i(g{cxtW?AO$*~`xVxp?W)?@Wuc>u3L~ zS+!!(`O16WYws;y8Xq_ByR9Gt7GS&R!r70Dm6K{*LfnoV_~g&d!0>C0v*5=tmz>^? zyG!$~%vGqr|NB_~eDlwrUjDCrabe+MuGV+kY#12k?C3h0v?n$?`uCSx>Hm+7FYJ{}1qTKU=$c{d(z(8b=rz7C19=wMy7je2BJ>{&RYNcZYqc z<-Q|-!_36^dBq^Dk2v~N=KACK@ob{=e3R*Pdu@%zZ8W z(La8drIJ!wOsQ|pqsjC8Iy&-eA75KxS)j1*tNr_)`r6duJKy5}d%L*IF{&!Lv+>3M zEjRcU*h|3CP@z#WK=wP)^V#m zsrH3tVBuYUWxi`}hxP4m+mwB1NCHL8o%mx*wWqCNP>~Fbl)QfScl(O<>-+QnZjXL^ z@OPcu{h!aSC%>MiTlnyP=+Vf@%T4;%u3z6B+HZU7Pj>p(zK)I+HY<&`?fExdLxijF z+LM!lf`!+Xoc#DJJ}$dz|E5nr=1Vx}hjw)AmyN#st+(r7pvBiaD`pgjAGr`bE2-vO zY@md>Y0`xYW`~#e+6uH9o%5c+tTMCco26mSjSm$c{m&o$SL>TC#4l?lB0VowTu@N{ z$AkMNyJ{B(bUZxle&qG~y<%IpZZ*t2b<;5C#)gXXKR#}Jc~o3UDd|YZy}B#8(Vf+|CizIzjGz+&x(WXNoOv!Z8{5b zfn@osY9*zl@9XC3#@*kX{oVTg{A+)1d8+sJRKJTC`hI7>O5Xo_vVs>&TK5KMxFkPb zcw^7Yr!!2RzOnec#xqbuAtgstXHp|J$(yDQA2W}&A#Qq6PTCW+*n)f;?iUFbc)#e zekoHeC8c|J{#}1|{nhK&k5-2EZhP zHfCne)aby3;*x6@7|wxTS~N< z&DG_|gM-dV&#qoK$=_Xh`SjWF6WQ6>i??kvi>v$CI-}s%%Z)FePG7Wbo0-zAX=|=r zy?WGm|Ig_^%*8{Gis%1aUXl8F*^U?Y`zl}?90v8`FC{h#=^tK$Di6T zR&`H}b~|=-`P=-DI-lB>&d;;{@b>P-4L|O0uiUwBqkwk2{mwPTT3-dbLb=ob?>_hc zok&BnQNn=+3FESykl32c81wwPJv;xd|JYmq=lt2%2N^Z`_s`#;FE(%M>lNGf=6`!G?Q53gBdIq-rZhV@@B=_j>`0Rc^hk%+OkV?e?0%c z%B%lKMT-JE!-Dx$pL!>LeA6dbc=cs<@~s=px%-Y5`~N*~V?(lw&9l@x4oM ze6;k_Q3nQw2_B#dPsx#wfuWPdl7T@*=n1NzSKWn2=VGg?isTp=F6d8{dBSi1FU9{H zGXsO4#^H8;{meP4+Ne6P8gokOiE-MQ4oTxQ7iVYZ+28HHuJk{;CG+y53k#jy_x`u7 z{&r+b=H;a4=jP_#yS=Zr+Gn#zK_4>P}4kzDit_x=3ud-dyoo-VwdyZvid zB@@Gi>M4J@eojzyesq2RztZmOakXEs-q=@Ly_5gyp&f;fldi4`4UCMux%l|YGk<=5 zPX77nX<&T({YPJ4y?Pa~x2p8aeEa?3XM`CV?k#@fSoiU$`0-7tr=#|4%elE}hFxt{ z_w{S*HIx{+F@%lDubA3`Mq8`J+8~OTkL4<_q*cO^m&tbD8JXFRfP8;)lIq#@qTNR;UMy@+`uFE&?y+*S z>D_v}G|X~t2nhOKw9CA-q~hDn^x4~MpUq6a^+%DNp<&p;@hR=~ImX*I?+ghIHD&*?tMv7wO{ev`j~zR9V{3MJ&dp7#Sy@?+_Iy64 zeR>Wf!-?RF!q#)G%kyf!u8!|LU-Qg(BPfX)r~Az|?6>=su_k7xk(89wq5zE}+j4I& ziqLT@EiIi>^Xa7Su|65gO^4@mw>q_(cggMa`~6|ZTMx8D)+p03w>b94If2M3#P_q@%~lmGXj{c(2u?4 z_r33rzPr17_Wk(XWx0P|xZ8JCzu#*Pii(bo8Rq$ME-pQ?*4xeohs0IC-6~<87Zdr+ zEbq<^3G1>Po%nrwbbe=_|MTpS}EzAQd#o+7uJTW^QLt5>fc zeLODT-?2>lQu&=i_u10!{YTsFzAUu(dL{Vrx7+#s+85kRT_g&G{{+cJsh0kWD zADd%Y3@Qf#b;KS|p8qH1PU-d7kGt>x)qA(+^SOvU6@_PJ7%Jz1a{0II`(=}l_vxPA z)5b3^XaD0M|D%KK@;#uqyR)-6`TxJapkim!oBtbFxy6pmGR?lVcjo3Rprm*7Sg-Ws zL)`j39m}2?98EIxvwpip!Za%+@;NtyL!VRU<5gK%S=YY2@L^zhz+zPM_b}qz+AGOrTmQeSakKsUlY!yF=1c1)Oc$Dk8a&0q2!w<>z+VL9#Ajz_!<7wk)Zb5(BKvEgTU^?O$n*B1}t z-S@@$?R^`uHr{&L&ndg~{+#&kf90ty1B3dmym-a@_p1uBYrnqA+nG1PX4>1kOE?)W z?4MG5(yL{L&6(#63=ZK{Dkpv~+V|^-&_#ZR3){c`2oAG~%D)%AC9eJM^=ql$znvFn zXgD|3^hE!ji62h{bbH+QF*AY{T*1U^Je1ER;>yxvq7#N;tPqA31rF8Af z6Bkg!!Pr*mTAAqyab`wbC0IPqmD4gmD-Q4bAoJy0$rp}O-{kJz2tO~#uplNz@qOyA zss&|VeSRK}-}mI_D+UJN7)5m@t9`3#O@3$i$Nz6;V{q`-te#N*G8 zsZHG)P_f{0s@&5EwZKywt!Bqp7r)GjasQq3ey?(=$^6F~d6q4HJk>~!VS&B&?@3eF zO;eeY{qoSn+ofCPKdxN!PD@ODoAvd+zS~u|55F%rncvs6f3c$m>2^|5IB@y`)+&o111zuauw-t(JRpWk&Qd5X7Y{Rx4`3=Fzc zOHX>0uRA6ung8g`i<0lfuUDSmxj5Bn={`P&3-g!SrXHPaRy93n-4^-R|CEndFgRqp zi#|Cnl)rYn?Y+Qzmmkhtsy{D0evUDN!%mP%)!u=8|7$GvME6@3-R}Dqyzb|g@-;=8 zPw(try?FWJDMfK>uNO_<74|oxv^0j1kKsbl!Xt&Df*Y5qt6viJmSbRek-W_P$gOp3 zUpw<(i9BXtn6sklsd?$N$ERx;7#8TinChsvkCEYp{;wmGL>U+?0>Q1kD{ob`m)bdf zh>-h!{o+R}>#+3q(|65(yfS9KfBuEL=dbzhVqkdx>YM1NYv&K=`CHC6ja`@V+V^Am zBHQlW*7I{;KcB+P@TGk3r)B5fz4G12%fN6q@R~w??a|K>)77RFv2L~Yj_M10oqzw$ zi;7s=bsOi*_g{JZ)w?;jXM2~+J7;=wF*v+DRTZ4R@Por_Ro&xKkB|SF`u=Ih;lsD( z;=b2DIsaC6`C`MTziPZ}!(MM(d%R?8e^CF`s+#mW5qIi;%obl0I<^VeM| zE9+Z&@6D}?+m1duB(|`2%W)=#1-kvVZ``BeC+63F-M#rOUw^r>8iPY6XOUOXsiLWi zZ;CTG)L(z>_-<9!^HX1ce!24ePSyUW9Yz2B7#Q}f4tTUN~(~YR9{L zMGOoJR8(1ano^-6d$FQvPehEw(ML|nJ%-05l6wr<9!TtT0*&!-srIez>M*sn6l7qq zI0zoW;w@@LZOaF`njiCfzcCWad=Omdz{X$y59BPpw$vv;3MA<1>gTe~DWM4fWXG6R literal 0 HcmV?d00001 From 1907e5d3ce93c1c60cfe6694cefabb4b71ce308a Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 19 Mar 2020 17:13:50 +0100 Subject: [PATCH 22/51] Summarize execution steps for the simple executor example --- .../second-edition/posts/12-async-await/index.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 2cb4b9ac..6a0312aa 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -982,3 +982,15 @@ fn kernel_main(boot_info: &'static BootInfo) -> ! { When we run it, we see that the expected _"async number: 42"_ message is printed to the screen: ![QEMU printing "Hello World", "async number: 42", and "It did not crash!"](qemu-simple-executor.png) + +Let's summarize the various steps that happen for this example: + +- First, a new instance of our `SimpleExecutor` type is created with an empty `task_queue`. +- Next, we call the asynchronous `example_task` function, which returns a future. We wrap this future in the `Task` type, which moves it to the heap and pins it, and then add the task to the `task_queue` of the executor through the `spawn` method. +- We then wall the `run` method to start the execution of the single task in the queue. This involves: + - Popping the task from the front of the `task_queue`. + - Creating a `DummyWaker` for the task, converting it to a [`Waker`] instance, and then creating a [`Context`] instance from it. + - Calling the [`poll`] method on the future of the task, using the `Context` we just created. + - Since the `example_task` does not wait for anything, it can directly run til its end on the first `poll` call. This is where the _"async number: 42"_ line is printed. + - Since the `example_task` directly returns `Poll::Ready`, it is not added back to the task queue. +- The `run` method returns after the `task_queue` becomes empty. The execution of our `kernel_main` function continues and the _"It did not crash!"_ message is printed. From 744314cb3a09b36515980babc3fcaba7539cc6c6 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 19 Mar 2020 17:58:24 +0100 Subject: [PATCH 23/51] Begin section about async keyboard interrupt --- .../posts/12-async-await/index.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 6a0312aa..7b58667b 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -994,3 +994,21 @@ Let's summarize the various steps that happen for this example: - Since the `example_task` does not wait for anything, it can directly run til its end on the first `poll` call. This is where the _"async number: 42"_ line is printed. - Since the `example_task` directly returns `Poll::Ready`, it is not added back to the task queue. - The `run` method returns after the `task_queue` becomes empty. The execution of our `kernel_main` function continues and the _"It did not crash!"_ message is printed. + +### Async Keyboard Input + +Our simple executor does not utilize the `Waker` notifications and simply loops over all tasks until they are done. This wasn't a problem for our example since our `example_task` can directly run to finish on the first `poll` call. To see the performance advantages of a proper `Waker` implementation, we first need to create a task that is truly asynchronous, i.e. a task that will probably return `Poll::Pending` on the first `poll` call. + +We already have some kind of asynchronicity in our system that we can use for this: hardware interrupts. As we learned in the [_Interrupts_] post, hardware interrupts can occur at arbitrary points in time, determined by some external device. For example, a hardware timer sends an interrupt to the CPU after some predefined time elapsed. When the CPU receives an interrupt, it immediately transfers control to the corresponding handler function defined in the interrupt descriptor table (IDT). + +[_Interrupts_]: @/second-edition/posts/07-hardware-interrupts/index.md + +In the following, we will create an asynchronous task based on the keyboard interrupt. The keyboard interrupt is a good candidate for this because it is both non-deterministic and latency-critical. Non-deteministic means that there is no way to predict when the next key press will occur because it is entirely dependent on the user. Latency-critical means that we want to handle the keyboard input in a timely manner, otherwise the user will feel a lag. To support such a task in an efficient way, it will be essential that the executor has proper support for `Waker` notifications. + +#### Moving the Keyboard Code + +#### Scancode Queue + +#### AtomicWaker + +### Executor with Waker Support From 45afd2032b5332603ef5ac5ad69e8d5233636a91 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Fri, 20 Mar 2020 13:00:47 +0100 Subject: [PATCH 24/51] Add section about scancode queue --- .../posts/12-async-await/index.md | 74 ++++++++++++++++++- .../12-async-await/scancode-queue.drawio | 1 + .../posts/12-async-await/scancode-queue.svg | 3 + 3 files changed, 76 insertions(+), 2 deletions(-) create mode 100644 blog/content/second-edition/posts/12-async-await/scancode-queue.drawio create mode 100644 blog/content/second-edition/posts/12-async-await/scancode-queue.svg diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 7b58667b..2c658419 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -1005,10 +1005,80 @@ We already have some kind of asynchronicity in our system that we can use for th In the following, we will create an asynchronous task based on the keyboard interrupt. The keyboard interrupt is a good candidate for this because it is both non-deterministic and latency-critical. Non-deteministic means that there is no way to predict when the next key press will occur because it is entirely dependent on the user. Latency-critical means that we want to handle the keyboard input in a timely manner, otherwise the user will feel a lag. To support such a task in an efficient way, it will be essential that the executor has proper support for `Waker` notifications. -#### Moving the Keyboard Code - #### Scancode Queue +Currently, we handle the keyboard input directly in the interrupt handler. This is not a good idea for the long term because interrupt handlers should stay as short as possible as they might interrupt important work. Instead, interrupt handlers should only perform the minimal amount of work necessary (e.g. reading the keyboard scancode) and leave the rest of the work (e.g. interpreting the scancode) to a background task. + +A common pattern for delegating work to a background task is to create some sort of queue. The interrupt handler pushes work units of work to the queue and the background task handles the work in the queue. Applied to our keyboard interrupt, this means that the interrupt handler only reads the scancode from the keyboard, pushes it to the queue, and then returns. The keyboard task sits on the other end of the queue and interprets and handles each scancode that is pushed to it: + +![Scancode queue with 8 slots on the top. Keyboard interupt handler on the bottom left with a "push scancode" arrow to the left of the queue. Keyboard task on the bottom right with a "pop scancode" queue coming from the right side of the queue.](scancode-queue.svg) + +A simple implementation of that queue could be a mutex-protected [`VecDeque`]. However, using mutexes in interrupt handlers is not a good idea since it can easily lead to deadlocks. For example, when the user presses a key while the keyboard task has locked the queue, the interrupt handler tries to acquire the lock again and hangs indefinitely. Another problem with this approach is that `VecDeque` automatically increases its capacity by performing a new heap allocation when it becomes full. This can lead to deadlocks again because our allocator also uses a mutex internally. Further problems are that heap allocations can fail or take a considerable amount of time when the heap is fragmented. + +To prevent these problems, we need a queue implementation that does not require mutexes or allocations for its `push` operation. Such queues can be implemented by using lock-free [atomic operations] for pushing and popping elements. This way, it is possible to create `push` and `pop` operations that only require a `&self` reference and are thus usable without a mutex. To avoid allocations on `push`, the queue can be backed by a pre-allocated fixed-size buffer. While this makes the queue _bounded_ (i.e. it has a maximum length), it is often possible to define reasonable upper bounds for the queue length in practice so that this isn't a big problem. + +[atomic operations]: https://doc.rust-lang.org/core/sync/atomic/index.html + +##### Crossbeam + +Implementing such a queue in a correct and efficient way is very difficult, so I recommend to stick to existing, well-tested implementations. One popular Rust project that implements various mutex-free types for concurrent programming is [`crossbeam`]. It provides a type named [`ArrayQueue`] that is exactly what we need in this case. And we're lucky: The type is fully compatible to `no_std` crates with allocation support. + +[`crossbeam`]: https://github.com/crossbeam-rs/crossbeam +[`ArrayQueue`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html + +To use the type, we need to add a dependency on the `crossbeam-queue` crate: + +```toml +# in Cargo.toml + +[dependencies.crossbeam-queue] +version = "0.2.1" +default-features = false +features = ["alloc"] +``` + +By default, the crate depends on the standard library. To make it `no_std` compatible, we need to disable its default features and instead enable the `alloc` feature. (Note that depending on the main `crossbeam` crate does not work here because it is missing an export of the `queue` module for `no_std`. I filed a [pull request](https://github.com/crossbeam-rs/crossbeam/pull/480) to fix this.) + +##### Implementation + +Using the `ArrayQueue` type, we can now create a global scancode queue in a new `task::keyboard` module: + +```rust +// in src/task/mod.rs + +pub mod keyboard; +``` + +```rust +// in src/task/keyboard.rs + +use conquer_once::spin::OnceCell; +use crossbeam_queue::ArrayQueue; + +static SCANCODE_QUEUE: OnceCell> = OnceCell::uninit(); +``` + +Since the [`ArrayQueue::new`] performs a heap allocation, which are not possible at compile time ([yet][const-heap-alloc]), we can't initialize the static variable directly. Instead, we use the [`OnceCell`] type of the [`conquer_once`] crate, which makes it possible to perform safe one-time initialization of static values. To include the crate, we need to add it as a dependency in our `Cargo.toml`: + +[`ArrayQueue::new`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.new +[const-heap-alloc]: https://github.com/rust-lang/const-eval/issues/20 +[`OnceCell`]: https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html +[`conquer_once`]: https://docs.rs/conquer-once/0.2.0/conquer_once/index.html + +```toml +# in Cargo.toml + +[dependencies.conquer-once] +version = "0.2.0" +default-features = false +``` + +Instead of the [`OnceCell`] primitive, we could also use the [`lazy_static`] macro here. However, the `OnceCell` type has the advantage that we can ensure that the initialization does not happen in the interrupt handler, thus preventing that the interrupt handler performs a heap allocation. + +[`lazy_static`]: https://docs.rs/lazy_static/1.4.0/lazy_static/index.html + +#### Scancode Stream + #### AtomicWaker ### Executor with Waker Support diff --git a/blog/content/second-edition/posts/12-async-await/scancode-queue.drawio b/blog/content/second-edition/posts/12-async-await/scancode-queue.drawio new file mode 100644 index 00000000..b09530b9 --- /dev/null +++ b/blog/content/second-edition/posts/12-async-await/scancode-queue.drawio @@ -0,0 +1 @@ +7Vldb9sgFP01fmzk7ySPbdKt0zqpayate6Q2sVGJsTBukv76QQw22Plam6ZWNVeq4HC5wLk395jE8iaL1VcK8vQHiSG2XDteWd7Ucl1v7PP/AlhXgBu6FZBQFFeQ0wAz9AIlaEu0RDEsDENGCGYoN8GIZBmMmIEBSsnSNJsTbK6agwR2gFkEcBf9jWKWVugosBv8BqIkVSs7thxZAGUsgSIFMVlqkHdteRNKCKtai9UEYsGd4qWa92XHaL0xCjN2zITZC/aTcgZ+4zF5CFfpzfDevZBengEu5YHlZtlaMUBJmcVQOLEt72qZIgZnOYjE6JKHnGMpW2Dec3hTuoOUwdXOfTr16XnWQLKAjK65iZwwknzJhAlld9mwrxhNNeIVBmS8k9pvQwlvSFb+gSG3fwzVOdYTirweUuT2iyK/hxT5/aIo6CFFYb8oCk9M0RxhPCGY0M1cbz6HYRRxvGCUPEFtxN48JyK1ZwV++BlIdXsmCaNPQWrPRGTcIZW/qGYRf+Xm6M8ScrRNMj88M5k0GctIBlv0SghglGS8G3GyIMevBJWIvxhfyoEFimOxzNbQmcE9QSxaRcPvhqIu1meJhbqfaMH4DtePBNCYo98EZ7TMGW/fgIzzRPfkv3M4/981z4cmt/W7k0bucAu34btxe8StBGbxpbjeiQzFoChQZBIGV4g9yPQT7T+iPQhkb7rShqZr1cn47h/0jjZLdJtpm56aV20Oxp2bZCsA/ACkpBE8Iq0YoAlk+wy3R1QLWbAlZAqjEAOGns39boujXOGOoIzpamMkTO1WeaiOKSfpV9KWn1bi1SKm/FQsdPzwwIO1ZpYLg2L3doNWfgf23l21zIeGNW9UyzfpXdP/hozv3jLzskhFxtQV/mBlVxUbwznbV68pLNALeNw4Eukr6eNegysrmHIEg0eIr0D0lGyKlCG84lEmd6RADBHhnVZJVq962xqvVy94jUNZcrvZ5NQT4kMypi3hT8Xfvsomv+uRR7DqdNQ/cHsKy846eGEPvDAcmslykg+MY/i8cEeDkTfWntB0SObzArJW2p0m0bp3dU22foHiqcc61b45fLxQHXGtP1aonNcIlfNhQjU6Vqe8/0J1tFC11xnuF6rOi1twDqnqfk+Tk/x1StUWjfNIVUsfDynVvczKM0uVf0iq3MAUFVWM3ipVvplTzsAdv12deLf5paMyb34u8q7/Ag== \ No newline at end of file diff --git a/blog/content/second-edition/posts/12-async-await/scancode-queue.svg b/blog/content/second-edition/posts/12-async-await/scancode-queue.svg new file mode 100644 index 00000000..e4fe8010 --- /dev/null +++ b/blog/content/second-edition/posts/12-async-await/scancode-queue.svg @@ -0,0 +1,3 @@ + + +
Scancode Queue
Scancode Queue
Keyboard Interrupt Handler
Keyboard In...
push scancode
p...
Keyboard Task
Keyboard Ta...
pop scancode
p...
Viewer does not support full SVG 1.1
\ No newline at end of file From dd83feec2dcded63c364b6efc12ed56e55d451e3 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Fri, 20 Mar 2020 15:27:03 +0100 Subject: [PATCH 25/51] Add section about filling the scancode queue --- .../posts/12-async-await/index.md | 52 ++++++++++++++++++- 1 file changed, 50 insertions(+), 2 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 2c658419..8105de97 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -1019,7 +1019,7 @@ To prevent these problems, we need a queue implementation that does not require [atomic operations]: https://doc.rust-lang.org/core/sync/atomic/index.html -##### Crossbeam +##### The `crossbeam` Crate Implementing such a queue in a correct and efficient way is very difficult, so I recommend to stick to existing, well-tested implementations. One popular Rust project that implements various mutex-free types for concurrent programming is [`crossbeam`]. It provides a type named [`ArrayQueue`] that is exactly what we need in this case. And we're lucky: The type is fully compatible to `no_std` crates with allocation support. @@ -1039,7 +1039,7 @@ features = ["alloc"] By default, the crate depends on the standard library. To make it `no_std` compatible, we need to disable its default features and instead enable the `alloc` feature. (Note that depending on the main `crossbeam` crate does not work here because it is missing an export of the `queue` module for `no_std`. I filed a [pull request](https://github.com/crossbeam-rs/crossbeam/pull/480) to fix this.) -##### Implementation +##### Queue Implementation Using the `ArrayQueue` type, we can now create a global scancode queue in a new `task::keyboard` module: @@ -1077,6 +1077,54 @@ Instead of the [`OnceCell`] primitive, we could also use the [`lazy_static`] mac [`lazy_static`]: https://docs.rs/lazy_static/1.4.0/lazy_static/index.html +#### Filling the Queue + +To fill the scancode queue, we create a new `add_scancode` function that we will call from the interrupt handler: + +```rust +// in src/task/keyboard.rs + +/// Called by the keyboard interrupt handler +/// +/// Must not block or allocate. +pub(crate) add_scancode(scancode: u8) { + if let Ok(queue) = SCANCODE_QUEUE.try_get() { + if let Err(_) = scancode_queue.push(scancode) { + println!("WARNING: scancode queue full; dropping keyboard input"); + } + } +} +``` + +We use the [`OnceCell::try_get`] to get a reference to the initialized queue. If the queue is not initialized yet, we do nothing and ignore the keyboard scancode. It's important that we don't try to initialize the queue in this function because it will be called by the interrupt handler, which should not perform heap allocations. Since this function should not be callable from our `main.rs`, we use the `pub(crate)` visibility to make it only available to our `lib.rs`. + +[`OnceCell::try_get`]: https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html#method.try_get + +To call the function on keyboard interrupts, we update our `keyboard_interrupt_handler` function in the `interrupts` module: + +```rust +// in src/interrupts.rs + +extern "x86-interrupt" fn keyboard_interrupt_handler( + _stack_frame: &mut InterruptStackFrame +) { + use x86_64::instructions::port::Port; + + let mut port = Port::new(0x60); + let scancode: u8 = unsafe { port.read() }; + crate::task::keyboard::add_scancode(scancode); // new + + unsafe { + PICS.lock() + .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); + } +} +``` + +We removed all the keyboard handling code from this function and instead added a call to the `add_scancode` function. The rest of the function stays the same as before. + +As expected, keypresses are no longer printed to the screen when we run our project using `cargo xrun` now. Instead, the scancodes are added to the `SCANCODE_QUEUE`. After 100 keystrokes, the queue becomes full and we see the warning about dropped keyboard input on the screen. + #### Scancode Stream #### AtomicWaker From c3648e4b20ec967dc083f6c6f8785e05f1fec9f0 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Fri, 20 Mar 2020 17:23:22 +0100 Subject: [PATCH 26/51] Implement Stream for ScancodeStream --- .../posts/12-async-await/index.md | 96 ++++++++++++++++++- 1 file changed, 94 insertions(+), 2 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 8105de97..02f7e6b4 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -1100,7 +1100,11 @@ We use the [`OnceCell::try_get`] to get a reference to the initialized queue. If [`OnceCell::try_get`]: https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html#method.try_get -To call the function on keyboard interrupts, we update our `keyboard_interrupt_handler` function in the `interrupts` module: +The fact that the [`ArrayQueue::push`] method requires only a `&self` reference makes it very simple to call the method on the static queue. The `ArrayQueue` type performs all necessary synchronization itself, so we don't need a mutex wrapper here. + +[`ArrayQueue::push`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.push + +To call the `add_scancode` function on keyboard interrupts, we update our `keyboard_interrupt_handler` function in the `interrupts` module: ```rust // in src/interrupts.rs @@ -1127,6 +1131,94 @@ As expected, keypresses are no longer printed to the screen when we run our proj #### Scancode Stream -#### AtomicWaker +To read the scancodes from the queue in an asynchronous way, we create a new `ScancodeStream` type: + +```rust +// in src/task/keyboard.rs + +pub struct ScancodeStream { + _private: (), +} + +impl ScancodeStream { + pub fn new() -> Self { + SCANCODE_QUEUE.try_init_once(|| ArrayQueue::new(100)) + .expect("ScancodeStream::new should only be called once"); + ScancodeStream { + _private: (), + } + } +``` + +The purpose of the `_private` field is to prevent construction of the struct from outside of the module. This makes the `new` function the only way to construct the type. In the function, we first try to initialize the `SCANCODE_QUEUE` static. We panic if it is already initialized to ensure that only a single `ScancodeStream` type can be created. + +To make the scancodes available to asynchronous tasks, the next step is to implement `poll`-like method that tries to pop the next scancode off the queue. While this sounds like we should implement [`Future`] trait for our type, this does not quite fit here. The problem is that the `Future` trait only abstracts over a single asynchronous value and expects that the `poll` method is not called again after it returns `Poll::Ready`. Our scancode queue, however, contains multiple asynchronous tasks so that it is ok to keep polling it. + +##### The `Stream` Trait + +Since types that yield multiple asynchronous values are common, the [`futures`] crate provides a useful abstraction for such types: the [`Stream`] trait. The trait is defined like this: + +[`Stream`]: https://rust-lang.github.io/async-book/05_streams/01_chapter.html + +```rust +pub trait Stream { + type Item; + + fn poll_next(self: Pin<&mut Self>, cx: &mut Context) + -> Poll>; +} +``` + +This definition is quite similar to the [`Future`] trait, with the following differences: + +- The associated type is named `Item` instead of `Output`. +- Instead of a `poll` method that returns `Poll`, the `Stream` trait defines a `poll_next` method that returns a `Poll>` (note the additional `Option`). + +There is also a semantic difference: The `poll_next` can be called repeatedly, until it returns `Poll::Ready(None)` to signal that the stream is finished. In this regard, the method is similar to the [`Iterator::next`] method, which also returns `None` after the last value. + +[`Iterator::next`]: https://doc.rust-lang.org/stable/core/iter/trait.Iterator.html#tymethod.next + +##### Implementing `Stream` + +Let's implement the `Stream` trait for our `ScancodeStream` to provide the values of the `SCANCODE_QUEUE` in an asynchronous way. For this, we first need to add a dependency on the `futures-util` crate, which contains the `Stream` type: + +```toml +# in Cargo.toml + +[dependencies.futures-util] +version = "0.3.4" +default-features = false +features = ["alloc"] +``` + +We disable the default features to make the crate `no_std` compatible and enable the `alloc` feature to make its allocation-based types available (we will need this later). (Note that we could also add a dependency on the main `futures` crate, which re-exports the `futures-util` crate, but this would result in a larger number of dependencies and longer compile times.) + +Now we can import and implement the `Stream` trait: + +```rust +// in src/task/keyboard.rs + +use futures_util::stream::Stream; + +impl Stream for ScancodeStream { + type Item = u8; + + fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll> { + let queue = SCANCODE_QUEUE.try_get().expect("not initialized"); + match queue.pop() { + Ok(scancode) => Poll::Ready(Some(scancode)), + Err(crossbeam_queue::PopError) => Poll::Pending, + } + } +} +``` + +We first use the [`OnceCell::try_get`] method to get a reference to the initialized scancode queue. This should never fail since we initialize the queue in the `new` function, so we can safely use the `expect` method to panic if it's not initalized. Next, we use the [`ArrayQueue::pop`] to try to get the next element from the queue. If it succeeds we return the scancode wrapped in `Poll::Ready(Some(…))`. If it fails, it means that the queue is empty. In that case, we return `Poll::Pending`. + +[`ArrayQueue::pop`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.pop + +#### Waker Support + + ### Executor with Waker Support From 4f29fdea7220f8ab71081c0ccce532a66a75d50c Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Fri, 20 Mar 2020 19:01:20 +0100 Subject: [PATCH 27/51] Add Waker support to the poll_next implementation on ScancodeStream --- .../posts/12-async-await/index.md | 83 +++++++++++++++++++ 1 file changed, 83 insertions(+) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 02f7e6b4..c5ca2591 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -1219,6 +1219,89 @@ We first use the [`OnceCell::try_get`] method to get a reference to the initiali #### Waker Support +Like the `Futures::poll` method, the `Stream::poll_next` method requires that the asynchronous task notifies the executor when it becomes ready after `Poll::Pending` is returned for the first time. This way, the executor does not need to poll the same task again until it is notified, which greatly reduces the performance overhead of waiting tasks. + +To send this notification, the task should extract the [`Waker`] from the passed [`Context`] reference and store it somewhere. When the task becomes ready, it should invoke the [`wake`] method on the stored `Waker` to notify the executor that the task should be polled again. + +##### AtomicWaker + +To implement the `Waker` notification for our `ScancodeStream`, we need a place where we can store the `Waker` between poll calls. We can't store it as a field in the `ScancodeStream` itself because it needs to be accessible from the `add_scancode` function. The solution for this is to use a static variable of the [`AtomicWaker`] type provided by the `futures-util` crate. Like the `ArrayQueue` type, this type is based on atomic instructions and can be safely stored in a static and modified concurrently. + +[`AtomicWaker`]: https://docs.rs/futures-util/0.3.4/futures_util/task/struct.AtomicWaker.html + +Let's use the [`AtomicWaker`] type to define a static `WAKER`: + +```rust +// in src/task/keyboard.rs + +static WAKER: AtomicWaker = AtomicWaker::new(); +``` + +The idea is that the `poll_next` implementation stores the current waker in this static and the `add_scancode` function calls the `wake` function on it when a new scancode is added to the queue. + +##### Storing a Waker + +The contract defined by `poll`/`poll_next` requires that the task registers a wakeup for the passed `Waker` when it returns `Poll::Pending`. Let's modify our `poll_next` implementation to satisfy these requirement: + +```rust +// in src/task/keyboard.rs + +impl Stream for ScancodeStream { + type Item = u8; + + fn poll_next(self: Pin<&mut Self>, context: &mut Context) -> Poll> { + let queue = SCANCODE_QUEUE + .try_get() + .expect("scancode queue not initialized"); + + // fast path + if let Ok(scancode) = queue.pop() { + return Poll::Ready(Some(scancode)); + } + + WAKER.register(&cx.waker()); + match scancodes.pop() { + Ok(scancode) => Poll::Ready(scancode), + Err(crossbeam_queue::PopError) => Poll::Pending, + } + } +} +``` + +Like before, we first use the [`OnceCell::try_get`] function to get a reference to the initialized scancode queue. We then optimistically try to `pop` from the queue and return `Poll::Ready` when it succeeds. This exploits the fact that it's only required to register a wakeup when returning `Poll::Pending`. + +If the first call to `queue.pop()` does not succeed, the queue is potentially empty. Only potentially because the interrupt handler might have filled the queue asynchronously immediately after the check. Since this race condition can occur again on the next check, we need to register the `Waker` in the `WAKER` static before the second check. This way, a wakeup might happen before we return `Poll::Pending`, but it is guaranteed that we get a wakeup for any scancodes pushed after the check. + +After registering the `Waker` contained in the passed [`Context`] through the [`AtomicWaker::register`] function, we try popping from the queue a second time. If it now succeeds, we return `Poll::Ready`. Otherwise, we return `Poll::Pending` like before, but this time with a registered wakeup. + +[`AtomicWaker::register`]: https://docs.rs/futures-util/0.3.4/futures_util/task/struct.AtomicWaker.html#method.register + +Note that there are two ways that a wakeup can happen for a task that did not return `Poll::Pending` (yet). One way is the mentioned race condition when the wakeup happens immediately before returning `Poll::Pending`. The other way is when the queue is no longer empty after registering the waker so that `Poll::Ready` is returned. Since these spurious wakeups are not preventable, the executor needs to be able to handle them correctly. + +##### Waking the Stored Waker + +To wake the stored `Waker`, we add a call to `WAKER.wake()` in the `add_scancode` function: + +```rust +// in src/task/keyboard.rs + +pub(crate) add_scancode(scancode: u8) { + if let Ok(queue) = SCANCODE_QUEUE.try_get() { + if let Err(_) = scancode_queue.push(scancode) { + println!("WARNING: scancode queue full; dropping keyboard input"); + } else { + WAKER.wake(); // new + } + } +} +``` + +The only change that we performed is to add a call to `WAKER.wake()` if the push to the scancode queue succeeds. If a waker is registered in the `WAKER` static, this method will call the equally-named [`wake`] method on it, which notifies the executor. Otherwise, the operation is a no-op, i.e. nothing happens. + +[`wake`]: https://doc.rust-lang.org/stable/core/task/struct.Waker.html#method.wake + +It is important that we call `wake` only after pushing to the queue because otherwise the task might be woken too early when the queue is still empty. This can for example happen when using a multi-threaded executor that starts the woken task concurrently on a different CPU core. While we don't have thread support yet, we will add it soon and we don't want things to break then. + ### Executor with Waker Support From 000adfb2bedf1939a2b63f2627ad6b07a5824fa7 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Sat, 21 Mar 2020 11:05:19 +0100 Subject: [PATCH 28/51] Create a keyboard task and use it with our SimpleExecutor --- .../posts/12-async-await/index.md | 62 +++++++++++++++++++ 1 file changed, 62 insertions(+) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index c5ca2591..c2b16504 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -1302,6 +1302,68 @@ The only change that we performed is to add a call to `WAKER.wake()` if the push It is important that we call `wake` only after pushing to the queue because otherwise the task might be woken too early when the queue is still empty. This can for example happen when using a multi-threaded executor that starts the woken task concurrently on a different CPU core. While we don't have thread support yet, we will add it soon and we don't want things to break then. +#### Keyboard Task +Now that we implemented the `Stream` trait for our `ScancodeStream`, we can use it to create an asynchronous keyboard task: + +```rust +// in src/task/keyboard.rs + +use futures_util::stream::StreamExt; +use pc_keyboard::{layouts, DecodedKey, HandleControl, Keyboard, ScancodeSet1}; +use crate::print; + +pub async fn print_keypresses() { + let mut scancodes = ScancodeStream::new(); + let mut keyboard = Keyboard::new(layouts::Us104Key, ScancodeSet1, + HandleControl::Ignore); + + while let Some(scancode) = scancodes.next().await { + if let Ok(Some(key_event)) = keyboard.add_byte(scancode) { + if let Some(key) = keyboard.process_keyevent(key_event) { + match key { + DecodedKey::Unicode(character) => print!("{}", character), + DecodedKey::RawKey(key) => print!("{:?}", key), + } + } + } + } +} +``` + +The code is very similar to the code we had in our [keyboard interrupt handler] before we modified it in this post. The only difference is that, instead of reading the scancode from an I/O port, we take it from the `ScancodeStream`. For this, we first create a new `Scancode` stream and then repeatedly use the [`next`] method provided by the [`StreamExt`] trait to get a `Future` that resolves to the next element in the stream. By using the `await` operator on it, we asynchronously wait for the result of the future. + +[keyboard interrupt handler]: TODO +[`next`]: TODO +[`StreamExt`]: TODO + +We use `while let` to loop until the stream returns `None` to signal its end. Since our `poll_next` method never returns `None`, this is effectively and endless loop, so the `print_keypresses` task never finishes. + +Let's add the `print_keypresses` task to our executor in our `main.rs` to get working keyboard input again: + +```rust +// in src/main.rs + +use blog_os::task::keyboard; + +fn kernel_main(boot_info: &'static BootInfo) -> ! { + // […] initialization routines, including `init_heap` + + let mut executor = SimpleExecutor::new(); + executor.spawn(Task::new(example_task())); + executor.spawn(Task::new(keyboard::print_keypresses())); + executor.run(); + + // […] test_main, "it did not crash" message, hlt_loop +} +``` + +When we execute `cargo xrun` now, we see that keyboard input works again: + +TODO image + +If you keep an eye on the CPU utilization of your computer, you will see that the `QEMU` process now keeps one CPU completely busy. This happens because our `SimpleExecutor` polls tasks over and over again in a loop. So even if we don't press any keys on the keyboard, the executor repeatedly calls `poll` on our `print_keypresses` task, even though the task cannot make any progress and will return `Poll::Pending` each time. + +To fix this, we need to create an executor that properly utilizes the `Waker` notifications. This way, the executor is notified when the next keyboard interrupt occurs, so it does not need to keep polling the `print_keypresses` task over and over again. ### Executor with Waker Support From c423363266681bd3d4b3f12b27e076f65ea500ae Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Sun, 22 Mar 2020 12:46:31 +0100 Subject: [PATCH 29/51] Small fixes to text and code examples --- .../posts/12-async-await/index.md | 34 ++++++++++++------- 1 file changed, 21 insertions(+), 13 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index c2b16504..7d09ce97 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -997,7 +997,7 @@ Let's summarize the various steps that happen for this example: ### Async Keyboard Input -Our simple executor does not utilize the `Waker` notifications and simply loops over all tasks until they are done. This wasn't a problem for our example since our `example_task` can directly run to finish on the first `poll` call. To see the performance advantages of a proper `Waker` implementation, we first need to create a task that is truly asynchronous, i.e. a task that will probably return `Poll::Pending` on the first `poll` call. +Our simple executor does not utilize the `Waker` notifications and simply loops over all tasks until they are done. This wasn't a problem for our example since our `example_task` can directly run to finish on the first `poll` call. To see the performance advantages of a proper `Waker` implementation, we first need to create a task that is truly asynchronous, i.e. a task that will probably return `Poll::Pending` on the first `poll` call. We already have some kind of asynchronicity in our system that we can use for this: hardware interrupts. As we learned in the [_Interrupts_] post, hardware interrupts can occur at arbitrary points in time, determined by some external device. For example, a hardware timer sends an interrupt to the CPU after some predefined time elapsed. When the CPU receives an interrupt, it immediately transfers control to the corresponding handler function defined in the interrupt descriptor table (IDT). @@ -1037,7 +1037,7 @@ default-features = false features = ["alloc"] ``` -By default, the crate depends on the standard library. To make it `no_std` compatible, we need to disable its default features and instead enable the `alloc` feature. (Note that depending on the main `crossbeam` crate does not work here because it is missing an export of the `queue` module for `no_std`. I filed a [pull request](https://github.com/crossbeam-rs/crossbeam/pull/480) to fix this.) +By default, the crate depends on the standard library. To make it `no_std` compatible, we need to disable its default features and instead enable the `alloc` feature. (Note that depending on the main `crossbeam` crate does not work here because it is missing an export of the `queue` module for `no_std`. I filed a [pull request](https://github.com/crossbeam-rs/crossbeam/pull/480) to fix this, but it wasn't released on crates.io yet.) ##### Queue Implementation @@ -1084,19 +1084,23 @@ To fill the scancode queue, we create a new `add_scancode` function that we will ```rust // in src/task/keyboard.rs +use crate::println; + /// Called by the keyboard interrupt handler /// /// Must not block or allocate. -pub(crate) add_scancode(scancode: u8) { +pub(crate) fn add_scancode(scancode: u8) { if let Ok(queue) = SCANCODE_QUEUE.try_get() { - if let Err(_) = scancode_queue.push(scancode) { + if let Err(_) = queue.push(scancode) { println!("WARNING: scancode queue full; dropping keyboard input"); } + } else { + println!("WARNING: scancode queue uninitialized"); } } ``` -We use the [`OnceCell::try_get`] to get a reference to the initialized queue. If the queue is not initialized yet, we do nothing and ignore the keyboard scancode. It's important that we don't try to initialize the queue in this function because it will be called by the interrupt handler, which should not perform heap allocations. Since this function should not be callable from our `main.rs`, we use the `pub(crate)` visibility to make it only available to our `lib.rs`. +We use the [`OnceCell::try_get`] to get a reference to the initialized queue. If the queue is not initialized yet, we ignore the keyboard scancode and print a warning. It's important that we don't try to initialize the queue in this function because it will be called by the interrupt handler, which should not perform heap allocations. Since this function should not be callable from our `main.rs`, we use the `pub(crate)` visibility to make it only available to our `lib.rs`. [`OnceCell::try_get`]: https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html#method.try_get @@ -1127,11 +1131,11 @@ extern "x86-interrupt" fn keyboard_interrupt_handler( We removed all the keyboard handling code from this function and instead added a call to the `add_scancode` function. The rest of the function stays the same as before. -As expected, keypresses are no longer printed to the screen when we run our project using `cargo xrun` now. Instead, the scancodes are added to the `SCANCODE_QUEUE`. After 100 keystrokes, the queue becomes full and we see the warning about dropped keyboard input on the screen. +As expected, keypresses are no longer printed to the screen when we run our project using `cargo xrun` now. Instead, we see the warning that the scancode queue is uninitialized for every keystroke. #### Scancode Stream -To read the scancodes from the queue in an asynchronous way, we create a new `ScancodeStream` type: +To initialize the `SCANCODE_QUEUE` and read the scancodes from the queue in an asynchronous way, we create a new `ScancodeStream` type: ```rust // in src/task/keyboard.rs @@ -1148,6 +1152,7 @@ impl ScancodeStream { _private: (), } } +} ``` The purpose of the `_private` field is to prevent construction of the struct from outside of the module. This makes the `new` function the only way to construct the type. In the function, we first try to initialize the `SCANCODE_QUEUE` static. We panic if it is already initialized to ensure that only a single `ScancodeStream` type can be created. @@ -1198,6 +1203,7 @@ Now we can import and implement the `Stream` trait: ```rust // in src/task/keyboard.rs +use core::{pin::Pin, task::{Poll, Context}}; use futures_util::stream::Stream; impl Stream for ScancodeStream { @@ -1234,6 +1240,8 @@ Let's use the [`AtomicWaker`] type to define a static `WAKER`: ```rust // in src/task/keyboard.rs +use futures_util::task::AtomicWaker; + static WAKER: AtomicWaker = AtomicWaker::new(); ``` @@ -1249,7 +1257,7 @@ The contract defined by `poll`/`poll_next` requires that the task registers a wa impl Stream for ScancodeStream { type Item = u8; - fn poll_next(self: Pin<&mut Self>, context: &mut Context) -> Poll> { + fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll> { let queue = SCANCODE_QUEUE .try_get() .expect("scancode queue not initialized"); @@ -1260,8 +1268,8 @@ impl Stream for ScancodeStream { } WAKER.register(&cx.waker()); - match scancodes.pop() { - Ok(scancode) => Poll::Ready(scancode), + match queue.pop() { + Ok(scancode) => Poll::Ready(Some(scancode)), Err(crossbeam_queue::PopError) => Poll::Pending, } } @@ -1344,14 +1352,14 @@ Let's add the `print_keypresses` task to our executor in our `main.rs` to get wo ```rust // in src/main.rs -use blog_os::task::keyboard; - fn kernel_main(boot_info: &'static BootInfo) -> ! { + use blog_os::task::keyboard; + // […] initialization routines, including `init_heap` let mut executor = SimpleExecutor::new(); executor.spawn(Task::new(example_task())); - executor.spawn(Task::new(keyboard::print_keypresses())); + executor.spawn(Task::new(keyboard::print_keypresses())); // new executor.run(); // […] test_main, "it did not crash" message, hlt_loop From 89f5350ac4bd1ae7e05549d0a5551d23e053b663 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Sun, 22 Mar 2020 17:29:27 +0100 Subject: [PATCH 30/51] Start explaining the deadlock problem caused by Arc dealloc --- .../posts/12-async-await/index.md | 70 ++++++++++++++++++- 1 file changed, 68 insertions(+), 2 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 7d09ce97..f85ef248 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -1366,9 +1366,75 @@ fn kernel_main(boot_info: &'static BootInfo) -> ! { } ``` -When we execute `cargo xrun` now, we see that keyboard input works again: +When we execute `cargo xrun` now, we see that keyboard input works again, but only for a short time: + +![QEMU printing output for keypresses "H", "e", and "l", then it hangs](keyboard-deadlock.gif) + +After pressing a few keys, the complete execution hangs. Not even the dots by the timer interrupt are printed anymore. Such bugs are typically caused by a [_deadlock_], which is a state where we endlessly wait on some lock. To find out where the program hangs, the best approach is to connect a debugger and print the backtrace. Expand the section below for the exact debugging steps: + +[_deadlock_]: https://en.wikipedia.org/wiki/Deadlock + +
+Debugging Steps + +- Make sure `gdb` or `gdb-multiarch` is installed on your system. +- Pass the `-s` flag to QEMU when running your kernel. You can do this through the command `cargo xrun -- -s`. +- Run `gdb` with the file name of your kernel as argument: + ``` + gdb target/x86_64-blog_os/debug/blog_os + ``` +- From the `gdb` console, connect to the QEMU instance by executing `target ext :1234`. +- Print the backtrace by executing `backtrace` or `bt`. + + +The backtrace in this case looks like this: + +``` +#0 AtomicBool::load (self=0x22d250 , …) + at libcore/sync/atomic.rs:404 +#1 spin::Mutex::obtain_lock (self=0x22d250 ) + at spin-0.5.2/src/mutex.rs:134 +#2 spin::Mutex::lock (self=0x22d250 ) + at spin-0.5.2/src/mutex.rs:158 +#3 blog_os::allocator::Locked::lock (…) + at src/allocator.rs:73 +#4 Locked::dealloc (…) at src/allocator/fixed_size_block.rs:83 +#5 __rg_dealloc (…) at src/allocator.rs:19 +#6 alloc::alloc::dealloc (…) at liballoc/alloc.rs:103 +#7 alloc::alloc::Global::dealloc (…) at liballoc/alloc.rs:174 +#8 alloc::sync::Arc::drop_slow (…) at liballoc/sync.rs:743 +#9 alloc::sync::Arc::drop (…) at liballoc/sync.rs:1249 +#10 core::ptr::drop_in_place () at libcore/ptr/mod.rs:174 +#11 blog_os::task::simple_executor::DummyWaker::wake (…) + at src/task/simple_executor.rs:37 +#12 alloc::task::raw_waker::wake (waker=0x4444444400d0) + at liballoc/task.rs:69 +#13 core::task::wake::Waker::wake (self=...) at libcore/task/wake.rs:241 +#14 AtomicWaker::wake (self=0x22d210 ) + at futures-core-0.3.4/src/task/__internal/atomic_waker.rs:355 +#15 blog_os::task::keyboard::add_scancode (scancode=31) at src/task/keyboard.rs:24 +#16 blog_os::interrupts::keyboard_interrupt_handler (…) at src/interrupts.rs:87 +``` + +Note that I shortened the output a bit to make it more readable. + +
+ +From the backtrace, we can deduce that the deadlock was caused by the following order of operations: + +``` +keyboard_interrupt_handler -> add_scancode -> AtomicWaker::wake +-> GlobalAlloc::dealloc -> allocator::Locked::lock +``` + +TODO + + + +#### Fixing the Deadlock + + -TODO image If you keep an eye on the CPU utilization of your computer, you will see that the `QEMU` process now keeps one CPU completely busy. This happens because our `SimpleExecutor` polls tasks over and over again in a loop. So even if we don't press any keys on the keyboard, the executor repeatedly calls `poll` on our `print_keypresses` task, even though the task cannot make any progress and will return `Poll::Pending` each time. From c26d36ebce4530e9be6368ea6c9d14588a03d61f Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Mon, 23 Mar 2020 11:21:24 +0100 Subject: [PATCH 31/51] Prevent deadlock by basing DummyWaker directly on RawWaker Don't use Arc for DummyWaker. It causes a drop in the `add_scancode` function, which can easily lead to a deadlock because the function is called directly from the interrupt handler. --- .../posts/12-async-await/index.md | 170 +++++++++--------- 1 file changed, 82 insertions(+), 88 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index f85ef248..5e552430 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -890,53 +890,68 @@ The struct contains a single `task_queue` field of type [`VecDeque`], which is b #### Dummy Waker -In order to call the `poll` method, we need to create a [`Context`] type, which wraps a [`Waker`] type. To start simple, we will first create a dummy waker that does nothing. The simplest way to do this is by implementing the unstable [`Wake`] trait for an empty `DummyWaker` struct: +In order to call the `poll` method, we need to create a [`Context`] type, which wraps a [`Waker`] type. To start simple, we will first create a dummy waker that does nothing. For this, we create a [`RawWaker`] instance, which defines the implementation of the different `Waker` methods, and then use the [`Waker::from_raw`] function to turn it into a `Waker`: -[`Wake`]: https://doc.rust-lang.org/nightly/alloc/task/trait.Wake.html +[`RawWaker`]: https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html +[`Waker::from_raw`]: https://doc.rust-lang.org/stable/core/task/struct.Waker.html#method.from_raw ```rust // in src/task/simple_executor.rs -use alloc::{sync::Arc, task::Wake}; +use core::task::{Waker, RawWaker}; -struct DummyWaker; +fn dummy_raw_waker() -> RawWaker { + todo!(); +} -impl Wake for DummyWaker { - fn wake(self: Arc) { - // do nothing - } +fn dummy_waker() -> Waker { + unsafe { Waker::from_raw(dummy_raw_waker()) } } ``` -The trait is still unstable, so we have to add **`#![feature(wake_trait)]`** to the top of our `lib.rs` to use it. The `wake` method of the trait is normally responsible for waking the corresponding task in the executor. However, our `SimpleExecutor` will not differentiate between ready and waiting tasks, so we don't need to do anything on `wake` calls. +The `from_raw` function is unsafe because undefined behavior can occur if the programmer does not uphelp the documented requirements of `RawWaker`. Before we look at the implementation of the `dummy_raw_waker` function, we first try to understand how the `RawWaker` type works. -Since wakers are normally shared between the executor and the asynchronous tasks, the `wake` method requires that the `Self` instance is wrapped in the [`Arc`] type, which implements reference-counted ownership. The basic idea is that the value is heap-allocated and the number of active references to it are counted. If the number of active references reaches zero, the value is no longer needed and can be deallocated. +##### `RawWaker` +The [`RawWaker`] type requires the programmer to explicitly define a [_virtual method table_] (_vtable_) that specifies the functions that should be called when the `RawWaker` is cloned, woken, or dropped. The layout of this vtable is defined by the [`RawWakerVTable`] type. Each function receives a `*const ()` argument that is basically a _type-erased_ `&self` pointer to some struct, e.g. allocated on the heap. The reason for using a `*const ()` pointer instead of a proper reference is that the `RawWaker` type should be non-generic. The pointer value that is passed to the functions is the `data` pointer given to [`RawWaker::new`]. + +[_virtual method table_]: https://en.wikipedia.org/wiki/Virtual_method_table +[`RawWakerVTable`]: https://doc.rust-lang.org/stable/core/task/struct.RawWakerVTable.html +[`RawWaker::new`]: https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html#method.new + +Typically, the `RawWaker` is created for some heap allocated struct that is wrapped into the [`Box`] or [`Arc`] type. For such types, methods like [`Box::into_raw`] can be used to convert the `Box` to a `*const T` pointer. This pointer can then be casted to an anonymous `*const ()` pointer and passed to `RawWaker::new`. Since each vtable function receives the same `*const ()` as argument, the functions can sately cast the pointer back to a `Box` or a `&T` to operate on it. As you can imagine, this process is highly dangerous and can easily lead to undefined behavior on mistakes. For this reason, manually creating a `RawWaker` is not recommended unless necessary. + +[`Box`]: https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html +[`Box::into_raw`]: https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html#method.into_raw -To make our `DummyWaker` usable with the [`Context`] type, we need a method to convert it to the [`Waker`] defined in the core library: +##### A Dummy `RawWaker` + +While manually creating a `RawWaker` is not recommended, there is currently no other way to create a dummy `Waker` that does nothing. Fortunately, the fact that we want to do nothing makes it relatively safe to implement the `dummy_raw_waker` function: ```rust // in src/task/simple_executor.rs -use core::task::Waker; +use core::task::RawWakerVTable; -impl DummyWaker { - fn to_waker(self) -> Waker { - Waker::from(Arc::new(self)) +fn dummy_raw_waker() -> RawWaker { + fn no_op(_: *const ()) {} + fn clone(_: *const ()) -> RawWaker { + dummy_raw_waker() } + + let vtable = &RawWakerVTable::new(clone, no_op, no_op, no_op); + RawWaker::new(0 as *const (), vtable) } ``` -The method first makes the `self` instance reference-counted by wrapping it in an [`Arc`]. Then it uses the [`Waker::from`] method to create the `Waker`. This method is available for all reference counted types that implement the [`Wake`] trait. +First, we define two inner functions named `no_op` and `clone`. The `no_op` function takes a `*const ()` pointer and does nothing. The `clone` function also takes a `*const ()` pointer and returns a new `RawWaker` by calling `dummy_raw_waker` again. We use these two functions to create a minimal `RawWakerVTable`: The `clone` function is used for the cloning operations and the `no_op` function is used for all other operations. Since the `RawWaker` does nothing, it does not matter that we return a new `RawWaker` from `clone` instead of cloning it. -[`Waker::from`]: TODO - -Now we have a way to create a `Waker` instance, we can use it to implement a `run` method on our executor. +After creating the `vtable`, we use the [`RawWaker::new`] function to create the `RawWaker`. The passed `*const ()` does not matter since none of the vtable function uses it. For this reason, we simply pass a null pointer. #### A `run` Method -The most simple `run` method is to repeatedly poll all queued tasks in a loop until all are done. This is not very efficient since it does not utilize the notifications of the `Waker` type, but it is an easy way to get things running: +Now we have a way to create a `Waker` instance, we can use it to implement a `run` method on our executor. The most simple `run` method is to repeatedly poll all queued tasks in a loop until all are done. This is not very efficient since it does not utilize the notifications of the `Waker` type, but it is an easy way to get things running: ```rust // in src/task/simple_executor.rs @@ -1366,78 +1381,57 @@ fn kernel_main(boot_info: &'static BootInfo) -> ! { } ``` -When we execute `cargo xrun` now, we see that keyboard input works again, but only for a short time: - -![QEMU printing output for keypresses "H", "e", and "l", then it hangs](keyboard-deadlock.gif) - -After pressing a few keys, the complete execution hangs. Not even the dots by the timer interrupt are printed anymore. Such bugs are typically caused by a [_deadlock_], which is a state where we endlessly wait on some lock. To find out where the program hangs, the best approach is to connect a debugger and print the backtrace. Expand the section below for the exact debugging steps: - -[_deadlock_]: https://en.wikipedia.org/wiki/Deadlock - -
-Debugging Steps - -- Make sure `gdb` or `gdb-multiarch` is installed on your system. -- Pass the `-s` flag to QEMU when running your kernel. You can do this through the command `cargo xrun -- -s`. -- Run `gdb` with the file name of your kernel as argument: - ``` - gdb target/x86_64-blog_os/debug/blog_os - ``` -- From the `gdb` console, connect to the QEMU instance by executing `target ext :1234`. -- Print the backtrace by executing `backtrace` or `bt`. - - -The backtrace in this case looks like this: - -``` -#0 AtomicBool::load (self=0x22d250 , …) - at libcore/sync/atomic.rs:404 -#1 spin::Mutex::obtain_lock (self=0x22d250 ) - at spin-0.5.2/src/mutex.rs:134 -#2 spin::Mutex::lock (self=0x22d250 ) - at spin-0.5.2/src/mutex.rs:158 -#3 blog_os::allocator::Locked::lock (…) - at src/allocator.rs:73 -#4 Locked::dealloc (…) at src/allocator/fixed_size_block.rs:83 -#5 __rg_dealloc (…) at src/allocator.rs:19 -#6 alloc::alloc::dealloc (…) at liballoc/alloc.rs:103 -#7 alloc::alloc::Global::dealloc (…) at liballoc/alloc.rs:174 -#8 alloc::sync::Arc::drop_slow (…) at liballoc/sync.rs:743 -#9 alloc::sync::Arc::drop (…) at liballoc/sync.rs:1249 -#10 core::ptr::drop_in_place () at libcore/ptr/mod.rs:174 -#11 blog_os::task::simple_executor::DummyWaker::wake (…) - at src/task/simple_executor.rs:37 -#12 alloc::task::raw_waker::wake (waker=0x4444444400d0) - at liballoc/task.rs:69 -#13 core::task::wake::Waker::wake (self=...) at libcore/task/wake.rs:241 -#14 AtomicWaker::wake (self=0x22d210 ) - at futures-core-0.3.4/src/task/__internal/atomic_waker.rs:355 -#15 blog_os::task::keyboard::add_scancode (scancode=31) at src/task/keyboard.rs:24 -#16 blog_os::interrupts::keyboard_interrupt_handler (…) at src/interrupts.rs:87 -``` - -Note that I shortened the output a bit to make it more readable. - -
- -From the backtrace, we can deduce that the deadlock was caused by the following order of operations: - -``` -keyboard_interrupt_handler -> add_scancode -> AtomicWaker::wake --> GlobalAlloc::dealloc -> allocator::Locked::lock -``` - -TODO - - - -#### Fixing the Deadlock - - +When we execute `cargo xrun` now, we see that keyboard input works again: +TODO image If you keep an eye on the CPU utilization of your computer, you will see that the `QEMU` process now keeps one CPU completely busy. This happens because our `SimpleExecutor` polls tasks over and over again in a loop. So even if we don't press any keys on the keyboard, the executor repeatedly calls `poll` on our `print_keypresses` task, even though the task cannot make any progress and will return `Poll::Pending` each time. To fix this, we need to create an executor that properly utilizes the `Waker` notifications. This way, the executor is notified when the next keyboard interrupt occurs, so it does not need to keep polling the `print_keypresses` task over and over again. ### Executor with Waker Support + + +#### The `Wake` Trait + +The simplest way to do this is by implementing the unstable [`Wake`] trait for an empty `DummyWaker` struct: + +[`Wake`]: https://doc.rust-lang.org/nightly/alloc/task/trait.Wake.html + +```rust +// in src/task/simple_executor.rs + +use alloc::{sync::Arc, task::Wake}; + +struct DummyWaker; + +impl Wake for DummyWaker { + fn wake(self: Arc) { + // do nothing + } +} +``` + +The trait is still unstable, so we have to add **`#![feature(wake_trait)]`** to the top of our `lib.rs` to use it. The `wake` method of the trait is normally responsible for waking the corresponding task in the executor. However, our `SimpleExecutor` will not differentiate between ready and waiting tasks, so we don't need to do anything on `wake` calls. + +Since wakers are normally shared between the executor and the asynchronous tasks, the `wake` method requires that the `Self` instance is wrapped in the [`Arc`] type, which implements reference-counted ownership. The basic idea is that the value is heap-allocated and the number of active references to it are counted. If the number of active references reaches zero, the value is no longer needed and can be deallocated. + +[`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html + +To make our `DummyWaker` usable with the [`Context`] type, we need a method to convert it to the [`Waker`] defined in the core library: + +```rust +// in src/task/simple_executor.rs + +use core::task::Waker; + +impl DummyWaker { + fn to_waker(self) -> Waker { + Waker::from(Arc::new(self)) + } +} +``` + +The method first makes the `self` instance reference-counted by wrapping it in an [`Arc`]. Then it uses the [`Waker::from`] method to create the `Waker`. This method is available for all reference counted types that implement the [`Wake`] trait. + +[`Waker::from`]: TODO From 5a879058c989bad08a5012113b13678ce6f7816f Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Mon, 23 Mar 2020 15:45:58 +0100 Subject: [PATCH 32/51] Minor fixes --- .../second-edition/posts/12-async-await/index.md | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 5e552430..cc2daf72 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -913,7 +913,7 @@ The `from_raw` function is unsafe because undefined behavior can occur if the pr ##### `RawWaker` -The [`RawWaker`] type requires the programmer to explicitly define a [_virtual method table_] (_vtable_) that specifies the functions that should be called when the `RawWaker` is cloned, woken, or dropped. The layout of this vtable is defined by the [`RawWakerVTable`] type. Each function receives a `*const ()` argument that is basically a _type-erased_ `&self` pointer to some struct, e.g. allocated on the heap. The reason for using a `*const ()` pointer instead of a proper reference is that the `RawWaker` type should be non-generic. The pointer value that is passed to the functions is the `data` pointer given to [`RawWaker::new`]. +The [`RawWaker`] type requires the programmer to explicitly define a [_virtual method table_] (_vtable_) that specifies the functions that should be called when the `RawWaker` is cloned, woken, or dropped. The layout of this vtable is defined by the [`RawWakerVTable`] type. Each function receives a `*const ()` argument that is basically a _type-erased_ `&self` pointer to some struct, e.g. allocated on the heap. The reason for using a `*const ()` pointer instead of a proper reference is that the `RawWaker` type should be non-generic but still support arbitrary types. The pointer value that is passed to the functions is the `data` pointer given to [`RawWaker::new`]. [_virtual method table_]: https://en.wikipedia.org/wiki/Virtual_method_table [`RawWakerVTable`]: https://doc.rust-lang.org/stable/core/task/struct.RawWakerVTable.html @@ -961,7 +961,7 @@ use core::task::{Context, Poll}; impl SimpleExecutor { pub fn run(&mut self) { while let Some(mut task) = self.task_queue.pop_front() { - let waker = DummyWaker.to_waker(); + let waker = dummy_waker(); let mut context = Context::from_waker(&waker); match task.poll(&mut context) { Poll::Ready(()) => {} // task done @@ -972,7 +972,7 @@ impl SimpleExecutor { } ``` -The function uses a `while let` loop to handle all tasks in the `task_queue`. For each task, it first creates a `Context` type by wrapping a `Waker` instance created from our `DummyWaker` type. Then it invokes the `Task::poll` method with this `Context`. If the `poll` method returns `Poll::Ready`, the task is finished and we can continue with the next task. If the task is still `Poll::Pending`, we add it to the back of the queue again so that it will be polled again in a subsequent loop iteration. +The function uses a `while let` loop to handle all tasks in the `task_queue`. For each task, it first creates a `Context` type by wrapping a `Waker` instance returned by our `dummy_waker` function. Then it invokes the `Task::poll` method with this `Context`. If the `poll` method returns `Poll::Ready`, the task is finished and we can continue with the next task. If the task is still `Poll::Pending`, we add it to the back of the queue again so that it will be polled again in a subsequent loop iteration. #### Trying It @@ -1004,7 +1004,7 @@ Let's summarize the various steps that happen for this example: - Next, we call the asynchronous `example_task` function, which returns a future. We wrap this future in the `Task` type, which moves it to the heap and pins it, and then add the task to the `task_queue` of the executor through the `spawn` method. - We then wall the `run` method to start the execution of the single task in the queue. This involves: - Popping the task from the front of the `task_queue`. - - Creating a `DummyWaker` for the task, converting it to a [`Waker`] instance, and then creating a [`Context`] instance from it. + - Creating a `RawWaker` for the task, converting it to a [`Waker`] instance, and then creating a [`Context`] instance from it. - Calling the [`poll`] method on the future of the task, using the `Context` we just created. - Since the `example_task` does not wait for anything, it can directly run til its end on the first `poll` call. This is where the _"async number: 42"_ line is printed. - Since the `example_task` directly returns `Poll::Ready`, it is not added back to the task queue. @@ -1392,6 +1392,11 @@ To fix this, we need to create an executor that properly utilizes the `Waker` no ### Executor with Waker Support + + + +### Old + #### The `Wake` Trait The simplest way to do this is by implementing the unstable [`Wake`] trait for an empty `DummyWaker` struct: From 9e5e993a1b0a54969ea058d6e1c475b006e3337b Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Mon, 23 Mar 2020 15:54:24 +0100 Subject: [PATCH 33/51] Fix typo --- blog/content/second-edition/posts/12-async-await/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index cc2daf72..805140e0 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -1002,7 +1002,7 @@ Let's summarize the various steps that happen for this example: - First, a new instance of our `SimpleExecutor` type is created with an empty `task_queue`. - Next, we call the asynchronous `example_task` function, which returns a future. We wrap this future in the `Task` type, which moves it to the heap and pins it, and then add the task to the `task_queue` of the executor through the `spawn` method. -- We then wall the `run` method to start the execution of the single task in the queue. This involves: +- We then call the `run` method to start the execution of the single task in the queue. This involves: - Popping the task from the front of the `task_queue`. - Creating a `RawWaker` for the task, converting it to a [`Waker`] instance, and then creating a [`Context`] instance from it. - Calling the [`poll`] method on the future of the task, using the `Context` we just created. From 8d3cdf62e399ba1706be640c1af2538203965f42 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Mon, 23 Mar 2020 16:07:47 +0100 Subject: [PATCH 34/51] Add keyboard output gif --- .../second-edition/posts/12-async-await/index.md | 4 ++-- .../12-async-await/qemu-keyboard-output.gif | Bin 0 -> 9758 bytes 2 files changed, 2 insertions(+), 2 deletions(-) create mode 100644 blog/content/second-edition/posts/12-async-await/qemu-keyboard-output.gif diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 805140e0..18508677 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -1383,9 +1383,9 @@ fn kernel_main(boot_info: &'static BootInfo) -> ! { When we execute `cargo xrun` now, we see that keyboard input works again: -TODO image +![QEMU printing ".....H...e...l...l..o..... ...W..o..r....l...d...!"](qemu-keyboard-output.gif) -If you keep an eye on the CPU utilization of your computer, you will see that the `QEMU` process now keeps one CPU completely busy. This happens because our `SimpleExecutor` polls tasks over and over again in a loop. So even if we don't press any keys on the keyboard, the executor repeatedly calls `poll` on our `print_keypresses` task, even though the task cannot make any progress and will return `Poll::Pending` each time. +If you keep an eye on the CPU utilization of your computer, you will see that the `QEMU` process now continuously keeps the CPU busy. This happens because our `SimpleExecutor` polls tasks over and over again in a loop. So even if we don't press any keys on the keyboard, the executor repeatedly calls `poll` on our `print_keypresses` task, even though the task cannot make any progress and will return `Poll::Pending` each time. To fix this, we need to create an executor that properly utilizes the `Waker` notifications. This way, the executor is notified when the next keyboard interrupt occurs, so it does not need to keep polling the `print_keypresses` task over and over again. diff --git a/blog/content/second-edition/posts/12-async-await/qemu-keyboard-output.gif b/blog/content/second-edition/posts/12-async-await/qemu-keyboard-output.gif new file mode 100644 index 0000000000000000000000000000000000000000..992fa5fcd373005f8a5c4bce2618239f449d10ab GIT binary patch literal 9758 zcmZ?wbhEHbyuh@A@f#z<|No&33=E4GFJ8TR_0F9;ckkYP_UzgF_wT=c{rc_Ow;w)y z`1I-1tXZ?>&!2zn*s;*i(4|Y4Ub%AR=FOY8Z{L3Y{Q0|g@4kKecInck|Ns9_m@r|| zq)C$}Po6Sm%G9Y-r%jtSW5$e`GiT13GiUDHxeFF7ShQ%-k|j%)FJHcL<;qp7R;^jH zX6@Rw>(;GXzkdD3jT<*_-n?bYmaSX2ZriqP`}Xa-cJ11;XV1QU`wkp9aQN`yBS(&$ zIC0|S$&;r~pFVTu%=z=@FJHcV_3G7Y*REZ^e*M<1TMr&Qc=YJemMvScV#S6H8#Zm)w0G~` z{rmSHI&|ph(W4hGT)24g;*A?O?%ut7@7}$K4o;uNw0XRJbd){$HblJO{r*)8AP*K*)` z$3hJS{~0P*7>;>PYLYO{y0e4Fg;BIQOJIfhLYEE!3l`p)Os-zyj*>ejd|>z#-!wzh zV@X6(Bga&>pdyc^L#iq*^Hl2DwhAOMondN9NqOk8b=B3?3mkl2G%zfiUgfW+AiL?w z;nVt3dT9zD)-m$W&i>EJS;4)i{kn(63Lc@2teOXQ#c}$?dOE#hT+<<}9arO#-801@ z;E%RR61R)9WS`4Xrs;>Coo>q8F~Q~H`uGcCF7AGFlU%?^Xg z_D)S5UGZ)0?+c%wm-3r;NAe^qFSl+M$FFpSouB5)ho)sB&cK`pN#cpI<*Zw_h?$#PnVRip zQT$=d$#3Y(a-q-mQ;5q1AK!_O{!j8Td--H?fScyiDIsB*Pp3wtt$aEyrtIa@=?QI` z&t{}d%X~I7W7VUhvvRh*d^Wq_nCA02CD$^a&#id2^7*`)Z!e$EZ(!4Uv7kjP>&3zj zwN)<`^_aa{F^|zj>*bOuVOcMi&PZGJa@m}+S1*?@Xw!PNV#&0uS1VU6TlH$ynr*LM zt=@1<>-Cx~*Ro!(-SKSI>vem+y?VX=0Gsxk4M)VX-)uagw))McGiI;fY`);8{dUWh zu$lqL?c7Hgg{eI7vYuWGj{&=?f z{k}inUccYZz^?P*0E>9ehl3pIYd##}F@N*nuz7>T;HJ?uDY=86Vw83$m&u2`o=X^eE@qEqab2i`Kd_M2MuKVSJi+Jvr ziyrE0zg+S$fBWTffV=M3DzU#~`_ul;&0ru^;K>j~|;-)^K#&;52YWBJ-|w{o_> z{dT+HxbF8mCD(Jm->rDQ_WQk>?{B}~Z(!H^@t{RK@5jRq^>sfU^_ai=@pyu}-p?mf z!t;JUosquo=d(HG?|wdC(60CE#ggfHzh16bzV6qnHQV3)dcEPe-tRYCuIK%JyW{z~ z-|zN(fA{P8;rV~R-bi2n_uHNF_kX`X zXxIPu|HqT*`Tu^tSib(>uQ%J@|NH&nxc>h?U#{o>|NGUEmU|r4iu%|Uab}U&_7g|-sy=on@GO=%?s3d$*2kU8M+u zrtLVhRImKx>9}8?rXS#0X3*|=CQ0h^j1xY~jHaJFlVqx3 zv+XC(=2d;3{eWk=#c|JbMYBH7dEv9%>iWrZWxGDl{m`@A=DFwjs#~Au{W!DS?)%B} zb-zB(XW(7o!0vUSN$Sf24&N0{;-@aOS$$b3(7VD#-Rojk)R#pPXIHqHpSsvr^<}XF z?@AAMuS=6=eOaR6yV5KC)TL>=zAQE9UFnnVb$QmUFUu^>uJkKEb$Q;eFUuWxR|T|t zU0Edcb%lrTs-Wqot}L_qx-y`5RmgI$tE-~Eu8MFxyDDt^sjKU%zOGK-T^(`U>)NJS zU)N;#u8z8X>e{wlU)L7&u8w)`b$!>Zuj?w#u8#YD>iWK4U)MMAu1R3`zHz8aUjS6A zu(B{Pf@+m4#%T;%3=B;FS(@5T=Uk7QedS=uYQCbf-K~#SBq&eUeflz0n>$F~>hQLj zt5er)&aZ!UR&M3&z^UGK6LxR}erXn%yd^AqhfUVYU(zRgY=7)wX3imR3n?F3Vy0&v| zx^;GYpPj|~r*YNYw-rZxmHAS-pU7sk>&_E9A)xg;NnLyrzQD>>)>dAwoD$>x77yB52q zD=3u-pWUgG*zIEpDr+!ZBdn(jxb2)Xg=EHc`lBZK7LT{{()1UKT z*QSTBOm!5plTOdF?KWF`YirK&Z;e+tj$hTibRw&ZJ6zs5E}@w-aQXBjcP7vK)s}7H zm3>0z`x~RkuD(p!0~3-bZmgL!=V0&bP_wNq>s8N&tg~7kb8;iU;+6Re)`{^L-^&bL z5bEs{@p9$qBiorz@3}hb-s`>dK3>wZEBcW%!;sC?T>08Rs(pfteDu!xa~fr){EiK!(JEX=~d14_$`GCTy>B3TtE*A?W zk66yXu%g6kZPfZ3YsyZ)t=j)!!*@l!>AESKcXc~1x&A@>*~7luD`znLt$ok_sVYy) z&Zo7Zkzd=Z)w{j1tH-CWf8tct$sNXU^Pty#c}!%a$#gWWk@pFR*fAWr?7L z=!W@>BK$%rOf_{`65?x_tJn_|O7Qw|Z9P%I=g-5&%Cddd8h=Hw`&k(z8Fb_rq8TSM zcrq}s{!dWrgSbEE^}ct^=Sy<$S7e{8eEFnT zx6V}U-==$=OK-OSqmPwm&n>z7cE@+#>9^;9`Bl5#HS}kLeZH_nYimklr$IEo5x+p! zM2lb_K_$^iGfjd;_=Sv^I%X}-5bt9SVPCR3h*z0w`kGCud@3L}Zr-ily9c!~*vNQx zCgb@FU6(FjX}EU%M)~<`cd9Sne^7Lpf#K4VLI$PhFJI-nk$f)6_~}F1r!Svh{7C)9 zaR29NrVX-Ijm(ShSht_MyRF6lNM$vL;VG9w(Y-NqW(c0%UczB*CD;+DG9~wxk5wJ# z%1uj8Pd7+DHOF)Fva_=-ieKG1xq12d`3}uovR+$OTwLto{?>cdp@rL)2Q2oQ>ji2J zMr_X7E5(EePIU%;#y<>G(Sx%<=X^=8UD>+#HShl)n87NpBh`OmNl~k#z;sY@P)(FO zx2o*)+fxRTZ!UZ5&0hYX;Myghv2K2X2)~d8)137?+9kvnGbgg| zJlHA0tIW0L(8(r9D4)92zUVU6w)N{bZ{NND@bS~~S4{byk2lJVHE z;9xU{uvW~84GRyq3n+Wd@z}WNXg8>BePZL{ivDTBh9QCt{YDPQNpEw)x3R8xnuZaqbCai9Kk&W}j=%H-pE=Cn$T* z^VzxS>FF89+4s)u-2D9f0_Scy-(6c?US1KrI_~VQt*@_dNIpH!clWlpw|5l3zIS%_ z_V@P>G;_<#2{1I=I(Anp{@k9OpPye?>^2TCp4$kej4ER@<`{>O+@9d*iHYtP#wLvtrMl*h&aRBswzBRC6DRrg_T^5VF>_YDdi(783l^$Rt?gg5 ze8uFcpptC*%8i@qOZaX11vc;6-H??is3f}Q@R5Qn5q=?CrVU5Woe!1}-_D%Pe&NQ= z08q3ay!G&r3nbnjzkcJm?Jb4T`yV5EF>}jgt=LmrR$fNr76;~pxe^Yq!#ZO3*5yis zy@_azIy^7-Md;h8mZZD;bTh)<#WbcF-`M+GfBn66HO#;MT3Zx@ljRo zWwE*%|Dxzq|8%^Mo$=RAI56jH<2CK1q8AmVvh(e7s&GcH-<};GAD@`4jTyc7&+Xm) z{r!W(-SYnX_Wb<(;_~YF^ZWMx{{G?d>G}Tq_x=6-$#ltq4vKJ5A71}f&b*M}O^_`Zjc+{n{?Zu;R zgJT+xdrYonJnpr4w&HQ0&9@hi`yJRcpJXyHd`OzutGetFf93b%sVL=y$hfJ&FD{SIfCx%FnBe_pr2jZMuh`t=R1on75k?HxT6CQgd!>z_P*#!Tm_(`L<^zo1Pcxnkk6<;(hnCDyE2 zx?;n|iDmqD`~sVH?wXX7ET|;9`_SP)#zaXGejz)i^+(QMuwgvrCBBW>mi^-GJC>KP zSb-AZfxFLN7~Fqg0ZE51zkF3ccHe31w?BWb{QS$r^~H?o!GUIu*bT-rHY_~cq$*)I z$6~@>X~|e z@$tD1&VSE17_SKQj4nNKEq zs9{Tfnmh~)KTfE6A@$sst$c>kbLUKlBu+L^W6cSpvBm`w_QnwAWME*>*}}M&ftNvq zfq{vyr+?+?xBQFeY`N9FIgS;Y+65SNI2kw?;u+>Jg2t5?8My^o1^AoWy1HB48C|>B zJA0-~oysrW&Cfr{W7>R1e*HFw4xjlG`@4Ky{Z_1>VlY|2h0$isx((e6`P<}{37E~E z(Q4hXXbU6ranHGKrw)4xupQ>v)#iJmReJB-%U9dY1bi7KTaO4Znyun~)$D!h@;1*` zpIrL*eLVS9&OLqn=f%tgOW)3Fi(~LtaaNkJfG6pgfZd6}6JmxEbA%K#iuhc67}_MP zeigoJ-mjo)|I4XjZO7V#POFu7)K-bJ3(RTp+4NLlzJ}*=H=ba(rc~w7cB>!FQ$iTl zCtv3gbiRJ_<%ODNwJxoLpO&t0bjW0JsC;qcOy=1MNl~*dWrR3e{FhUl`qJ!$cmz_j@Dp=$;D8%)-_sv-6x@Md6VO3d3Bfshv0%;wl_b#+A@hCxLOL7mFvj zkvXo!rNqDxz_ez=!E@$1%E$GXaugU8f3iUOt@#X#7`efH7B-e9$+THvU@34`XvD}0 zVBs#TI#@uNXAXuAu<%ms!t1aLZ^tft07Dqu=Q@cY%nEYJCG5htu?s)JF8mg|@Mnfu zjC>4S3=B+?EKO~*ukX3i>=?%0yY0-Uj@rwv6O$h&`Z9t;5$qBUM)Xc0SXdCduoQM- zWyZOTybSCN3`{~j@(a3-f4eYCxAB(Jp1$b6hOi=tkwM24yKXz|!tU6G{TXL5@_@Z3 z+|s;q`PsR0JVIgYX=i2?<;1u_JO)k&N!Yb#Gh$0nU_aLoHW?hJ{TL>LlhF+HKnUpJ zyE08DCMf+EG!VebR$n zzuu{wJ-eRSYx0JvE|aWMxgvr{DZepH0o%#KG?!5X9QWd&xWAjTp!&gy%7!Wh~&dBu=4qyS>0-9we;&|E;{3ZlzgnP>vqO2?1No6lnIu6 zM0)ZUoY+_^SCX(SOZ7_bNAFUu03^4jW7l7RUAPjva1(|wI1O}T7oLJ$crJ!8I2kTx zf<>Q1Pvf%HC*R+5bT}t4^K^Fj%gBXcfj6!*K*A1e;|>fnz`_SHgu(H23X3okXmlou z@eGp}_CZ9CY0o;By*jn+9ry81KG(nXKHu0hv1W?nUyFm61yU?uO*^yfGiR^Y@++&p zO8wH*zMY%-;;U%&NyBeFeygv)+4lSIwK%y`c@y&7CVk{Sx;);6U4$(>qoQNRiXA67 znvW~9yD*wfcfTgwSy*snr)0)W2`wwHsI00}lYUm1yK&BV>ZwxAd9o{ny=&F()3;y$ z{=FnCbGf_5YjN+{&#hLog9j0Pb@w>_sXDp!)*Q>3f!?}yesOv8GS=RG^JcB!z1Ww0 ze2vw4_5JhL?|*bQ+|1&AzfGz8YtY(YtCmjnh0E4>Czj?lfM0F3>z*q-VSFiVF*06xAB8I_cxT z$Y-{j3CD|cvlO%9z`0SKabaOWnR>3@&aS+%QSx`@S+m!7cjZ=}Q#0iVi@dVc{pv;4 zZu5M*`m(dzOdo~m-YDPtV-@R}@9XXz?ODBZZnxRppf&Y|s>*9bR!XGzT36SVSr}c| z(r-R_nzXp_?W?74>*{!KPf1u%=f8VeignPd1B(y)Oslru_TgcHP~KXv-#X0rI~Tt9q{gfa1GeKy{Yg75V&gTm+HU8};SE{P zaYiJwyWg+xduOIz+qPK~|FK3s(2X(dygfP2idAJ&@HDk&$=>H?sHU=9`jI+2`D!xH z)H7d}%rdh`yUEwPRr5*Oe5>9ai9)7ICFzq+@otg{X*iV;kseyw^CrDEC{v?$N5YwVUib zR_pI$@}9Wv8;>;Wrv3l-RitjxvO4sd`_w+w+Y`3$tI`&{xXnOU_`F{C1Le>;Tx&P8 zvEA<2c;?fJrb1K0AdMY%3!{ZAt#eN6?u!tLykdB9(ro?RBCo7jr+#X?{bsjSN%u1z z25qAo`@gX1upa(0vC`lW)ANMFL)^k=qK^BnQQCNXWy2Z%Qvnh+d9yf^FF&1Feo)8w ztp4_<&8OGh+*5om`&y3fe8c;5(&pKmfAjf*?dzVBiwB)%m|XDrtZ8EH?_Rq!KSKYn z@x|EUwO~`CI*9flrvZ(p?*&S+{YUfIhO|y#ct9&*q|6JwsdG&l%FBbLtRlQs`zpv`m zs`clpUa#BFSN&$wdB5tn+wS*OzuQ%=^6A~a=}SMoKU5roRKoILv|qp#iYSIKxJ;5| zn8hdxuJ@&R1XiA|y8fuj!z)Yo%=S;)Y8OfP&C8tAH%kUm;(?8^#BK~`zY1(RX1@w7 z9D-p8*z8!WZA`EZP=gh9kR2>sfng9>7_$uv76!Fpu^2KL!w_&GPHD3iT$>QxtOZ+% z*{lT%)2LYs4)bpqo(Bijf2>gf)w@SPZ)`&c-m-z%|4$1RS5BHZi78!EuV( H0Ja7I)41)o literal 0 HcmV?d00001 From 7600a763a24f298bd013fd3e00494416b6b76eff Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Tue, 24 Mar 2020 15:47:38 +0100 Subject: [PATCH 35/51] Unregister keyboard waker when we can continue Avoids that the waker stored in the waker_cache in the executor is dropped first. Thus, it avoids that the waker is deallocated inside interrupt handlers. --- blog/content/second-edition/posts/12-async-await/index.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 18508677..1f0d4d21 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -1284,7 +1284,10 @@ impl Stream for ScancodeStream { WAKER.register(&cx.waker()); match queue.pop() { - Ok(scancode) => Poll::Ready(Some(scancode)), + Ok(scancode) => { + WAKER.take(); + Poll::Ready(Some(scancode)) + }, Err(crossbeam_queue::PopError) => Poll::Pending, } } @@ -1295,9 +1298,10 @@ Like before, we first use the [`OnceCell::try_get`] function to get a reference If the first call to `queue.pop()` does not succeed, the queue is potentially empty. Only potentially because the interrupt handler might have filled the queue asynchronously immediately after the check. Since this race condition can occur again on the next check, we need to register the `Waker` in the `WAKER` static before the second check. This way, a wakeup might happen before we return `Poll::Pending`, but it is guaranteed that we get a wakeup for any scancodes pushed after the check. -After registering the `Waker` contained in the passed [`Context`] through the [`AtomicWaker::register`] function, we try popping from the queue a second time. If it now succeeds, we return `Poll::Ready`. Otherwise, we return `Poll::Pending` like before, but this time with a registered wakeup. +After registering the `Waker` contained in the passed [`Context`] through the [`AtomicWaker::register`] function, we try popping from the queue a second time. If it now succeeds, we return `Poll::Ready`. We also remove the registered waker again using [`Waker::take`] because a waker notification is no longer needed. In case `queue.pop()` fails for a second time, we return `Poll::Pending` like before, but this time with a registered wakeup. [`AtomicWaker::register`]: https://docs.rs/futures-util/0.3.4/futures_util/task/struct.AtomicWaker.html#method.register +[`AtomicWaker::take`]: https://docs.rs/futures/0.3.4/futures/task/struct.AtomicWaker.html#method.take Note that there are two ways that a wakeup can happen for a task that did not return `Poll::Pending` (yet). One way is the mentioned race condition when the wakeup happens immediately before returning `Poll::Pending`. The other way is when the queue is no longer empty after registering the waker so that `Poll::Ready` is returned. Since these spurious wakeups are not preventable, the executor needs to be able to handle them correctly. From 886d7411ae288f7aba5c3887ecbd5269e47ab4f4 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Tue, 24 Mar 2020 15:48:07 +0100 Subject: [PATCH 36/51] Begin working on executor with waker support --- .../posts/12-async-await/index.md | 158 +++++++++++++++++- 1 file changed, 156 insertions(+), 2 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 1f0d4d21..5e2122e7 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -1391,11 +1391,165 @@ When we execute `cargo xrun` now, we see that keyboard input works again: If you keep an eye on the CPU utilization of your computer, you will see that the `QEMU` process now continuously keeps the CPU busy. This happens because our `SimpleExecutor` polls tasks over and over again in a loop. So even if we don't press any keys on the keyboard, the executor repeatedly calls `poll` on our `print_keypresses` task, even though the task cannot make any progress and will return `Poll::Pending` each time. -To fix this, we need to create an executor that properly utilizes the `Waker` notifications. This way, the executor is notified when the next keyboard interrupt occurs, so it does not need to keep polling the `print_keypresses` task over and over again. - ### Executor with Waker Support +To fix the performance problem, we need to create an executor that properly utilizes the `Waker` notifications. This way, the executor is notified when the next keyboard interrupt occurs, so it does not need to keep polling the `print_keypresses` task over and over again. +#### Task Id + +The first step in creating an executor with proper support for waker notifications is to give each task an unique ID. This is required because we need a way to specify which task should be woken. We start by creating a new `TaskId` wrapper type: + +```rust +// in src/task/mod.rs + +#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)] +struct TaskId(usize); +``` + +The `TaskId` struct is a simple wrapper type around `usize`. We derive a number of traits for it to make it printable, copyable, comparable, and sortable. The latter is important because we want to use `TaskId` as the key type of a [`BTreeMap`] in a moment. + +[`BTreeMap`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html + +To assign each task an unique ID, we utilize the fact that each task stores a pinned, heap-allocated future: + +```rust +pub struct Task { + future: Pin>>, +} +``` + +The idea is to use the memory address of this future as an ID. This address is unique because because no two futures are stored at the same address. The `Pin` type ensures that they can't move in memory, so we also know that the address stays the same as long as the task exists. These properties make the address a good candidate for an ID. + +The implementation looks like this: + +```rust +// in src/task/mod.rs + +impl Task { + fn id(&self) -> TaskId { + use core::ops::Deref; + + let addr = Pin::deref(&self.future) as *const _ as *const () as usize; + TaskId(addr) + } +} +``` + +We use the `deref` method of the [`Deref`] trait to get a reference to the heap allocated future. To get the corresponding memory address, we convert this reference to a raw pointer and then to an `usize`. Finally, we return the address wrapped in the `TaskId` struct. + +[`Deref`]: https://doc.rust-lang.org/core/ops/trait.Deref.html + +#### The `Executor` Type + +We create our new `Executor` type in a `task::executor` module: + +```rust +// in src/task/mod.rs + +pub mod executor; +``` + +```rust +// in src/task/executor.rs + +use super::{Task, TaskId}; +use alloc::collections::{BTreeMap, VecDeque}; +use core::task::Waker; +use crossbeam_queue::ArrayQueue; + +pub struct Executor { + task_queue: VecDeque, + waiting_tasks: BTreeMap, + wake_queue: Arc>, + waker_cache: BTreeMap, +} + +impl Executor { + pub fn new() -> Self { + Executor { + task_queue: VecDeque::new(), + waiting_tasks: BTreeMap::new(), + wake_queue: Arc::new(ArrayQueue::new(100)), + waker_cache: BTreeMap::new(), + } + } +} +``` + +In addition to a `task_queue`, that stores the tasks that are ready to execute, the type has a `waiting_tasks` map, a `wake_queue` and a `waker_cache`. These fields have the following purpose: + +- The `waiting_tasks` map stores tasks that returned `Poll::Pending`. The map is indexed by the `TaskId` to allow efficient continuation a specific task. +- The `wake_queue` is an reference-counted [`ArrayQueue`] of task IDs. It will be shared between the executor and wakers. The idea is that the wakers push the ID of the woken task to the queue. The executor sits on the receiving end of the queue and moves all woken tasks from the `waiting_tasks` map back to the `task_queue`. The reason for using a fixed-size queue instead of an unbounded queue such as [`SegQueue`] is that interrupt handlers that should not allocate will push to this queue. +- The `waker_cache` map caches the [`Waker`] of a task after its creation. This has two reasons: First, it improves performance by reusing the same waker for multiple wake-ups of the same task instead of creating a new waker each time. Second, it ensures that reference-counted wakers are not deallocated inside interrupt handlers because it could lead to deadlocks (there are more details on this below). + +[`SegQueue`]: https://docs.rs/crossbeam-queue/0.2.1/crossbeam_queue/struct.SegQueue.html + +To create an `Executor`, we provide a simple `new` function. We choose a capacity of 100 for the `wake_queue`, which should be more than enough for the foreseeable future. In case our system will have more than 100 concurrent tasks at some point, we can easily increase this size. + +#### Spawning Tasks + +As for the `SimpleExecutor`, we provide a `spawn` method on our `Executor` type that adds a given task to the `task_queue`: + +```rust +// in src/task/executor.rs + +impl Executor { + pub fn spawn(&mut self, task: Task) { + self.task_queue.push_back(task) + } +} +``` + +While this method requires a `&mut` reference to the executor it is not callable after the executor has been started. If it should be possible to let tasks themselves spawn additional tasks at some point, we could change the type of the task queue to a concurrent queue such as [`SegQueue`] and share a reference to this queue with tasks. + +#### Running Tasks + +To execute all tasks in the `task_queue`, we create a private `run_ready_tasks` method: + +```rust +// in src/task/executor.rs + +impl Executor { + fn run_ready_tasks(&mut self) { + while let Some(mut task) = self.task_queue.pop_front() { + let waker = self.waker_cache.entry(&task.id()).or_insert_with(|| { + self.create_waker(task.id()) + }); + let mut context = Context::from_waker(waker); + match task.poll(&mut context) { + Poll::Ready(()) => { + // task done -> remove cached waker + self.waker_cache.remove(task.id()); + } + Poll::Pending => { + if self.waiting_tasks.insert(task.id(), task).is_some() { + panic!("task with same ID already in waiting_tasks"); + } + }, + } + } + } +} +``` + +The basic idea of this function is similar to our `SimpleExecutor`: Loop over all tasks in the `task_queue`, create a waker for each task, and then poll it. However, instead of adding pending tasks back to the end of the `task_queue`, we store them in the `waiting_tasks` map until they are woken again. The waker creation is done by a method named `create_waker`, whose implemenation will be shown in a moment. + +To avoid the performance overhead of creating a waker on each poll, we use the `waker_cache` map to store the waker for each task after it has been created. For this, we first use the [`BTreeMap::entry`] method to find the [`Entry`] corresponding to the task ID. We then use the [`Entry::or_insert_with`] method to optionally create a new `Waker` if not present and then get a reference to the `Waker`. Note reusing wakers like this is not possible for all waker implementations, but our implemenation will allow it. To clean up the `waker_cache` when a task is finished, we use use the [`BTreeMap::remove`] method to remove any cached waker for that task from the map. + +[`BTreeMap::entry`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.entry +[`Entry`]: https://doc.rust-lang.org/alloc/collections/btree_map/enum.Entry.html +[`Entry::or_insert_with`]: https://doc.rust-lang.org/alloc/collections/btree_map/enum.Entry.html#method.or_insert_with +[`BTreeMap::remove`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.remove + +#### Waker Design + +- Waker +- Executor::create_waker +- Executor::wake_tasks + +#### A `run` Method + +#### Sleep If Idle From 8e758c383cf8d197ba844168b67687eec62acaa7 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Wed, 25 Mar 2020 17:58:14 +0100 Subject: [PATCH 37/51] Finish implementation of executor with waker support --- .../posts/12-async-await/index.md | 241 ++++++++++++++++-- 1 file changed, 213 insertions(+), 28 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 5e2122e7..d3b0f9a8 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -1374,14 +1374,14 @@ Let's add the `print_keypresses` task to our executor in our `main.rs` to get wo fn kernel_main(boot_info: &'static BootInfo) -> ! { use blog_os::task::keyboard; - // […] initialization routines, including `init_heap` + // […] initialization routines, including init_heap, test_main let mut executor = SimpleExecutor::new(); executor.spawn(Task::new(example_task())); executor.spawn(Task::new(keyboard::print_keypresses())); // new executor.run(); - // […] test_main, "it did not crash" message, hlt_loop + // […] "it did not crash" message, hlt_loop } ``` @@ -1479,9 +1479,12 @@ impl Executor { In addition to a `task_queue`, that stores the tasks that are ready to execute, the type has a `waiting_tasks` map, a `wake_queue` and a `waker_cache`. These fields have the following purpose: - The `waiting_tasks` map stores tasks that returned `Poll::Pending`. The map is indexed by the `TaskId` to allow efficient continuation a specific task. -- The `wake_queue` is an reference-counted [`ArrayQueue`] of task IDs. It will be shared between the executor and wakers. The idea is that the wakers push the ID of the woken task to the queue. The executor sits on the receiving end of the queue and moves all woken tasks from the `waiting_tasks` map back to the `task_queue`. The reason for using a fixed-size queue instead of an unbounded queue such as [`SegQueue`] is that interrupt handlers that should not allocate will push to this queue. +- The `wake_queue` is [`ArrayQueue`] of task IDs, wrapped into the [`Arc`] type that implements _reference counting_. Reference countingmakes it possible to share ownership of the value between multiple owners. It works by allocating the value on the heap and counting the number of active references to it. When the number of active references reaches zero, the value is no longer needed and can be deallocated. + + We use the `Arc` wrapper for the `wake_queue` because it will be shared between the executor and wakers. The idea is that the wakers push the ID of the woken task to the queue. The executor sits on the receiving end of the queue and moves all woken tasks from the `waiting_tasks` map back to the `task_queue`. The reason for using a fixed-size queue instead of an unbounded queue such as [`SegQueue`] is that interrupt handlers that should not allocate will push to this queue. - The `waker_cache` map caches the [`Waker`] of a task after its creation. This has two reasons: First, it improves performance by reusing the same waker for multiple wake-ups of the same task instead of creating a new waker each time. Second, it ensures that reference-counted wakers are not deallocated inside interrupt handlers because it could lead to deadlocks (there are more details on this below). +[`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html [`SegQueue`]: https://docs.rs/crossbeam-queue/0.2.1/crossbeam_queue/struct.SegQueue.html To create an `Executor`, we provide a simple `new` function. We choose a capacity of 100 for the `wake_queue`, which should be more than enough for the foreseeable future. In case our system will have more than 100 concurrent tasks at some point, we can easily increase this size. @@ -1543,58 +1546,240 @@ To avoid the performance overhead of creating a waker on each poll, we use the ` #### Waker Design -- Waker -- Executor::create_waker -- Executor::wake_tasks +The job of the waker is to push the ID of the woken task to the `wake_queue` of the executor. We implement this by creating a new `TaskWaker` struct that stores the task ID and a reference to the the `wake_queue`: -#### A `run` Method +```rust +// in src/task/executor.rs -#### Sleep If Idle +struct TaskWaker { + task_id: TaskId, + wake_queue: Arc>, +} +``` +Since the ownership of the `wake_queue` is shared between the executor and wakers, we use the [`Arc`] wrapper type to implement shared reference-counted ownership. +[`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html -### Old +The implementation of the wake operation is quite simple: -#### The `Wake` Trait +```rust +// in src/task/executor.rs -The simplest way to do this is by implementing the unstable [`Wake`] trait for an empty `DummyWaker` struct: +impl TaskWaker { + fn wake_task(&self) { + self.wake_queue.push(self.task_id).expect("wake_queue full"); + } +} +``` + +We push the `task_id` to the referenced `wake_queue`. Since modifications of the [`ArrayQueue`] type only require a shared reference, we can implement this method on `&self` instead of `&mut self`. + +##### The `Wake` Trait + +In order to use our `TaskWaker` type for polling futures, we need to convert it to a [`Waker`] instance first. This is required because the [`Future::poll`] method takes a [`Context`] instance as argument, which can only be constructed from the `Waker` type. While we could do this by providing an implementation of the [`RawWaker`] type, it's both simpler and safer to instead implement the [`Wake`] trait and then using the [`From`] implementations provided by the standard library to construct the `Waker`. + +The trait implementation looks like this: [`Wake`]: https://doc.rust-lang.org/nightly/alloc/task/trait.Wake.html ```rust // in src/task/simple_executor.rs -use alloc::{sync::Arc, task::Wake}; +use alloc::task::Wake; -struct DummyWaker; - -impl Wake for DummyWaker { +impl Wake for TaskWaker { fn wake(self: Arc) { - // do nothing + self.wake_task(); + } + + fn wake_by_ref(self: &Arc) { + self.wake_task(); } } ``` -The trait is still unstable, so we have to add **`#![feature(wake_trait)]`** to the top of our `lib.rs` to use it. The `wake` method of the trait is normally responsible for waking the corresponding task in the executor. However, our `SimpleExecutor` will not differentiate between ready and waiting tasks, so we don't need to do anything on `wake` calls. +The trait is still unstable, so we have to add **`#![feature(wake_trait)]`** to the top of our `lib.rs` to use it. Since wakers are commonly shared between the executor and the asynchronous tasks, the trait methods require that the `Self` instance is wrapped in the [`Arc`] type, which implements reference-counted ownership. This means that we have to move our `TaskWaker` to an `Arc` to in order to call them. -Since wakers are normally shared between the executor and the asynchronous tasks, the `wake` method requires that the `Self` instance is wrapped in the [`Arc`] type, which implements reference-counted ownership. The basic idea is that the value is heap-allocated and the number of active references to it are counted. If the number of active references reaches zero, the value is no longer needed and can be deallocated. +The difference between the `wake` and `wake_by_ref` methods is that the latter only requires a reference the the `Arc`, while the former takes ownership of the `Arc` and thus often requires an increase of the reference count. Not all types support waking by reference, so implementing the `wake_by_ref` method is optional, however it can lead to better performance because it avoids unnecessary reference count modifications. In our case, we can simply forward both trait methods to our `wake_task` function, which requires only a shared `&self` reference. -[`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html +##### Creating Wakers -To make our `DummyWaker` usable with the [`Context`] type, we need a method to convert it to the [`Waker`] defined in the core library: +Since the `Waker` type supports [`From`] conversions for all `Arc`-wrapped values that implement the `Wake` trait, we can now implement the `Executor::create_waker` method using our `TaskWaker`: + +[`From`]: https://doc.rust-lang.org/nightly/core/convert/trait.From.html ```rust -// in src/task/simple_executor.rs +// in src/task/executor.rs -use core::task::Waker; - -impl DummyWaker { - fn to_waker(self) -> Waker { - Waker::from(Arc::new(self)) +impl Executor { + fn create_waker(&self, task_id: TaskId) -> Waker { + Waker::from(Arc::new(TaskWaker { + task_id, + wake_queue: self.wake_queue.clone(), + })) } } ``` -The method first makes the `self` instance reference-counted by wrapping it in an [`Arc`]. Then it uses the [`Waker::from`] method to create the `Waker`. This method is available for all reference counted types that implement the [`Wake`] trait. +We create the `TaskWaker` using the passed `task_id` and a clone of the `wake_queue`. Since the `wake_queue` is wrapped into `Arc`, the `clone` only increases the reference count of the value, but still points to the same heap allocated queue. We store the `TaskWaker` in an `Arc` too because the `Waker::from` implementation requires it. This function then takes care of constructing a [`RawWakerVTable`] and a [`RawWaker`] instance for our `TaskWaker` type. In case you're interested in how it works in detail, check out the [implementation in the `alloc` crate][waker-from-impl]. -[`Waker::from`]: TODO +[waker-from-impl]: https://github.com/rust-lang/rust/blob/cdb50c6f2507319f29104a25765bfb79ad53395c/src/liballoc/task.rs#L58-L87 + +##### Handling Wake-Ups + +To handle wake-ups in our executor, we add a `wake_tasks` method: + +```rust +// in src/task/executor.rs + +impl Executor { + fn wake_tasks(&mut self) { + while let Ok(task_id) = self.wake_queue.pop() { + if let Some(task) = self.waiting_tasks.remove(&task_id) { + self.task_queue.push_back(task); + } + } + } +} +``` + +We use a `while let` loop to pop all items from the `wake_queue`. For each popped task ID, we remove the corresponding task from the `waiting_tasks` map and add it to the back of the `task_queue`. Since we register wakers before checking whether a task needs to be put to sleep, it might happen that a wake-up occurs for tasks even though they are not in the `waiting_tasks` map. In this case, we simply ignore the wake-up. + +#### A `run` Method + +With our waker implementation in place, we can finally construct a `run` method for our executor: + +```rust +// in src/task/executor.rs + +impl Executor { + pub fn run(&mut self) -> ! { + loop { + self.wake_tasks(); + self.run_ready_tasks(); + } + } +} +``` + +This method just calls the `wake_tasks` and `run_ready_tasks` functions in a loop. While we could theoretically return from the function when both the `task_queue` and the `waiting_tasks` map become empty, this would never happen since our `keyboard_task` never finishes, so a simply `loop` should suffice. Since the function never returns, we use the `!` return type to mark the function as [diverging] to the compiler. + +[diverging]: https://doc.rust-lang.org/stable/rust-by-example/fn/diverging.html + +We can now change our `kernel_main` to use our new `Executor` instead of the `SimpleExecutor`: + +```rust +// in src/main.rs + +fn kernel_main(boot_info: &'static BootInfo) -> ! { + use blog_os::task::executor::Executor; + + // […] initialization routines, including init_heap, test_main + + let mut executor = Executor::new(); + executor.spawn(Task::new(example_task())); + executor.spawn(Task::new(keyboard::print_keypresses())); + executor.run(); +} +``` + +We only need to change the import and the type name. Since our `run` function is marked as diverging, the compiler knows that it never returns so that we no longer need a call to `hlt_loop` at the end of our `kernel_main` function. + +When we run our kernel using `cargo xrun` now, we see that keyboard input still works: + +TODO gif + +However, the CPU utilization of QEMU did not get any better. The reason for this is that we still keep the CPU busy for the whole time. We no longer poll tasks until they are woken again, but we still check the `wake_queue` and the `task_queue` in a busy loop. To fix this, we need to put the CPU to sleep if there is no more work to do. + +#### Sleep If Idle + +The basic idea is to execute the [`hlt` instruction] when both the `task_queue` and the `wake_queue` are empty. This instruction puts the CPU to sleep until the next interrupt arrives. The fact that the CPU immediately becomes active again on interrupts ensures that we can still directly react when an interrupt handler pushes to the `wake_queue`. + +[`hlt` instruction]: https://en.wikipedia.org/wiki/HLT_(x86_instruction) + +To implement this, we create a new `sleep_if_idle` method to our executor and call it from our `run` method: + +```rust +// in src/task/executor.rs + +impl Executor { + pub fn run(&mut self) -> ! { + loop { + self.wake_tasks(); + self.run_ready_tasks(); + self.sleep_if_idle(); // new + } + } + + fn sleep_if_idle(&self) { + if self.wake_queue.is_empty() { + x86_64::instructions::hlt(); + } + } +} +``` + +Since we call `sleep_if_idle` directly after `run_ready_tasks`, which loops until the `task_queue` becomes empty, we only need to check the `wake_queue`. If it is empty too, there is no task that is ready to run, so we execute the `hlt` instruction through the [`instructions::hlt`] wrapper function provided by the [`x86_64`] crate. + +[`instructions::hlt`]: https://docs.rs/x86_64/0.9.6/x86_64/instructions/fn.hlt.html +[`x86_64`]: https://docs.rs/x86_64/0.9.6/x86_64/index.html + +Unfortunately, there is a subtle race condition in this implementation. Since interrupts are asynchronous and can happen at any time, it is possible that an interrupt happens between the `is_empty` check and the call to `hlt`: + +```rust +if self.wake_queue.is_empty() { + /// <--- interrupt can happen here + x86_64::instructions::hlt(); +} +``` + +In case this interrupt pushes to the `wake_queue`, we put the CPU to sleep even though there is now a ready task. In the worst case, this could delay the handling of a keyboard interrupt until the next keypress or the next timer interrupt. So how do we prevent it? + +The answer is to disable interrupts on the CPU before the check and atomically enable them again together with the `hlt` instruction. This way, all interrupts that happen between in between are delayed after the `hlt` instruction so that no wake-ups are missed. To implement this approach, we can use the [`enable_interrupts_and_hlt`] function provided by the [`x86_64`] crate: + +[`enable_interrupts_and_hlt`]: https://docs.rs/x86_64/0.9.6/x86_64/instructions/interrupts/fn.enable_interrupts_and_hlt.html + +```rust +// in src/task/executor.rs + +impl Executor { + fn sleep_if_idle(&self) { + use x86_64::instructions::interrupts::{self, enable_interrupts_and_hlt}; + + // fast path + if !self.wake_queue.is_empty() { + return; + } + + interrupts::disable(); + if self.wake_queue.is_empty() { + enable_interrupts_and_hlt(); + } else { + interrupts::enable(); + } + } +} +``` + +To avoid unnecessarily disabling interrupts, we early return if the `wake_queue` is not empty. Otherwise, we disable interrupts and check the `wake_queue` again. If it is still empty, we use the [`enable_interrupts_and_hlt`] function to enable interrupts and put the CPU to sleep as a single atomic operation. In case the queue is no longer empty, it means that an interrupt woke a task between the first and the second check. In that case, we enable interrupts again and directly continue execution without executing `hlt`. + +Now our executor properly puts the CPU to sleep when there is nothing to do. We can see that the QEMU process as a much lower CPU utilization when we run our kernel using `cargo xrun` now. + +#### Possible Extensions + +Our executor is now able to run tasks in an efficient way. It utilizes waker notifications to avoid polling waiting tasks and puts the CPU to sleep when there is currently no work to do. However, our executor is still quite basic and there are many possible ways to extend its functionality: + +- **Scheduling:** We currently use the [`VecDeque`] type to implement a _first in first out_ (FIFO) strategy for our `task_queue`, which is often also called _round robin_ scheduling. This strategy might not be the most efficient for all workloads. For example, it might make sense to prioritize latency-critical tasks or task that do a lot of I/O. See the [scheduling chapter] of the [_Operating Systems: Three Easy Pieces_] book or the [Wikipedia article on scheduling][scheduling-wiki] for more information. +- **Task Spawning**: Our `Executor::spawn` method currently requires a `&mut self` reference and is thus no longer available after starting the `run` method. To fix this, we could create an additional `Spawner` type that shares some kind of queue with the executor and allows task creation from within tasks themselves. The queue could be for example the `task_queue` directly or a separate queue that the executor checks in its run loop. +- **Utilizing Threads**: We don't have support for threads yet, but we will add it in the next post. This will make it possible to launch multiple instances of the executor in different threads. The advantage of this approach is that the delay imposed by long running tasks can be reduced because other tasks can run concurrently. This approach also allows it to utilize multiple CPU cores. +- **Load Balancing**: When adding threading support, it becomes important how to distribute the tasks between the executors to ensure that all CPU cores are utilized. A common technique for this is [_work stealing_]. + +[scheduling chapter]: http://pages.cs.wisc.edu/~remzi/OSTEP/cpu-sched.pdf +[_Operating Systems: Three Easy Pieces_]: http://pages.cs.wisc.edu/~remzi/OSTEP/ +[scheduling-wiki]: https://en.wikipedia.org/wiki/Scheduling_(computing) +[_work stealing_]: https://en.wikipedia.org/wiki/Work_stealing + +## Summary + +## What's Next? From 9c96651e7032fc8733b5151fb50d01480ad97481 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 26 Mar 2020 12:46:59 +0100 Subject: [PATCH 38/51] Write summary and what's next sections --- .../second-edition/posts/12-async-await/index.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index d3b0f9a8..7a45f8c7 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -1782,4 +1782,18 @@ Our executor is now able to run tasks in an efficient way. It utilizes waker not ## Summary +We started this post by introducing **multitasking** and differentiating between _preemptive_ multitasking, which forcibly interrupts running tasks regularly, and _cooperative_ multitasking, which lets tasks run until they voluntarily give up control of the CPU. + +We then explored how Rust's support of **async/await** provides a language-level implementation of cooperative multitasking. Rust bases its implementation on top of the polling-based `Future` trait, which abstracts asynchronous tasks. Using async/await, it is possible to work with futures almost like with normal synchronous code. The difference is that asynchronous functions return a `Future` again, which needs to be added to an executor at some point in order to run it. + +Behind the scenes, the compiler transforms async/await code to _state machines_, with each `.await` operation corresponding to a possible pause point. By utilizing its knowledge about the program, the compiler is able to save only the minimal state for each pause point, resulting in a very small memory consumption per task. One challange is that the generated state machines might contain _self-referential_ structs, for example when local variables of the asynchronous function reference each other. To prevent pointer invalidation, Rust uses the `Pin` type to ensure that futures cannot be moved in memory anymore after they have been polled for the first time. + +For our **implementation**, we first created a very basic executor that polls all spawned tasks in a busy loop without using the `Waker` type at all. We then showed the advantage of waker notifications by implementing an asynchronous keyboard task. The task defines a static `SCANCODE_QUEUE` using the mutex-free `ArrayQueue` type provided by the `crossbeam` crate. Instead of handling keypresses directly, the keyboard interrupt handler now puts all received scancodes in the queue and then wakes the registered `Waker` to signal that new input is available. On the receiving end, we created a `ScancodeStream` type to provide a `Future` resolving to the next scancode in the queue. This made it possible to create an asynchronous `print_keypresses` task that uses async/await to interpret and print the scancodes in the queue. + +To utilize the waker notifications of the keyboard task, we created a new `Executor` type that differentiates between ready and waiting tasks. Using an `Arc`-shared `wake_queue`, we implemented a `TaskWaker` type that sends wake-up notifications directly to the executor, which can then mark the corresponding task as ready again. To save power when no tasks are runnable, we added support for putting the CPU to sleep using the `hlt` instruction. Finally, we discussed some potential extensions of our executor, for example for providing multi-core support. + ## What's Next? + +Using async/wait, we now have basic support for cooperative multitasking in our kernel. While cooperative multitasking is very efficient, it leads to latency problems when individual tasks keep running for too long and thus prevent other tasks to run. For this reason, it makes sense to also add support for preemptive multitasking to our kernel. + +In the next post, we will introduce _threads_ as the most common form of preemptive multitasking. In addition to resolving the problem of long running tasks, threads will also prepare us for utilizing multiple CPU cores and running untrusted user programs in the future. From 1264a44aa0c852e4cf99e7e7c6e30e764591860b Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 26 Mar 2020 13:39:03 +0100 Subject: [PATCH 39/51] Don't check links to github.com because of rate limiting --- blog/config.toml | 1 + 1 file changed, 1 insertion(+) diff --git a/blog/config.toml b/blog/config.toml index fa70baf3..93d9a2f7 100644 --- a/blog/config.toml +++ b/blog/config.toml @@ -18,6 +18,7 @@ skip_prefixes = [ "https://crates.io/crates", # see https://github.com/rust-lang/crates.io/issues/788 "https://www.amd.com/system/files/TechDocs/", # seems to have problems with PDFs "https://developer.apple.com/library/archive/qa/qa1118/_index.html", # results in a 401 (I don't know why) + "https://github.com", # rate limiting often leads to "Error 429 Too Many Requests" ] skip_anchor_prefixes = [ "https://github.com/", # see https://github.com/getzola/zola/issues/805 From 358a05c0fa7a1117c71b8501c1fd3b6fcf7b9264 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 26 Mar 2020 13:39:11 +0100 Subject: [PATCH 40/51] Fix some typos --- blog/content/second-edition/posts/12-async-await/index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 7a45f8c7..844c97f3 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -1026,7 +1026,7 @@ Currently, we handle the keyboard input directly in the interrupt handler. This A common pattern for delegating work to a background task is to create some sort of queue. The interrupt handler pushes work units of work to the queue and the background task handles the work in the queue. Applied to our keyboard interrupt, this means that the interrupt handler only reads the scancode from the keyboard, pushes it to the queue, and then returns. The keyboard task sits on the other end of the queue and interprets and handles each scancode that is pushed to it: -![Scancode queue with 8 slots on the top. Keyboard interupt handler on the bottom left with a "push scancode" arrow to the left of the queue. Keyboard task on the bottom right with a "pop scancode" queue coming from the right side of the queue.](scancode-queue.svg) +![Scancode queue with 8 slots on the top. Keyboard interrupt handler on the bottom left with a "push scancode" arrow to the left of the queue. Keyboard task on the bottom right with a "pop scancode" queue coming from the right side of the queue.](scancode-queue.svg) A simple implementation of that queue could be a mutex-protected [`VecDeque`]. However, using mutexes in interrupt handlers is not a good idea since it can easily lead to deadlocks. For example, when the user presses a key while the keyboard task has locked the queue, the interrupt handler tries to acquire the lock again and hangs indefinitely. Another problem with this approach is that `VecDeque` automatically increases its capacity by performing a new heap allocation when it becomes full. This can lead to deadlocks again because our allocator also uses a mutex internally. Further problems are that heap allocations can fail or take a considerable amount of time when the heap is fragmented. @@ -1234,7 +1234,7 @@ impl Stream for ScancodeStream { } ``` -We first use the [`OnceCell::try_get`] method to get a reference to the initialized scancode queue. This should never fail since we initialize the queue in the `new` function, so we can safely use the `expect` method to panic if it's not initalized. Next, we use the [`ArrayQueue::pop`] to try to get the next element from the queue. If it succeeds we return the scancode wrapped in `Poll::Ready(Some(…))`. If it fails, it means that the queue is empty. In that case, we return `Poll::Pending`. +We first use the [`OnceCell::try_get`] method to get a reference to the initialized scancode queue. This should never fail since we initialize the queue in the `new` function, so we can safely use the `expect` method to panic if it's not initialized. Next, we use the [`ArrayQueue::pop`] to try to get the next element from the queue. If it succeeds we return the scancode wrapped in `Poll::Ready(Some(…))`. If it fails, it means that the queue is empty. In that case, we return `Poll::Pending`. [`ArrayQueue::pop`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.pop @@ -1786,7 +1786,7 @@ We started this post by introducing **multitasking** and differentiating between We then explored how Rust's support of **async/await** provides a language-level implementation of cooperative multitasking. Rust bases its implementation on top of the polling-based `Future` trait, which abstracts asynchronous tasks. Using async/await, it is possible to work with futures almost like with normal synchronous code. The difference is that asynchronous functions return a `Future` again, which needs to be added to an executor at some point in order to run it. -Behind the scenes, the compiler transforms async/await code to _state machines_, with each `.await` operation corresponding to a possible pause point. By utilizing its knowledge about the program, the compiler is able to save only the minimal state for each pause point, resulting in a very small memory consumption per task. One challange is that the generated state machines might contain _self-referential_ structs, for example when local variables of the asynchronous function reference each other. To prevent pointer invalidation, Rust uses the `Pin` type to ensure that futures cannot be moved in memory anymore after they have been polled for the first time. +Behind the scenes, the compiler transforms async/await code to _state machines_, with each `.await` operation corresponding to a possible pause point. By utilizing its knowledge about the program, the compiler is able to save only the minimal state for each pause point, resulting in a very small memory consumption per task. One challenge is that the generated state machines might contain _self-referential_ structs, for example when local variables of the asynchronous function reference each other. To prevent pointer invalidation, Rust uses the `Pin` type to ensure that futures cannot be moved in memory anymore after they have been polled for the first time. For our **implementation**, we first created a very basic executor that polls all spawned tasks in a busy loop without using the `Waker` type at all. We then showed the advantage of waker notifications by implementing an asynchronous keyboard task. The task defines a static `SCANCODE_QUEUE` using the mutex-free `ArrayQueue` type provided by the `crossbeam` crate. Instead of handling keypresses directly, the keyboard interrupt handler now puts all received scancodes in the queue and then wakes the registered `Waker` to signal that new input is available. On the receiving end, we created a `ScancodeStream` type to provide a `Future` resolving to the next scancode in the queue. This made it possible to create an asynchronous `print_keypresses` task that uses async/await to interpret and print the scancodes in the queue. From e76e71f2854132d3e80cb5268b0c6c48560a33be Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 26 Mar 2020 17:01:39 +0100 Subject: [PATCH 41/51] Write introduction --- blog/content/second-edition/posts/12-async-await/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 844c97f3..efd0ab40 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -8,7 +8,7 @@ date = 0000-01-01 chapter = "Interrupts" +++ -In this post we explore _cooperative multitasking_ and the _async/await_ feature of Rust. This will make it possible to run multiple concurrent tasks in our kernel. TODO +In this post we explore _cooperative multitasking_ and the _async/await_ feature of Rust. We take a detailed look how async/await works in Rust, including the design of the `Future` trait, the state machine transformation, and _pinning_. We then add basic support for async/await to our kernel by creating an asynchronous keyboard task and a basic executor. From fe0c8ccb0c7eb9ee32332f774a6b112280fea537 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 26 Mar 2020 17:02:01 +0100 Subject: [PATCH 42/51] Add job seeking note --- blog/content/second-edition/posts/12-async-await/index.md | 8 ++++++++ blog/static/css/main.css | 8 ++++++-- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index efd0ab40..b3e0117e 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -18,6 +18,14 @@ This blog is openly developed on [GitHub]. If you have any problems or questions [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-12 +
+ +As a personal side note, I'm currently looking for a job in Karlsruhe (Germany) or remote. I would love to do systems programming using Rust, but I'm also open to other opportuni­ties. For more information, see my [_LinkedIn_ profile] or contact me at . + +[_LinkedIn_ profile]: https://www.linkedin.com/in/phil-opp/ + +
+ ## Multitasking diff --git a/blog/static/css/main.css b/blog/static/css/main.css index f35814d9..2e2df2c2 100644 --- a/blog/static/css/main.css +++ b/blog/static/css/main.css @@ -315,9 +315,13 @@ a.zola-anchor:hover { div.note { padding: .7rem 1rem; margin: 1rem .2rem; - border: 2px solid #99ff99; + border: 2px solid #6ad46a; border-radius: 5px; - background-color: #99ff0022; + background-color: #99ff991f; +} + +div.note p:last-child { + margin-bottom: 0; } div.warning { From 46fbd2454c52d9aacf817d1126aff5b648f4ef58 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 26 Mar 2020 17:02:20 +0100 Subject: [PATCH 43/51] Add TODO for updating release date before publishing --- blog/content/second-edition/posts/12-async-await/index.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index b3e0117e..08c0ef60 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -1805,3 +1805,6 @@ To utilize the waker notifications of the keyboard task, we created a new `Execu Using async/wait, we now have basic support for cooperative multitasking in our kernel. While cooperative multitasking is very efficient, it leads to latency problems when individual tasks keep running for too long and thus prevent other tasks to run. For this reason, it makes sense to also add support for preemptive multitasking to our kernel. In the next post, we will introduce _threads_ as the most common form of preemptive multitasking. In addition to resolving the problem of long running tasks, threads will also prepare us for utilizing multiple CPU cores and running untrusted user programs in the future. + + +TODO: update date \ No newline at end of file From 117fcbddd4b1ae1f1be5d7b9bb7f2ed8a6855f87 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 26 Mar 2020 17:18:15 +0100 Subject: [PATCH 44/51] Resolve remaining TODO-links --- blog/content/second-edition/posts/12-async-await/index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 08c0ef60..11f9a1d4 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -1368,9 +1368,9 @@ pub async fn print_keypresses() { The code is very similar to the code we had in our [keyboard interrupt handler] before we modified it in this post. The only difference is that, instead of reading the scancode from an I/O port, we take it from the `ScancodeStream`. For this, we first create a new `Scancode` stream and then repeatedly use the [`next`] method provided by the [`StreamExt`] trait to get a `Future` that resolves to the next element in the stream. By using the `await` operator on it, we asynchronously wait for the result of the future. -[keyboard interrupt handler]: TODO -[`next`]: TODO -[`StreamExt`]: TODO +[keyboard interrupt handler]: @/second-edition/posts/07-hardware-interrupts/index.md#interpreting-the-scancodes +[`next`]: https://docs.rs/futures-util/0.3.4/futures_util/stream/trait.StreamExt.html#method.next +[`StreamExt`]: https://docs.rs/futures-util/0.3.4/futures_util/stream/trait.StreamExt.html We use `while let` to loop until the stream returns `None` to signal its end. Since our `poll_next` method never returns `None`, this is effectively and endless loop, so the `print_keypresses` task never finishes. From 4d326ef806a4f105fbf3995682fc0fcba42aa7ef Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 26 Mar 2020 17:24:40 +0100 Subject: [PATCH 45/51] Ignore linkedin.com in link checking --- blog/config.toml | 1 + 1 file changed, 1 insertion(+) diff --git a/blog/config.toml b/blog/config.toml index 93d9a2f7..14b440de 100644 --- a/blog/config.toml +++ b/blog/config.toml @@ -19,6 +19,7 @@ skip_prefixes = [ "https://www.amd.com/system/files/TechDocs/", # seems to have problems with PDFs "https://developer.apple.com/library/archive/qa/qa1118/_index.html", # results in a 401 (I don't know why) "https://github.com", # rate limiting often leads to "Error 429 Too Many Requests" + "https://www.linkedin.com/", # seems to send invalid HTTP status codes ] skip_anchor_prefixes = [ "https://github.com/", # see https://github.com/getzola/zola/issues/805 From 55bfb1d5503bf021aecb5b6e10bf70e7a9cfe763 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 26 Mar 2020 18:12:25 +0100 Subject: [PATCH 46/51] Minor improvements --- .../posts/12-async-await/index.md | 30 +++++++------------ 1 file changed, 11 insertions(+), 19 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 11f9a1d4..78494997 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -95,7 +95,7 @@ Language-supported implementations of cooperative tasks are often even able to b The drawback of cooperative multitasking is that an uncooperative task can potentially run for an unlimited amount of time. Thus, a malicious or buggy task can prevent other tasks from running and slow down or even block the whole system. For this reason, cooperative multitasking should only be used when all tasks are known to cooperate. As a counterexample, it's not a good idea to make the operating system rely on the cooperation of arbitrary userlevel programs. -However, the strong performance and memory benefits of cooperative multitasking make it a good approach for usage _within_ a program, especially in combination with asynchronous operations. Since an operating system kernel is a performance-critical program that interacts with asynchronous hardware, cooperative multitasking seems like a good approach for concurrency in our kernel. +However, the strong performance and memory benefits of cooperative multitasking make it a good approach for usage _within_ a program, especially in combination with asynchronous operations. Since an operating system kernel is a performance-critical program that interacts with asynchronous hardware, cooperative multitasking seems like a good approach for implementing concurrency. ## Async/Await in Rust @@ -180,11 +180,11 @@ A more efficient approach could be to _block_ the current thread until the futur #### Future Combinators -An alternative to waiting is to use future combinators. Future combinators are functions like `map` that allow chaining and combining futures together, similar to the functions on [`Iterator`]. Instead of waiting on the future, these combinators return a future themselves, which applies the mapping operation on `poll`. +An alternative to waiting is to use future combinators. Future combinators are methods like `map` that allow chaining and combining futures together, similar to the methods on [`Iterator`]. Instead of waiting on the future, these combinators return a future themselves, which applies the mapping operation on `poll`. [`Iterator`]: https://doc.rust-lang.org/stable/core/iter/trait.Iterator.html -As an example, a simple `string_len` combinator for converting `Future` to a `Future` could look like this: +As an example, a simple `string_len` combinator for converting a `Future` to a `Future` could look like this: ```rust struct StringLen { @@ -316,7 +316,7 @@ We see that the first `poll` call starts the function and lets it run until it r #### Saving State -In order to be able to continue from the last waiting state, the state machine must save it internally. In addition, it must save all the variables that it needs to continue execution on the next `poll` call. This is where the compiler can really shine: Since it knows which variables are used when, it can automatically generate structs with exactly the variables that are needed. +In order to be able to continue from the last waiting state, the state machine must keep track of the current state internally. In addition, it must save all the variables that it needs to continue execution on the next `poll` call. This is where the compiler can really shine: Since it knows which variables are used when, it can automatically generate structs with exactly the variables that are needed. As an example, the compiler generates structs like the following for the above `example` function: @@ -521,7 +521,7 @@ struct WaitingOnWriteState { } ``` -We need to store both the `array` and `element` variables because `element` is required for the return type and `array` is referenced by `element`. Since `element` is a reference, it stores a _pointer_ (i.e. a memory address) to the referenced element. We used `0x1001a` as an example memory address here. In reality it needs to be the address of the last element of the `array` field, so it depends on where the struct lives in memory. Structs with such internal pointers are called _self-referential_ structs because they reference themselves from one of their fields. +We need to store both the `array` and `element` variables because `element` is required for the return value and `array` is referenced by `element`. Since `element` is a reference, it stores a _pointer_ (i.e. a memory address) to the referenced element. We used `0x1001a` as an example memory address here. In reality it needs to be the address of the last element of the `array` field, so it depends on where the struct lives in memory. Structs with such internal pointers are called _self-referential_ structs because they reference themselves from one of their fields. #### The Problem with Self-Referential Structs @@ -588,7 +588,7 @@ println!("internal reference: {:p}", stack_value.self_ptr); ([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e160ee8a64cba4cebc1c0473dcecb7c8)) -Here we use the [`mem::replace`] function to replace the heap allocated value with a new struct instance. This allows us to move the original `heap_value` to the stack, while the `self_ptr` field of the struct is now a dangling pointer that still points to the old heap address. When you try to run the example on the playground, you see that the printed _"value at:"_ and _"internal reference:"_ lines show indeed different pointers. So heap allpcating a value is not enough to make self-references safe. +Here we use the [`mem::replace`] function to replace the heap allocated value with a new struct instance. This allows us to move the original `heap_value` to the stack, while the `self_ptr` field of the struct is now a dangling pointer that still points to the old heap address. When you try to run the example on the playground, you see that the printed _"value at:"_ and _"internal reference:"_ lines show indeed different pointers. So heap allocating a value is not enough to make self-references safe. [`mem::replace`]: https://doc.rust-lang.org/nightly/core/mem/fn.replace.html @@ -675,7 +675,7 @@ The [`get_unchecked_mut`] function works on a `Pin<&mut T>` instead of a `Pin>`, we can prevent this error and safely work with self-referential structs. Note that the compiler is not able to prove that the creation of the self-reference is safe (yet), so we need to use an unsafe block and verify the correctness ourselves. +Now the only error left is the desired error on `mem::replace`. Remember, this operation tries to move the heap allocated value to stack, which would break the self-reference stored in the `self_ptr` field. By opting out of `Unpin` and using `Pin>`, we can prevent this operation at compile time and thus safely work with self-referential structs. As we saw, the compiler is not able to prove that the creation of the self-reference is safe (yet), so we need to use an unsafe block and verify the correctness ourselves. #### Stack Pinning and `Pin<&mut T>` @@ -717,15 +717,7 @@ In case you're interested in understanding how to safely implement a future comb Using async/await, it is possible to ergonomically work with futures in a completely asynchronous way. However, as we learned above, futures do nothing until they are polled. This means we have to have to call `poll` on them at some point, otherwise the asynchronous code is never executed. -With a single future, we can always wait for the future using a loop [as described above](#waiting-on-futures). However, this approach is very inefficient, especially for programs that create a large number of futures. An example for such a program could be a web server that handles each request using an asynchronous function: - -```rust -async fn handle_request(request: Request) {…} -``` - -The function is invoked for each request the webserver receives. It has no return type, so it results in a future with the empty type `()` as output. When the web server receives many concurrent requests, this can easily result in hundreds or thousands of futures in the system. While these futures have no return value that we need for future computations, we still want them to be polled to completion because otherwise the requests would not be handled. - -The most common approach for this is to define a global _executor_ that is responsible for polling all futures in the system until they are finished. +With a single future, we can always wait for each future manually using a loop [as described above](#waiting-on-futures). However, this approach is very inefficient and not practical for programs that create a large number of futures. The most common solution for this problem is to define a global _executor_ that is responsible for polling all futures in the system until they are finished. #### Executors @@ -740,7 +732,7 @@ To avoid the overhead of polling futures over and over again, executors typicall #### Wakers -The idea behind the waker API is that a special [`Waker`] type is passed to each invocation of `poll`, wrapped in a [`Context`] type for future extensibility. This `Waker` type is created by the executor and can be used by the asynchronous task to signal its (partial) completion. As a result, the executor does not need to call `poll` on a future that previously returned `Poll::Pending` again until it is notified by the corresponding waker. +The idea behind the waker API is that a special [`Waker`] type is passed to each invocation of `poll`, wrapped in the [`Context`] type. This `Waker` type is created by the executor and can be used by the asynchronous task to signal its (partial) completion. As a result, the executor does not need to call `poll` on a future that previously returned `Poll::Pending` until it is notified by the corresponding waker. [`Context`]: https://doc.rust-lang.org/nightly/core/task/struct.Context.html @@ -752,7 +744,7 @@ async fn write_file() { } ``` -This function asynchronously writes the string "Hello" to a `foo.txt` file. Since hard disk writes take some time, the first `poll` call on this future will likely return `Poll::Pending`. However, the hard disk driver will internally store the `Waker` passed in the `poll` call and signal it as soon as the file was written to disk. This way, the executor does not need to waste any time trying to `poll` the future again before it receives the waker notification. +This function asynchronously writes the string "Hello" to a `foo.txt` file. Since hard disk writes take some time, the first `poll` call on this future will likely return `Poll::Pending`. However, the hard disk driver will internally store the `Waker` passed to the `poll` call and use it to notify the executor when the file was written to disk. This way, the executor does not need to waste any time trying to `poll` the future again before it receives the waker notification. We will see how the `Waker` type works in detail when we create our own executor with waker support in the implementation section of this post. @@ -765,7 +757,7 @@ It might not be immediately apparent, but futures and async/await are an impleme - Each future that is added to the executor is basically an cooperative task. - Instead of using an explicit yield operation, futures give up control of the CPU core by returning `Poll::Pending` (or `Poll::Ready` at the end). - There is nothing that forces futures to give up the CPU. If they want, they can never return from `poll`, e.g. by spinning endlessly in a loop. - - Since each future can block the execution of the other futures in the executor, we need to trust they are not malicious. + - Since each future can block the execution of the other futures in the executor, we need to trust them to be not malicious. - Futures internally store all the state they need to continue execution on the next `poll` call. With async/await, the compiler automatically detects all variables that are needed and stores them inside the generated state machine. - Only the minimum state required for continuation is saved. - Since the `poll` method gives up the call stack when it returns, the same stack can be used for polling other futures. From da58c31ed44afa3e0375675fc833462157f9b628 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Thu, 26 Mar 2020 18:23:21 +0100 Subject: [PATCH 47/51] Fill in required nightly version and note missing rustfmt --- blog/content/second-edition/posts/12-async-await/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 78494997..ab71cbb3 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -766,7 +766,7 @@ We see that futures and async/await fit the cooperative multitasking pattern per ## Implementation -Now that we understand how cooperative multitasking based on futures and async/await works in Rust, it's time to add support for it to our kernel. Since the [`Future`] trait is part of the `core` library and async/await is a feature of the language itself, there is nothing special we need to do to use it in our `#![no_std]` kernel. The only requirement is that we use at least nightly-TODO of Rust because async/await was not `no_std` compatible before. +Now that we understand how cooperative multitasking based on futures and async/await works in Rust, it's time to add support for it to our kernel. Since the [`Future`] trait is part of the `core` library and async/await is a feature of the language itself, there is nothing special we need to do to use it in our `#![no_std]` kernel. The only requirement is that we use at least nightly `2020-03-25` of Rust because async/await was not `no_std` compatible before. (There is no nightly with the rustfmt and clippy components since then, so you might have to pass the `--force` flag to `rustup update`, which performs the update even if it removes some installed components.) With a recent-enough nightly, we can start using async/await in our `main.rs`: From d29a28591ecc913c31f5b014a2e9f5567aa0e7ce Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Fri, 27 Mar 2020 17:23:29 +0100 Subject: [PATCH 48/51] Finish the post --- .../posts/12-async-await/index.md | 107 +++++++++++------- .../qemu-keyboard-output-again.gif | Bin 0 -> 7732 bytes 2 files changed, 65 insertions(+), 42 deletions(-) create mode 100644 blog/content/second-edition/posts/12-async-await/qemu-keyboard-output-again.gif diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index ab71cbb3..911d93e2 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -808,14 +808,14 @@ pub struct Task { } ``` -The `Task` struct is a newtype wrapper around a pinned, heap allocated, dynamically dispatched future with the empty type `()` as output. Let's go through it in detail: +The `Task` struct is a newtype wrapper around a pinned, heap allocated, and dynamically dispatched future with the empty type `()` as output. Let's go through it in detail: - We require that the future associated with a task returns `()`. This means that tasks don't return any result, they are just executed for its side effects. For example, the `example_task` function we defined above has no return value, but it prints something to the screen as a side effect. -- The `dyn` keyword indicates that we store a [trait object] in the `Box`. This means that the type of the future is [dynamically dispatched], which makes it possible to store different types of futures in the `Task` type. This is important because each `async fn` has their own type and we want to be able to create different tasks later. +- The `dyn` keyword indicates that we store a [_trait object_] in the `Box`. This means that the methods on the future are [_dynamically dispatched_], which makes it possible to store different types of futures in the `Task` type. This is important because each `async fn` has its own type and we want to be able to create multiple different tasks. - As we learned in the [section about pinning], the `Pin` type ensures that a value cannot be moved in memory by placing it on the heap and preventing the creation of `&mut` references to it. This is important because futures generated by async/await might be self-referential, i.e. contain pointers to itself that would be invalidated when the future is moved. -[trait object]: https://doc.rust-lang.org/book/ch17-02-trait-objects.html -[dynamically dispatched]: https://doc.rust-lang.org/book/ch17-02-trait-objects.html#trait-objects-perform-dynamic-dispatch +[_trait object_]: https://doc.rust-lang.org/book/ch17-02-trait-objects.html +[_dynamically dispatched_]: https://doc.rust-lang.org/book/ch17-02-trait-objects.html#trait-objects-perform-dynamic-dispatch [section about pinning]: #pinning To allow the creation of new `Task` structs from futures, we create a `new` function: @@ -832,9 +832,9 @@ impl Task { } ``` -The function takes an arbitrary future with output type `()` and pins it in memory through the [`Box::pin`] function. Then it wraps it in the `Task` struct and returns the new task. The `'static` lifetime is required here because the returned `Task` can live for an arbitrary time, so the future needs to be valid for that time too. +The function takes an arbitrary future with output type `()` and pins it in memory through the [`Box::pin`] function. Then it wraps the boxed future in the `Task` struct and returns it. The `'static` lifetime is required here because the returned `Task` can live for an arbitrary time, so the future needs to be valid for that time too. -We also add a `poll` method to allow the executor to poll the corresponding future: +We also add a `poll` method to allow the executor to poll the stored future: ```rust // in src/task/mod.rs @@ -883,7 +883,7 @@ impl SimpleExecutor { } ``` -The struct contains a single `task_queue` field of type [`VecDeque`], which is basically a vector that allows to push and pop on both ends. The idea behind using this type is that we insert new tasks through the `spawn` method at the end and pop the next task for execution from the front. This way, we get a simple [FIFO queue] (_"first in, first out"_). +The struct contains a single `task_queue` field of type [`VecDeque`], which is basically a vector that allows to push and pop operations on both ends. The idea behind using this type is that we insert new tasks through the `spawn` method at the end and pop the next task for execution from the front. This way, we get a simple [FIFO queue] (_"first in, first out"_). [`VecDeque`]: https://doc.rust-lang.org/stable/alloc/collections/vec_deque/struct.VecDeque.html [FIFO queue]: https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics) @@ -972,7 +972,7 @@ impl SimpleExecutor { } ``` -The function uses a `while let` loop to handle all tasks in the `task_queue`. For each task, it first creates a `Context` type by wrapping a `Waker` instance returned by our `dummy_waker` function. Then it invokes the `Task::poll` method with this `Context`. If the `poll` method returns `Poll::Ready`, the task is finished and we can continue with the next task. If the task is still `Poll::Pending`, we add it to the back of the queue again so that it will be polled again in a subsequent loop iteration. +The function uses a `while let` loop to handle all tasks in the `task_queue`. For each task, it first creates a `Context` type by wrapping a `Waker` instance returned by our `dummy_waker` function. Then it invokes the `Task::poll` method with this `context`. If the `poll` method returns `Poll::Ready`, the task is finished and we can continue with the next task. If the task is still `Poll::Pending`, we add it to the back of the queue again so that it will be polled again in a subsequent loop iteration. #### Trying It @@ -992,6 +992,18 @@ fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] test_main, "it did not crash" message, hlt_loop } + + +// Below is the example_task function again so that you don't have to scroll up + +async fn async_number() -> u32 { + 42 +} + +async fn example_task() { + let number = async_number().await; + println!("async number: {}", number); +} ``` When we run it, we see that the expected _"async number: 42"_ message is printed to the screen: @@ -1119,7 +1131,7 @@ We use the [`OnceCell::try_get`] to get a reference to the initialized queue. If [`OnceCell::try_get`]: https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html#method.try_get -The fact that the [`ArrayQueue::push`] method requires only a `&self` reference makes it very simple to call the method on the static queue. The `ArrayQueue` type performs all necessary synchronization itself, so we don't need a mutex wrapper here. +The fact that the [`ArrayQueue::push`] method requires only a `&self` reference makes it very simple to call the method on the static queue. The `ArrayQueue` type performs all necessary synchronization itself, so we don't need a mutex wrapper here. In case the queue is full, we print a warning too. [`ArrayQueue::push`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.push @@ -1163,16 +1175,14 @@ impl ScancodeStream { pub fn new() -> Self { SCANCODE_QUEUE.try_init_once(|| ArrayQueue::new(100)) .expect("ScancodeStream::new should only be called once"); - ScancodeStream { - _private: (), - } + ScancodeStream { _private: () } } } ``` -The purpose of the `_private` field is to prevent construction of the struct from outside of the module. This makes the `new` function the only way to construct the type. In the function, we first try to initialize the `SCANCODE_QUEUE` static. We panic if it is already initialized to ensure that only a single `ScancodeStream` type can be created. +The purpose of the `_private` field is to prevent construction of the struct from outside of the module. This makes the `new` function the only way to construct the type. In the function, we first try to initialize the `SCANCODE_QUEUE` static. We panic if it is already initialized to ensure that only a single `ScancodeStream` instance can be created. -To make the scancodes available to asynchronous tasks, the next step is to implement `poll`-like method that tries to pop the next scancode off the queue. While this sounds like we should implement [`Future`] trait for our type, this does not quite fit here. The problem is that the `Future` trait only abstracts over a single asynchronous value and expects that the `poll` method is not called again after it returns `Poll::Ready`. Our scancode queue, however, contains multiple asynchronous tasks so that it is ok to keep polling it. +To make the scancodes available to asynchronous tasks, the next step is to implement `poll`-like method that tries to pop the next scancode off the queue. While this sounds like we should implement [`Future`] trait for our type, this does not quite fit here. The problem is that the `Future` trait only abstracts over a single asynchronous value and expects that the `poll` method is not called again after it returns `Poll::Ready`. Our scancode queue, however, contains multiple asynchronous values so that it is ok to keep polling it. ##### The `Stream` Trait @@ -1234,13 +1244,13 @@ impl Stream for ScancodeStream { } ``` -We first use the [`OnceCell::try_get`] method to get a reference to the initialized scancode queue. This should never fail since we initialize the queue in the `new` function, so we can safely use the `expect` method to panic if it's not initialized. Next, we use the [`ArrayQueue::pop`] to try to get the next element from the queue. If it succeeds we return the scancode wrapped in `Poll::Ready(Some(…))`. If it fails, it means that the queue is empty. In that case, we return `Poll::Pending`. +We first use the [`OnceCell::try_get`] method to get a reference to the initialized scancode queue. This should never fail since we initialize the queue in the `new` function, so we can safely use the `expect` method to panic if it's not initialized. Next, we use the [`ArrayQueue::pop`] method to try to get the next element from the queue. If it succeeds we return the scancode wrapped in `Poll::Ready(Some(…))`. If it fails, it means that the queue is empty. In that case, we return `Poll::Pending`. [`ArrayQueue::pop`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.pop #### Waker Support -Like the `Futures::poll` method, the `Stream::poll_next` method requires that the asynchronous task notifies the executor when it becomes ready after `Poll::Pending` is returned for the first time. This way, the executor does not need to poll the same task again until it is notified, which greatly reduces the performance overhead of waiting tasks. +Like the `Futures::poll` method, the `Stream::poll_next` method requires that the asynchronous task notifies the executor when it becomes ready after `Poll::Pending` is returned. This way, the executor does not need to poll the same task again until it is notified, which greatly reduces the performance overhead of waiting tasks. To send this notification, the task should extract the [`Waker`] from the passed [`Context`] reference and store it somewhere. When the task becomes ready, it should invoke the [`wake`] method on the stored `Waker` to notify the executor that the task should be polled again. @@ -1287,18 +1297,18 @@ impl Stream for ScancodeStream { Ok(scancode) => { WAKER.take(); Poll::Ready(Some(scancode)) - }, + } Err(crossbeam_queue::PopError) => Poll::Pending, } } } ``` -Like before, we first use the [`OnceCell::try_get`] function to get a reference to the initialized scancode queue. We then optimistically try to `pop` from the queue and return `Poll::Ready` when it succeeds. This exploits the fact that it's only required to register a wakeup when returning `Poll::Pending`. +Like before, we first use the [`OnceCell::try_get`] function to get a reference to the initialized scancode queue. We then optimistically try to `pop` from the queue and return `Poll::Ready` when it succeeds. This way, we can avoid the performance overhead of registering a waker when the queue is not empty. -If the first call to `queue.pop()` does not succeed, the queue is potentially empty. Only potentially because the interrupt handler might have filled the queue asynchronously immediately after the check. Since this race condition can occur again on the next check, we need to register the `Waker` in the `WAKER` static before the second check. This way, a wakeup might happen before we return `Poll::Pending`, but it is guaranteed that we get a wakeup for any scancodes pushed after the check. +If the first call to `queue.pop()` does not succeed, the queue is potentially empty. Only potentially because the interrupt handler might have filled the queue asynchronously immediately after the check. Since this race condition can occur again for the next check, we need to register the `Waker` in the `WAKER` static before the second check. This way, a wakeup might happen before we return `Poll::Pending`, but it is guaranteed that we get a wakeup for any scancodes pushed after the check. -After registering the `Waker` contained in the passed [`Context`] through the [`AtomicWaker::register`] function, we try popping from the queue a second time. If it now succeeds, we return `Poll::Ready`. We also remove the registered waker again using [`Waker::take`] because a waker notification is no longer needed. In case `queue.pop()` fails for a second time, we return `Poll::Pending` like before, but this time with a registered wakeup. +After registering the `Waker` contained in the passed [`Context`] through the [`AtomicWaker::register`] function, we try popping from the queue a second time. If it now succeeds, we return `Poll::Ready`. We also remove the registered waker again using [`AtomicWaker::take`] because a waker notification is no longer needed. In case `queue.pop()` fails for a second time, we return `Poll::Pending` like before, but this time with a registered wakeup. [`AtomicWaker::register`]: https://docs.rs/futures-util/0.3.4/futures_util/task/struct.AtomicWaker.html#method.register [`AtomicWaker::take`]: https://docs.rs/futures/0.3.4/futures/task/struct.AtomicWaker.html#method.take @@ -1319,6 +1329,8 @@ pub(crate) add_scancode(scancode: u8) { } else { WAKER.wake(); // new } + } else { + println!("WARNING: scancode queue uninitialized"); } } ``` @@ -1371,8 +1383,9 @@ Let's add the `print_keypresses` task to our executor in our `main.rs` to get wo ```rust // in src/main.rs +use blog_os::task::keyboard; // new + fn kernel_main(boot_info: &'static BootInfo) -> ! { - use blog_os::task::keyboard; // […] initialization routines, including init_heap, test_main @@ -1418,7 +1431,7 @@ pub struct Task { } ``` -The idea is to use the memory address of this future as an ID. This address is unique because because no two futures are stored at the same address. The `Pin` type ensures that they can't move in memory, so we also know that the address stays the same as long as the task exists. These properties make the address a good candidate for an ID. +The idea is to use the memory address of this future as an ID. This address is unique because no two futures are stored at the same address. The `Pin` type ensures that they can't move in memory, so we also know that the address stays the same as long as the task exists. These properties make the address a good candidate for an ID. The implementation looks like this: @@ -1453,7 +1466,7 @@ pub mod executor; // in src/task/executor.rs use super::{Task, TaskId}; -use alloc::collections::{BTreeMap, VecDeque}; +use alloc::{collections::{BTreeMap, VecDeque}, sync::Arc}; use core::task::Waker; use crossbeam_queue::ArrayQueue; @@ -1476,10 +1489,10 @@ impl Executor { } ``` -In addition to a `task_queue`, that stores the tasks that are ready to execute, the type has a `waiting_tasks` map, a `wake_queue` and a `waker_cache`. These fields have the following purpose: +In addition to a `task_queue`, which stores the tasks that are ready to execute, the type has a `waiting_tasks` map, a `wake_queue` and a `waker_cache`. These fields have the following purpose: -- The `waiting_tasks` map stores tasks that returned `Poll::Pending`. The map is indexed by the `TaskId` to allow efficient continuation a specific task. -- The `wake_queue` is [`ArrayQueue`] of task IDs, wrapped into the [`Arc`] type that implements _reference counting_. Reference countingmakes it possible to share ownership of the value between multiple owners. It works by allocating the value on the heap and counting the number of active references to it. When the number of active references reaches zero, the value is no longer needed and can be deallocated. +- The `waiting_tasks` map stores tasks that returned `Poll::Pending`. The map is indexed by the `TaskId` to allow efficient continuation of a specific task. +- The `wake_queue` is [`ArrayQueue`] of task IDs, wrapped into the [`Arc`] type that implements _reference counting_. Reference counting makes it possible to share ownership of the value between multiple owners. It works by allocating the value on the heap and counting the number of active references to it. When the number of active references reaches zero, the value is no longer needed and can be deallocated. We use the `Arc` wrapper for the `wake_queue` because it will be shared between the executor and wakers. The idea is that the wakers push the ID of the woken task to the queue. The executor sits on the receiving end of the queue and moves all woken tasks from the `waiting_tasks` map back to the `task_queue`. The reason for using a fixed-size queue instead of an unbounded queue such as [`SegQueue`] is that interrupt handlers that should not allocate will push to this queue. - The `waker_cache` map caches the [`Waker`] of a task after its creation. This has two reasons: First, it improves performance by reusing the same waker for multiple wake-ups of the same task instead of creating a new waker each time. Second, it ensures that reference-counted wakers are not deallocated inside interrupt handlers because it could lead to deadlocks (there are more details on this below). @@ -1512,20 +1525,24 @@ To execute all tasks in the `task_queue`, we create a private `run_ready_tasks` ```rust // in src/task/executor.rs +use core::task::{Context, Poll}; + impl Executor { fn run_ready_tasks(&mut self) { while let Some(mut task) = self.task_queue.pop_front() { - let waker = self.waker_cache.entry(&task.id()).or_insert_with(|| { - self.create_waker(task.id()) - }); + let task_id = task.id(); + if !self.waker_cache.contains_key(&task_id) { + self.waker_cache.insert(task_id, self.create_waker(task_id)); + } + let waker = self.waker_cache.get(&task_id).expect("should exist"); let mut context = Context::from_waker(waker); match task.poll(&mut context) { Poll::Ready(()) => { // task done -> remove cached waker - self.waker_cache.remove(task.id()); + self.waker_cache.remove(&task_id); } Poll::Pending => { - if self.waiting_tasks.insert(task.id(), task).is_some() { + if self.waiting_tasks.insert(task_id, task).is_some() { panic!("task with same ID already in waiting_tasks"); } }, @@ -1537,11 +1554,15 @@ impl Executor { The basic idea of this function is similar to our `SimpleExecutor`: Loop over all tasks in the `task_queue`, create a waker for each task, and then poll it. However, instead of adding pending tasks back to the end of the `task_queue`, we store them in the `waiting_tasks` map until they are woken again. The waker creation is done by a method named `create_waker`, whose implemenation will be shown in a moment. -To avoid the performance overhead of creating a waker on each poll, we use the `waker_cache` map to store the waker for each task after it has been created. For this, we first use the [`BTreeMap::entry`] method to find the [`Entry`] corresponding to the task ID. We then use the [`Entry::or_insert_with`] method to optionally create a new `Waker` if not present and then get a reference to the `Waker`. Note reusing wakers like this is not possible for all waker implementations, but our implemenation will allow it. To clean up the `waker_cache` when a task is finished, we use use the [`BTreeMap::remove`] method to remove any cached waker for that task from the map. +To avoid the performance overhead of creating a waker on each poll, we use the `waker_cache` map to store the waker for each task after it has been created. For this, we first use the [`BTreeMap::contains_key`] method to check whether a cached waker exists for the task. If not, we use the [`BTreeMap::insert`] method to create it. Afterwards, we can be sure that the waker exists, so we use the [`BTreeMap::get`] method in combination with an [`expect`] call to get a reference to it. + +[`BTreeMap::contains_key`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.contains_key +[`BTreeMap::insert`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.insert +[`BTreeMap::get`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.get +[`expect`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.expect + +Note that reusing wakers like this is not possible for all waker implementations, but our implemenation will allow it. To clean up the `waker_cache` when a task is finished, we use use the [`BTreeMap::remove`] method to remove any cached waker for that task from the map. -[`BTreeMap::entry`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.entry -[`Entry`]: https://doc.rust-lang.org/alloc/collections/btree_map/enum.Entry.html -[`Entry::or_insert_with`]: https://doc.rust-lang.org/alloc/collections/btree_map/enum.Entry.html#method.or_insert_with [`BTreeMap::remove`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.remove #### Waker Design @@ -1577,7 +1598,7 @@ We push the `task_id` to the referenced `wake_queue`. Since modifications of the ##### The `Wake` Trait -In order to use our `TaskWaker` type for polling futures, we need to convert it to a [`Waker`] instance first. This is required because the [`Future::poll`] method takes a [`Context`] instance as argument, which can only be constructed from the `Waker` type. While we could do this by providing an implementation of the [`RawWaker`] type, it's both simpler and safer to instead implement the [`Wake`] trait and then using the [`From`] implementations provided by the standard library to construct the `Waker`. +In order to use our `TaskWaker` type for polling futures, we need to convert it to a [`Waker`] instance first. This is required because the [`Future::poll`] method takes a [`Context`] instance as argument, which can only be constructed from the `Waker` type. While we could do this by providing an implementation of the [`RawWaker`] type, it's both simpler and safer to instead implement the `Arc`-based [`Wake`] trait and then using the [`From`] implementations provided by the standard library to construct the `Waker`. The trait implementation looks like this: @@ -1672,12 +1693,12 @@ We can now change our `kernel_main` to use our new `Executor` instead of the `Si ```rust // in src/main.rs -fn kernel_main(boot_info: &'static BootInfo) -> ! { - use blog_os::task::executor::Executor; +use blog_os::task::executor::Executor; // new +fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] initialization routines, including init_heap, test_main - let mut executor = Executor::new(); + let mut executor = Executor::new(); // new executor.spawn(Task::new(example_task())); executor.spawn(Task::new(keyboard::print_keypresses())); executor.run(); @@ -1688,7 +1709,7 @@ We only need to change the import and the type name. Since our `run` function is When we run our kernel using `cargo xrun` now, we see that keyboard input still works: -TODO gif +![QEMU printing ".....H...e...l...l..o..... ...a..g..a....i...n...!"](qemu-keyboard-output-again.gif) However, the CPU utilization of QEMU did not get any better. The reason for this is that we still keep the CPU busy for the whole time. We no longer poll tasks until they are woken again, but we still check the `wake_queue` and the `task_queue` in a busy loop. To fix this, we need to put the CPU to sleep if there is no more work to do. @@ -1736,10 +1757,12 @@ if self.wake_queue.is_empty() { In case this interrupt pushes to the `wake_queue`, we put the CPU to sleep even though there is now a ready task. In the worst case, this could delay the handling of a keyboard interrupt until the next keypress or the next timer interrupt. So how do we prevent it? -The answer is to disable interrupts on the CPU before the check and atomically enable them again together with the `hlt` instruction. This way, all interrupts that happen between in between are delayed after the `hlt` instruction so that no wake-ups are missed. To implement this approach, we can use the [`enable_interrupts_and_hlt`] function provided by the [`x86_64`] crate: +The answer is to disable interrupts on the CPU before the check and atomically enable them again together with the `hlt` instruction. This way, all interrupts that happen between in between are delayed after the `hlt` instruction so that no wake-ups are missed. To implement this approach, we can use the [`enable_interrupts_and_hlt`] function provided by the [`x86_64`] crate. This function is only available since version 0.9.6, so you might need to update your `x86_64` dependency to use it. [`enable_interrupts_and_hlt`]: https://docs.rs/x86_64/0.9.6/x86_64/instructions/interrupts/fn.enable_interrupts_and_hlt.html +The updated implementation of our `sleep_if_idle` function looks like this: + ```rust // in src/task/executor.rs @@ -1764,7 +1787,7 @@ impl Executor { To avoid unnecessarily disabling interrupts, we early return if the `wake_queue` is not empty. Otherwise, we disable interrupts and check the `wake_queue` again. If it is still empty, we use the [`enable_interrupts_and_hlt`] function to enable interrupts and put the CPU to sleep as a single atomic operation. In case the queue is no longer empty, it means that an interrupt woke a task between the first and the second check. In that case, we enable interrupts again and directly continue execution without executing `hlt`. -Now our executor properly puts the CPU to sleep when there is nothing to do. We can see that the QEMU process as a much lower CPU utilization when we run our kernel using `cargo xrun` now. +Now our executor properly puts the CPU to sleep when there is nothing to do. We can see that the QEMU process as a much lower CPU utilization when we run our kernel using `cargo xrun` again. #### Possible Extensions diff --git a/blog/content/second-edition/posts/12-async-await/qemu-keyboard-output-again.gif b/blog/content/second-edition/posts/12-async-await/qemu-keyboard-output-again.gif new file mode 100644 index 0000000000000000000000000000000000000000..990ab9599c5d1e7018a651ea2ea1ab7f65dc20e1 GIT binary patch literal 7732 zcmZ?wbhEHbyuh@A@f#z<|No&33=9({Oqeuj(&WjLr%ahLb?Ve<)27XsF=OV;nX_ii znloq4+_`h-&!4|w!GcAL7A;=9c*&9_%a<=-xpL*IRjXF7UcF|`nzd`!u3NWm{rdGA zH*Va#dGnSnTefc9x^3IG?c2BS+_`htu3fu#@7}X#&%S;84jede`0(K)M~)mjcI?E7 z6DLoeJbn7~nKNh3o;`d1{Q1k5FJHZS_1d*-*RNl{b?ess`}ZF_c<|`aqsNaQKYjZ2 z*|TRaUc7ky`t_SPZ{EIr`~Lm=4<9~!{P^+Hr%#_hfBy32%h#`8r%#_gd-m*k^X4sF zxNzyxrOTErTd`urh7B7wZQ8VV@813U_a8cR=;+a-7cN}5c=6(;OP8)(xpL#ijhi=b z-oAbN?%lih?%jL%@Zt04&)>a!_wC!a&`?m|Fwhbx{^#~{4GDI33~)8lGhk+9U|>-E zC+S~Ml9`)Xm71bZTAZ1eu8>-lo|&eXoS(~}_>+Z&h2cMg4#-iUIAmb|-w@Z-+|t_C z-qG3B-P7CGKVjme$y26In?7Uatl4ws&YQnr;iAP$mM&YqV&$sUYu2t?zhUF1&0Dr^ z+rDGxuHAd~?%RLh;Gx4ujvhOH;^e8*XU?8Gf8pY#%U7;myME*5t=o6*-n;+c;iJb- zo<4j2;^nJHoSe=rLQL=bKDAhXoh$0s>LdT%{xuVqj7LF?wM27~&VlP43s*?Ewy4N# zXyxc{k}%G?vqPt`gS*+IVMV*@p(a*!ma73tES;i`Ov`L67(T`K&k&P~V0d)&*d#WO zAP*xW)s#u|RO;EjZerl}5p1$^c<8Zp)z#Gtgc1xlFsDtca$TXYmXYP4w=T0D7sJPO zjNG%c|Fb$)%s$Y1-9vpEPm%(w=7C*tjy|zJ9rSqDbO>w5)p)FKpJJ%-NBfBbmy0t~ zkxMPp^wZByH~HjDaJjfX{z9CKyWiYHj{LG5I!5z+cb7S?&FD;6U{u-4qN<}SzODUz z;q&v%e)H}~W*+9ym-68FbzN!ar@5?69&CY2S=nztINY5czOm>{eK=o5nwE{?`9~^! z&gzQE2bZp&?0@@t`q8`H0=^=q_ZnElBznVEE?jco`s&&b&5_Fvs?RTM;B~yOV%2CO z8=>#=Rfx+wg<&#NtVG76P7SW+1o5T5Wd}WV76^#(ddiesuMzfoep!(FhcPF&p)bpY zKHE9m-#mrtiBv}rz@kuojw+02Yp zla9{H+4l0;?1E#O&*zj}%X~h!;@QgQ^J>1md_KQ{P3y&i7O|`s3p>(z=S)3RQzT(NA`t5s{Zy?V8J!!fPbYqng= zdcAhXvsJIx?fLfV_4)&B+HW=-5zBtF@r2syH=E9wy?(R#f}8f+Emy*_-)_B;w)*Y1 zJ7urmZhz3G{cgvTY1!{~zF4;U-L5y=UccM@;h6UOJzuV6zu)`g+3NTE{(O7=em?`d z&W8gm;yE7$RBjw_mR(wCjGmkup8^+s%ySYroyf+5Yz1?SkXF-|v)M z&;5S4;`!R|_iDbs{eHiJUGK+(7V*3v4?EP?{dm-4{_e-)3GRA7pG*nQ`}uT6`nsRb z=9Itt`FugU-me!+rsw^7xnlXcU$53|fA{P4hU0p_-)y;__xtUR=j(pI+w=Y1@An7T z_5XZ0BA)-};|cZke?FZtfB)z61$X_wU#^7b|NVL+ef{5Wcgo-Y{r;d`|KI-~Pp0Sp z`}tz|`hUOPY=8gn_lM*9|Nnfsp8x;vkLTQI z;sBe~hem;(1uW_w2f3m?G)bIUz-E5pAYavoW(A&w9PS>6gl2tc(ePQw6@KE7*sc$) z20aUT(mf7K-TKgGab_W3`H91Fzdp1(@GKH&_c)>?^|8alXOYnK6GzmnK6VE5ED~An zaa1enV^_qPMPl1e9M!A(*qy+$SmL7J*fZhe}zeV+H@%yPT$C(qaY`aGY3cZCDH z*M%mjFAF$)S2&5Ey3l6zWuZXt3Kw;+i(OG)7D=34;bwm7Vqevl#R|MDJ>0!6O`7#( ziH7e=ukcfsrtSK&)S!2zPrBFTS+~9{vpBobul&^IdB46aci>$W(C&3*k<`}}9=@xB zrk}d9%gifk9#A z)}pMnIZH)iyq{j%b}Tfccx_0Wdg%44Z(G;TVpeuWjVB!_R7oZEDNaNp{q}e&|~1mgcWH8DC#T9=lb%W%=IhG~Lk1 zo0G~mtlXNL9)3Ff;-;?~rj@FvJ9po@cj?-eU1z66Pyc=U$)~byJ6PZ4w$8ruz)NiB zj^3Q&aO;?Ra@Tj}zb+KMo4?Z4hfw_j$+_{o|0txeaaRXCCrZ z|2Uk=%>t?_SveV)KviWG<8%gD1_q}8EKO~vbFRnDzH+c+HD6KL?$$>qVoz1LNpk9G z-mZGR`B2%^eY5-S7QQULS|t`|FvIv_iim9PiY(jKrBeBkEpoZ%gN;rt5i-?3DqdW0 z?o!xWv9fpTX3sR9$-AZc`J>gpeTANTOxtbZxa>}1WhoW1Mzt=GS~Wpr$OVsdJF=B(Ls=FXcx zJhHmBzPUcDV=7{G*F^qF+x#MQi`Jv)9BrmjBATz2EBn zi*xqT(?6}5d|W$zX^Ymaho8&66qf~es4=yq<|o>fhe{nh^2-ad4> z<;b^T^}}6uk6xMTOpi|5E5-m0X%+@X1|4RGMT|@gObiT6Of7+6VNQ@R55poxZm=*L zOOs^UtT3j4zRE~c3}hT!j{;DoiK#KuJOhY23r||T{sp)m>1-wGz?*|A^F&a zD=>t?E@{LaE|W2IfX$zSU3e*W;dR)Bw_^x%fV^}7Ll|u3N$kRxunXVDF8l;T7#sj^ zF@(WZe#0*OA43@IZVpDQiB6CaTcYCv`BWK02iRI23}IFVR|XwU1`Y;2#yO0jmNp}! zqid^cd~;h@cdI3%d6!9N&y=ZC<5{}n<0n~7LuqMGV~m$+Gwg6$+&7`k+1zE#6xqqH zW{f&Jckh9=w7b{Ex3O<_Rh>PfRl8&T0Y=eVHghfRUbAu4y~ef7`F1PIg}KjOw5z&0 zGcvVaKl}U7-+%vInWj{P_&-okxO;~4v0}hNUibKx4?jLB+}T?tBI(6&tnUg#hJ<>- z<%JU;w`bpUsPkmpu=MnFgXoPrt~CZPl6C$R_(9-+fTz2`C&djK&Kl=46g=+=xakHi zU;H-CQF}te<(_!M%~^M6`ABqitXH{Ml(6=8^O9Z;IirM+Vrm@c|8JSQYuS3?JDUXA z*Vc$K>R#X9z}%knD{I0HLB=k*SqxWYEVj+giNhqJ=sb1y-y%+BElqSswGUe1%hryLJ?=^=a&T)A+98fXFSI zo+)QVG%d6wh@7w$P`-jJ;=lk#9_xJaY&#&*F z-@pI=e+D*<2MsJ@84nsc)K)xb;xT*ipjp68<6(6UOeu1VAFgu!9^_d$wUvel~0Ik zX)jy(jFgu4i^UV%hDb|$-Oy-hr!zA!Y@5f<1g>Sk<%}Ih&IV_AcdSJOSSk{`jwB3W zu=&{-!eB#6FoeOXYjIoKkD&u>&!byK3;vb zu4UTNla&q-%fL>1fL;4b3}JAHd}5r%$OHC>NKgKP6B}#gN)nc3sb0zb=v~ScfYiWX zVZv6ZgKZRLf|()Q(!6r{*|~B&LSgJ_XJ!@U#JE9R2KKK86MA(47B<8#Y{fK}kr(V= zp&t1KUB|x#^zdDorV|sC{>zXR9JXK&_+Zx^id{GzyKn)9Fxc!$3}LW;o3I82*e_Es zbbt+-iy;hFy$r>R4YjY%H1%CdyXo6gy`=!+MX<6R*mWPoE_@1$Ft{n6#dwCv3VT!B zW7@OMWv@wQuKU zzW6FyebVq-kKgL+Z?^sZdo51xRNjO@JLE z)7`HLcNP{L*(sTEQ$owiD=Mq%)TEyk=5Cxbo_eZObDr!9VeeYC`}FPCzke^u%3SWQ z@mkz__H(P%?2x9o?jFZKRVTOJnqxUL&|BBeFD`Fh#@f4Y-mDe87yFWrudzC>zJLDu z{g2Lun_0Z?w@H;}sr;e5@bbP^9%-u@4#P!9Iz_aj_H-C7KGyqR#=7bchtZM~6IHxt z*>oB$JvCJ)dRGmn@iOnmgG^GBJB^p03pCFc>Dg|);=;lgMYV>jPWt#S@|o>s!to;A zEXAxiaBfs*Tv%98rk?A!vny|Gl>D7}*6j7&UAfig)J!?TBCl+9zj{%%+dSW{zU=Ha z(??;tH_EsESjBqg`?`Badsgq9+ii9?XidGLs`46#(dv`s*{k}gxzr4OZ|I#{fU&V*-fB#pX zKfi8P==b=>udF-nHk_3_@t~UHR>C~Kzn+g=TAn*xO=vTXQGb}><8vaCkN+0)qBgZQ zQ5jyDx)lo9-aQ_Dt?K7EB%+kIX*hRk=DkqtTzZisrPumd$74R%pc9LF?f6d0cX_l` zh@1Iqy__{AAj&g!%G#WlPo~DmT}ho55%%-xl$5qgwHaBnR;td*?K<{s#_K2Q>2p)7 zUM`z^^v0%V997>mpUpLE`=&Oj{+MRQtd3_hGZt1bt$MMz+v!!tf}S|9mkTH5Wo0fa zzjRY$+OteA)wYRhPc@eXUUh!iG-n>y>nP5TmYEZ_-RsI;x9i=h?DhNpapi0{#OIx} z@t9n9&Zbl9dZ%+XpR?o6-Ezs#J9q1~xbEC-xAIQsZogMo-P-q{uX^5&C-Y{!6cYlPX1UQ<_O-0JOn4K(e34*heHJ4EYT;z%O$S-Jk zyJB8U;I^o@2j4usQp;BAx+n-z9)pd*(U%HlfR)7}JOV4LU;pQ8S{Z1(@~UK>TCFIw zJO>v^M0caWwpB2|TqfDWFR=1?eo&6UDc9Az!{1-4<)5CYXx0m@g1`YobQcQj5=y&J zU@LKSp-!L#N7eS*7rYCNGI_V@Rqu$EVi!SFBSiO}z_tcVMwPy_gm|9`O literal 0 HcmV?d00001 From 5286828cb854f3428db39c3ee9d2d44a683e54a6 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Fri, 27 Mar 2020 17:24:02 +0100 Subject: [PATCH 49/51] Set release date for post --- blog/content/second-edition/posts/12-async-await/index.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 911d93e2..01150719 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -2,7 +2,7 @@ title = "Async/Await" weight = 12 path = "async-await" -date = 0000-01-01 +date = 2020-03-27 [extra] chapter = "Interrupts" @@ -1820,6 +1820,3 @@ To utilize the waker notifications of the keyboard task, we created a new `Execu Using async/wait, we now have basic support for cooperative multitasking in our kernel. While cooperative multitasking is very efficient, it leads to latency problems when individual tasks keep running for too long and thus prevent other tasks to run. For this reason, it makes sense to also add support for preemptive multitasking to our kernel. In the next post, we will introduce _threads_ as the most common form of preemptive multitasking. In addition to resolving the problem of long running tasks, threads will also prepare us for utilizing multiple CPU cores and running untrusted user programs in the future. - - -TODO: update date \ No newline at end of file From fb2b6f3685365336438b2fb352409d9b63079e47 Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Fri, 27 Mar 2020 17:30:37 +0100 Subject: [PATCH 50/51] Update chapter name of post --- blog/content/second-edition/posts/12-async-await/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/content/second-edition/posts/12-async-await/index.md b/blog/content/second-edition/posts/12-async-await/index.md index 01150719..cf7ab83f 100644 --- a/blog/content/second-edition/posts/12-async-await/index.md +++ b/blog/content/second-edition/posts/12-async-await/index.md @@ -5,7 +5,7 @@ path = "async-await" date = 2020-03-27 [extra] -chapter = "Interrupts" +chapter = "Multitasking" +++ In this post we explore _cooperative multitasking_ and the _async/await_ feature of Rust. We take a detailed look how async/await works in Rust, including the design of the `Future` trait, the state machine transformation, and _pinning_. We then add basic support for async/await to our kernel by creating an asynchronous keyboard task and a basic executor. From 4f8858f75dc3a00265b1b2aac03b35db90f0b8dd Mon Sep 17 00:00:00 2001 From: Philipp Oppermann Date: Fri, 27 Mar 2020 17:31:19 +0100 Subject: [PATCH 51/51] Update Readme for new async/await post --- README.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index b4dcd9d4..5afce02a 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ The code for each post lives in a separate git branch. This makes it possible to **The code for the latest post is available [here][latest-post].** -[latest-post]: https://github.com/phil-opp/blog_os/tree/post-11 +[latest-post]: https://github.com/phil-opp/blog_os/tree/post-12 You can find the branch for each post by following the `(source code)` link in the [post list](#posts) below. The branches are named `post-XX` where `XX` is the post number, for example `post-03` for the _VGA Text Mode_ post or `post-07` for the _Hardware Interrupts_ post. For build instructions, see the Readme of the respective branch. @@ -59,6 +59,11 @@ The goal of this project is to provide step-by-step tutorials in individual blog - [Allocator Designs](https://os.phil-opp.com/allocator-designs/) ([source code](https://github.com/phil-opp/blog_os/tree/post-11)) +**Multitasking**: + +- [Async/Await](https://os.phil-opp.com/async-await/) + ([source code](https://github.com/phil-opp/blog_os/tree/post-12)) + ## First Edition Posts The current version of the blog is already the second edition. The first edition is outdated and no longer maintained, but might still be useful. The posts of the first edition are: