Files
blog_os/blog/content/second-edition/posts/12-async-await/index.md
2020-03-26 13:41:25 +01:00

25 KiB

+++ title = "Async/Await" weight = 12 path = "async-await" date = 0000-01-01

[extra] chapter = "Interrupts" +++

In this post we explore cooperative multitasking and the async/await feature of Rust. This will make it possible to run multiple concurrent tasks in our kernel. TODO

This blog is openly developed on GitHub. If you have any problems or questions, please open an issue there. You can also leave comments at the bottom. The complete source code for this post can be found in the post-12 branch.

Multitasking

One of the fundamental features of most operating systems is multitasking, which is the ability to execute multiple tasks concurrently. For example, you probably have other programs open while looking at this post, such as a text editor or a terminal window. Even if you have only a single browser window open, there are probably various background tasks for managing your desktop windows, checking for updates, or indexing files.

While it seems like all tasks run in parallel, only a single task can be executed on a CPU core at a time. To create the illusion that the tasks run in parallel, the operating system rapidly switches between active tasks so that each one can make a bit of progress. Since computers are fast, we don't notice these switches most of the time.

While single-core CPUs can only execute a single task at a time, multi-core CPUs can run multiple tasks in a truly parallel way. For example, a CPU with 8 cores can run 8 tasks at the same time. We will explain how to setup multi-core CPUs in a future post. For this post, we will focus on single-core CPUs for simplicity. (It's worth noting that all multi-core CPUs start with only a single active core, so we can treat them as single-core CPUs for now.)

There are two forms of multitasking: Cooperative multitasking requires tasks to regularly give up control of the CPU so that other tasks can make progress. Preemptive multitasking uses operating system capabilities to switch threads at arbitrary points in time by forcibly pausing them. In the following we will explore the two forms of multitasking in more detail and discuss their respective advantages and drawbacks.

Preemptive Multitasking

The idea behind preemptive multitasking is that the operating system controls when to switch tasks. For that, it utilizes the fact that it regains control of the CPU on each interrupt. This makes it possible to switch tasks whenever new input is available to the system. For example, it would be possible to switch tasks when the mouse is moved or a network packet arrives. The operating system can also determine the exact time that a task is allowed to run by configuring a hardware timer to send an interrupt after that time.

The following graphic illustrates the task switching process on a hardware interrupt:

In the first row, the CPU is executing task A1 of program A. All other tasks are paused. In the second row, a hardware interrupt arrives at the CPU. As described in the Hardware Interrupts post, the CPU immediately stops the execution of task A1 and jumps to the interrupt handler defined in the interrupt descriptor table (IDT). Through this interrupt handler, the operating system now has control of the CPU again, which allows it to switch to task B1 instead of continuing task A1.

Saving State

Since tasks are interrupted at arbitrary points in time, they might be in the middle of some calculation. In order to be able to resume them later, the operating system must backup the whole state of the task, including its call stack and the values of all CPU registers. This process is called a context switch.

As the call stack can be very large, the operating system typically sets up a separate call stack for each task instead of backing up the call stack content on each task switch. Such a task with a separate stack is called a thread of execution or thread for short. By using a separate stack for each task, only the register contents need to be saved on a context switch (including the program counter and stack pointer). This approach minimizes the performance overhead of a context switch, which is very important since context switches often occur up to 100 times per second.

Discussion

The main advantage of preemptive multitasking is that the operating system can fully control the allowed execution time of a task. This way, it can guarantee that each task gets a fair share of the CPU time, without the need to trust the tasks to cooperate. This is especially important when running third-party tasks or when multiple users share a system.

The disadvantage of preemption is that each task requires its own stack. Compared to a shared stack, this results in a higher memory usage per task and often limits the number of tasks in the system. Another disadvantage is that the operating system always has to save the complete CPU register state on each task switch, even if the task only used a small subset of the registers.

Preemptive multitasking and threads are fundamental components of an operating system because they make it possible to run untrusted userspace programs. We will discuss these concepts in full detail in future posts. For this post, however, we will focus on cooperative multitasking, which also provides useful capabilities for our kernel.

Cooperative Multitasking

Instead of forcibly pausing running tasks at arbitrary points in time, cooperative multitasking lets each task run until it voluntarily gives up control of the CPU. This allows tasks to pause themselves at convenient points in time, for example when it needs to wait for an I/O operation anyway.

Cooperative multitasking is often used at the language level, for example in form of coroutines or async/await. The idea is that either the programmer or the compiler inserts yield operations into the program, which give up control of the CPU and allow other tasks to run. For example, a yield could be inserted after each iteration of a complex loop.

It is common to combine cooperative multitasking with asynchronous operations. Instead of blocking until an operation is finished and preventing other tasks to run in this time, asynchronous operations return a "not ready" status if the operation is not finished yet. In this case, the waiting task can execute a yield operation to let other tasks run.

Saving State

Since tasks define their pause points themselves, they don't need the operating system to save their state. Instead, they can save exactly the state they need for continuation before they pause themselves, which often results in better performance. For example, a task that just finished a complex computation might only need to backup the final result of the computation since it does not need the intermediate results anymore.

Language-supported implementations of cooperative tasks are often even able to backup up the required parts of the call stack before pausing. As an example, Rust's async/await implementation stores all local variables that are still needed in an automatically generated struct (see below). By backing up the relevant parts of the call stack before pausing, all tasks can share the same call stack, which results in a much smaller memory consumption per task. As a result, it is possible to create an almost arbitrary number of tasks without running out of memory.

Discussion

The drawback of cooperative multitasking is that an uncooperative task can potentially run for an unlimited amount of time. Thus, a malicious or buggy task can prevent other tasks from running and slow down or even block the whole system. For this reason, cooperative multitasking should only be used when all tasks are known to cooperate. As a counterexample, it's not a good idea to make the operating system rely on the cooperation of arbitrary userlevel programs.

However, the strong performance and memory benefits of cooperative multitasking make it a good approach for usage within a program, especially in combination with asynchronous operations. Since an operating system kernel is a performance-critical program that interacts with asynchronous hardware, cooperative multitasking seems like a good approach for concurrency in our kernel.

Async/Await in Rust

The Rust language provides first-class support for cooperative multitasking in form of async/await. Before we can explore what async/await is and how it works, we need to understand how futures and asynchronous programming work in Rust.

Futures

A future represents a value that might not be available yet. This could be for example an integer that is computed by another task or a file that is downloaded from the network. Instead of waiting until the value is available, futures make it possible to continue execution until the value is needed.

Example

The concept of futures is best illustrated with a small example:

Sequence diagram: main calls read_file and is blocked until it returns; then it calls foo() and is also blocked until it returns. The same process is repeated, but this time async_read_file is called, which directly returns a future; then foo() is called again, which now runs concurrently to the file load. The file is available before foo() returns.

This sequence diagram shows a main function that reads a file from the file system and then calls a function foo. This process is repeated to times: Once with a synchronous read_file call and once with an asynchronous async_read_file call.

With the synchronous call, the main function needs to wait until the file is loaded from the file system. Only then it can call the foo function, which requires it to again wait for the result.

With the asynchronous async_read_file call, the file system directly returns a future and loads the file asynchronously in the background. This allows the main function to call foo much earlier, which then runs in parallel with the file load. In this example, the file load even finishes before foo returns, so main can directly work with the file without further waiting after foo returns.

Futures in Rust

In Rust, futures are represented by the Future trait, which looks like this:

pub trait Future {
    type Output;
    fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output>;
}

The associated type Output specfies the type of the asynchronous value. For example, the async_read_file function in the diagram above would return a Future instance with Output set to File.

The poll method allows to check if the value is already available. It returns a Poll enum, which looks like this:

pub enum Poll<T> {
    Ready(T),
    Pending,
}

When the value is already available (e.g. the file was fully read from disk), it is returned wrapped in the Ready variant. Otherwise, the Pending variant is returned, which signals the caller that the value is not yet available.

The poll method takes two arguments: self: Pin<&mut Self> and cx: &mut Context. The former behaves like a normal &mut self reference, with the difference that the Self value is pinned to its memory location. Understanding Pin and why it is needed is difficult without understanding how async/await works first. We will therefore explain it later in this post.

The purpose of the cx: &mut Context parameter is to pass a Waker instance to the asynchronous task, e.g. the file system load. This Waker allows the asynchronous task to signal that it (or a part of it) is finished, e.g. that the file was loaded from disk. Since the main task knows that it will be notified when the Future is ready, it does not need to call poll over and over again. We will explain this process in more detail later when we implement an own Waker type.

Working with Futures

We now know how futures are defined and the rough idea behind the poll method. However, we still don't know how to effectively work with futures. The problem is that futures represent results of asynchronous tasks, which might be not available yet. In practice, however, we often need these values directly for further calculations. So the question is: How can we efficiently retrieve the value of a future when we need it?

Waiting on Futures

One possible answer is to wait until a future becomes ready. This could look something like this:

let future = async_read_file("foo.txt");
let file_content = loop {
    match future.poll() {
        Poll::Ready(value) => break value,
        Poll::Pending => {}, // do nothing
    }
}

Here we actively wait for the future by calling poll over and over again in a loop. The arguments to poll don't matter here, so we omitted them. While this solution works, it is very inefficient because we keep the CPU busy until the value becomes available.

A more effective approach could be to block the current thread until the future becomes available. This is of course only possible if you have threads, so this solution does not work for kernel, at least not yet. Even on systems where blocking is supported, it is often not desired because it turns an asynchronous task into a synchronous task again, thereby inhibiting the potential performance benefits.

Future Combinators

An alternative to waiting is to use future combinators. Future combinators are functions like map that allow chaining and combining futures together, similar to the functions on Iterator. Instead of waiting on the future, these combinators return a future themselves, which applies the mapping operation on poll.

As an example, a simple string_len combinator for converting Future<Output = String> to a Future<Output = usize could look like this:

struct StringLen<F> {
    inner_future: F,
}

impl<F> Future for StringLen<F> where Fut: Future<Output = String> {
    type Output = usize;

    fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<T> {
        match self.inner_future.poll(cx) {
            Poll::Ready(s) => Poll::Ready(s.len()),
            Poll::Pending => Poll::Pending,
        }
    }
}

fn string_len(string: impl Future<Output = String>)
    -> impl Future<Output = usize>
{
    StringLen {
        inner_future: string,
    }
}

// Usage
fn file_len() -> impl Future<Output = usize> {
    let file_content_future = async_read_file("foo.txt");
    string_len(file_content_future)
}

This code does not quite work because it does not handle pinning, but it suffices as an example. The basic idea is that the string_len function wraps a given Future instance into a new StringLen struct, which also implements Future. When the wrapped future is polled, it polls the inner future. If the value is not ready yet, Poll::Pending is returned from the wrapped future too. If the value is ready, the string is extracted from the Poll::Ready variant and its length is calculated. Afterwards, it is wrapped in Poll::Ready again and returned.

Manually writing correct combinator methods is difficult, therefore they are often provided by libraries. While the Rust standard library itself provides no combinator methods yet, the semi-official (and no_std compatible) futures crate does. Its FutureExt trait provides high-level combinator methods such as map or then, which can be used to manipulate the result with arbitrary closures.

Advantages

The big advantage of future combinators is that they keep the operations asynchronous. In combination with asynchronous I/O interfaces, this approach can lead to very high performance. The fact that future combinators are implemented as normal structs with trait implementations allows the compiler to excessively optimizing them to a efficient state machine. For more details, see the Zero-cost futures in Rust post, which announced the addition of futures to the Rust ecosystem.

Drawbacks

While future combinators make it possible to write very efficient code, they can be difficult to use in some situations because of the type system and the closure based interface. For example, consider code like this:

async_read_file("foo.txt").then(|content| {
    if content.len() > 100 {
        Either::Left(async_read_file("bar.txt"))
    } else {
        Either::Right(future::ready(content))
    }
})

(Try it on the playground)

Here we read the file foo.txt and then use the then combinator to chain a second future based on the file content. If the content length is greater than 100, we read a different bar.txt file and return its content, otherwise we return the content of foo.txt.

The reason for the Either wrapper is that if and else blocks must always have the same type. Since we return different future types in the blocks, we must use the wrapper type to unify them into a single type. The ready function wraps a value into a future, which is immediately ready. The function is required here because the Either wrapper expects that the wrapped value implements Future.

As you can imagine, this can quickly lead to very complex code for larger projects. It gets especially complicated if borrowing and different lifetimes are involved. For this reason, a lot of work was invested to add support for async/await to Rust, with the goal of making asynchronous code radically simpler to write.

The Async/Await Pattern

The idea behind async/await is to let the programmer write code that looks like normal synchronous code, but is turned into asynchronous code by the compiler. It works based on the two keywords async and await. The async keyword can be used in a function signature to turn a synchronous function into an asynchronous function that returns a future:

async fn foo() -> u32 {
    0
}

// the above is roughly translated by the compiler to:
fn foo() -> impl Future<Output = u32> {
    future::ready(0)
}

This keyword alone wouldn't be that useful. However, inside async functions, the await keyword can be used to retrieve the asynchronous value of a future:

async fn foo() -> String {
    let content = async_read_file("foo.txt").await;
    if content.len() > 100 {
        async_read_file("bar.txt").await
    } else {
        content
    }
}

(Try it on the playground)

This function is a direct translation of the future combinator code example, which required the Either wrapper type. Using the .await operator, we can retrieve the value of a future without needing any closures. As a result, we can write our code like we write normal synchronous code, with the difference that this is still asynchronous code.

State Machine Transformation

What the compiler does behind this scenes is to transform the body of the async function into a state machine, with each .await call representing a different state. For the above foo function, the compiler creates a state machine with the following four states:

start   waiting on 1st future    waiting on 2nd future   end

This state machine implements the Future trait by making each poll call a possible state switch event:

start   waiting on 1st future    waiting on 2nd future    end
|                 ^                         ^              ^
|                 |                         |              |
------------------------------------------------------------

The first poll call starts the function and lets it run until it reaches a future that is not ready yet. If all futures are ready, the function can run till its end and return its return value wrapped in Poll::Ready. Otherwise, Poll::Pending is returned. Internally, the stack machine keeps track of the active state, so that it can continue there on the next poll call.

On subsequent calls to poll, the state machine continues from the current state and polls the future it currently waits on again. In case it is ready now, it continues execution until it reaches the next future that is not ready. If it is still not ready, it stays in the state and returns Poll::Pending again.

The Async Keyword

The purpose of the async/await pattern is to make working with futures easier. Rust has language-level support for this pattern built on the two keywords async and await. We will explain them individually, starting with async.

The purpose of the async keyword is to turn a synchronous function into an asynchronous function that returns a Future:

fn synchronous() -> u32 {
    42
}

async fn asynchronous() -> u32 {
    42
}

While both functions specify a return type of u32, the async keyword turns the return type of the second function into impl Future<Output = u32>. So instead of returning an u32 directly, the asynchronous function returns a type that implements the Future trait with output type u32. We can see this when we try to assign the result to a variable of type u32:

let val: u32 = asynchronous();

The compiler responds with the following error (try it on the playground):

error[E0308]: mismatched types
  --> src/main.rs:3:23
   |
3  |     let val: u32 = asynchronous();
   |              ---   ^^^^^^^^^^^^^^ expected `u32`, found opaque type
   |              |
   |              expected due to this
...
10 | async fn asynchronous() -> u32 {
   |                            --- the `Output` of this `async fn`'s found opaque type
   |
   = note:     expected type `u32`
           found opaque type `impl std::future::Future`

The relevant part of that error message are the last two lines: It expects an u32 because of the type annotation, but the function returned an implementation of the Future trait instead.

Of course, changing the return type alone would not work. Instead, the compiler also needs to convert the function body, which is 42 in our case, into a future. Since 42 is not asynchronous, the compiler just generates a future that returns the result on the first poll. The generated code could look something like this:

struct GeneratedFuture;

impl Future for GeneratedFuture {
    type Output = u32;

    fn poll(self: Pin<&mut Self>, _cx: &mut Context) -> Poll<Self::Output> {
        Poll::Ready(42)
    }
}

fn asynchronous() -> impl Future<Output = u32> {
    GeneratedFuture
}

Instead of returning u32, the asynchronous function now returns an instance of a new GeneratedFuture struct. This struct implements the Future trait by returning Poll::Ready(42) on poll. The 42 is the body of asynchronous in this case.

Note that this is just an example implementation. The actual code generated by the compiler uses a much more powerful approach, which we will explain in a moment.

In addition to async futures, Rust also supports async blocks:

let future = async {
    42
};

The future variable also has the type impl Future<Output = u32> in this case. The generated code is very similar to the async fn, only without a function call: let future = GeneratedFuture;.

We now know roughly what the async keyword does, but we still don't know why it's useful yet. After all, there is no advantage of returning a impl Future<Output = u32> instead of returning the u32 directly. To answer this question, we have to explore different ways to work with futures.

Await

Generators