Yay ! New post !
I'm so impatient to read the next one ^^
Thank you a lot for what you're doing :)
diff --git a/blog/templates/first-edition/comments/allocating-frames.html b/blog/templates/first-edition/comments/allocating-frames.html index 500f3600..5604d773 100644 --- a/blog/templates/first-edition/comments/allocating-frames.html +++ b/blog/templates/first-edition/comments/allocating-frames.html @@ -1,213 +1,213 @@ {% raw %} -
Yay ! New post !
I'm so impatient to read the next one ^^
Thank you a lot for what you're doing :)
Yay ! New post !
I'm so impatient to read the next one ^^
Thank you a lot for what you're doing :)
You're welcome :). I plan to start writing on the next one in the next few days, but I can't promise anything.
Looking forward to the next post! Thanks for all your hard work on this, this tutorial series is incredibly enlightening.
Thanks a lot! The next post is nearly done: https://github.com/phil-opp...
Thanks a lot! The next post is nearly done: https://github.com/phil-opp...
Hey everybody!
I am stuck with #reproducing-the-bug-in-qemu. I cannot reproduce the alignment issue, neither in QEMU (with -enable-kvm) nor on real hardware.
Has anyone an idea how to force the misalignment?
I think that this could be a side effect of the recent update of our VGA driver code, which introduces volatile writes.
With volatile, the compiler can no longer use SSE instructions to combine multiple VGA buffer writes. The SSE instructions are the instructions that require the 16 byte alignment. So without them, the error no longer occurs. (Of course the issue is still there. We just need a different code sample to trigger it.)
I'll try to take a closer look on this issue in the next few days. Thanks a lot for your this!
I think that this could be a side effect of the recent update of our VGA driver code, which introduces volatile writes.
With volatile, the compiler can no longer use SSE instructions to combine multiple VGA buffer writes. The SSE instructions are the instructions that require the 16 byte alignment. So without them, the error no longer occurs. (Of course the issue is still there. We just need a different code sample to trigger it.)
I'll try to take a closer look on this issue in the next few days. Thanks a lot for your this!
I've updated the post: https://github.com/phil-opp...
We now add some garbage code to the `divide_by_zero_handler`, which should compile to a `movaps` instruction again. This should lead to a bootloop on real hardware. Does it work for you?
I've updated the post: https://github.com/phil-opp...
We now add some garbage code to the `divide_by_zero_handler`, which should compile to a `movaps` instruction again. This should lead to a bootloop on real hardware. Does it work for you?
Phil these tutorials are awesome, keep going!
What is your view on Redox ? https://github.com/redox-os...
Awesome tutorials! Please please please keep writing them :)
Thanks so much! Sure I will :)
I'm a computer science student and I've taken some great OS courses. It's also a hobby of mine and I've experimented with a lot with toy x86 kernels and Rust. Most of the x86 information is from the OSDev wiki and the Intel/AMD manuals.
I also have a great research assistant job since November, where I try to bring Rust to an ARM Cortex-M7 board.
I'm a computer science student and I've taken some great OS courses. It's also a hobby of mine and I've experimented with a lot with toy x86 kernels and Rust. Most of the x86 information is from the OSDev wiki and the Intel/AMD manuals.
I also have a great research assistant job since November, where I try to bring Rust to an ARM Cortex-M7 board.
This is one of my favourite series, you're really doing such a great job of explaining bare-metal stuff to us all thank you!
I actually encountered the println deadlock earlier while debugging something and solved it in a slightly different way. The problem generally occurs when a second println is encountered while evaluating one of the arguments to an outer println. So, I changed println to call a helper function called print_fmt which took in a core::fmt::Arguments. I used the format_args macro (https://doc.rust-lang.org/n... to evaluate the arguments and produce the core::fmt::Arguments, which I pass to print_fmt. Only within print_fmt do I actually take the lock on the WRITER, which means that all the expressions in the println! have been fully evaluated.
The advantage being you can nest println's as far as you want and you won't deadlock :)
See my implementation here: https://github.com/anurse/Oxygen/blob/dfda170b3f3d45eca20d4a1366e5d62384d7b2e4/src/vga.rs
Great posts by the way, loving the series!
I actually encountered the println deadlock earlier while debugging something and solved it in a slightly different way. The problem generally occurs when a second println is encountered while evaluating one of the arguments to an outer println. So, I changed println to call a helper function called print_fmt which took in a core::fmt::Arguments. I used the format_args macro (https://doc.rust-lang.org/n... to evaluate the arguments and produce the core::fmt::Arguments, which I pass to print_fmt. Only within print_fmt do I actually take the lock on the WRITER, which means that all the expressions in the println! have been fully evaluated.
The advantage being you can nest println's as far as you want and you won't deadlock :)
See my implementation here: https://github.com/anurse/Oxygen/blob/dfda170b3f3d45eca20d4a1366e5d62384d7b2e4/src/vga.rs
Great posts by the way, loving the series!
Thanks! I really like your solution. I wanted to fix the nested-println-deadlocks too, but totally forgot to do it… However, the print_error solution also has advantages. For example, it always displays the error, even when we handle asynchronous hardware interrupts in the future.
So I think that I'd like to keep the print_error solution but also integrate your println changes (in order to fix deadlocks on nested printlns). I'm just not sure how to integrate it. Maybe I'll just add another section to the end of this post (just before `What's next`)…What do you think?
Thanks! I really like your solution. I wanted to fix the nested-println-deadlocks too, but totally forgot to do it… However, the print_error solution also has advantages. For example, it always displays the error, even when we handle asynchronous hardware interrupts in the future.
So I think that I'd like to keep the print_error solution but also integrate your println changes (in order to fix deadlocks on nested printlns). I'm just not sure how to integrate it. Maybe I'll just add another section to the end of this post (just before `What's next`)…What do you think?
Hey. This post seems to break returning from interruptions.
Eg. int!(3) produces a double fault while it returned correctly in the previous post
Aside from that, thanks for this series. It is superbly written with lots of useful info :)
Hey. This post seems to break returning from interruptions.
Eg. int!(3) produces a double fault while it returned correctly in the previous post
Aside from that, thanks for this series. It is superbly written with lots of useful info :)
You're right, I don't know how I didn't notice this.
It seems like the problem is our new GDT, which doesn't have a data segment descriptor. This alone is fine, but the `ss` register still holds a value which is saved to the stack when an exception occurs and checked by `iretq` when we return. One possible solution is to load 0 into the `ss` register after entering long mode (null descriptors are explicitely allowed in `ss` in 64-bit mode). Another solution is to add a valid writable data segment to our new GDT.
This issue is tracked in #277 and I plan to fix it in the next few days. Thanks a lot for reporting!
Aside from that, thanks for this series. It is superbly written with lots of useful info :)
Thanks so much!
You're right, I don't know how I didn't notice this.
It seems like the problem is our new GDT, which doesn't have a data segment descriptor. This alone is fine, but the `ss` register still holds a value which is saved to the stack when an exception occurs and checked by `iretq` when we return. One possible solution is to load 0 into the `ss` register after entering long mode (null descriptors are explicitely allowed in `ss` in 64-bit mode). Another solution is to add a valid writable data segment to our new GDT.
This issue is tracked in #277 and I plan to fix it in the next few days. Thanks a lot for reporting!
Aside from that, thanks for this series. It is superbly written with lots of useful info :)
Thanks so much!
For all people that have the same problem:
The problem is that the ss segment register still contains a selector of the previous GDT that is no longer valid. The iretq instruction expects a valid data segment selector or a null selector. So the easiest way to fix this problem is to add the following to the long_mode_init.asm:
long_mode_start:
; load 0 into all data segment registers
mov ax, 0
mov ss, ax
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
(You only need to reload ss. However, the other registers also no longer point to valid data segments, so it's cleaner to invalidate them, too.)
For all people that have the same problem:
The problem is that the ss segment register still contains a selector of the previous GDT that is no longer valid. The iretq instruction expects a valid data segment selector or a null selector. So the easiest way to fix this problem is to add the following to the long_mode_init.asm:
long_mode_start:
; load 0 into all data segment registers
mov ax, 0
mov ss, ax
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
(You only need to reload ss. However, the other registers also no longer point to valid data segments, so it's cleaner to invalidate them, too.)
I just asked on the irc because this seemed like it could be possible with lifetimes. They suggested using PhantomData to add a lifetime parameter to the index and the stack pointers could be replaced by Option<&_>, which is easily obtained by an as_ref() method on a raw pointer.
The biggest issue here is verifying that the Option is the correct size.
I just asked on the irc because this seemed like it could be possible with lifetimes. They suggested using PhantomData to add a lifetime parameter to the index and the stack pointers could be replaced by Option<&_>, which is easily obtained by an as_ref() method on a raw pointer.
The biggest issue here is verifying that the Option is the correct size.
The biggest issue here is verifying that the Option is the correct size.
As far as I know, an Option<&X> has always the same size as a &X, since references implement the the NonZero trait. We could also use a struct StackPointer(usize) and implement NonZero for it. Then an Option<stackpointer> has the same size as an usize.
However, I don't think that it suffices to add a lifetime parameter to the index. For example, we could create two static TSSs A and B. Now we can load TSS A in the CPU but use an index from TSS B in our IDT.
The biggest issue here is verifying that the Option is the correct size.
As far as I know, an Option<&X> has always the same size as a &X, since references implement the the NonZero trait. We could also use a struct StackPointer(usize) and implement NonZero for it. Then an Option<stackpointer> has the same size as an usize.
However, I don't think that it suffices to add a lifetime parameter to the index. For example, we could create two static TSSs A and B. Now we can load TSS A in the CPU but use an index from TSS B in our IDT.
Hi Phill sir. I implemented the OS following the blog till here. So, I was trying to implement system calls to the OS. I tried to use the "syscall" crate and implement a "write" syscall but it caused a Double fault. Is this due to the page fault? Here is the stack frame
IP : 0x110e43
code_segment : 8
cpu_flags : 0x200006
stack_pointer : 0x121dd0
stack_segment : 0
Guard page is at 0x11b000
Hi Phill sir. I implemented the OS following the blog till here. So, I was trying to implement system calls to the OS. I tried to use the "syscall" crate and implement a "write" syscall but it caused a Double fault. Is this due to the page fault? Here is the stack frame
IP : 0x110e43
code_segment : 8
cpu_flags : 0x200006
stack_pointer : 0x121dd0
stack_segment : 0
Guard page is at 0x11b000
The problem might be that you didn't define a handler function for the syscall interrupt or that the privilege level of the IDT entry doesn't allow invokations from userland. In that case, a general protection exception occurs. If you didn't define a handler for this exception, it causes a double fault.
The problem might be that you didn't define a handler function for the syscall interrupt or that the privilege level of the IDT entry doesn't allow invokations from userland. In that case, a general protection exception occurs. If you didn't define a handler for this exception, it causes a double fault.
Very interesting series, thank you. Please feel free to add some sort of subscriber feed (RSS,Atom,..)! + If there is one, Firefox didn't find it. Haven't tried inoreader yet.
There should be a feed at https://os.phil-opp.com/rss.xml
Hey! I've been going through your blog, and I think it's splendidly written. I was wondering when the next post would be out
Thanks a lot! I'm currently working on a second edition of this blog, which reorders the posts (exceptions before page tables) and uses an own bootloader. So the plan is to rewrite the earlier posts, reuse the posts about exceptions, and then write some new posts about hardware interrupts and keyboard input.
+ AntwortenThanks a lot! I'm currently working on a second edition of this blog, which reorders the posts (exceptions before page tables) and uses an own bootloader. So the plan is to rewrite the earlier posts, reuse the posts about exceptions, and then write some new posts about hardware interrupts and keyboard input.
-I created an issue to track this.
Are you running your own blog post ? i've reading it the first half and already want to point out, this is all i need from such a great programmer. Otherways i would have asked my boss for such a course, but i think this can bring me to the path i wanted, i am a webdeveloper and want to serve json files on the internet. But to be ISO 27001 compliant i needed this information...
+ AntwortenAre you running your own blog post ? i've reading it the first half and already want to point out, this is all i need from such a great programmer. Otherways i would have asked my boss for such a course, but i think this can bring me to the path i wanted, i am a webdeveloper and want to serve json files on the internet. But to be ISO 27001 compliant i needed this information...
-Wanted to share a project I am working on, I started from this awesome blog! - https://github.com/arbel03/os
Wanted to share a project I am working on, I started from this awesome blog! + https://github.com/arbel03/os
Looks like you created your own bootloader and already have some kind of filesystem. Really cool!
+ AntwortenLooks like you created your own bootloader and already have some kind of filesystem. Really cool!
-We have just created the rust-osdev organization on Github, where we plan to host and maintain all kinds of libraries needed for OS development in Rust (e.g. the x86_64 crate, a bootloader, etc.). Let me know if you'd like to become a member, maybe we can join forces.
Sadly, this appears to no longer compile, as some of the dependencies are now rather different and some language features have changed. I know you're busy with the second edition effort, but is there any chance there are updates waiting in the wings to the first edition parts?
Sadly, this appears to no longer compile, as some of the dependencies are now rather different and some language features have changed. I know you're busy with the second edition effort, but is there any chance there are updates waiting in the wings to the first edition parts?
Sorry, I don't have the time to keep the first version up to date. I try my best to incorporate the first edition posts into the second edition soon, but it will take some time.
Hi, thank you for the blog posts, finding them really accessible and interesting.
The test_long_mode function doesn't look quite right:
test_long_mode:
mov eax, 0x80000000 ; Set the A-register to 0x80000000.
cpuid ; CPU identification.
cmp eax, 0x80000001 ; Compare the A-register with 0x80000001.
jb .no_long_mode ; It is less, there is no long mode.
mov eax, 0x80000000 ; Set the A-register to 0x80000000.
cpuid ; CPU identification.
cmp eax, 0x80000001 ; Compare the A-register with 0x80000001.
jb .no_long_mode ; It is less, there is no long mode.
ret
^ should probaly be (according to your linked OSDEV page):
test_long_mode:
mov eax, 0x80000000 ; Set the A-register to 0x80000000.
cpuid ; CPU identification.
cmp eax, 0x80000001 ; Compare the A-register with 0x80000001.
jb .no_long_mode ; It is less, there is no long mode.
mov eax, 0x80000001 ; Set the A-register to 0x80000001.
cpuid ; CPU identification.
test edx, 1 << 29 ; Test if the LM-bit, which is bit 29, is set in the D-register.
jz .no_long_mode ; They aren't, there is no long mode.
ret
Hi, thank you for the blog posts, finding them really accessible and interesting.
The test_long_mode function doesn't look quite right:
test_long_mode:
mov eax, 0x80000000 ; Set the A-register to 0x80000000.
cpuid ; CPU identification.
cmp eax, 0x80000001 ; Compare the A-register with 0x80000001.
jb .no_long_mode ; It is less, there is no long mode.
mov eax, 0x80000000 ; Set the A-register to 0x80000000.
cpuid ; CPU identification.
cmp eax, 0x80000001 ; Compare the A-register with 0x80000001.
jb .no_long_mode ; It is less, there is no long mode.
ret
^ should probaly be (according to your linked OSDEV page):
test_long_mode:
mov eax, 0x80000000 ; Set the A-register to 0x80000000.
cpuid ; CPU identification.
cmp eax, 0x80000001 ; Compare the A-register with 0x80000001.
jb .no_long_mode ; It is less, there is no long mode.
mov eax, 0x80000001 ; Set the A-register to 0x80000001.
cpuid ; CPU identification.
test edx, 1 << 29 ; Test if the LM-bit, which is bit 29, is set in the D-register.
jz .no_long_mode ; They aren't, there is no long mode.
ret
You're right, thank you! I created an Github issue and will fix it soon: https://github.com/phil-opp...
You're right, thank you! I created an Github issue and will fix it soon: https://github.com/phil-opp...
Thank you! For any further issues I run into would you prefer I posted to the github page?
I ran into another snag, at the end of the paging setup section (just before the GDT section) you state:
"To test it we execute make run. If the green OK is still printed, we have successfully enabled paging!"
I'm assuming (based on trying it out) it would not be bootable at that stage, though it may in some way be down to the specifics of my setup or mistakes on my part.
When run with qemu it repeatedly restarts and doesn't reach an 'OK' boot.
Removing the `enable_paging` call allows the OS to boot properly, though the paging will not be set up. Further forwards, once the GDT is implemented, the OS once more boots without a hitch.
I checked out the repo and stripped out the later steps to compensate for any errors I made in copying.
Thanks again for these posts.
Thank you! For any further issues I run into would you prefer I posted to the github page?
I ran into another snag, at the end of the paging setup section (just before the GDT section) you state:
"To test it we execute make run. If the green OK is still printed, we have successfully enabled paging!"
I'm assuming (based on trying it out) it would not be bootable at that stage, though it may in some way be down to the specifics of my setup or mistakes on my part.
When run with qemu it repeatedly restarts and doesn't reach an 'OK' boot.
Removing the `enable_paging` call allows the OS to boot properly, though the paging will not be set up. Further forwards, once the GDT is implemented, the OS once more boots without a hitch.
I checked out the repo and stripped out the later steps to compensate for any errors I made in copying.
Thanks again for these posts.
Post issues wherever you want (and thank you for doing it).
Hmm, I can't reproduce it on my machine. I checked out commit 457a613 (see link below) and it ran without problems. Could you try the code from this commit?
Link to 457a613: https://github.com/phil-opp...
Post issues wherever you want (and thank you for doing it).
Hmm, I can't reproduce it on my machine. I checked out commit 457a613 (see link below) and it ran without problems. Could you try the code from this commit?
Link to 457a613: https://github.com/phil-opp...
This is fun!
Continuing my saga of trying to get this running under Ubuntu 14.04 LTS on a MacBook Pro, if you find that enabling long paging causes an infinite reboot cycle (triple fault?) in QEMU, then you might want to check your qemu-system-x86_64 version. Version 2.0 will reboot infinitely as soon as you try to turn on paging. Version 2.1.2, however, works fine.
Is this perhaps a problem with huge pages? Would it help to add another feature test?
In order to prevent more debugging fun, I've downloaded and built your blog_os repo, and I can now see it print "Hello world", so it should be smooth sailing from here. :-)
Once again, thank you for a cool series of blog posts! Are there any OS development books that you recommend for ideas on further enhancing this basic system?
This is fun!
Continuing my saga of trying to get this running under Ubuntu 14.04 LTS on a MacBook Pro, if you find that enabling long paging causes an infinite reboot cycle (triple fault?) in QEMU, then you might want to check your qemu-system-x86_64 version. Version 2.0 will reboot infinitely as soon as you try to turn on paging. Version 2.1.2, however, works fine.
Is this perhaps a problem with huge pages? Would it help to add another feature test?
In order to prevent more debugging fun, I've downloaded and built your blog_os repo, and I can now see it print "Hello world", so it should be smooth sailing from here. :-)
Once again, thank you for a cool series of blog posts! Are there any OS development books that you recommend for ideas on further enhancing this basic system?
You're right, support for 1GB pages was introduced in QEMU 2.1 in 2014. Intel CPUs support it since Westmere (2010).
There is indeed a way to test support: CPUID 0x80000001, EDX bit 26. But I'm not quite sure if it's good to rely on such a "new" feature at all... Maybe I change it to use 2MB pages instead...
I opened an issue for it. Thank you very much for the hint!
Edit: Updated the code and article to use 2MiB pages instead of 1GiB pages. It now works on my old PC from 2005 again :).
You're right, support for 1GB pages was introduced in QEMU 2.1 in 2014. Intel CPUs support it since Westmere (2010).
There is indeed a way to test support: CPUID 0x80000001, EDX bit 26. But I'm not quite sure if it's good to rely on such a "new" feature at all... Maybe I change it to use 2MB pages instead...
I opened an issue for it. Thank you very much for the hint!
Edit: Updated the code and article to use 2MiB pages instead of 1GiB pages. It now works on my old PC from 2005 again :).
Are there any OS development books that you recommend for ideas on further enhancing this basic system?
Well, there is the Three Easy Pieces book I linked in the post, which gives a theoretical overview over different OS concepts. Then there's the little book about OS development, which is more practical and contains C example code. Of course there are many paid books, too.
Besides books, the OSDev Wiki is also a good resource for many topics. Looking at the source of e.g. Redox can be helpful, too.
For exotical ideas, I really like the concept of Phantom OS and Rust's memory safety might allow something similar… We'll see ;)
Are there any OS development books that you recommend for ideas on further enhancing this basic system?
Well, there is the Three Easy Pieces book I linked in the post, which gives a theoretical overview over different OS concepts. Then there's the little book about OS development, which is more practical and contains C example code. Of course there are many paid books, too.
Besides books, the OSDev Wiki is also a good resource for many topics. Looking at the source of e.g. Redox can be helpful, too.
For exotical ideas, I really like the concept of Phantom OS and Rust's memory safety might allow something similar… We'll see ;)
There's still some text in the article referring to the gigabyte page, like "Now the first gigabyte of our kernel is identity mapped", but otherwise, immense thanks for this article; even though I'm not going to use Rust, these two articles actually got me up and running in long mode *without* hassle!
There's still some text in the article referring to the gigabyte page, like "Now the first gigabyte of our kernel is identity mapped", but otherwise, immense thanks for this article; even though I'm not going to use Rust, these two articles actually got me up and running in long mode *without* hassle!
> An entry in the P4, P3, P2, and P1 tables consists of the page aligned 52-bit physical address of the page/next page table and the following bits that can be OR-ed in:
I can't quite make sense of that - so the physical addresses which are available to virtual addressing are only 52bit (instead of all 64bit)? There appear to be 24 flags which can be or'ed in, but wouldn't that necessitate overwriting parts of the physical address (52bit + 22bit > 64bit) of the page/page table?
> An entry in the P4, P3, P2, and P1 tables consists of the page aligned 52-bit physical address of the page/next page table and the following bits that can be OR-ed in:
I can't quite make sense of that - so the physical addresses which are available to virtual addressing are only 52bit (instead of all 64bit)? There appear to be 24 flags which can be or'ed in, but wouldn't that necessitate overwriting parts of the physical address (52bit + 22bit > 64bit) of the page/page table?
The key is that the physical addresses are page aligned. The last 12 bits are thus guaranteed to be 0 and can be used to store some flags. So there are 24 bits for the various flags and 52-12=40 bits for the aligned physical address.
I'm confused about this as well. Why say "52-bit physical address" if the address is only 40 bits? Is it because the address is between sets of flags? Meaning, do the table entries really look like this?
+-------+----------------------------------------+-------+
| flags | physical address (frame or next table) | flags |
+-------+----------------------------------------+-------+
63 51 11 0
Can you check my understanding:
* Virtual addresses are effectively 48 bits:
* Highest 16 bits are sign extension of 48th bit
* Next 36 bits are used to navigate the paging tables
* Lowest 12 bits are used as offset from physical address
found in P1
* Physical addresses are effectively 40 bits and page aligned
* Paging table entries are 64 bits:
* Highest 12 bits are flags
* Next 40 bits are the physical address of a table or frame
* Lowest 12 bits are flags
Thus physical addresses identify the start of each aligned frame, and virtual addresses identify the location within the frame.
I'm confused about this as well. Why say "52-bit physical address" if the address is only 40 bits? Is it because the address is between sets of flags? Meaning, do the table entries really look like this?
+-------+----------------------------------------+-------+
| flags | physical address (frame or next table) | flags |
+-------+----------------------------------------+-------+
63 51 11 0
Can you check my understanding:
* Virtual addresses are effectively 48 bits:
* Highest 16 bits are sign extension of 48th bit
* Next 36 bits are used to navigate the paging tables
* Lowest 12 bits are used as offset from physical address
found in P1
* Physical addresses are effectively 40 bits and page aligned
* Paging table entries are 64 bits:
* Highest 12 bits are flags
* Next 40 bits are the physical address of a table or frame
* Lowest 12 bits are flags
Thus physical addresses identify the start of each aligned frame, and virtual addresses identify the location within the frame.
The physical address is 52 bits. It is possible to address up to 2^52 bytes of memory with it. Operating systems without paging (e.g. MS-DOS) directly use the physical address to access memory. And so do we before we enable paging.
As soon as we enable paging, the CPU uses the memory management unit (MMU) to translate used addresses (“virtual addresses”) to the real memory addresses. These virtual addresses are effectively 48 bits on x86_64 and behave exactly as you stated.
So why are only 40 physical address bits stored in the page table? The reason is that the physical memory is split into page sized chunks, which are called frames. The first frame starts at physical address 0, the second frame at physical address 4096, and so on. Thus the physical address of a frame is always page aligned. There are still non-page-aligned physical addresses but they can't be the start of a frame.
So the lowest 12 bits of a valid physical frame address are always 0. We don't need to store anything if we know that it is always 0. Thus these bits can be used to store useful information instead (flags in our case).
I hope this helps in clearing up your confusion.
The physical address is 52 bits. It is possible to address up to 2^52 bytes of memory with it. Operating systems without paging (e.g. MS-DOS) directly use the physical address to access memory. And so do we before we enable paging.
As soon as we enable paging, the CPU uses the memory management unit (MMU) to translate used addresses (“virtual addresses”) to the real memory addresses. These virtual addresses are effectively 48 bits on x86_64 and behave exactly as you stated.
So why are only 40 physical address bits stored in the page table? The reason is that the physical memory is split into page sized chunks, which are called frames. The first frame starts at physical address 0, the second frame at physical address 4096, and so on. Thus the physical address of a frame is always page aligned. There are still non-page-aligned physical addresses but they can't be the start of a frame.
So the lowest 12 bits of a valid physical frame address are always 0. We don't need to store anything if we know that it is always 0. Thus these bits can be used to store useful information instead (flags in our case).
I hope this helps in clearing up your confusion.
Thanks, this has indeed become more clear as I've worked with it. I wrote (and just revised due to better understanding) a detailed comment and that helped nail it down for me.
In case it's not clear to anyone else, the reason the lower bits are always 0 is because 4096 = 0x1000.
Another question then: since we're aligning on 2mib pages here (0x200000), can we access the extra few bits (21 vs 12)?
I'll try this myself once I'm allocating pages.
Edit:
It seems like this idea works. I added the following lines after the paging table setup and didn't encounter any processor exceptions:
; try writing within reserved address space,
; in a middle entry of P4
mov eax, (1 << 31)
or [p4_table + (256*8)], eax
I guess this works, just be sure you're acting on a 2mib page and not a 4kib page.
Thanks, this has indeed become more clear as I've worked with it. I wrote (and just revised due to better understanding) a detailed comment and that helped nail it down for me.
In case it's not clear to anyone else, the reason the lower bits are always 0 is because 4096 = 0x1000.
Another question then: since we're aligning on 2mib pages here (0x200000), can we access the extra few bits (21 vs 12)?
I'll try this myself once I'm allocating pages.
Edit:
It seems like this idea works. I added the following lines after the paging table setup and didn't encounter any processor exceptions:
; try writing within reserved address space,
; in a middle entry of P4
mov eax, (1 << 31)
or [p4_table + (256*8)], eax
I guess this works, just be sure you're acting on a 2mib page and not a 4kib page.
That's an interesting question! The AMD manual says no in section 5.3.4 in Figure 5-25 on page 135. The bits between 13 and 20 are marked as “Reserved, must be zero”. So it seems like a general protection fault occurs then.
Your example works because you only set a bit of a non-present page. AFAIK all bits of non-present pages are available to the OS (except the present bit). If you want to test it, you can set a bit between 13 and 20 in the currently used P2 table. The P3 and P4 table entries still need 40bits for storing the physical address of the next table since page tables only need to be 4KiB aligned.
That's an interesting question! The AMD manual says no in section 5.3.4 in Figure 5-25 on page 135. The bits between 13 and 20 are marked as “Reserved, must be zero”. So it seems like a general protection fault occurs then.
Your example works because you only set a bit of a non-present page. AFAIK all bits of non-present pages are available to the OS (except the present bit). If you want to test it, you can set a bit between 13 and 20 in the currently used P2 table. The P3 and P4 table entries still need 40bits for storing the physical address of the next table since page tables only need to be 4KiB aligned.
You should probably mention that setting bit 16 in cr0 turns on write protection for read only pages, even in kernel mode.
Good catch! I copied the code from my experimental kernel and it seems like I have missed that… I'm not quite sure if I should keep and explain it, or just remove it. What do you think?
I opened an issue for this.
Philipp,
Just an FYI, In my baremetal-x86_64 repo I ported your boot.asm to boot.gas.S so I could use the code with gnu Assembler.
Philipp,
Just an FYI, In my baremetal-x86_64 repo I ported your boot.asm to boot.gas.S so I could use the code with gnu Assembler.
Nice! You are porting it to C?
Yes I'm using boot to launch my C based system, your code was the best and most straight forward code to get to long mode that I've seen. I found your code though Eric Kidd's posts to the rust mailing list on the interrupt issues, and I'm glad I'm not going to have to solve that problem yet again :)
Yes I'm using boot to launch my C based system, your code was the best and most straight forward code to get to long mode that I've seen. I found your code though Eric Kidd's posts to the rust mailing list on the interrupt issues, and I'm glad I'm not going to have to solve that problem yet again :)
I have an interesting problem, that probably has something to do with alignment (as usual while dealing with assembly), though I can't say for sure.
I tried to run the code that does all the checks, but with no paging yet (so prior to "Paging" header). Unfortunately, it always gets into some kind of loop, sometimes qemu throws an exception:
`qemu: fatal: Trying to execute code outside RAM or ROM at 0x000000002b100044`
So it probably tries to execute some random code.
If I delete call to check_long_mode, everything works properly, and green OK is printed to the screen. I don't even need to delete the whole call, it is enough to put `ret` after `test edx, 1 << 29` so it seems as if the jump to error code (`jz .no_long_mode`) was somehow to blame.
During the course of debugging, I added a small function, almost identical to `error` and discovered that just adding the function makes the error go away.
Here are both my codes: https://gist.github.com/anu...
The first one (boot.asm) enters the strange loop (executing random instructions?) on my laptop, the second one (boot2.asm) executes properly. And the only difference is addition of some code that is never called anyway.
Any ideas what may cause it?
EDIT:
Aligning stack to 4096 (bss is in my code above text section) also seems to solve the issue. Still, I don't really understand why is this happening. I thought that x86 doesn't need instructions to be aligned to anything specific?
I have an interesting problem, that probably has something to do with alignment (as usual while dealing with assembly), though I can't say for sure.
I tried to run the code that does all the checks, but with no paging yet (so prior to "Paging" header). Unfortunately, it always gets into some kind of loop, sometimes qemu throws an exception:
`qemu: fatal: Trying to execute code outside RAM or ROM at 0x000000002b100044`
So it probably tries to execute some random code.
If I delete call to check_long_mode, everything works properly, and green OK is printed to the screen. I don't even need to delete the whole call, it is enough to put `ret` after `test edx, 1 << 29` so it seems as if the jump to error code (`jz .no_long_mode`) was somehow to blame.
During the course of debugging, I added a small function, almost identical to `error` and discovered that just adding the function makes the error go away.
Here are both my codes: https://gist.github.com/anu...
The first one (boot.asm) enters the strange loop (executing random instructions?) on my laptop, the second one (boot2.asm) executes properly. And the only difference is addition of some code that is never called anyway.
Any ideas what may cause it?
EDIT:
Aligning stack to 4096 (bss is in my code above text section) also seems to solve the issue. Still, I don't really understand why is this happening. I thought that x86 doesn't need instructions to be aligned to anything specific?
That was an interesting debugging session :D
I tried every debugging trick I knew, read the manual entries for all involved instructions, and even tried to use GDB. But I could not find the bug.
Then I gave up and just looked at the source code in the repo and created a diff to your code. And the problem was surprisingly simple:
You swapped `stack_bottom` and `stack_top`.
But this small change causes big problems. Every `push` or `call` instruction overwrites some bits of the `.text` section below. The last function in the source file and thus the last function in the `.text` section is `check_long_mode`. If you add something behind it, e.g. another error function, it is no longer overwritten and works again.
I think the counter-intuitive thing is that stuff further down in the source file ends up further up in memory. And the stack grows downwards to make it even more confusing. Maybe we should add a small note in the text, why `stack_bottom` needs to be _above_ `stack_top` in the file?
That was an interesting debugging session :D
I tried every debugging trick I knew, read the manual entries for all involved instructions, and even tried to use GDB. But I could not find the bug.
Then I gave up and just looked at the source code in the repo and created a diff to your code. And the problem was surprisingly simple:
You swapped `stack_bottom` and `stack_top`.
But this small change causes big problems. Every `push` or `call` instruction overwrites some bits of the `.text` section below. The last function in the source file and thus the last function in the `.text` section is `check_long_mode`. If you add something behind it, e.g. another error function, it is no longer overwritten and works again.
I think the counter-intuitive thing is that stuff further down in the source file ends up further up in memory. And the stack grows downwards to make it even more confusing. Maybe we should add a small note in the text, why `stack_bottom` needs to be _above_ `stack_top` in the file?
Uh, that is an.. embarrassing error. I checked all registers twice (easy to mistake eax with ecx) but somehow never thought to check that... I guess that when you see top above bottom in code you unconsciously decide that it is ok.
About the note - it would probably make sense, maybe it will make someone to check their code twice, and surely will be a good reminder for people that have little experience with low level things like that.
Thanks very much for the help - I guess it would take me a lot of time later to debug it, when it would start to mysteriously fall after I add another function call in Rust.
Uh, that is an.. embarrassing error. I checked all registers twice (easy to mistake eax with ecx) but somehow never thought to check that... I guess that when you see top above bottom in code you unconsciously decide that it is ok.
About the note - it would probably make sense, maybe it will make someone to check their code twice, and surely will be a good reminder for people that have little experience with low level things like that.
Thanks very much for the help - I guess it would take me a lot of time later to debug it, when it would start to mysteriously fall after I add another function call in Rust.
Phillipp,
Previously I mentioned I'm using a derivative of your boot.S code to boot a C kernel. Things are going pretty good so far, but today I wanted to try to get interrupts going and have run into a brick wall.
I've simplified my test program to something to something very simple. All that happens is boot code jumps to the C code which enables interrupts and loops for a short period of time and then exits. There should be no interrupt sources so I'd expect this to run for as long as I'd like and then exit. And it does If the loop time is very short, but if I lengthen the loop it stops prematurely.
In a more sophisticated version of my program I initialize the Interrupt Descriptor Table and use the APIC to generate a one-shot timer interrupt. Here too, all is well if the delay is short, but when I lengthen the delay I get a Double Fault interrupt!
It almost feels like there is a watchdog timer or .......
Any suggestions welcome.
Thanks,
Wink
Phillipp,
Previously I mentioned I'm using a derivative of your boot.S code to boot a C kernel. Things are going pretty good so far, but today I wanted to try to get interrupts going and have run into a brick wall.
I've simplified my test program to something to something very simple. All that happens is boot code jumps to the C code which enables interrupts and loops for a short period of time and then exits. There should be no interrupt sources so I'd expect this to run for as long as I'd like and then exit. And it does If the loop time is very short, but if I lengthen the loop it stops prematurely.
In a more sophisticated version of my program I initialize the Interrupt Descriptor Table and use the APIC to generate a one-shot timer interrupt. Here too, all is well if the delay is short, but when I lengthen the delay I get a Double Fault interrupt!
It almost feels like there is a watchdog timer or .......
Any suggestions welcome.
Thanks,
Wink
A double fault occurs when you don't handle an exception/interrupt or your exception handler causes another exception. Do you enable interrupts (sti) or do you just catch cpu exceptions? Maybe you forgot to handle the interrupts from the hardware timer? But it's difficult to help without the actual code…
A double fault occurs when you don't handle an exception/interrupt or your exception handler causes another exception. Do you enable interrupts (sti) or do you just catch cpu exceptions? Maybe you forgot to handle the interrupts from the hardware timer? But it's difficult to help without the actual code…
Agreed, and I see that in my more sophisticated program, the question is what is it that I'm doing wrong. I believe I've setup the Interrupt Descriptor Table to handle all interrupts, i.e. I have an array of 256 interrupt gates. That program is here (https://github.com/winksaville/sadie but its too complicated to debug and I haven't yet checked in my non-working APIC timer code. But with that code I'm able to do software interrupts and also when my APIC timer code fires an interrupt fast enough it does work. So it would seem I've done most of the initialization "properly". Note, I'm also compiling my code with -mno-red-zone so that shouldn't be the problem.
So my debug strategy in situations such as this is to simplify. So the first thing was to just enable interrupts and doing nothing that should cause an interrupt to occur and then delay awhile in the code and see what happens. But, sure enough I'm still getting a double fault. Of course according to the documentation in the Intel SDM Volume 3 section 6.15 "Interrupt 8--Double Fault Exception (#DF)" the error code is 0 and CS EIP registers are undefined :(
Anyway, I then simplified to as simple as I can get. I modified your boot.asm program adding the code below the esp initialization that output's character to the VGA display.
start:
mov esp, stack_top
; Save registers
push edx
push ecx
push ebx
push eax
; Enable interrupts
;sti
; Initialize edx to vga buffer ah attribute, al ch
mov edx, 0xb8000
mov ax, 0x0f60
; ebx number of loops
mov ebx,10000
.loop:
; Output next character and attribute
mov word [edx], ax
; Increment to next character with wrap
inc al
cmp al, 0x7f
jne .nextloc
mov al,60
; Next location with wrap
.nextloc:
add edx, 2
and edx,0x7ff
or edx,0xb8000
; Delay
mov ecx,0x2000
.delay:
loop .delay
; Continue looping until ebx is 0
dec ebx
jnz .loop
; Disable interrupts
cli
; Restore registers
pop eax
pop ebx
pop ecx
pop edx
Here is a github repo: (https://github.com/winksaville/baremetal-po-x86_64/tree/test_enable_interrupts). If you add the above code to your boot.asm it will print 10,000 characters to the VGA display and then continue with the normal code paths. If the "sti" instruction is commented out, as it is above, then all is well. But if I uncomment the "sti" thus enabling interrupts then it fails.
I anticipated that enabling interrupts would succeed as I wouldn't expect any interrupts because the hardware is in a state where no interrupts should be generated. Or if grub or the BIOS is using interrupts then I'd expect things to also be OK.
Obviously I'm wrong and I'd hope you'd be able to suggest where my flaw is.
Agreed, and I see that in my more sophisticated program, the question is what is it that I'm doing wrong. I believe I've setup the Interrupt Descriptor Table to handle all interrupts, i.e. I have an array of 256 interrupt gates. That program is here (https://github.com/winksaville/sadie but its too complicated to debug and I haven't yet checked in my non-working APIC timer code. But with that code I'm able to do software interrupts and also when my APIC timer code fires an interrupt fast enough it does work. So it would seem I've done most of the initialization "properly". Note, I'm also compiling my code with -mno-red-zone so that shouldn't be the problem.
So my debug strategy in situations such as this is to simplify. So the first thing was to just enable interrupts and doing nothing that should cause an interrupt to occur and then delay awhile in the code and see what happens. But, sure enough I'm still getting a double fault. Of course according to the documentation in the Intel SDM Volume 3 section 6.15 "Interrupt 8--Double Fault Exception (#DF)" the error code is 0 and CS EIP registers are undefined :(
Anyway, I then simplified to as simple as I can get. I modified your boot.asm program adding the code below the esp initialization that output's character to the VGA display.
start:
mov esp, stack_top
; Save registers
push edx
push ecx
push ebx
push eax
; Enable interrupts
;sti
; Initialize edx to vga buffer ah attribute, al ch
mov edx, 0xb8000
mov ax, 0x0f60
; ebx number of loops
mov ebx,10000
.loop:
; Output next character and attribute
mov word [edx], ax
; Increment to next character with wrap
inc al
cmp al, 0x7f
jne .nextloc
mov al,60
; Next location with wrap
.nextloc:
add edx, 2
and edx,0x7ff
or edx,0xb8000
; Delay
mov ecx,0x2000
.delay:
loop .delay
; Continue looping until ebx is 0
dec ebx
jnz .loop
; Disable interrupts
cli
; Restore registers
pop eax
pop ebx
pop ecx
pop edx
Here is a github repo: (https://github.com/winksaville/baremetal-po-x86_64/tree/test_enable_interrupts). If you add the above code to your boot.asm it will print 10,000 characters to the VGA display and then continue with the normal code paths. If the "sti" instruction is commented out, as it is above, then all is well. But if I uncomment the "sti" thus enabling interrupts then it fails.
I anticipated that enabling interrupts would succeed as I wouldn't expect any interrupts because the hardware is in a state where no interrupts should be generated. Or if grub or the BIOS is using interrupts then I'd expect things to also be OK.
Obviously I'm wrong and I'd hope you'd be able to suggest where my flaw is.
Thanks for the overview and the simplified example! I haven't had the time to look at it in detail, but the problem in your simplified example could be the Programmable Interval timer. From the “Outputs” section:
The output from PIT channel 0 is connected to the PIC chip, so that it generates an "IRQ 0". Typically during boot the BIOS sets channel 0 with a count of 65535 or 0 (which translates to 65536), which gives an output frequency of 18.2065 Hz (or an IRQ every 54.9254 ms).
So it seems like the BIOS turns it on by default so that it causes an interrupts every ~55ms. This causes a double fault, since there is no interrupt handler for IRQ 0.
Thanks for the overview and the simplified example! I haven't had the time to look at it in detail, but the problem in your simplified example could be the Programmable Interval timer. From the “Outputs” section:
The output from PIT channel 0 is connected to the PIC chip, so that it generates an "IRQ 0". Typically during boot the BIOS sets channel 0 with a count of 65535 or 0 (which translates to 65536), which gives an output frequency of 18.2065 Hz (or an IRQ every 54.9254 ms).
So it seems like the BIOS turns it on by default so that it causes an interrupts every ~55ms. This causes a double fault, since there is no interrupt handler for IRQ 0.
Philipp, you were correct, the PIT was the culprit causing the "Double Fault". Although it turns out the PIT is actually generating an Interrupt 8 so its not really a Double Fault it just a PIT interrupt.
My short term solution is to add a pit_isr as interrupt 8 handler and at the end of pit_isr send an EOI to the PIT using outb(0x20, 0x20). I also needed to issue a APIC EOI for my apic_timer_isr and I cleaned up the initialization. So now my system is cleanly handling these interrupts at least.
For the PIT I really want to disable it and I'd like to suggest disabling the PIT be part of boot.asm so that my simple sti, delay, cli test works. If/when I figure that out I'll let you know. Oh, and if know how to disalbe the PIT please let me know.
Thanks again for your help!
Philipp, you were correct, the PIT was the culprit causing the "Double Fault". Although it turns out the PIT is actually generating an Interrupt 8 so its not really a Double Fault it just a PIT interrupt.
My short term solution is to add a pit_isr as interrupt 8 handler and at the end of pit_isr send an EOI to the PIT using outb(0x20, 0x20). I also needed to issue a APIC EOI for my apic_timer_isr and I cleaned up the initialization. So now my system is cleanly handling these interrupts at least.
For the PIT I really want to disable it and I'd like to suggest disabling the PIT be part of boot.asm so that my simple sti, delay, cli test works. If/when I figure that out I'll let you know. Oh, and if know how to disalbe the PIT please let me know.
Thanks again for your help!
Here is a solution. There doesn't seem to be a way to disable the PIT, but you can disable all IRQ's from the PIC, adding the following code to my test_enable_interrupts branch allows the code to work even with the enabling interrupts:
```
; Disable PIC interrupts so we don't get interrupts if the PIC
; was being used by grub or BIOS. See Disabling section of
; http://wiki.osdev.org/PIC. If the application wants to use devices
; connected to the PIC, such at the PIT, it will probably want
; to remap the PIC interrupts to be above 0 .. 31 which are
; used or reserved by Intel. See the Initialisation section of
; the same page for the PIC_remap subroutine.
mov al,0xff
out 0xa1, al
out 0x21, al
```
Thanks again for your help.
Here is a solution. There doesn't seem to be a way to disable the PIT, but you can disable all IRQ's from the PIC, adding the following code to my test_enable_interrupts branch allows the code to work even with the enabling interrupts:
```
; Disable PIC interrupts so we don't get interrupts if the PIC
; was being used by grub or BIOS. See Disabling section of
; http://wiki.osdev.org/PIC. If the application wants to use devices
; connected to the PIC, such at the PIT, it will probably want
; to remap the PIC interrupts to be above 0 .. 31 which are
; used or reserved by Intel. See the Initialisation section of
; the same page for the PIC_remap subroutine.
mov al,0xff
out 0xa1, al
out 0x21, al
```
Thanks again for your help.
To identity map the first gigabyte of our kernel with 512 2MiB pages, we need one P4, one P3, and one P2 table.
Why don't we need to set up a P1 table? We don't even reserve the space for one since there's no p1_table label in the .bss. Is the CPU able to read the paging tables such that it knows to stop translating once it reaches an entry in P2 marked "huge"? What happens to bits 12-20 of the virtual address?
To identity map the first gigabyte of our kernel with 512 2MiB pages, we need one P4, one P3, and one P2 table.
Why don't we need to set up a P1 table? We don't even reserve the space for one since there's no p1_table label in the .bss. Is the CPU able to read the paging tables such that it knows to stop translating once it reaches an entry in P2 marked "huge"? What happens to bits 12-20 of the virtual address?
Hi, Philipp! Thanks so much for creating this for us--it's been very fun to go from 0-OKAY with the ASM here, and I can't wait to get to the Rust portion (which is what drew me to this project in the first place. I'm a little confused, though, about the 4-level paging structure. Is there exactly one each of P2, P3, and P4, and then 512 different P1's that each point to various 4K physical pages?
Hi, Philipp! Thanks so much for creating this for us--it's been very fun to go from 0-OKAY with the ASM here, and I can't wait to get to the Rust portion (which is what drew me to this project in the first place. I'm a little confused, though, about the 4-level paging structure. Is there exactly one each of P2, P3, and P4, and then 512 different P1's that each point to various 4K physical pages?
Thanks!
There is always exactly one P4. For each P4 entry, there is a P3. For each P3 entry, there is a P2. And for each P2 entry, there is a P1. Each entry of the P1 then points to a physical memory page.
So there is one P4 table, 1…512 P3 tables, 1…(512*512) P2 tables, and 1…(512*512*512) P1 tables. (And 1…(512*512*512*512) mapped 4k pages. 512^4 * 4k = 256TiB = 2^48 bytes is the maximum amount of addressable virtual memory.)
If we wanted to identity map the first 2MiB, it would require 512 4k pages and thus exactly 512 P1 entries. Every page table has 512 entries, so we need exactly one P1 (and one P2, P3, P4).
If we wanted to identity map the first 513 4k pages, we would need another P1 entry. Our first P1 is full, so we create another P1. Its first entry points to the 513th 4k page and the other entries are empty. Now we map the second P2 entry (which is currently empty) to the P1 table.
In our case, we want to identity map the first 512*2MiB. This requires 512*512 4k pages and thus 512 P1 tables. Fortunately, there is a useful hardware feature: huge pages. A huge page is 2MiB instead of 4k and is mapped directly by the P2 (so we completely skip the P1 table). This allows us to avoid the 512 P4 tables. Instead we map the 512P2 entries to huge pages.
The big advantage of a multilevel page table is that we don't need to create the page tables / page table entries for memory areas we don't use. In contrast, a single level page table would need 68719476736 entries to address the same amount of virtual memory. So the page table alone would need 68719476736*8=512GiB memory, which is much more than the total amount of RAM in a consumer PC.
Thanks!
There is always exactly one P4. For each P4 entry, there is a P3. For each P3 entry, there is a P2. And for each P2 entry, there is a P1. Each entry of the P1 then points to a physical memory page.
So there is one P4 table, 1…512 P3 tables, 1…(512*512) P2 tables, and 1…(512*512*512) P1 tables. (And 1…(512*512*512*512) mapped 4k pages. 512^4 * 4k = 256TiB = 2^48 bytes is the maximum amount of addressable virtual memory.)
If we wanted to identity map the first 2MiB, it would require 512 4k pages and thus exactly 512 P1 entries. Every page table has 512 entries, so we need exactly one P1 (and one P2, P3, P4).
If we wanted to identity map the first 513 4k pages, we would need another P1 entry. Our first P1 is full, so we create another P1. Its first entry points to the 513th 4k page and the other entries are empty. Now we map the second P2 entry (which is currently empty) to the P1 table.
In our case, we want to identity map the first 512*2MiB. This requires 512*512 4k pages and thus 512 P1 tables. Fortunately, there is a useful hardware feature: huge pages. A huge page is 2MiB instead of 4k and is mapped directly by the P2 (so we completely skip the P1 table). This allows us to avoid the 512 P4 tables. Instead we map the 512P2 entries to huge pages.
The big advantage of a multilevel page table is that we don't need to create the page tables / page table entries for memory areas we don't use. In contrast, a single level page table would need 68719476736 entries to address the same amount of virtual memory. So the page table alone would need 68719476736*8=512GiB memory, which is much more than the total amount of RAM in a consumer PC.
Thank you for the very clear blog and explanations.
Just a remark, would be clearer to add in the Paging section the meaning of bits 12-31 containing the physical address of the next P or the physical address.
What I don't understand is why P1 is not used and how the CPU know that there is no P1 and we link directly to the physical page ? It is also the role of the huge bit ? And also for 2 MB how is defined the offset ?
Thank you for the very clear blog and explanations.
Just a remark, would be clearer to add in the Paging section the meaning of bits 12-31 containing the physical address of the next P or the physical address.
What I don't understand is why P1 is not used and how the CPU know that there is no P1 and we link directly to the physical page ? It is also the role of the huge bit ? And also for 2 MB how is defined the offset ?
Thanks!
We don't use a P1 because it would be cumbersome to set up 512 P1 tables in assembly. Instead, we set the huge bit in the P2 entries, which signals to the CPU that the entry directly points to the physical start address of a 2MiB page frame. This address has to be 2MiB aligned, so bits 0-23 have to be zero. When translating an address, these bits specify the offset in the 2MiB page.
Just a remark, would be clearer to add in the Paging section the meaning of bits 12-31 containing the physical address of the next P or the physical address.
Thanks for the suggestion! I opened #314 to track it.
Thanks!
We don't use a P1 because it would be cumbersome to set up 512 P1 tables in assembly. Instead, we set the huge bit in the P2 entries, which signals to the CPU that the entry directly points to the physical start address of a 2MiB page frame. This address has to be 2MiB aligned, so bits 0-23 have to be zero. When translating an address, these bits specify the offset in the 2MiB page.
Just a remark, would be clearer to add in the Paging section the meaning of bits 12-31 containing the physical address of the next P or the physical address.
Thanks for the suggestion! I opened #314 to track it.
Thanks for the blogpost series. It is very useful for those who develops its own x86 operation system.
In my own project (unrelated to this Rust OS) I try to initialize segment registers with null descriptor like you do 'mov XX, 0'. Setting ds/es/fs/gs works fine, but when I try to set SS with null descriptor I get a crash. Looking at the documentation 'Intel 64 developers manual Vol. 2B 4-37' I see that 'MOV SS, 0' is prohibited and causes #GP(0).
I wonder why 'MOV SS, 0' works for you...
Thanks for the blogpost series. It is very useful for those who develops its own x86 operation system.
In my own project (unrelated to this Rust OS) I try to initialize segment registers with null descriptor like you do 'mov XX, 0'. Setting ds/es/fs/gs works fine, but when I try to set SS with null descriptor I get a crash. Looking at the documentation 'Intel 64 developers manual Vol. 2B 4-37' I see that 'MOV SS, 0' is prohibited and causes #GP(0).
I wonder why 'MOV SS, 0' works for you...
I'm not certain why there is a limitation, but in the blog post the
data is written to `ax` first and then loaded from `ax` to `ss`.
it seems that "mov" to segment register requires a general purpose register as source. In my code I also use 'movw %ax, %ds' I just made it a bit easier to read by using const value.
Anyway it is unrelated to my original question. Writing null descriptor to all segment registers (except %ss) is fine. Documentation also states that null descriptor cannot be used for the stack segment.
it seems that "mov" to segment register requires a general purpose register as source. In my code I also use 'movw %ax, %ds' I just made it a bit easier to read by using const value.
Anyway it is unrelated to my original question. Writing null descriptor to all segment registers (except %ss) is fine. Documentation also states that null descriptor cannot be used for the stack segment.
Hmm, do you have a link to the documentation? I can't find anything relevant on page 4-37 in this document: https://www.intel.com/Assets/en_US/PDF/manual/253667.pdf
The AMD64 manual states on page 253:
Normally, an IRET that pops a null selector into the SS register causes a general-protection exception (#GP) to occur. However, in long mode, the null selector indicates the existence of nested interrupt handlers and/or privileged software in 64-bit mode. Long mode allows an IRET to pop a null selector into SS from the stack under the following conditions:
• The target mode is 64-bit mode.
• The target CPL<3.
In this case, the processor does not load an SS descriptor, and the null selector is loaded into SS without causing a #GP exception
Maybe I interpreted that wrong, though…
Hmm, do you have a link to the documentation? I can't find anything relevant on page 4-37 in this document: https://www.intel.com/Assets/en_US/PDF/manual/253667.pdf
The AMD64 manual states on page 253:
Normally, an IRET that pops a null selector into the SS register causes a general-protection exception (#GP) to occur. However, in long mode, the null selector indicates the existence of nested interrupt handlers and/or privileged software in 64-bit mode. Long mode allows an IRET to pop a null selector into SS from the stack under the following conditions:
• The target mode is 64-bit mode.
• The target CPL<3.
In this case, the processor does not load an SS descriptor, and the null selector is loaded into SS without causing a #GP exception
Maybe I interpreted that wrong, though…
Hi Philipp, your link points to 6 years old Intel doc, here is the same but much more recent https://software.intel.com/...
Scroll to 'MOV' instruction, page 4-37. There is a block algorithm for MOV that says
IF segment selector is NULL
THEN #GP(0); FI;
I believe I hit this issue.
Hi Philipp, your link points to 6 years old Intel doc, here is the same but much more recent https://software.intel.com/...
Scroll to 'MOV' instruction, page 4-37. There is a block algorithm for MOV that says
IF segment selector is NULL
THEN #GP(0); FI;
I believe I hit this issue.
Thanks for the link!
Hmm, the listing is preceded by “Loading a segment register while in protected mode results in special checks and actions, as described in the following listing.” (emphasis mine)
Under “64-Bit Mode Exceptions” (page 4-39) there are only 3 cases for a #GP(0):
If the memory address is in a non-canonical form.
If an attempt is made to load SS register with NULL segment selector when CPL = 3.
If an attempt is made to load SS register with NULL segment selector when CPL < 3 and CPL ≠ RPL.
I see no reason why we should hit any of these…
Thanks for the link!
Hmm, the listing is preceded by “Loading a segment register while in protected mode results in special checks and actions, as described in the following listing.” (emphasis mine)
Under “64-Bit Mode Exceptions” (page 4-39) there are only 3 cases for a #GP(0):
If the memory address is in a non-canonical form.
If an attempt is made to load SS register with NULL segment selector when CPL = 3.
If an attempt is made to load SS register with NULL segment selector when CPL < 3 and CPL ≠ RPL.
I see no reason why we should hit any of these…
I have one more question. In your example you do a jump to long mode. As far as I know long 'call' can be used here as well. In fact call works in KVM and vmware but for some reason the operation crashes with #GP error. Do you know why it can be?
You need to do a so-called far jump, which updates the code segment. I'm not sure right now if a far call is supported in long mode. Either way, returning to 32-bit code might not be a good idea anyway, since the opcodes might be interpreted differently.
Hi, I can't get the boot.asm file to assemble because it gives me this error: src/arch/x86_64/boot.asm:(.text+0x4a): undefined reference to `long_mode_start'
Does the error occur when invoking nasm? Then you need to add extern long_mode_start somewhere inside the boot.asm (e.g. at the beginning). If it occurs while invoking ld, make sure that the long_mode_init.asm file is assembled and passed to ld (and it should of course define a global long_mode_start: label).
Does the error occur when invoking nasm? Then you need to add extern long_mode_start somewhere inside the boot.asm (e.g. at the beginning). If it occurs while invoking ld, make sure that the long_mode_init.asm file is assembled and passed to ld (and it should of course define a global long_mode_start: label).
Hi, I want to ask something about assembly. Why do I have to move p4_table to eax before moving eax into cr3 ? Why can't I move p4_table directly into cr3 ?
Because the CR3 register can only be loaded from a register. So you have to load the p4_table address into a register first.
Hi,
+AntwortenHi,
out of curiosity: Does it make sense to keep the 32bit print instructions as "dead code" in the program? It can never be reached, right?
; print `OK` to screen
mov dword [0xb8000], 0x2f4b2f4f
hlt
-Yeah, it should be unreachable after entering long mode (we would need to enter protected mode again). So it does not make much sense to keep it.
You should probably mention that the "set_up_page_tables" function works with 32 bit addresses and 32-bit (4-byte) PTE/PDE entries, each holding the 20-bit, page-aligned, physical address of the next data structure (plus 12 bits of 0s, since each level is page aligned). Readers may be confused from the preceding explication of 64-bit PTEs, which are not used there (certainly I was).
You should probably mention that the "set_up_page_tables" function works with 32 bit addresses and 32-bit (4-byte) PTE/PDE entries, each holding the 20-bit, page-aligned, physical address of the next data structure (plus 12 bits of 0s, since each level is page aligned). Readers may be confused from the preceding explication of 64-bit PTEs, which are not used there (certainly I was).
We do use 8 byte PTEs with 64 bit addresses, but we only write the bottom 32 bits, since the higher 32 bits are zero.
I guess that what's unclear to me is why you say that each PTE entry +Antworten
I guess that what's unclear to me is why you say that each PTE entry contains the 52-bit physical address of the next frame/entry but in the -table it looks like only bits 12-51 (40 bits) are used for that.
Is this rust or assembly? I've never used rust before although I've used assembly.
Hi,
I'm trying to follow your steps while I'm trying to build a kernel in Rust. I have some questions at this point:
@@ -788,17 +788,17 @@ table it looks like only bits 12-51 (40 bits) are used for that.On OSDev they also mention something about a P5 coming in the future.
-Thanks, very informative reading!
error[E0425]: cannot find function `int3` in module `x86_64::instructions::interrupts`
--> src/lib.rs:55:39
|
55 | x86_64::instructions::interrupts::int3();
| ^^^^ not found in `x86_64::instructions::interrupts`
I am using x86_64 v0.1.0. I looked in both x86_64 and x86 crates.io documentation. There is no such function as int3() in them. May be they stopped support in the newer versions?
error[E0425]: cannot find function `int3` in module `x86_64::instructions::interrupts`
--> src/lib.rs:55:39
|
55 | x86_64::instructions::interrupts::int3();
| ^^^^ not found in `x86_64::instructions::interrupts`
I am using x86_64 v0.1.0. I looked in both x86_64 and x86 crates.io documentation. There is no such function as int3() in them. May be they stopped support in the newer versions?
Sorry, I completely forgot to push my latest x86_64 updates to crates.io. It's in x86_64 0.1.2 now, so it should work after a `cargo update`.
Nice article, thanks
"We have some well tested B-tree" Where can I find the source code for that B-tree implementation?
In the btree module of libcollections: https://github.com/rust-lan...
The rendered documentation is here.
There is some discussion on /r/rust, hacker news, and /r/programming.
There is some discussion on /r/rust, hacker news, and /r/programming.
Love this series of articles! I'm very new to Rust and kernel development, and I've really enjoyed following along and trying to experiment a bit with alternative implementations. In that vein, I ported the inimitable gz's rust-slabmalloc (https://github.com/gz) to run in my implementation of these tutorials: https://github.com/ryanbree...
One potentially interesting approach I tried, taking a bit of a page from Linux which I know uses a dumbed down allocator for the early allocation during kernel boot, is to have my Rust allocator be tiered: during early kernel boot, it uses a bump allocator. The only allocations done by the bump allocator are to set up the memory to be used by the slab_allocator. This meant I could get the benefit of collections when porting slab_allocator, so I dropped its internal data structure in favor of a plain old vec.
Thanks for this series! You're doing awesome work and giving people a world of new educational opportunities.
Love this series of articles! I'm very new to Rust and kernel development, and I've really enjoyed following along and trying to experiment a bit with alternative implementations. In that vein, I ported the inimitable gz's rust-slabmalloc (https://github.com/gz) to run in my implementation of these tutorials: https://github.com/ryanbree...
One potentially interesting approach I tried, taking a bit of a page from Linux which I know uses a dumbed down allocator for the early allocation during kernel boot, is to have my Rust allocator be tiered: during early kernel boot, it uses a bump allocator. The only allocations done by the bump allocator are to set up the memory to be used by the slab_allocator. This meant I could get the benefit of collections when porting slab_allocator, so I dropped its internal data structure in favor of a plain old vec.
Thanks for this series! You're doing awesome work and giving people a world of new educational opportunities.
Thanks so much!
I really like your approach of building allocators on top of each other (and I will take a closer look when I have some time). Maybe it's even possible to create an allocator based on a B-tree…?
Ahh, I see that the API to custom allocators changed :-0 I see that the code in git is updated but not for the bump_allocator. Even if one can work around it to conform to the new interface it is puzzling before you figure out what the problem is.
+ AntwortenAhh, I see that the API to custom allocators changed :-0 I see that the code in git is updated but not for the bump_allocator. Even if one can work around it to conform to the new interface it is puzzling before you figure out what the problem is.
A guide to the new allocator:
-https://github.com/rust-lang/rfcs/blob/master/text/1974-global-allocators.md
Best strategy might be to go directly to the hole_list_allocator but to start up simple and ignore trying to reclaim blocks; that way the transition is easier.
See https://github.com/phil-opp/blog_os/issues/341 for more information
See https://github.com/phil-opp/blog_os/issues/341 for more information
great work in explaining how all the different pieces of hardware/software come together
On mac OS X, for some reason,
dd - (0xe85250d6 + 0 + (header_end - header_start))
had no compiler warnings, while
dd 0x100000000 - (0xe85250d6 + 0 + (header_end - header_start))
led to
multiboot_header.asm:7: warning: numeric constant 0x100000000 does not fit in 32 bitsMac OS X 10.11.1
NASM version 0.98.40 (Apple Computer, Inc. build 11) compiled on Oct 5 2015
On mac OS X, for some reason,
dd - (0xe85250d6 + 0 + (header_end - header_start))
had no compiler warnings, while
dd 0x100000000 - (0xe85250d6 + 0 + (header_end - header_start))
led to
multiboot_header.asm:7: warning: numeric constant 0x100000000 does not fit in 32 bitsMac OS X 10.11.1
NASM version 0.98.40 (Apple Computer, Inc. build 11) compiled on Oct 5 2015
Well, that's unfortunate… Thank you for the hint, I opened an issue: https://github.com/phil-opp...
If (in my case on Mac OS X) grub-mkrescue (after you've installed it) gives the error "grub-mkrescue: warning: Your xorriso doesn't support `--grub2-boot-info'.", you just need to install xorriso. You probably don't have it at all yet.
For me (running it in an ubuntu docker container), grub-mkrescue silently fails until you add the -v flag - only with that can you see the error about xorriso (took me a lot of head-scratching to figure it out).
./rs_decoder.h:2:Unknown pseudo-op: .macosx_version_min
./rs_decoder.h:2:Rest of line ignored. 1st junk character valued 49 (1).
clang: error: assembler command failed with exit code 1 (use -v to see invocation)
make[3]: *** [boot/i386/pc/lzma_decompress_image-startup_raw.o] Error 1
This is an awesome series of blog posts!
If you don't see a green "OK", look for a "GRUB" message. If you don't see "GRUB", then the weak link is probably grub-mkrescue. Two common failure modes:
1. If your grub-mkrescue isn't installed correctly, it may silently do nothing or make bad ISO files. Try mounting your ISO file to make sure that it has your kernel and grub.cfg.
2. If you run Linux on an EFI machine, grub-mkrescue will produce EFI boot images that don't work with BIOS-based systems like QEMU. To fix this, see this article, which recommends installing grub-pc-bin and running:
grub-mkrescue /usr/lib/grub/i386-pc -o myos.iso isodir
This is an awesome series of blog posts!
If you don't see a green "OK", look for a "GRUB" message. If you don't see "GRUB", then the weak link is probably grub-mkrescue. Two common failure modes:
1. If your grub-mkrescue isn't installed correctly, it may silently do nothing or make bad ISO files. Try mounting your ISO file to make sure that it has your kernel and grub.cfg.
2. If you run Linux on an EFI machine, grub-mkrescue will produce EFI boot images that don't work with BIOS-based systems like QEMU. To fix this, see this article, which recommends installing grub-pc-bin and running:
grub-mkrescue /usr/lib/grub/i386-pc -o myos.iso isodir
Thank you, I didn't know about the EFI issue...
This is definitely fun! I tried to do this from my Mac OS X (Yosemite) and could not properly boot my fresh ISO disk. Compilation works fine, I have installed a cross-compiler for x86_64-elf architecture, compiled grub following instructions here http://wiki.osdev.org/GRUB_...... I generate a correct ISO file (checked it by mounting using Disk Utility) but it does not boot and I cannot see the GRUB message.
Not sure how to troubleshoot this issue.... I suspect this might be a problem with incorrect format in grub as the last stage of compilation shows this message:
../grub/configure --build=x86_64-elf --target=x86_64-elf --disable-werror TARGET_CC=x86_64-elf-gcc TARGET_OBJCOPY=x86_64-elf-objcopy TARGET_STRIP=x86_64-elf-strip TARGET_NM=x86_64-elf-nm TARGET_RANLIB=x86_64-elf-ranlib LD_FLAGS=/usr/local/opt/flex/ CPP_FLAGS=/usr/local/opt/flex/include/
[..]
config.status: linking ../grub/include/grub/i386 to include/grub/cpu
config.status: linking ../grub/include/grub/i386/pc to include/grub/machine
config.status: executing depfiles commands
config.status: executing po-directories commands
config.status: creating po/POTFILES
config.status: creating po/Makefile
*******************************************************
GRUB2 will be compiled with following components:
Platform: i386-pc
With devmapper support: No (need libdevmapper header)
With memory debugging: No
With disk cache statistics: No
With boot time statistics: No
efiemu runtime: Yes
grub-mkfont: Yes
grub-mount: No (need FUSE headers)
starfield theme: No (No DejaVu found)
With libzfs support: No (need zfs library)
Build-time grub-mkfont: No (no fonts)
Without unifont (no build-time grub-mkfont)
With liblzma from -llzma (support for XZ-compressed mips images)
*******************************************************
I don't know what the i386-pc refer too, but if this is the target platform then it's probably incorrect. Note that I tried to boot using qemu-system-i386 but to no avail.
Regards,
This is definitely fun! I tried to do this from my Mac OS X (Yosemite) and could not properly boot my fresh ISO disk. Compilation works fine, I have installed a cross-compiler for x86_64-elf architecture, compiled grub following instructions here http://wiki.osdev.org/GRUB_...... I generate a correct ISO file (checked it by mounting using Disk Utility) but it does not boot and I cannot see the GRUB message.
Not sure how to troubleshoot this issue.... I suspect this might be a problem with incorrect format in grub as the last stage of compilation shows this message:
../grub/configure --build=x86_64-elf --target=x86_64-elf --disable-werror TARGET_CC=x86_64-elf-gcc TARGET_OBJCOPY=x86_64-elf-objcopy TARGET_STRIP=x86_64-elf-strip TARGET_NM=x86_64-elf-nm TARGET_RANLIB=x86_64-elf-ranlib LD_FLAGS=/usr/local/opt/flex/ CPP_FLAGS=/usr/local/opt/flex/include/
[..]
config.status: linking ../grub/include/grub/i386 to include/grub/cpu
config.status: linking ../grub/include/grub/i386/pc to include/grub/machine
config.status: executing depfiles commands
config.status: executing po-directories commands
config.status: creating po/POTFILES
config.status: creating po/Makefile
*******************************************************
GRUB2 will be compiled with following components:
Platform: i386-pc
With devmapper support: No (need libdevmapper header)
With memory debugging: No
With disk cache statistics: No
With boot time statistics: No
efiemu runtime: Yes
grub-mkfont: Yes
grub-mount: No (need FUSE headers)
starfield theme: No (No DejaVu found)
With libzfs support: No (need zfs library)
Build-time grub-mkfont: No (no fonts)
Without unifont (no build-time grub-mkfont)
With liblzma from -llzma (support for XZ-compressed mips images)
*******************************************************
I don't know what the i386-pc refer too, but if this is the target platform then it's probably incorrect. Note that I tried to boot using qemu-system-i386 but to no avail.
Regards,
Forget it: grub-mkrescue was not correctly installed so it failed to add needed boot files.
Thanks again for sharing this!
What would happen if we didn't put hlt? Would the cpu start reading random bytes and execute them as code? I tried without hlt and qemu seems to go into an infinite boot loop, but I'm just wondering what's going on.
Yes, that exactly what happens. The CPU simply tries to read the next instruction, even if it doesn't exist, until it causes some exception. QEMU can print these exceptions, the "Setup Rust" post explains how. I just tried it and it hits an Invalid Opcode exception at some point because some memory is no valid instruction.
Bonus: You can use GDB to disassemble the “code” behind the start label. You need to start `qemu-system-x86_64 -hda build/os-x86_64.iso -s -S` in one console and `gdb build/kernel-x86_64.bin` in another. Then you need the following gdb commands:
- `set architecture i386` because we are still in 32-bit mode
- `target remote :1234` to connect to QEMU
(- `disas /r start,+250` to disassemble the 250 bytes after the `start` label. Everything will be 0 as GRUB did not load our kernel yet)
- `break start` to set a breakpoint at `start`
- `continue` to continue execution until start is reached. Now the kernel is loaded and we can use
- `disas /r start,+250` to disassemble the 250 bytes after the `start` label
Then you can look at the faulting address you got from the QEMU debugging to see your invalid instruction. For me it seems to be an `add (%eax),%al` with the Opcode `02 00`.
Yes, that exactly what happens. The CPU simply tries to read the next instruction, even if it doesn't exist, until it causes some exception. QEMU can print these exceptions, the "Setup Rust" post explains how. I just tried it and it hits an Invalid Opcode exception at some point because some memory is no valid instruction.
Bonus: You can use GDB to disassemble the “code” behind the start label. You need to start `qemu-system-x86_64 -hda build/os-x86_64.iso -s -S` in one console and `gdb build/kernel-x86_64.bin` in another. Then you need the following gdb commands:
- `set architecture i386` because we are still in 32-bit mode
- `target remote :1234` to connect to QEMU
(- `disas /r start,+250` to disassemble the 250 bytes after the `start` label. Everything will be 0 as GRUB did not load our kernel yet)
- `break start` to set a breakpoint at `start`
- `continue` to continue execution until start is reached. Now the kernel is loaded and we can use
- `disas /r start,+250` to disassemble the 250 bytes after the `start` label
Then you can look at the faulting address you got from the QEMU debugging to see your invalid instruction. For me it seems to be an `add (%eax),%al` with the Opcode `02 00`.
oh! What a wonderful article to read!
Nice post! I am on OS X, but I find it easier to use Linux for this assembly stuff. Using VirtualBox, I have created a minimal Debian machine running an SSH server and with a folder shared between the OS X host and the Debian guest. So, I may install all the needed tools and cross-compile in Debian and have the final .iso accessible in OS X (to use it with QEMU), all of this while working in Terminal.app as usual.
As a side note, I had to set LDEMULATION="elf_x86_64" before linking, because I was getting this error: `ld: i386:x86-64 architecture of input file `multiboot_header.o' is incompatible with i386 output`. This may be because I have used Debian's 32-bit PC netinst iso instead of the 64-bit version.
Nice post! I am on OS X, but I find it easier to use Linux for this assembly stuff. Using VirtualBox, I have created a minimal Debian machine running an SSH server and with a folder shared between the OS X host and the Debian guest. So, I may install all the needed tools and cross-compile in Debian and have the final .iso accessible in OS X (to use it with QEMU), all of this while working in Terminal.app as usual.
As a side note, I had to set LDEMULATION="elf_x86_64" before linking, because I was getting this error: `ld: i386:x86-64 architecture of input file `multiboot_header.o' is incompatible with i386 output`. This may be because I have used Debian's 32-bit PC netinst iso instead of the 64-bit version.
Thanks for sharing your experiences! There is an issue about Mac OS support, but it seems like using a virtual machine is the easiest way…
Thanks for sharing your experiences! There is an issue about Mac OS support, but it seems like using a virtual machine is the easiest way…
On my system and on some others grub-makerescue is actually called grub2-makerescue and should be represented accordingly in the makefile. Perhaps this merits a comment in the text since I was not alone (https://www.reddit.com/r/os... in spending some time trying to figure out what was happening after a rather meaningless error message from make.
On my system and on some others grub-makerescue is actually called grub2-makerescue and should be represented accordingly in the makefile. Perhaps this merits a comment in the text since I was not alone (https://www.reddit.com/r/os... in spending some time trying to figure out what was happening after a rather meaningless error message from make.
When I run grub-mkrescue I got no output an just silence
after install xorriso I got error like this
-----
xorriso 1.3.2 : RockRidge filesystem manipulator, libburnia project.
Drive current: -outdev 'stdio:os.iso'
Media current: stdio file, overwriteable
Media status : is blank
Media summary: 0 sessions, 0 data blocks, 0 data, 861g free
Added to ISO image: directory '/'='/tmp/grub.pI5jyq'
xorriso : UPDATE : 276 files added in 1 seconds
Added to ISO image: directory '/'='/path/to/my/work/isofiles'
xorriso : FAILURE : Cannot find path '/efi.img' in loaded ISO image
xorriso : UPDATE : 280 files added in 1 seconds
xorriso : aborting : -abort_on 'FAILURE' encountered 'FAILURE'
-----
and I search for resolve this error, I arrive here[ https://bugs.archlinux.org/42334 ]
after isntall mtools, grub-mkrescue create os.iso
When I run grub-mkrescue I got no output an just silence
after install xorriso I got error like this
-----
xorriso 1.3.2 : RockRidge filesystem manipulator, libburnia project.
Drive current: -outdev 'stdio:os.iso'
Media current: stdio file, overwriteable
Media status : is blank
Media summary: 0 sessions, 0 data blocks, 0 data, 861g free
Added to ISO image: directory '/'='/tmp/grub.pI5jyq'
xorriso : UPDATE : 276 files added in 1 seconds
Added to ISO image: directory '/'='/path/to/my/work/isofiles'
xorriso : FAILURE : Cannot find path '/efi.img' in loaded ISO image
xorriso : UPDATE : 280 files added in 1 seconds
xorriso : aborting : -abort_on 'FAILURE' encountered 'FAILURE'
-----
and I search for resolve this error, I arrive here[ https://bugs.archlinux.org/42334 ]
after isntall mtools, grub-mkrescue create os.iso
After creating the iso, I can boot to it on QEMU with no problem. Even burning it on to a disk and booting on a different machine works like a charm. However, I am having trouble getting it on to an USB thumb drive. I have tried packing it on to a USB with UNetbootin, but as soon as the UNetbootin screen appears after booting to the USB device, (The OS selection screen, giving you the options [Default] and [my_os]), nothing happens. I can select either of those options, but nothing happens.
EDIT: Got it to work using the command line tool dd!
After creating the iso, I can boot to it on QEMU with no problem. Even burning it on to a disk and booting on a different machine works like a charm. However, I am having trouble getting it on to an USB thumb drive. I have tried packing it on to a USB with UNetbootin, but as soon as the UNetbootin screen appears after booting to the USB device, (The OS selection screen, giving you the options [Default] and [my_os]), nothing happens. I can select either of those options, but nothing happens.
EDIT: Got it to work using the command line tool dd!
I just wanted to suggest dd! For the record, the command is sudo dd if=build/os.iso of=/dev/sdX && sync where sdX is the device name of your USB stick. It overwrites everything on that device, so be careful to choose the correct device name.
@phil_opp:disqus i created a GitHub repository where i work through your great guide step-by-step. It is located here: https://github.com/peacememories/rust-kernel-experiments
Please let me know if there are problems with the attribution. =)
Thanks you for your great articles.
I have created my OS in Rust, and these are really useful for me.
I have been revising my OS based on your articles.
Also, I have been writing an article which is similar to your
http://mopp.github.io/articles/os/os00_intro
I added link into my articles to this website.
If you feel unpleasant, please tell me and I will remove it.
Thanks
Thanks you for your great articles.
I have created my OS in Rust, and these are really useful for me.
I have been revising my OS based on your articles.
Also, I have been writing an article which is similar to your
http://mopp.github.io/articles/os/os00_intro
I added link into my articles to this website.
If you feel unpleasant, please tell me and I will remove it.
Thanks
Thanks! I don't speak Japanese, so I can only read the rough google translation. However, your article seems to be a really good and introduction to OS development!
I really enjoyed your accessible blog format and your awesome osdev tutorials!
I was inspired by your articles and decided to write my own :)
Let me know what you think.
http://tutorialsbynick.com/...
Thanks,
Nick
I really enjoyed your accessible blog format and your awesome osdev tutorials!
I was inspired by your articles and decided to write my own :)
Let me know what you think.
http://tutorialsbynick.com/...
Thanks,
Nick
It's awesome! I really like that you start without a bootloader and interact with the BIOS directly in real mode. I never programmed at this level, so it was a really great read!
Hi guys if you want to boot the kernel in VirtualBox just modify the grub cfg file by setting the following variable properly, check https://www.gnu.org/softwar... for the options.
GRUB_TERMINAL_OUTPUT
Hi guys if you want to boot the kernel in VirtualBox just modify the grub cfg file by setting the following variable properly, check https://www.gnu.org/softwar... for the options.
GRUB_TERMINAL_OUTPUT
This blog is just a treasure! I'm so happy that I found it. Thank you so much Phil!
By the way, my Arch Linux is booted in legacy BIOS mode (my BIOS doesn't even support EFI), but without '-d /usr/lib/grub/i386-pc/' grub-mkrescue didn't work for me.
P.S. Aside from this project, I think I will refer to your Makefile lot of times in future just to learn techniques that you used. I think it is the shortest example of so many Makefile best practices.
This blog is just a treasure! I'm so happy that I found it. Thank you so much Phil!
By the way, my Arch Linux is booted in legacy BIOS mode (my BIOS doesn't even support EFI), but without '-d /usr/lib/grub/i386-pc/' grub-mkrescue didn't work for me.
P.S. Aside from this project, I think I will refer to your Makefile lot of times in future just to learn techniques that you used. I think it is the shortest example of so many Makefile best practices.
Thanks a lot! :)
Unfortunately I have no idea what's the problem here. I've never had this error.
Ough.. probably you replied to first edition of my comment, but I didn't reload the page and didn't see your reply. Then I found my mistake (I put grub directory into the root of image, not into boot directory). And while thinking that nobody have seen my comment yet I edited it and removed the question about error. Sorry for my careless.
Ough.. probably you replied to first edition of my comment, but I didn't reload the page and didn't see your reply. Then I found my mistake (I put grub directory into the root of image, not into boot directory). And while thinking that nobody have seen my comment yet I edited it and removed the question about error. Sorry for my careless.
Why for the development of the core operation system, the language Rust? Why not C++? Does Rust have such opportunities as in C++?
Rust aims to be comparable to C++, both it terms of capabilities and in terms of performance. However, it has some great advantages over C++:
The greatest advantage of Rust is its memory safety. It prevents common bugs such as use after free or dangling pointers at compile time. So you get the safety of a garbage collected language, but without garbage collection. In fact, the safety guarantees go even further: The compiler also prevents data races and iterator invalidation bugs. So we should get a much safer kernel compared to C++.
(One caveat: Sometimes we need unsafe blocks for OS development, which weaken some safety guarantees. However, we try to use them only when it's absolutely needed and try to check them thoroughly.)
Another advantage of Rust is the great type system. It allows us to create powerful, generic abstractions, even for low level things such as page tables.
The tooling is great, too. Rust uses a package manager called “cargo”, which makes it easy to add various libraries to our project. Cargo automatically downloads the correct version and compiles/links it. Thus, we can use awesome libraries such as x86 easily.
Rust aims to be comparable to C++, both it terms of capabilities and in terms of performance. However, it has some great advantages over C++:
The greatest advantage of Rust is its memory safety. It prevents common bugs such as use after free or dangling pointers at compile time. So you get the safety of a garbage collected language, but without garbage collection. In fact, the safety guarantees go even further: The compiler also prevents data races and iterator invalidation bugs. So we should get a much safer kernel compared to C++.
(One caveat: Sometimes we need unsafe blocks for OS development, which weaken some safety guarantees. However, we try to use them only when it's absolutely needed and try to check them thoroughly.)
Another advantage of Rust is the great type system. It allows us to create powerful, generic abstractions, even for low level things such as page tables.
The tooling is great, too. Rust uses a package manager called “cargo”, which makes it easy to add various libraries to our project. Cargo automatically downloads the correct version and compiles/links it. Thus, we can use awesome libraries such as x86 easily.
For anyone else struggling with "Boot failed: Could not read from CDROM (code 0009)", you need to install `grub-pc-bin` and then regenerate the .iso. Solution from here: http://intermezzos.github.io/book/appendix/troubleshooting.html#could-not-read-from-cdrom-code-0009.
By the way, I'm loving the tutorial style. Very clear, thank you!
For anyone else struggling with "Boot failed: Could not read from CDROM (code 0009)", you need to install `grub-pc-bin` and then regenerate the .iso. Solution from here: http://intermezzos.github.io/book/appendix/troubleshooting.html#could-not-read-from-cdrom-code-0009.
By the way, I'm loving the tutorial style. Very clear, thank you!
This is completely awesome!
YMMV but FWIW in Fedora 25, I needed to install three packages, `sudo dnf install nasm xorriso qemu-system-x86`. The last one installs fewer packages than installing "qemu" which adds two dozen ARM, m68k, S390, sparc, ... emulators as well 8-)
I find the example displays "OK" fine, but it erases the console before this so the boot messages disappear. I'm not sure if the fix lies in grub configuration or the qemu command line.
It is interesting to look at the contents of the CD-ROM image, though it mostly reveals the complexity of the GRUB bootloader. I used `mkdir temp_mount && sudo mount -t iso9660 -o loop os.iso temp_mount` then looked around in temp_mount.
This is completely awesome!
YMMV but FWIW in Fedora 25, I needed to install three packages, `sudo dnf install nasm xorriso qemu-system-x86`. The last one installs fewer packages than installing "qemu" which adds two dozen ARM, m68k, S390, sparc, ... emulators as well 8-)
I find the example displays "OK" fine, but it erases the console before this so the boot messages disappear. I'm not sure if the fix lies in grub configuration or the qemu command line.
It is interesting to look at the contents of the CD-ROM image, though it mostly reveals the complexity of the GRUB bootloader. I used `mkdir temp_mount && sudo mount -t iso9660 -o loop os.iso temp_mount` then looked around in temp_mount.
I had 2 issues with making the iso. First, there was no output file, yet grub-mkrescue didn't complain, I fixed this by running "apt install xorriso" (Ubuntu). The other issue was that qemu couldn't read the cdrom (error 0009), fixed that one by running "apt install grub-pc-bin". Hope this helps some of you... and thanks for the awesome post Phil :)
I had 2 issues with making the iso. First, there was no output file, yet grub-mkrescue didn't complain, I fixed this by running "apt install xorriso" (Ubuntu). The other issue was that qemu couldn't read the cdrom (error 0009), fixed that one by running "apt install grub-pc-bin". Hope this helps some of you... and thanks for the awesome post Phil :)
How to shutdown?
The Makefile doesn't work for me. It gives only the error No rule to make target ' build/arch/x86_64/boot.o' needed by 'build/kernel-x86_64.bin'. + I don't know what's going wrong....
It seems like there is some problem with this lines:
+ AntwortenIt seems like there is some problem with this lines:
build/arch/$(arch)/%.o: src/arch/$(arch)/%.asm
@@ -778,56 +778,56 @@
build/arch/$(arch)/boot.o: src/arch/$(arch)/boot.asm
- (Note that you need to copy this rule for every .asm file without wildcards.)
I'm interested whether it would run on actual hardware, with a real CD. I doubt anyone tried it though...
This is incredible, just fantastic..
+ AntwortenThis is incredible, just fantastic..
I did have a couple of hiccups following along using Win10 WSL on a UEFI PC, maybe these details can be folded in to the tutorial?
@@ -851,197 +851,197 @@ Booting from ROM... -Solution: sudo -S apt-get install grub-pc-bin
I am having the same issue as you have mentioned in number one. It says + Antworten
I am having the same issue as you have mentioned in number one. It says warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5] - And it is not printing out OK, I tried with the -curses option for qemu but not working.
I'm trying to do this, but I can't get the OK to actually display and I've kind of ran out of ideas. Trying to run with QEMU on Arch Linux.
+ AntwortenI'm trying to do this, but I can't get the OK to actually display and I've kind of ran out of ideas. Trying to run with QEMU on Arch Linux.
Things I've tried: Adding the multiboot tag that should tell grub I want a text mode, 80x25. Just gives me a black screen, instead of saying "Booting 'my os'"
Switching grub to text mode with every possible switch I can find that looks related, with and without ^. Just gives me a black screen for all of them too.
- I can confirm my code actually seems to be executed - or, at least, hits the hlt instruction. Just that there's no output, which makes me think VGA problems, hence me trying all of the above. That seems to leave trying to parse the multiboot header or something, and that seems like... something I don't really want to try to do in assembly, including pushing it over assembly? I don't really want to move unless this works, though, because I see you still are using text mode extensively further on. :/
...Never mind. I just figured out, and it was a very tiny mistake. I accidentally typed 0xb800 instead of 0xb8000, so, of course, no output ever because I was copying to the wrong region of memory.
Hi. Thanks for write, but i have an error when i run it in qemu - "error: no multiboot header found, error: you need to load kernel first". Whats wrong?
It means that GRUB couldn't find a multiboot header at the beginning of your kernel. So either your multiboot header is invalid (maybe a typo somewhere?) or it is not at the beginning of the file (is your linker script correct? did you use the --nmagic flag?).
I just found this and it is great. Clear, practical explanations without fuss but that don't hide what's going on. That's perfect for how I like my tech explanations
Thank you so much for the article. It is of great help to understand the basics of developing OS. Especially the in hand experience.
Thank you for this series, it's exceptional! Clear, deep into details, and fascinating :)
For anyone trying to push themselves into using the GNU assembler (i.e. as), if you're getting "no multiboot header" errors with QEMU, put the line:
.align 8
before the end tags.
There is some interesting discussion on /r/programming and /r/rust.
Hi Philipp,
First of all thanks a lot for sharing this series of article. Just a question: do you have an idea why doubling the stack size would not be sufficient to avoid the silent stack overflow you mentioned? To make my code work I had to triple it...
What are you doing in your main? Maybe there is something stack intensive before the `test_paging` call? My main function looks like this. I removed most of the example code from the previous posts, so maybe that's the reason..
Only the stuff from the OSDev wiki but I think that you are aware of it already.
I would link the Rust code to the higher half but keep all startup assembly identity mapped. Then map the Rust code from the long mode assembly and jump to it.
Maybe it's even possible to use the same linker script as before since rustc generates position independent code by default (AFAIK).
Only the stuff from the OSDev wiki but I think that you are aware of it already.
I would link the Rust code to the higher half but keep all startup assembly identity mapped. Then map the Rust code from the long mode assembly and jump to it.
Maybe it's even possible to use the same linker script as before since rustc generates position independent code by default (AFAIK).
Is there a typo in the code in the huge pages section?
`TableEntryFlags` is mentioned exactly once. Where does it come from?
It should be `EntryFlags`. Thanks for reporting, I pushed an update.
It should be `EntryFlags`. Thanks for reporting, I pushed an update.
In order to decide when a page table should be freed, you can use the 52nd bit in the first 9 entries, to keep a score of how many present entries there are. It might be pretty bad for caching though.
I like the idea. Alternatively, we could use bits 52 to 61 of the first entry. What's your concern about caching?
My main concern with using all the bits in the first entry is that if ever in the future you want to go back and add more things, it would need to be changed.
As far as caching goes, it would depend on how big a cache line is, but if it is smaller than 9 table entries then there could be multiple cache misses on a single update (theoretically 9 bits could be changed for one addition / deletion of a page). Obviously not a _huge_ deal, but I think it's worth pointing out.
My main concern with using all the bits in the first entry is that if ever in the future you want to go back and add more things, it would need to be changed.
As far as caching goes, it would depend on how big a cache line is, but if it is smaller than 9 table entries then there could be multiple cache misses on a single update (theoretically 9 bits could be changed for one addition / deletion of a page). Obviously not a _huge_ deal, but I think it's worth pointing out.
Neither choice seems like it works. Bits 62:MAXPHYADDR (where MAXPHYADDR is at most 52) are reserved and supposed to be set to 0. However, bits 11:9 appear to be free at every level of the page table hierarchy.
+ AntwortenNeither choice seems like it works. Bits 62:MAXPHYADDR (where MAXPHYADDR is at most 52) are reserved and supposed to be set to 0. However, bits 11:9 appear to be free at every level of the page table hierarchy.
Somewhat annoyingly, 10 bits are needed since 513 values need to be represented. Thus one could use three bits from each of the first four entries.
-x86-64 has a 64-byte cache line size so the four accesses do fit in a single cache line.
Hey Phillip, started doing this a couple days ago and was able to make it this far. Unfortunately I am now having some compilation issues that appear to be a result of the x86 crate. I get "error: 'raw::Slice' does not name a structure" when compiling raw-cpuid, a dependency for x86. Thoughts?
edit: Ah-ha! I see you mentioned this a few days ago . Thanks for everything!
Hey Phillip, started doing this a couple days ago and was able to make it this far. Unfortunately I am now having some compilation issues that appear to be a result of the x86 crate. I get "error: 'raw::Slice' does not name a structure" when compiling raw-cpuid, a dependency for x86. Thoughts?
edit: Ah-ha! I see you mentioned this a few days ago . Thanks for everything!
Yeah, the `raw::Slice` struct was deprecated and removed in the latest nightlies. However, the current version of the raw-cpuid crate still depends on it. The author is aware of the issue and will publish a new version in the next few days. Until then, you can try an older nightly as a workaround.
Yeah, the `raw::Slice` struct was deprecated and removed in the latest nightlies. However, the current version of the raw-cpuid crate still depends on it. The author is aware of the issue and will publish a new version in the next few days. Until then, you can try an older nightly as a workaround.
Can you explain me this self referencing trick for Page tables ?
Of course! I tried to do give a short overview in the Recursive Mapping section. Which part of it is unclear? Or do you have a specific question?
Of course! I tried to do give a short overview in the Recursive Mapping section. Which part of it is unclear? Or do you have a specific question?
The virtual->physical address calculation is done in hardware, which expects 4 levels of page tables. In order to access the entries of a P1 table, we need to remove one level of translation. The trick is that all page tables have the (almost) same format, independent of the table level. So the CPU doesn't see a difference between e.g. a P4 and a P3 table.
This allows us to implement the recursive mapping trick. We lead the CPU to believe that the P2 table is the P1 table and that the P3 table is the P2 table. Thus we end up on the memory page of the P1 table and are able to modify its entries.
Likewise, the CPU interprets the P4 table as P3 table. But which table do we use as P4 table then? Well, we use the same P4 table as before. So our P4 is used twice by the CPU: At the first time, it is interpreted as a P4 table and at the second time as a P3 table.
Now only one piece is missing: We need a special P4 entry, which points to the its own table again. This way we can construct a virtual address for which the P4 table is used twice (once as P4 and once as P3).
I hope it helps :).
The virtual->physical address calculation is done in hardware, which expects 4 levels of page tables. In order to access the entries of a P1 table, we need to remove one level of translation. The trick is that all page tables have the (almost) same format, independent of the table level. So the CPU doesn't see a difference between e.g. a P4 and a P3 table.
This allows us to implement the recursive mapping trick. We lead the CPU to believe that the P2 table is the P1 table and that the P3 table is the P2 table. Thus we end up on the memory page of the P1 table and are able to modify its entries.
Likewise, the CPU interprets the P4 table as P3 table. But which table do we use as P4 table then? Well, we use the same P4 table as before. So our P4 is used twice by the CPU: At the first time, it is interpreted as a P4 table and at the second time as a P3 table.
Now only one piece is missing: We need a special P4 entry, which points to the its own table again. This way we can construct a virtual address for which the P4 table is used twice (once as P4 and once as P3).
I hope it helps :).
Hi everybody!
When I try to compile the code from this article, I get the following error:
error: private trait in public interface (error E0445)
It's fired here: impl<l> Table<l> where L: HierarchicalLevel {...}
This seems to be one of those errors, caused by the fact, Rust is still under development and some features change from time to time.
It compiles if I make the trait public:
pub trait HierarchicalLevel: TableLevel {....}
I wrote the code on my own and to make sure this is not an error caused by myself, I cloned the github repo and tried to compile it with the same result.
Maybe one of the more skilled OS devs here has an idea if it is a problem to mark the trait as public.
Thanks in advance!
Christian
Hi everybody!
When I try to compile the code from this article, I get the following error:
error: private trait in public interface (error E0445)
It's fired here: impl<l> Table<l> where L: HierarchicalLevel {...}
This seems to be one of those errors, caused by the fact, Rust is still under development and some features change from time to time.
It compiles if I make the trait public:
pub trait HierarchicalLevel: TableLevel {....}
I wrote the code on my own and to make sure this is not an error caused by myself, I cloned the github repo and tried to compile it with the same result.
Maybe one of the more skilled OS devs here has an idea if it is a problem to mark the trait as public.
Thanks in advance!
Christian
Thanks a lot for reporting this! I thought that we've fixed this issue, but it seems like we've messed up somehow. I opened this issue for it.
That said, I think that a public HierarchicalLevel is the correct solution. It shouldn't be a problem since you can't do anything bad by implementing HierarchicalLevel (e.g. you still can't construct a `Table`).
Thanks a lot for reporting this! I thought that we've fixed this issue, but it seems like we've messed up somehow. I opened this issue for it.
That said, I think that a public HierarchicalLevel is the correct solution. It shouldn't be a problem since you can't do anything bad by implementing HierarchicalLevel (e.g. you still can't construct a `Table`).
Hello,
You said that the P4 recursive loop must be set before paging is enabled.
But I wonder - the memory is currently identity mapped, so what difference is there in P4_table address in with/without paging?
Hmm, good point! The P4 table is part of the identity mapped area, so it should work even if we do it after enabling paging.
That sentence was added in #246, but I don't know the reason anymore. I just tested it and it still works if I do the recursive mapping after paging is enabled. So maybe we should revert that PR…
Hmm, good point! The P4 table is part of the identity mapped area, so it should work even if we do it after enabling paging.
That sentence was added in #246, but I don't know the reason anymore. I just tested it and it still works if I do the recursive mapping after paging is enabled. So maybe we should revert that PR…
Hi Phil,
First off, I just want to say thanks for this tutorial, it's really great.
I've run into an issue on this section when implementing the memory::paging::test_paging function. Specifically, with the test for unmap. If everything up to and including the unmap function call is implemented it operates as expected and unmap panics, but if the corresponding println! is added the kernel goes into a boot loop. Based on what you have said in this section that seems to indicate a page fault, but I don't really understand why the existence of that println! causes it. It's more confusing because execution never actually reaches that macro, so it's just the inclusion of that line in the code that triggers it. To make things ever weirder, it quits boot-looping whenever I add a second (or third, etc.) instance of that println! call.
So my question is: Do you have any insight as to what could be the cause of this? Or, if nothing else, some avenues I could pursue for debugging?
Thanks
Hi Phil,
First off, I just want to say thanks for this tutorial, it's really great.
I've run into an issue on this section when implementing the memory::paging::test_paging function. Specifically, with the test for unmap. If everything up to and including the unmap function call is implemented it operates as expected and unmap panics, but if the corresponding println! is added the kernel goes into a boot loop. Based on what you have said in this section that seems to indicate a page fault, but I don't really understand why the existence of that println! causes it. It's more confusing because execution never actually reaches that macro, so it's just the inclusion of that line in the code that triggers it. To make things ever weirder, it quits boot-looping whenever I add a second (or third, etc.) instance of that println! call.
So my question is: Do you have any insight as to what could be the cause of this? Or, if nothing else, some avenues I could pursue for debugging?
Thanks
Hey Philipp, regarding the Testing and Bugfixing section, can you explain why only a P2 and a P1 table is created after running map_to? Why isn't a P3 table created?
I think I figured it out. Is it because we already mapped index 0 of P4 to a P3 table from boot.asm?
I've implemented paging and everything seems to work correctly, but reading from an unmapped page causes a page fault even without flushing the translation lookaside buffer. I also tried it out with your repo and it exhibited the same behavior after I commented out the call to tlb::flush. Is there some QEMU setting that I need to change?
I've implemented paging and everything seems to work correctly, but reading from an unmapped page causes a page fault even without flushing the translation lookaside buffer. I also tried it out with your repo and it exhibited the same behavior after I commented out the call to tlb::flush. Is there some QEMU setting that I need to change?
If the address isn't cached in the TLB, it works without a flush. The cache has limited space, so some translations are evicted when space runs out. Maybe try accessing the to be unmapped page right before unmapping it?
Hi! + Antworten
Hi! First of all; This is an amazing project, thank you very much for your time and effort, and the work put down into doing this! - I am a complete beginner in Rust, and follow this guide mainly to get a better grasp on operating system "basics". Your tutorials have been very good at explaining the code snippets in a simple manner, but sometimes it gets a bit confusing as to where functions and other code snippets should be placed in the file tree.. If you find the time, could you write the path/filename of where each code snippet goes? I'm sure it would be very helpful to other people who, like me, are not yet intuitive about "what parts go where".
Thanks for the suggestion! I opened https://github.com/phil-opp/blog_os/issues/382 on Github to track this issue.
Thanks for the suggestion! I opened https://github.com/phil-opp/blog_os/issues/382 on Github to track this issue.
Hi Phil, when I try to add the test code to test the unmap with the lines of code below, looks like the system can't boot up, and qemu just keeps rebooting. But if I remove this line. the code works perfectly. Could you please help to have a check.
+ AntwortenHi Phil, when I try to add the test code to test the unmap with the lines of code below, looks like the system can't boot up, and qemu just keeps rebooting. But if I remove this line. the code works perfectly. Could you please help to have a check.
println!("{:#x}", unsafe { *(Page::containing_address(addr).start_address() as *const u64) - });
The issue has been solved, keep rebooting is caused by the page fault, and the root cause is some index is misused. After correct the index, the issue is gone.
Hey Phil. This is a fairly basic question compared to some of the other comments here and I probably am missing something simple. When the Entry::pointed_frame function is made, it uses 0x000fffff_fffff000 to mask bits 12-51. Why that number? It doesn't only mask those bits. What am I missing?
Hey Phil. This is a fairly basic question compared to some of the other comments here and I probably am missing something simple. When the Entry::pointed_frame function is made, it uses 0x000fffff_fffff000 to mask bits 12-51. Why that number? It doesn't only mask those bits. What am I missing?
It clears the lowest and highest 12 bits. So bits 12 to 51 should be the only bits set afterwards.
hi phil, quick question
+ Antwortenhi phil, quick question
-It seems that as soon as I enable x86 paging the VGA buffer is not accessible anymore (because 0xb8000 is not identity mapped yet?). So essentially the test_paging routine doesnt print anything... so my thinking tells me the identity map is the first thing to do after enabling paging, yet its the subject of the next chapter, am I not getting something?
I didn't realise the boot.asm p2 p3 p4 setup was already a preliminary identity paging with huge pages, that's awesome! - I'm working on x86 protected mode so I only have p1,p2 and my huge pages are 4MiB.
Please add #[repr(C)] to some of your structs, in particular the ones that depend on having a specific layout.
Please add #[repr(C)] to some of your structs, in particular the ones that depend on having a specific layout.
Thanks for the hint! On `ScreenChar` it is definitely required. But I'm not quite sure if it's needed for the `ColorCode` newtype…
Where (else) would you add #[repr(C)]?
You could also link to: http://embedded.hannobraun....
for another example of getting no_std rust working.
You could also link to: http://embedded.hannobraun....
for another example of getting no_std rust working.
If anyone else (like me) is running in to some minor issues due to the incompleteness of the source code please have a look at my version that compiles: https://github.com/obscuren...
@phil_opp:disqus FYI the link up top that says (full file) is missing and comes up with an anchor to #TODO :-)
If anyone else (like me) is running in to some minor issues due to the incompleteness of the source code please have a look at my version that compiles: https://github.com/obscuren...
@phil_opp:disqus FYI the link up top that says (full file) is missing and comes up with an anchor to #TODO :-)
Thanks for the hint! I removed that link and added the rest of the Color enum.
Btw: you accidentally skipped the number 2 in your Color enum :).
Ah oops :)
I'm fairly new to rust and I couldn't get https://github.com/obscuren... that line (in your original example) to work. Throwing something about "borrowed something something couldn't be moved".
Also thanks for the excellent examples and post! :D
Ah oops :)
I'm fairly new to rust and I couldn't get https://github.com/obscuren... that line (in your original example) to work. Throwing something about "borrowed something something couldn't be moved".
Also thanks for the excellent examples and post! :D
I made it this far, and then started on implementing interrupts, which are needed for keyboard input. Along the way, I discovered the rust-x86 crate, which provides data structures for major x86_64 CPU data types. This looks like it would save a lot of debugging time and digging around in manuals. Also of interest is the hilarious and extremely helpful After 5 days, my OS doesn't crash when I press a key.
My interrupt-descriptor-table-in-progress is also available on GitHub for anybody who's interested in laying the groundwork for keyboard I/O.
This is definitely a fun idea, and your explanations are great. Highly recommended.
I made it this far, and then started on implementing interrupts, which are needed for keyboard input. Along the way, I discovered the rust-x86 crate, which provides data structures for major x86_64 CPU data types. This looks like it would save a lot of debugging time and digging around in manuals. Also of interest is the hilarious and extremely helpful After 5 days, my OS doesn't crash when I press a key.
My interrupt-descriptor-table-in-progress is also available on GitHub for anybody who's interested in laying the groundwork for keyboard I/O.
This is definitely a fun idea, and your explanations are great. Highly recommended.
Thanks for the links! The rust-x86 crate looks useful indeed (but may need some minor additions/changes for 64bit). Julia Evans's blog is great and was my first resource about rust OS development :). Just keep in mind that it's from 2013 and thus uses an early version of Rust.
Your interrupt table code looks good! I think the handler functions need to be assembly functions until support for something like naked functions is added. Am I right?
Thanks for the links! The rust-x86 crate looks useful indeed (but may need some minor additions/changes for 64bit). Julia Evans's blog is great and was my first resource about rust OS development :). Just keep in mind that it's from 2013 and thus uses an early version of Rust.
Your interrupt table code looks good! I think the handler functions need to be assembly functions until support for something like naked functions is added. Am I right?
Yup. x86 interrupt handler functions appear to be completely non-standard in any case—it seems like every x86(_64) OS I've looked at has different rules for saving registers, mostly because the architecture is such a mess of ancient hacks. I also have another comment-in-progress on the XMM registers, which cause serious headaches implementing interrupts.
rust-x86 appears to actually be x86_64-only, despite the name. The data structures I looked at were all what I'd expect on a 64-bit chip. I'll probably throw away a lot of my carefully debugged code and just use the rust-x86 versions of stuff.
It would be nice to have a bunch of crates on cargo which handle common processor and I/O stuff. And maybe a Rust-only ELF loader library for loading and relocating user-space binaries. :-)
Yup. x86 interrupt handler functions appear to be completely non-standard in any case—it seems like every x86(_64) OS I've looked at has different rules for saving registers, mostly because the architecture is such a mess of ancient hacks. I also have another comment-in-progress on the XMM registers, which cause serious headaches implementing interrupts.
rust-x86 appears to actually be x86_64-only, despite the name. The data structures I looked at were all what I'd expect on a 64-bit chip. I'll probably throw away a lot of my carefully debugged code and just use the rust-x86 versions of stuff.
It would be nice to have a bunch of crates on cargo which handle common processor and I/O stuff. And maybe a Rust-only ELF loader library for loading and relocating user-space binaries. :-)
I think it should be 80 columns and 25 rows, when describing VGA.
When you mention the write!/writeln! macros can be now used, the example uses Writer::new(...) although Writer doesn't have a `new` method.
The print macro should be `let mut writer`.
hey, great articles so far!
i've been following along, and i've run into some issues with the ::core::fmt::Write implementation for our writer class.
if i add that code in, i get these linker errors:
core.0.rs:(.text._ZN4core3fmt5write17hdac96890aec66a9aE+0x324): undefined reference to `_Unwind_Resume'
core.0.rs:(.text._ZN4core3fmt5write17hdac96890aec66a9aE+0x3eb): undefined reference to `_Unwind_Resume'
core.0.rs:(.text._ZN4core3fmt5write17hdac96890aec66a9aE+0x3f3): undefined reference to `_Unwind_Resume'
i've gone back and checked that i set panic to "abort" for both dev and release profiles in my config.toml, the same way you did to fix the unwinding issues. everything seems to match up with what you have. what have i missed?
thanks in advance.
hey, great articles so far!
i've been following along, and i've run into some issues with the ::core::fmt::Write implementation for our writer class.
if i add that code in, i get these linker errors:
core.0.rs:(.text._ZN4core3fmt5write17hdac96890aec66a9aE+0x324): undefined reference to `_Unwind_Resume'
core.0.rs:(.text._ZN4core3fmt5write17hdac96890aec66a9aE+0x3eb): undefined reference to `_Unwind_Resume'
core.0.rs:(.text._ZN4core3fmt5write17hdac96890aec66a9aE+0x3f3): undefined reference to `_Unwind_Resume'
i've gone back and checked that i set panic to "abort" for both dev and release profiles in my config.toml, the same way you did to fix the unwinding issues. everything seems to match up with what you have. what have i missed?
thanks in advance.
It becomes more and more difficult for me to understand it... Mostly because of difficulty of Rust. I just follow every step and everything works for me, but I feel like if I make any step away from the instruction - everything will get broken.
But still I learn a lot from this blog. Thank you again!
One thing I can't understand: we implement ::core::fmt::Write trait for our Writer, and we implement 'write_str' method for it. But how can we use 'write_fmt' method if we didn't define it here?
It becomes more and more difficult for me to understand it... Mostly because of difficulty of Rust. I just follow every step and everything works for me, but I feel like if I make any step away from the instruction - everything will get broken.
But still I learn a lot from this blog. Thank you again!
One thing I can't understand: we implement ::core::fmt::Write trait for our Writer, and we implement 'write_str' method for it. But how can we use 'write_fmt' method if we didn't define it here?
The core::fmt::Write defines 3 methods: `write_str`, `write_char`, and `write_fmt`. However, the latter two have default implementations, so we only need to implement `write_str`.
I try to explain uncommon features of Rust in the posts, but providing a complete Rust introduction is out of scope. I recommend you to read the official Rust book or its new iteration (still work-in-progress).
It becomes more and more difficult for me to understand it...
That's unfortunate :(. Feel free to ask if something else is unclear!
The core::fmt::Write defines 3 methods: `write_str`, `write_char`, and `write_fmt`. However, the latter two have default implementations, so we only need to implement `write_str`.
I try to explain uncommon features of Rust in the posts, but providing a complete Rust introduction is out of scope. I recommend you to read the official Rust book or its new iteration (still work-in-progress).
It becomes more and more difficult for me to understand it...
That's unfortunate :(. Feel free to ask if something else is unclear!
Aa, ok. Thanks for clarification and quick reply.
Yes, you explain everything very well. Very surprisingly well for such difficult topic. Of course complete Rust intro is out of scope. But I didn't know about new iteration of Rust book. Thank you for the hint.
And thanks a lot for such welcoming and kind manner of communication with your "students" :)
Aa, ok. Thanks for clarification and quick reply.
Yes, you explain everything very well. Very surprisingly well for such difficult topic. Of course complete Rust intro is out of scope. But I didn't know about new iteration of Rust book. Thank you for the hint.
And thanks a lot for such welcoming and kind manner of communication with your "students" :)
Hey!
If you are interested in adding some build information, you can do it this way:
Add the following to your Makefile:
BUILD_NUMBER_FILE := buildno.txt
BUILD_NUMBER_LDFLAGS = --defsym _BUILD_NUMBER=$$(cat $(BUILD_NUMBER_FILE)) --defsym _BUILD_DATE=$$(date +'%Y%m%d')
$(kernel): builddata cargo ..
builddata:
touch $(BUILD_NUMBER_FILE)
@echo $$(($$(cat $(BUILD_NUMBER_FILE)) + 1)) > $(BUILD_NUMBER_FILE)
The above will make the current date and a build number available in the kernel, incremented each time you build it.
To access build information:
Add the following to your lib.rs:
extern {
fn _BUILD_NUMBER();
fn _BUILD_DATE();
}
Now you can do the following, for example in your rust_main():
let build_number = _BUILD_NUMBER as u32;
let build_date = _BUILD_DATE as u32;
println!("Build {}: {} ", build_number, build_date);
Hey!
If you are interested in adding some build information, you can do it this way:
Add the following to your Makefile:
BUILD_NUMBER_FILE := buildno.txt
BUILD_NUMBER_LDFLAGS = --defsym _BUILD_NUMBER=$$(cat $(BUILD_NUMBER_FILE)) --defsym _BUILD_DATE=$$(date +'%Y%m%d')
$(kernel): builddata cargo ..
builddata:
touch $(BUILD_NUMBER_FILE)
@echo $$(($$(cat $(BUILD_NUMBER_FILE)) + 1)) > $(BUILD_NUMBER_FILE)
The above will make the current date and a build number available in the kernel, incremented each time you build it.
To access build information:
Add the following to your lib.rs:
extern {
fn _BUILD_NUMBER();
fn _BUILD_DATE();
}
Now you can do the following, for example in your rust_main():
let build_number = _BUILD_NUMBER as u32;
let build_date = _BUILD_DATE as u32;
println!("Build {}: {} ", build_number, build_date);
I'm curious - why does this code write from the bottom of the screen and then up, when most other VGA drivers go from the top of the screen down?
As soon as the screen is full (i.e. scrolling begins) both approaches do the same thing: writing to the bottom line and shifting the other lines up. By starting at the bottom, we don't need any special case for the first 25 lines.
(…and maybe I just wanted to do things a bit differently :D)
As soon as the screen is full (i.e. scrolling begins) both approaches do the same thing: writing to the bottom line and shifting the other lines up. By starting at the bottom, we don't need any special case for the first 25 lines.
(…and maybe I just wanted to do things a bit differently :D)
I went ahead and made the "panic_fmt" function complete :) Now I get filename, line and an error message when rust panics! very helpful, here is the code:
#[lang = "panic_fmt"]
extern fn panic_fmt(args: fmt::Arguments, file: &'static str, line: u32) -> ! {
println!("Panic in file {} line {}: {}", file, line, args);
loop {}
}
Thanks for the awesome tutorials!
I went ahead and made the "panic_fmt" function complete :) Now I get filename, line and an error message when rust panics! very helpful, here is the code:
#[lang = "panic_fmt"]
extern fn panic_fmt(args: fmt::Arguments, file: &'static str, line: u32) -> ! {
println!("Panic in file {} line {}: {}", file, line, args);
loop {}
}
Thanks for the awesome tutorials!
Great! We also do this in the next post, but it's a good idea to mention it here, too.
Thanks for the code by the way, I always thought that the `file` argument is of type `&str` (instead of `&'static str`). I opened #256 to fix this.
Thanks for the awesome tutorials!
You're welcome, I'm glad you like them!
Great! We also do this in the next post, but it's a good idea to mention it here, too.
Thanks for the code by the way, I always thought that the `file` argument is of type `&str` (instead of `&'static str`). I opened #256 to fix this.
Thanks for the awesome tutorials!
You're welcome, I'm glad you like them!
So i'm trying to make `println!("{}: some number", 1);` work, but when I add that line to my rust_main function, the emulator does the whole triple exception thing starting with a 0xd error - which according to OSDev.org is a "General protection fault":
```check_exception old: 0xffffffff new 0xd
0: v=0d e=0000 i=0 cpl=0 IP=0008:ec834853e5894855 pc=ec834853e5894855 SP=0010:000000000012ec18 env->regs[R_EAX]=0000000000000a00```
`println!("Hello {}!", "world");` works just fine - it just doesn't seem to be able to interpolate non-string types. Would you have any idea on what's going wrong? I'm not sure where to even look. If you'd like to clone and run my code and take a look: https://github.com/ocamlmycaml/rust-moss/
btw ++good tutorial, i'm learning a lot!
So i'm trying to make `println!("{}: some number", 1);` work, but when I add that line to my rust_main function, the emulator does the whole triple exception thing starting with a 0xd error - which according to OSDev.org is a "General protection fault":
```check_exception old: 0xffffffff new 0xd
0: v=0d e=0000 i=0 cpl=0 IP=0008:ec834853e5894855 pc=ec834853e5894855 SP=0010:000000000012ec18 env->regs[R_EAX]=0000000000000a00```
`println!("Hello {}!", "world");` works just fine - it just doesn't seem to be able to interpolate non-string types. Would you have any idea on what's going wrong? I'm not sure where to even look. If you'd like to clone and run my code and take a look: https://github.com/ocamlmycaml/rust-moss/
btw ++good tutorial, i'm learning a lot!
I figured it out - I had OUTPUT_FORMAT(elf32-i386) in my linker.ld file. I was trying a few things to run the kernel without making an iso
I'm not gonna pretend like I understand what happened, but removing that OUTPUT_FORMAT statement fixed my problems
Solution for the problems of compilation:
+ AntwortenSolution for the problems of compilation:
1: go to vga_buffer.rs 2: go to line buffer: unsafe { Unique::new(0xb8000 as *mut _) }, 3: change for buffer: unsafe { Unique::new_unchecked(0xb8000 as *mut _) }, - thanks for attention (sorry for my english)
Great tutorials, just a quick question for learning purposes. + Antworten
Great tutorials, just a quick question for learning purposes.
Could the values in enum be defined implicitly like so?
pub enum Color {
@@ -396,134 +396,134 @@
Blue,
...
}
- I don't know actually. It seems to work: https://play.rust-lang.org/?gist=7e8684f332ece651836c80b4d7439c1c&version=stable. However, I'm not sure if this is specified behavior or if it might be changed some day (e.g. by new compiler optimizations).
I don't know actually. It seems to work: https://play.rust-lang.org/?gist=7e8684f332ece651836c80b4d7439c1c&version=stable. However, I'm not sure if this is specified behavior or if it might be changed some day (e.g. by new compiler optimizations).
I followed everything in this tutorial to the letter, and had a question. If I were to try to print a string to the screen, how would I do it? I have been using
+ AntwortenI followed everything in this tutorial to the letter, and had a question. If I were to try to print a string to the screen, how would I do it? I have been using
print!("{}", string)
-with string containing what I want to print. I know this works in normal Rust, but would it work with the VGA buffer you made? Thanks!
Sure, why shouldn't it? The whole formatting part is handled by Rust's format_args!</code macro, so everything should work the same.
Question. How would I go about changing the color of the text on the fly? Like if I wanted to print + Antworten
Question. How would I go about changing the color of the text on the fly? Like if I wanted to print Hello World
-and have "Hello" be green and "World" be white. How would I go about doing this?
You could use two separate print statements and change the color on the global writer in between. Or alternatively, you could add support for ANSI escape codes like in a normal terminal.
You could use two separate print statements and change the color on the global writer in between. Or alternatively, you could add support for ANSI escape codes like in a normal terminal.
I keep getting the error of the trait `core::marker::Copy` is not implemented for `vga_buffer::ScreenChar`
I keep getting the error of the trait `core::marker::Copy` is not implemented for `vga_buffer::ScreenChar`
Why exactly does this happen despite everything looking up to snuff?
There is some discussion on hacker news, /r/rust, and /r/programming.
There is some discussion on hacker news, /r/rust, and /r/programming.
Error while using x86_64::shared::control_regs.
There was no `shared` in x86_64.
Thanks for the help. :)
Trying to get this to work, my code looks identical to yours, save for the occasional twist for aesthetics, or different variable name, but after enabling the nxe bit, when according to you it should boot successfully, it crashes for me.
A bit of sleuthing on my part deduced the issue, I'm getting a double fault when I try to write to the cr3 register. A bit more debugging helped me find the culprit, when I write to cr3 in the switch method, something happens and the CPU double faults.
The exact instruction that the pc points to in the register dump is "add $0x18, %rsp"
-Thanks in advance for helping me resolve this.
Looked a bit further, the original fault is a page fault with the present, write, and reserved write bits set
Hmm, sounds like your CPU somehow thinks that you set a reserved bit. If it works fine before setting the NXE bit, it could be caused by:
+ AntwortenHmm, sounds like your CPU somehow thinks that you set a reserved bit. If it works fine before setting the NXE bit, it could be caused by:
IA32_EFER)Hope this helps!
Hi, just leaving this here for future reference.
- I had the same problem and discovered that it was actually a typo, I didn't notice the ! on the if checking for ELF_SECTION_EXECUTABLE in EntryFlags::from_elf_section_flags. Maybe this will shed some light on your problem, if you still have it.
Hi, just leaving this here for future reference.
+ I had the same problem and discovered that it was actually a typo, I didn't notice the ! on the if checking for ELF_SECTION_EXECUTABLE in EntryFlags::from_elf_section_flags. Maybe this will shed some light on your problem, if you still have it.
Note on the footnote: I paste in your "most useful GDB command", and it tells me "syntax error in expression, near `int*)0xfffffffffffff000)@512' "
I think it's a problem across gdb versions. I had a similar problem recently. It seems like newer versions no longer understand some casts, but I couldn't find out whether that's a bug or an intentional syntax change.
This issue is merged: https://github.com/rust-lang/rust/issues/16012#issuecomment-160380183
This issue is merged: https://github.com/rust-lang/rust/issues/16012#issuecomment-160380183
There are also great comments on hackernews and /r/rust!
Hi everybody!
When I try to modify the page fault handler to define accesses to 0xdeadbeaf as legal, I get an error for this line:
let stack_frame = &mut *(stack_frame as *mut ExceptionStackFrame);
error: casting `&interrupts::ExceptionStackFrame` as `*mut interrupts::ExceptionStackFrame` is invalid
Thanks in advance!
Christian
Hi everybody!
When I try to modify the page fault handler to define accesses to 0xdeadbeaf as legal, I get an error for this line:
let stack_frame = &mut *(stack_frame as *mut ExceptionStackFrame);
error: casting `&interrupts::ExceptionStackFrame` as `*mut interrupts::ExceptionStackFrame` is invalid
Thanks in advance!
Christian
Hmm, I've updated this post two days ago and removed this section. Before that, we used to take stack_frame as *const pointer. Since the update, we take stack_frame as & reference, which makes the cast illegal.
But this doesn't make any sense since I've pushed this update after your comment?..
Hmm, I've updated this post two days ago and removed this section. Before that, we used to take stack_frame as *const pointer. Since the update, we take stack_frame as & reference, which makes the cast illegal.
But this doesn't make any sense since I've pushed this update after your comment?..
This is impressive work and pedagogical. Phil, how long time did it take to acquire the necessary technical knowledge and what did you do to achieve this technical competence?
Thanks a lot!
I took (and still take) some operating system classes at university. However, most of the details of these posts I learned from various blogs, tutorials, and wikis (e.g. the awesome OSDev wiki). The x86 details come from the official manuals from AMD and Intel.
I started my own little toy kernels a few years ago, at first in C. At some point I discovered Rust. It was still highly unstable at that time, but I loved to play with it and I learned a lot.
So I think that I learned most things from writing my own toy kernels and experimenting with them.
Thanks a lot!
I took (and still take) some operating system classes at university. However, most of the details of these posts I learned from various blogs, tutorials, and wikis (e.g. the awesome OSDev wiki). The x86 details come from the official manuals from AMD and Intel.
I started my own little toy kernels a few years ago, at first in C. At some point I discovered Rust. It was still highly unstable at that time, but I loved to play with it and I learned a lot.
So I think that I learned most things from writing my own toy kernels and experimenting with them.
If you decide to add interrupt support to your OS (for keyboard input, for example), you may not want Rust to be generating SSE code. If you use SEE code in the kernel, then you need to save SSE registers in interrupts, and saving SSE registers is slow and takes a lot of RAM. As far as I can tell, a lot of kernels simply avoid floating point to help keep interrupts and system calls efficient.
Also, as you noted in your bug on GitHub, you'll also want to set no-redzone to prevent memory corruption during interrupts.
Since we need to set a bunch of compiler flags for all generated code, including libcore, the right answer may be to replace the target x86_64-unknown-linux-gnu with a custom target that uses the right options by default. There's a discussion here and an example target file in the zinc OS.
If you decide to add interrupt support to your OS (for keyboard input, for example), you may not want Rust to be generating SSE code. If you use SEE code in the kernel, then you need to save SSE registers in interrupts, and saving SSE registers is slow and takes a lot of RAM. As far as I can tell, a lot of kernels simply avoid floating point to help keep interrupts and system calls efficient.
Also, as you noted in your bug on GitHub, you'll also want to set no-redzone to prevent memory corruption during interrupts.
Since we need to set a bunch of compiler flags for all generated code, including libcore, the right answer may be to replace the target x86_64-unknown-linux-gnu with a custom target that uses the right options by default. There's a discussion here and an example target file in the zinc OS.
OK, it took almost a day, but I think I've got this figured out. This is probably overkill for your great blog posts, but I'll leave it here for the next person to pass this way.
Here's the basic strategy to getting an SSE-free, redzone-free kernel:
1. Define a new target x86_64-unknown-none-gnu, where none means running on bare metal without an OS. This can be done by creating a file x86_64-unknown-none-gnu.json and filling it in with the right options. See below. You can just drop this in your top-level build directory and Rust will find it.
2. Check out the same Rust you're compiling with, and patch libcore to remove floating point. You can usually find a current libcore patch in thepowersgang/rust-barebones-kernel on GitHub.
3. Build libcore with --target $(target) --cfg disable_float, and put it in ~/.multirust/toolchains/nightly/lib/rustlib/$(target)/lib.
4. Run cargo normally, specifying your custom target with --target $(target).
Here's my custom x86_64-unknown-none-gnu.json file
{
"llvm-target": "x86_64-unknown-none-gnu",
"target-endian": "little",
"target-pointer-width": "64",
"os": "none",
"arch": "x86_64",
"data-layout": "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128",
"pre-link-args": [ "-m64" ],
"cpu": "x86-64",
"features": "-mmx,-sse,-sse2,-sse3,-ssse3",
"disable-redzone": true,
"eliminate-frame-pointer": false,
"linker-is-gnu": true,
"no-compiler-rt": true,
"archive-format": "gnu"
}This seems like a good approach, because I'm not compiling Rust code for the Linux user-space, and then trying to convince it to run on bare metal. Instead, I'm compiling Rust code against a properly-configured bare metal target, and if I don't like the compiler options, I can quickly change them for all crates. And if any floating point code tries to sneak into kernel space, I'll get an error immediately, instead of finding it when floating point registers get clobbered by an interrupt that used MMX code.
The osdev wiki claims that this the harder but wiser course of action:
Common examples [of beginner mistakes] include being too lazy to use a Cross-Compiler, developing in Real Mode instead of Protected Mode or Long Mode, relying on BIOS calls rather than writing real hardware drivers, using flat binaries instead of ELF, and so on.
Since they know way more about this than I do, I'm going with their suggestions for now. :-)
OK, it took almost a day, but I think I've got this figured out. This is probably overkill for your great blog posts, but I'll leave it here for the next person to pass this way.
Here's the basic strategy to getting an SSE-free, redzone-free kernel:
1. Define a new target x86_64-unknown-none-gnu, where none means running on bare metal without an OS. This can be done by creating a file x86_64-unknown-none-gnu.json and filling it in with the right options. See below. You can just drop this in your top-level build directory and Rust will find it.
2. Check out the same Rust you're compiling with, and patch libcore to remove floating point. You can usually find a current libcore patch in thepowersgang/rust-barebones-kernel on GitHub.
3. Build libcore with --target $(target) --cfg disable_float, and put it in ~/.multirust/toolchains/nightly/lib/rustlib/$(target)/lib.
4. Run cargo normally, specifying your custom target with --target $(target).
Here's my custom x86_64-unknown-none-gnu.json file
{
"llvm-target": "x86_64-unknown-none-gnu",
"target-endian": "little",
"target-pointer-width": "64",
"os": "none",
"arch": "x86_64",
"data-layout": "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128",
"pre-link-args": [ "-m64" ],
"cpu": "x86-64",
"features": "-mmx,-sse,-sse2,-sse3,-ssse3",
"disable-redzone": true,
"eliminate-frame-pointer": false,
"linker-is-gnu": true,
"no-compiler-rt": true,
"archive-format": "gnu"
}This seems like a good approach, because I'm not compiling Rust code for the Linux user-space, and then trying to convince it to run on bare metal. Instead, I'm compiling Rust code against a properly-configured bare metal target, and if I don't like the compiler options, I can quickly change them for all crates. And if any floating point code tries to sneak into kernel space, I'll get an error immediately, instead of finding it when floating point registers get clobbered by an interrupt that used MMX code.
The osdev wiki claims that this the harder but wiser course of action:
Common examples [of beginner mistakes] include being too lazy to use a Cross-Compiler, developing in Real Mode instead of Protected Mode or Long Mode, relying on BIOS calls rather than writing real hardware drivers, using flat binaries instead of ELF, and so on.
Since they know way more about this than I do, I'm going with their suggestions for now. :-)
In theory, "-mmx" means "disable mmx", and so on. I'm attempting to convince Rust and LLVM to leave all those registers alone in kernel space, and to never generate any code which uses them. The goal here is not to need to save that huge block of registers (plus FPU state) on every interrupt. This seems to be a popular choice for x86 kernels.
Does it work? We'll see.
In theory, "-mmx" means "disable mmx", and so on. I'm attempting to convince Rust and LLVM to leave all those registers alone in kernel space, and to never generate any code which uses them. The goal here is not to need to save that huge block of registers (plus FPU state) on every interrupt. This seems to be a popular choice for x86 kernels.
Does it work? We'll see.
Ah, I see... It seems like there is no documentation about this. Hopefully the libcore-without-sse issue gets resolved soon. Manually patching libcore seems like a really ugly solution.
I think I will choose the "slow" solution and just save the sse registers on every interrupt. It's required anyway when switching between (user) processes.
In my opinion the best solution would be an annotation to disable SSE just for the interrupt handlers.
Ah, I see... It seems like there is no documentation about this. Hopefully the libcore-without-sse issue gets resolved soon. Manually patching libcore seems like a really ugly solution.
I think I will choose the "slow" solution and just save the sse registers on every interrupt. It's required anyway when switching between (user) processes.
In my opinion the best solution would be an annotation to disable SSE just for the interrupt handlers.
This is currently also hard on ARM, because I can't work out the features to disable to avoid emitting fpu instructions. I've raised a bug on llvm to seek clarification: https://llvm.org/bugs/show_...
This is currently also hard on ARM, because I can't work out the features to disable to avoid emitting fpu instructions. I've raised a bug on llvm to seek clarification: https://llvm.org/bugs/show_...
Thanks heaps Phil. Just a comment for others that the rlib dependency strategy you describe won't work under a cross-compiling (ie. arm / r-pi) because the multirust nightly won't include the libcore necessary for the dependant crates to build, and they won't refer to the ones inside /target. See here: https://github.com/rust-lan...
Thanks heaps Phil. Just a comment for others that the rlib dependency strategy you describe won't work under a cross-compiling (ie. arm / r-pi) because the multirust nightly won't include the libcore necessary for the dependant crates to build, and they won't refer to the ones inside /target. See here: https://github.com/rust-lan...
Right, have learned a lot in the last month, following you on ARM. I expect I'll need rlibc, but I haven't yet.
What I have needed is `compiler-rt`, which you have avoided because you are building on a (tier 3) supported build target which is [not the case](http://stackoverflow.com/qu...) for `arm-none-eabi`.
Right, have learned a lot in the last month, following you on ARM. I expect I'll need rlibc, but I haven't yet.
What I have needed is `compiler-rt`, which you have avoided because you are building on a (tier 3) supported build target which is [not the case](http://stackoverflow.com/qu...) for `arm-none-eabi`.
You can avoid compiler-rt by using a custom target file that contains `"no-compiler-rt": true`.
You can then use nightly-libcore to cross compile libcore for your new target.
You can avoid compiler-rt by using a custom target file that contains `"no-compiler-rt": true`.
You can then use nightly-libcore to cross compile libcore for your new target.
For OS X/Darwin users who have made it this far:
https://github.com/rust-lan...
It's been a good run, but Apple's modifications to clang and ld for the Darwin system completely destroy rust's existing cross-compile capabilities, which means building an x86_64-compatibile libcore simply isn't possible without monumental amounts of work.
For OS X/Darwin users who have made it this far:
https://github.com/rust-lan...
It's been a good run, but Apple's modifications to clang and ld for the Darwin system completely destroy rust's existing cross-compile capabilities, which means building an x86_64-compatibile libcore simply isn't possible without monumental amounts of work.
You might want to try again, I've been following along with all of the latest posts on macOS 10.11 without problems.
I guess there's a reason you can't enable SSE in your 32-bit assembly file before you switch to long mode?
It should be totally possible to do it in the 32-bit file. I just tested it and it works without problems. I think I used to do more things in the `long_mode_init` file in a previous system so that there was already an error function. But since it's now the only function that needs the 64-bit error function, we could remove that function if we moved the `setup_SEE` function.
Thanks for the hint! I opened an issue for this.
It should be totally possible to do it in the 32-bit file. I just tested it and it works without problems. I think I used to do more things in the `long_mode_init` file in a previous system so that there was already an error function. But since it's now the only function that needs the 64-bit error function, we could remove that function if we moved the `setup_SEE` function.
Thanks for the hint! I opened an issue for this.
on my box, I couldn't reproduce the SSE error. It looks like "a += 1;" doesn't generate SSE instructions anymore by default.
objdump still shows some in what looks like exception handling code but seems to never execute
~/dev/rustOS/$rustc --version
rustc 1.7.0-nightly (81dd3824f 2015-12-11)
~/dev/rustOS/$cargo --version
cargo 0.8.0-nightly (028ac34 2015-12-10)
Thank you very much for this series though, it's probably one of the most interesting ways I've seen to learn about OS boot sequence I've seen
on my box, I couldn't reproduce the SSE error. It looks like "a += 1;" doesn't generate SSE instructions anymore by default.
objdump still shows some in what looks like exception handling code but seems to never execute
~/dev/rustOS/$rustc --version
rustc 1.7.0-nightly (81dd3824f 2015-12-11)
~/dev/rustOS/$cargo --version
cargo 0.8.0-nightly (028ac34 2015-12-10)
Thank you very much for this series though, it's probably one of the most interesting ways I've seen to learn about OS boot sequence I've seen
Thanks a lot! I opened an issue for this.
How about the following example?:
let mut a = ("hello", 42);
a.1 += 1;
so much fun ! Thanks for this 💥🍾🍻 ! Can we have a emoji Hello World ? Just kidding.
Actually there are two smileys in code page 437.
Smiley Hello World:
....
let hello = b"\x02\x01 Hello World! \x01\x02";
let color_byte = 0x1f;
let mut hello_colored = [color_byte; 36];
...
Actually there are two smileys in code page 437.
Smiley Hello World:
....
let hello = b"\x02\x01 Hello World! \x01\x02";
let color_byte = 0x1f;
let mut hello_colored = [color_byte; 36];
...
Awesome! I've made a buffer overrun error, because I've added a comma to the "Hello, World!", and Rust have actually caught it at run time, and started looping.
Using
#[lang = "panic_fmt" ]
extern fn panic_fmt() -> ! {
let buffer_ptr = (0xb8000) as *mut _;
let red = 0x4f;
unsafe {
*buffer_ptr = [b'P', red, b'a', red, b'n', red, b'i', red, b'c', red, b'!', red];
};
loop { }
}
Helped a lot.
Awesome! I've made a buffer overrun error, because I've added a comma to the "Hello, World!", and Rust have actually caught it at run time, and started looping.
Using
#[lang = "panic_fmt" ]
extern fn panic_fmt() -> ! {
let buffer_ptr = (0xb8000) as *mut _;
let red = 0x4f;
unsafe {
*buffer_ptr = [b'P', red, b'a', red, b'n', red, b'i', red, b'c', red, b'!', red];
};
loop { }
}
Helped a lot.
Using the most recent nightly build for this, the no-landing-pads snippet also generates SSE, so it's a bit of a two-for-one. :)
I think that libcore needs both SSE and SSE2 to be supported. Shouldn't you check the SSE2 CPUID flag to be sure that both SSE and SSE2 is present? Not sure if it could cause any problems within libcore later on
Good catch! However, SSE2 should always be available if the long mode is available. Citing the OSDev wiki:
When the X86-64 architecture was introduced, AMD demanded a minimum level of SSE support to simplify OS code. Any system capable of long mode should support at least SSE and SSE2
So SSE and SSE2 should always be available in our case (if the wiki is correct). So we could even remove the SSE check. However, I think it's better to leave it in, because we enable SSE before switching to long mode.
Good catch! However, SSE2 should always be available if the long mode is available. Citing the OSDev wiki:
When the X86-64 architecture was introduced, AMD demanded a minimum level of SSE support to simplify OS code. Any system capable of long mode should support at least SSE and SSE2
So SSE and SSE2 should always be available in our case (if the wiki is correct). So we could even remove the SSE check. However, I think it's better to leave it in, because we enable SSE before switching to long mode.
Thanks again for sharing this! FYI, the link https://doc.rust-lang.org/std/rt/unwind/ in http://os.phil-opp.com/set-... is broken.
Thanks again for sharing this! FYI, the link https://doc.rust-lang.org/std/rt/unwind/ in http://os.phil-opp.com/set-... is broken.
#CODE FOR ANIMATED TEXT
If you want text that is moving around the screen like a snake, get that code and replace you rust_main function with it:
#[no_mangle]
pub extern fn rust_main() {
let color_byte: u16 = 0x1f;
let ascii_byte: u16 = 32;
let empty_2byte_character: u16 = (color_byte << 8) | ascii_byte;
let mut poz = 0;
while poz < 4000
{
let buffer = (0xb8000 + poz ) as *mut _;
unsafe{*buffer = empty_2byte_character };
poz+=2;
}
let text = b"ANIMATED TEXT!!!! ->";
let color_byte = 0x1f;
let mut text_colored = [color_byte; 44];
for(i, char_byte) in text.into_iter().enumerate(){
text_colored[i*2] = *char_byte;
}
//animate
let mut offset = 0;
let mut done = false;
let mut delay_counter = 0;
while !done
{
let poprzedni = (0xb8000 + offset) as *mut _;
unsafe{*poprzedni = empty_2byte_character };
offset+=2;
let buffer_ptr = (0xb8000 + offset) as *mut _;
unsafe{*buffer_ptr = text_colored };
if offset == (4000-44)
{
done = true;
}
while delay_counter < 10000000
{
delay_counter+=1;
}
delay_counter = 0;
}
loop
{
}
}
#CODE FOR ANIMATED TEXT
If you want text that is moving around the screen like a snake, get that code and replace you rust_main function with it:
#[no_mangle]
pub extern fn rust_main() {
let color_byte: u16 = 0x1f;
let ascii_byte: u16 = 32;
let empty_2byte_character: u16 = (color_byte << 8) | ascii_byte;
let mut poz = 0;
while poz < 4000
{
let buffer = (0xb8000 + poz ) as *mut _;
unsafe{*buffer = empty_2byte_character };
poz+=2;
}
let text = b"ANIMATED TEXT!!!! ->";
let color_byte = 0x1f;
let mut text_colored = [color_byte; 44];
for(i, char_byte) in text.into_iter().enumerate(){
text_colored[i*2] = *char_byte;
}
//animate
let mut offset = 0;
let mut done = false;
let mut delay_counter = 0;
while !done
{
let poprzedni = (0xb8000 + offset) as *mut _;
unsafe{*poprzedni = empty_2byte_character };
offset+=2;
let buffer_ptr = (0xb8000 + offset) as *mut _;
unsafe{*buffer_ptr = text_colored };
if offset == (4000-44)
{
done = true;
}
while delay_counter < 10000000
{
delay_counter+=1;
}
delay_counter = 0;
}
loop
{
}
}
With the latest rust nightly I was getting linker errors after pulling in in the rlibc crate:
target/x86_64-unknown-linux-gnu/debug/libblog_os.a(core-93f19628b61beb76.0.o): In function `core::panicking::panic_fmt':
/buildslave/rust-buildbot/slave/nightly-dist-rustc-linux/build/src/libcore/panicking.rs:69: undefined reference to `rust_begin_unwind'
make: *** [build/kernel-x86_64.bin] Error 1
Apparently the later versions of the compiler are pretty strict about mangling almost anything they can for optimization. Usually the panic_fmt symbol becomes rust_begin_unwind (for some reason), but now it's getting mangled and so the linker can't find that symbol - it's a pretty cryptic error with discussion at https://github.com/rust-lan...
To fix it, you need to mark panic_fmt with no_mangle as well, so the line in lib.rs becomes:
#[lang = "panic_fmt"] #[no_mangle] extern fn panic_fmt() -> ! {loop{}}
This allows it to build properly.
With the latest rust nightly I was getting linker errors after pulling in in the rlibc crate:
target/x86_64-unknown-linux-gnu/debug/libblog_os.a(core-93f19628b61beb76.0.o): In function `core::panicking::panic_fmt':
/buildslave/rust-buildbot/slave/nightly-dist-rustc-linux/build/src/libcore/panicking.rs:69: undefined reference to `rust_begin_unwind'
make: *** [build/kernel-x86_64.bin] Error 1
Apparently the later versions of the compiler are pretty strict about mangling almost anything they can for optimization. Usually the panic_fmt symbol becomes rust_begin_unwind (for some reason), but now it's getting mangled and so the linker can't find that symbol - it's a pretty cryptic error with discussion at https://github.com/rust-lan...
To fix it, you need to mark panic_fmt with no_mangle as well, so the line in lib.rs becomes:
#[lang = "panic_fmt"] #[no_mangle] extern fn panic_fmt() -> ! {loop{}}
This allows it to build properly.
Hello, I have reached the stage of panic = "abort", but when I make run I get this error: target/x86_64-os/debug/libos.a(core-9a5ada2b08448709.0.o): In function core::panicking::panic_fmt':
+Antworten
Hello, I have reached the stage of panic = "abort", but when I make run I get this error: target/x86_64-os/debug/libos.a(core-9a5ada2b08448709.0.o): In function core::panicking::panic_fmt':
core.cgu-0.rs:(.text.cold._ZN4core9panicking9panic_fmt17h6b6d64bae0e8a2c2E+0x88): undefined reference torust_begin_unwind', I really have no clue what is happening here as I have exactly the same code as you do above.
EDIT: I have read some of the above comments, turns out that other people were having the same issue as me and I have used their solutions. Sorry to waste your time.
There is an error when changing the makefile(roughly in the middle of this post). This part
+AntwortenThere is an error when changing the makefile(roughly in the middle of this post). This part
which stands for “garbage collect sections”. Let's add it to the $(kernel) target in our Makefile: $(kernel): xargo $(rust_os) $(assembly_object_files) $(linker_script) @@ -400,19 +400,19 @@ $(kernel): xargo $(rust_os) $(assembly_object_files) $(linker_script)
$(kernel): kernel $(rust_os) $(assembly_object_files) $(linker_script) @ld -n --gc-sections -T $(linker_script) -o $(kernel) \ - $(assembly_object_files) $(rust_os)
Hey, loving the tutorials :) though I'm running into an issue when using xargo build.
+Antworten
Hey, loving the tutorials :) though I'm running into an issue when using xargo build.
When I do xargo build --target=x86_64-blog_os, I get the following error:
error: failed to parse manifest at '/home/max/TesterOS/src/Cargo.toml'
xargo build --target=x86_64-blog_os, I get the following
Because when I saw src/lib.rs in the tutorial, I just saved lib.rs in the src file we created.
Is it something to do with where I placed my Cargo.toml or/and x86_64-blog_os.json file?
-Really confused here.
Cargo assumes that the lib.rs file is in a subfolder named src. So it doesn't work if you put the lib.rs next to the Cargo.toml.
Thanks a million, it's all sorted now :). One more issue though. +Antworten
Thanks a million, it's all sorted now :). One more issue though.
About the rlibc... make run seems to work fine without extern crate rlibc... But fails when I do add it in, saying it can't compile rlibc.
Sorry to be a bother lol I'm a newb.
There is currently a problem with cargo/xargo, maybe this is affects you: https://github.com/phil-opp/blog_os/issues/379
There is currently a problem with cargo/xargo, maybe this is affects you: https://github.com/phil-opp/blog_os/issues/379
There is some interesting discussion on reddit.