Skip to content

Commit

Permalink
Merge branch 'master' of github.com:fxlin/p1-kernel
Browse files Browse the repository at this point in the history
  • Loading branch information
fxlin committed Feb 11, 2024
2 parents c5fb0b8 + 9796b9d commit 3c1026f
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 8 deletions.
20 changes: 12 additions & 8 deletions docs/exp5/rpi-os.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,16 +33,19 @@ Please: do a `git pull` even if you have cloned the p1-kenel repo previously, in

Each system call is a synchronous exception. A user program prepares all necessary arguments, and then runs `svc` instruction. Such exceptions are handled at EL1 by the kernel. The kernel validates all arguments, does the syscall, and exits from the exception. After that, the user task resumes at EL0 right after the `svc` instruction.

* EL0 files: sys.h and sys.S
* EL1 files: sys.h and sys.c

![](figures/timeline-0.png)

We have 4 simple syscalls:
We have 4 simple syscalls (cf. sys.h):

1. `write` outputs to UART. It accepts a buffer with the text to be printed as the first argument.
1. `clone` creates a new user thread. The location of the stack for the newly created thread is passed as the first argument.
1. `malloc` allocates a memory page for a user process. There is no analog of this syscall in Linux (and I think in any other OS as well.) The only reason that we have no virtual memory yet, and all user processes work with physical memory addresses. Each process needs a way to figure out which memory page can be used. `malloc` returns pointer to the newly allocated page or -1 in case of an error.
1. `exit` Each process must call this syscall after it finishes execution. It will do cleanup.

All syscalls are defined in `sys.c`. There is also an array [sys_call_table](https://github.com/fxlin/p1-kernel/blob/master/src/exp5/src/sys.c) that contains pointers to all syscall handlers. Each syscall has a "syscall number" — this is just an index in the `sys_call_table` array. All syscall numbers are defined [here](https://github.com/fxlin/p1-kernel/blob/master/src/exp5/include/sys.h#L6) — they are used by the assembler code to look up syscall.
An array [sys_call_table](https://github.com/fxlin/p1-kernel/blob/master/src/exp5/src/sys.c) (sys.c) contains pointers to all syscall handlers. Each syscall has a "syscall number" — this is just an index in the `sys_call_table` array. All syscall numbers are defined [here](https://github.com/fxlin/p1-kernel/blob/master/src/exp5/include/sys.h#L6) — they are used by the assembler code to look up syscall.

Let's use `write` syscall as an example:

Expand All @@ -55,18 +58,18 @@ call_sys_write:
ret
```

Simple -- the wrapper stores the syscall number in the `w8` register and does `svc`. Convention: registers `x0``x7`are used for syscall arguments and `x8` is used to store syscall number. This allows a syscall to have up to 8 arguments.
The wrapper stores the syscall number in the `w8` register and does `svc`. Convention: registers `x0``x7`are used for syscall arguments and `x8` is used to store syscall number. This allows a syscall to have up to 8 arguments.

In commodity OSes, such wrapper functions are usually in user library such as [glibc](https://www.gnu.org/software/libc/) but not in the kernel.

### Switching between EL0 and EL1

We need this new mechanism. It's in the same spirit as we move from EL2/3 to EL1. (Recall: how did we do it?)
Our kernel should support switches between EL1/EL1, and between EL1/EL0.

Previously, our kernel runs at EL1; when an interrupt occurs, it takes the interrupt at EL1. Now, we need to take exception (svc) from EL0 to EL1. To accommodate this, both `kernel_entry` and `kernel_exit` macros accepts an additional argument `el`, indicating the EL an exception is taken from. The information is required to properly save/restore stack pointer. Here are the two relevant parts from the `kernel_entry` and `kernel_exit` macros.
Previously, our kernel runs at EL1; when an interrupt occurs, it takes the interrupt at EL1. Now, we need to take exception (svc) from EL0 to EL1. To accommodate this, both `kernel_entry` and `kernel_exit` macros accepts an additional argument `el`, indicating the EL an exception is taken from. The information is required to properly save/restore stack pointer.

```assembly
// kernel_entry
// kernel_entry (entry.S)
.if \el == 0
mrs x21, sp_el0
.else
Expand All @@ -83,13 +86,13 @@ msr sp_el0, x21
eret
```

Even for the same task, we are using 2 distinct stacks for EL0 and EL1. This is a common design because we want to separate user/kernel.
Even for the same task, we are using 2 distinct stacks for EL0 and EL1. This is needed to separate user/kernel.

Supported by CPU hardware, after taking an exception from EL0 to EL1, the CPU automatically starts use the SP for EL1. The SP for EL0 can be found in the `sp_el0` register.

The value of this register must be stored and restored upon entering/exiting the kernel, even if the kernel does not use `sp_el0` in the exception handler. Reason: we need to virtualize `sp_el0` for each task because each task has its own user stack. Try to visualize this in your mind.

When we do `kernel_exit`, how do we specify which EL to return to, EL0 or EL1? This EL level is encoded in the `spsr_el1` register that was saved, e.g. when syscall enters the kernel. So we always return to the level from which the exception was taken.
When we do `kernel_exit`, the EL to return to (EL0 or EL1) is encoded in the `spsr_el1` register that was saved, e.g. when syscall enters the kernel. So we always return to the level from which the exception was taken.

> How did we treat SP when taking interrupts (from EL1)? Revisit the figures in previous experiments.
Expand All @@ -115,6 +118,7 @@ el0_sync:
* `esr_el1` (Exception Syndrome Register) is checked. This register contains "exception class" field at offset [ESR_ELx_EC_SHIFT](https://github.com/fxlin/p1-kernel/blob/master/src/exp5/include/arm/sysregs.h#L46). If exception class is equal to [ESR_ELx_EC_SVC64](https://github.com/fxlin/p1-kernel/blob/master/src/exp5/include/arm/sysregs.h#L47) this means that the current exception is caused by the `svc` instruction and it is a system call. In this case, we jump to `el0_svc` label and show an error message otherwise.

```
// entry.S
sc_nr .req x25 // number of system calls
scno .req x26 // syscall number
stbl .req x27 // syscall table pointer
Expand Down
9 changes: 9 additions & 0 deletions docs/exp6/rpi-os.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,7 @@ As I mentioned in the previous section, each block descriptor contains a set of
ARMv8 architecture introduces `mair_el1` register. See [its definition](https://developer.arm.com/docs/ddi0595/b/aarch64-system-registers/mair_el1). This register consists of 8 slots, each spanning 8 bits. Each slot configures a common set of attributes. A descriptor then specifies just an index of the `mair` slot, instead of specifying all attributes directly. This allows using only 3 bits in the descriptor to reference a `mair` slot. We are using only a few of available attribute options. [Here](https://github.com/fxlin/p1-kernel/blob/master/src/exp6/include/arm/mmu.h#L11) is the code that prepares values for the `mair` register.

```
// arm/mmu.h
/*
* Memory region attributes:
*
Expand Down Expand Up @@ -232,6 +233,7 @@ As we know that no code will use `x29` during `__create_page_tables` execution,
> Q: What could go wrong if we push x30 to stack here?
```
// boot.S
adrp x0, pg_dir // adrp: form PC-relative address to 4KB page
mov x1, #PG_DIR_SIZE
bl memzero
Expand All @@ -256,6 +258,7 @@ Now we are going to step outside `__create_page_tables` function and take a look
`create_table_entry` is responsible for allocating a new page table (In our case either PGD or PUD) The source code is listed below.

```
// boot.S
.macro create_table_entry, tbl, virt, shift, tmp1, tmp2
lsr \tmp1, \virt, #\shift
and \tmp1, \tmp1, #PTRS_PER_TABLE - 1 // table index
Expand Down Expand Up @@ -319,6 +322,7 @@ Finally, we change `tbl` parameter to point to the next page table in the hierar
Next important macro is`create_block_map`. As you might guess this macro is responsible for populating entries of the PMD table. It looks like the following.

```
// boot.S
.macro create_block_map, tbl, phys, start, end, flags, tmp1
lsr \start, \start, #SECTION_SHIFT
and \start, \start, #PTRS_PER_TABLE - 1 // table index
Expand Down Expand Up @@ -396,6 +400,7 @@ The final part of the function is executed inside a loop. Here we first store cu
Now, when you understand how `create_table_entry` and `create_block_map` macros work, it will be straightforward to understand the rest of the `__create_page_tables` function.

```
// boot.S
adrp x0, pg_dir
mov x1, #VA_START
create_pgd_entry x0, x1, x2, x3
Expand Down Expand Up @@ -503,6 +508,7 @@ Another question: why `ldr x2, =kernel_main` itself must be executed before we t
Commodity kernels load user programs as ELF from filesystems. We won't be building a filesystem or ELF loader in this experiment. As a workaround, we will embed user programs in the kernel binary at link time, and load them at run time. For easy loading, we will store the user program in a separate ELF section of the kernel binary. Here is the relevant section of the linker script that is responsible for doing this.

```
//linker-qemu.ld (or linker.ld)
. = ALIGN(0x00001000);
user_begin = .;
.text.user : { build/user* (.text) }
Expand Down Expand Up @@ -828,6 +834,7 @@ If you go back and take a look at the `move_to_user_mode` function, you may noti
When a process tries to access some address which belongs to the page that is not yet mapped, a synchronous exception is generated. This is the second type of synchronous exception that we are going to support (the first type is an exception generated by the `svc` instruction which is a system call). Synchronous exception handler now looks like the following.

```
// entry.S
el0_sync:
kernel_entry 0
mrs x25, esr_el1 // read the syndrome register
Expand All @@ -842,6 +849,7 @@ el0_sync:
Here we use `esr_el1` register to determine exception type. If it is a page fault exception (or, which is the same, data access exception) `el0_da` function is called.

```
// entry.S
el0_da:
bl enable_irq
mrs x0, far_el1
Expand All @@ -865,6 +873,7 @@ el0_da:
`do_mem_abort` is listed below.

```
// mm.c
int do_mem_abort(unsigned long addr, unsigned long esr) {
unsigned long dfs = (esr & 0b111111);
if ((dfs & 0b111100) == 0b100) {
Expand Down

0 comments on commit 3c1026f

Please sign in to comment.