[RFC] MPU support for debugging
Boie, Andrew P
This is not a fully-fleshed out specification, but since there is a lot of interest in memory protection in Zephyr on multiple fronts I thought this would be a good way to get the conversation going on how we can use an MPU in Zephyr and what implications it has for the OS design. Hoping to get a consensus on the scope and capabilities of this before diving too deep into implementation details.
Memory protection could be used to greatly assist in kernel debugging. At this time I think it should be positioned as an optional debug feature and not a security feature. We want to target common MPU features found on MCUs. Systems without MPU hardware, but have an MMU, can use MMU with identity page tables instead. We want to be able to create "user" threads with reduced privileges on what memory they can write to, so that if they overflow their stack, branch to non-code, or write to memory that doesn't belong to them, the system will trigger a fatal error instead of silently wreaking havoc. We want to implement this in a way that is not invasive to the current Zephyr driver/OS APIs.
As I'm sure we all know, debugging multi-threaded applications is a complex task, made much more complex if threads can silently corrupt each other or the OS. Thread stack overflows are also very difficult to debug, with the usual symptom being strange, unpredictable behavior until the system eventually crashes.
We'd like to introduce support for memory protection units in Zephyr to help with these debug issues. To me the following characteristics would be desirable:
- Introduce the notion of User/Supervisor threads. User threads would only have write access to their own stack, plus some additional memory regions parameterized at runtime.
- Illegal memory access by a user thread, stack overflow, or branching the PC to memory that is not code should result in a fatal system error, using SysFatalErrorHandler.
- Kernel and driver APIs should be the same regardless of whether memory protection is used. For example, during debugging the end user may enable memory protection to catch issues, but for the final production build it would be switched off for performance reasons.
- For at least the initial implementation of this feature we should try to do it changing any existing public kernel or driver APIs as little as possible, although we will need to introduce some new ones.
It's important to keep in mind the capabilities of a typical MPU unit. In all the implementations I have seen, the memory regions need to be aligned to a multiple of their size, and the size must be a power of two. (ARM lets you further subdivide any given region into 8 sub-regions which can be selectively enabled). Regions may overlap, but there can only be a fixed number of regions active at any given time, with 8 or 16 being common.
Some arches like x86 don't have MPU hardware, but if an MMU is present it can be configured to act like an MPU using an identity page table.
THIS IS NOT A SECURITY FEATURE
If you look at all the kernel APIs in kernel.h, they all take pointers to various data structures owned by the caller. Let's take a Zephyr timer as an example. If you want to use a timer you need to declare a struct k_timer and then make kernel API calls passing its pointer value, which will involve a privilege elevation for the system to do its work. Some of the members of the k_timer struct are really private to the kernel and we could cause all kinds of havoc if garbage data is passed in, or the k_timer struct is corrupted in some way, or someone was trying to do something malicious.
We have a similar problem with drivers, where pointers to struct device are passed around and dereferenced in order to get at their API calls. At a fundamental level, Zephyr in its current state is designed to be application+kernel all rolled into one address space.
I do not see any good way around this problem unless we completely redesign all kernel and driver APIs:
- Data structures private to the kernel for kernel objects would need to be completely inaccessible to user threads.
- User threads wanting to work with some kernel object would not allocate memory for it and pass in a pointer. We'd need an API where the user thread requests an object (like a timer), the kernel allocates space for it somehow, and returns to the user thread a descriptor or handle to use when performing API calls.
- For situations where we only want to use memory protection for debugging, this layer of abstraction in all the APIs to use descriptors instead of pointers would impose a lot of unnecessary overhead for situations where MPU is disabled or not available.
Ideas are very welcome here, but my feeling is that "preventing threads from crashing the OS in all cases" or "prevent threads from doing malicious things" is probably not completely feasible in Zephyr unless we transform it into a remarkably different OS than it is now. Given the limitations of a typical MPU I suspect this would probably require an MMU to do it right. I feel there are a lot of things we can do to help with debugging if we assume good intent and are not trying to protect against malicious behavior, or garbage being passed to kernel APIs.
In this proposal, threads can switch between supervisor and user privileges at will. This greatly simplifies the implementation:
- No need for system calls for privilege elevation. Through macro magic, if memory protection is turned on we can wrap kernel/driver APIs with some code that elevates privileges before calling into the kernel, check that any supplied pointer values are actually owned by the calling thread, and drop privileges when returning to the caller.
- Many drivers and subsystems allow for the registration of callbacks upon events. Some of these run in ISR context and should be left as-is. However for example in the network stack you can supply callbacks invoked by supervisor network stack worker threads to perform protocol parsing on the application side, and it would be great if these callbacks could run in a reduced set of privileges. The worker threads in the subsystem that make the callbacks could lower their privileges (granting access to buffer memory regions if necessary) when making the callback, restoring them when it returns.
By default all threads start their life in supervisor mode. I think what Vincenzo recently uploaded is a good policy for supervisor threads: https://gerrit.zephyrproject.org/r/#/q/topic:zephyr_mpu.
Threads may drop down to user mode with an API call. This would grant the thread write access to only its stack memory, plus additional memory regions parameterized at runtime. Threads may go back to supervisor mode with another API call, there's no restrictions on moving back and forth between these states.
Even though we allow threads to move between user and supervisor mode at will, we are trying to catch mistakes. So some kernel/driver APIs that are intended to be called from user mode will have an implicit privilege elevation, but others which are intended to be supervisor-only will not.
For globals (data/bss) I think we should separate the data used by the OS (libzephyr.a) and globals declared by the application, put them in different memory regions. We should make it easy to grant any given thread write access to globals defined within the application. I'm thinking of a programming model similar to pthreads: each pthread's stack is private, and you can't touch OS data structures, but all the pthreads can access/modify globals within the process they run in. In Zephyr we more or less have many threads running in exactly one process.
Additionally, application globals that are never intended to be touched by user threads could be tagged with a __supervisor variable attribute which will roll them in with the other OS globals.
Rough to-do list to make this happen:
- Fully specify the relevant API calls for this feature
- Move the struct k_thread out of the thread stack where it currently lives. We don't want the thread clobbering this if too much stack is used, which is currently what happens since it's at the lowest memory addresses of the stack. May need to reserve some additional memory on the kernel side for this data to live in.
- Some APIs which assume use of the system heap for all allocations may need to be extended for user-supplied memory pools
- Separate out kernel globals (bss/data) from application globals. A typical user thread may need to have access to application globals, but should never need to write to kernel data structures in a non-supervisor state. Application globals tagged with __supervisor would be put with the OS data structures.
- Implement magic for implicit privilege elevation in kernel/driver APIs when calling them from user mode
- Define a memory protection API specification between the core kernel and arch/ code which actually talks to the MPU (or MMU)
- Possible adjustments to how the network stack does callbacks
- Test cases to prove that this all works