Re: [RFC] MPU support for debugging


Benjamin Walsh <benjamin.walsh@...>
 

Some random thoughts on this, do with them as you wish. :)

To get more detailed feedback, I think you need to document how exactly
you intend on using the MPU.

Like:

- region 0 to protect the kernel test+rodata (supervisor, ro)
- region 1 to protect the kernel data and kernel thread stacks
(supervisor, rw)
- region 2 for text+rodata for all user threads (user, ro)
- region 3 for data+stacks of threads in group A (user, rw, nx (if
available))
- region 4 for data+stacks of threads in group B (user, rw, nx (if
available))
- region 5 for shared memory between group A and group B (always
accessible) (user, rw, nx (if available))
- regions are fixed
- privileges for region 3 change from user to supervisor when a thread
in group B gets context switched in
- privileges for region 4 change from user to supervisor when a thread
in group A gets context switched in
- privileges for region 3 change from supervisor to user when a thread
in group A gets context switched in
- privileges for region 4 change from supervisor to user when a thread
in group B gets context switched in

or

- region 0 to protect the kernel test+rodata (supervisor, ro)
- region 1 to protect the kernel data and kernel thread stacks
(supervisor, rw)
- region 2 of text+rodata for all user threads (user, ro)
- region 3 for data+stack of currently running user thread (user, rw, nx
(if available)).
- region 4 for shared memory between all threads (always accessible)
(user, rw, nx (if available)).
- region 3 base addr + size get reprogrammed on user threads context
switches to match incoming thread's data+stack section
- kernel can reprogram region 3 base addr + size to match a thread it
wants to pass data to in a user-space buffer

or something else.

On Fri, Mar 10, 2017 at 10:11:20PM +0000, Boie, Andrew P wrote:
This is not a fully-fleshed out specification, but since there is a
lot of interest in memory protection in Zephyr on multiple fronts I
thought this would be a good way to get the conversation going on how
we can use an MPU in Zephyr and what implications it has for the OS
design. Hoping to get a consensus on the scope and capabilities of
this before diving too deep into implementation details.

TL;DR SUMMARY

Memory protection could be used to greatly assist in kernel debugging.
At this time I think it should be positioned as an optional debug
feature and not a security feature. We want to target common MPU
features found on MCUs. Systems without MPU hardware, but have an MMU,
can use MMU with identity page tables instead. We want to be able to
create "user" threads with reduced privileges on what memory they can
write to, so that if they overflow their stack, branch to non-code, or
write to memory that doesn't belong to them, the system will trigger a
fatal error instead of silently wreaking havoc. We want to implement
this in a way that is not invasive to the current Zephyr driver/OS
APIs.

PROBLEM STATEMENTS

As I'm sure we all know, debugging multi-threaded applications is a
complex task, made much more complex if threads can silently corrupt
each other or the OS. Thread stack overflows are also very difficult
to debug, with the usual symptom being strange, unpredictable behavior
until the system eventually crashes.

We'd like to introduce support for memory protection units in Zephyr
to help with these debug issues. To me the following characteristics
would be desirable:

- Introduce the notion of User/Supervisor threads. User threads would
only have write access to their own stack, plus some additional memory
^^^^^^^^^^^^^^^
And whatever other stacks are in the same protection region: stack
overflow will only corrupt the current region, but they can still happen
and wreck havoc within the protection area. Or are you thinking of
rewriting the memory protection area boundaries on every context switch
to with bounds around the current thread's stack ? I don't mean only
changing the permissions, but the boundaries themselves ?

regions parameterized at runtime.

- Illegal memory access by a user thread, stack overflow, or branching
the PC to memory that is not code should result in a fatal system
error, using SysFatalErrorHandler.

- Kernel and driver APIs should be the same regardless of whether
memory protection is used. For example, during debugging the end user
may enable memory protection to catch issues, but for the final
production build it would be switched off for performance reasons.

- For at least the initial implementation of this feature we should
try to do it changing any existing public kernel or driver APIs as
little as possible, although we will need to introduce some new ones.

MPU LIMITATIONS

It's important to keep in mind the capabilities of a typical MPU unit.
In all the implementations I have seen, the memory regions need to be
aligned to a multiple of their size, and the size must be a power of
two. (ARM lets you further subdivide any given region into 8
sub-regions which can be selectively enabled). Regions may overlap,
but there can only be a fixed number of regions active at any given
time, with 8 or 16 being common.

Some arches like x86 don't have MPU hardware, but if an MMU is present
it can be configured to act like an MPU using an identity page table.

THIS IS NOT A SECURITY FEATURE
This bothers me a bit. If you ever want to turn this into a more
fleshed-out user-space model, would you introduce a third API/model or
modify them again ?

My gut feeling would be to design for the full "user-space" model, and
then have the reduced-capability MPU-model be a special case of it.

Then again, depends on the timeframe and how future-proof you want to
design this.

If you look at all the kernel APIs in kernel.h, they all take pointers
to various data structures owned by the caller. Let's take a Zephyr
timer as an example. If you want to use a timer you need to declare a
struct k_timer and then make kernel API calls passing its pointer
value, which will involve a privilege elevation for the system to do
its work. Some of the members of the k_timer struct are really private
to the kernel and we could cause all kinds of havoc if garbage data is
passed in, or the k_timer struct is corrupted in some way, or someone
was trying to do something malicious.

We have a similar problem with drivers, where pointers to struct
device are passed around and dereferenced in order to get at their API
calls. At a fundamental level, Zephyr in its current state is designed
to be application+kernel all rolled into one address space.

I do not see any good way around this problem unless we completely
redesign all kernel and driver APIs:
Yeah, I think you would need a different API. With pre-zephyr, when we
added user-space and allowed the microkernel APIs from user-space, we
were playing with IDs already (the microkernel API referenced IDs, not
addresses), and we still had to put a bunch of restrictions on what the
APIs could do from user-space, which made the API clunky to use.

You could supplement the API with user-space "wrappers" that allocate
and initialize objects and return and operate on an ID, rather than
playing with addresses.

e.g.

k_sem_id_t k_sem_create(int limit, int count)

or maybe

z_sem_id_t z_sem_create(int limit, int count)

since they are not really kernel APIs when called from users-space, and then

int z_sem_take(z_sem_id_t id, int32_t timeout)

Then you need a pool of object or a real kernel memory allocator.

- Data structures private to the kernel for kernel objects would need
to be completely inaccessible to user threads.

- User threads wanting to work with some kernel object would not
allocate memory for it and pass in a pointer. We'd need an API where
the user thread requests an object (like a timer), the kernel
allocates space for it somehow, and returns to the user thread a
descriptor or handle to use when performing API calls.
Oh well, I wrote the previous comment before getting to this part. :)

- For situations where we only want to use memory protection for
debugging, this layer of abstraction in all the APIs to use
descriptors instead of pointers would impose a lot of unnecessary
overhead for situations where MPU is disabled or not available.

Ideas are very welcome here, but my feeling is that "preventing
threads from crashing the OS in all cases" or "prevent threads from
doing malicious things" is probably not completely feasible in Zephyr
unless we transform it into a remarkably different OS than it is now.
I think this needs more of a "split" between kernel and "user-space"
APIs than a redesign. But then, kernel applications could not be rebuilt
as user-space applications. Applications using the user-space API could
probably be rebuilt as kernel applications, if we provided the
user-space API as a very thin layer around the current kernel APIs in
the kernel. Nothing would prevent removing the user-space layer and run
kernel-only.

Given the limitations of a typical MPU I suspect this would probably
require an MMU to do it right. I feel there are a lot of things we can
do to help with debugging if we assume good intent and are not trying
to protect against malicious behavior, or garbage being passed to
kernel APIs.

In this proposal, threads can switch between supervisor and user
privileges at will. This greatly simplifies the implementation:

- No need for system calls for privilege elevation. Through macro
magic, if memory protection is turned on we can wrap kernel/driver
What do you mean by "macro magic" ? There has to be a mechanism to
trigger the privilege elevation. On ARM, you have to use the SVC
instruction, handle the exception, write the CONTROL register, and come
back to the next instruction after SVC. That's basically one system
call, which purpose is to elevate the privilege of the current thread.

APIs with some code that elevates privileges before calling into the
kernel, check that any supplied pointer values are actually owned by
the calling thread, and drop privileges when returning to the caller.

- Many drivers and subsystems allow for the registration of callbacks
upon events. Some of these run in ISR context and should be left
as-is. However for example in the network stack you can supply
callbacks invoked by supervisor network stack worker threads to
perform protocol parsing on the application side, and it would be
great if these callbacks could run in a reduced set of privileges. The
worker threads in the subsystem that make the callbacks could lower
their privileges (granting access to buffer memory regions if
necessary) when making the callback, restoring them when it returns.

THREAD PERMISSIONS

By default all threads start their life in supervisor mode. I think
what Vincenzo recently uploaded is a good policy for supervisor
threads: https://gerrit.zephyrproject.org/r/#/q/topic:zephyr_mpu.

Threads may drop down to user mode with an API call. This would grant
the thread write access to only its stack memory, plus additional
memory regions parameterized at runtime. Threads may go back to
supervisor mode with another API call, there's no restrictions on
moving back and forth between these states.

Even though we allow threads to move between user and supervisor mode
at will, we are trying to catch mistakes. So some kernel/driver APIs
that are intended to be called from user mode will have an implicit
privilege elevation, but others which are intended to be
supervisor-only will not.

For globals (data/bss) I think we should separate the data used by the
OS (libzephyr.a) and globals declared by the application, put them in
different memory regions. We should make it easy to grant any given
thread write access to globals defined within the application. I'm
thinking of a programming model similar to pthreads:

each pthread's stack is private
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

That's not really true: within a process, threads can stomp on each
other's stacks (that said, I haven't done a lot of pthread programming,
and I don't know if e.g. the Linux kernel allows protecting a thread's
stack from the other threads within a process).

and you can't touch OS data structures, but all the
pthreads can access/modify globals within the process they run in. In
Zephyr we more or less have many threads running in exactly one
process.

Additionally, application globals that are never intended to be
touched by user threads could be tagged with a __supervisor variable
attribute which will roll them in with the other OS globals.

ZEPHYR IMPLICATIONS

Rough to-do list to make this happen:

- Fully specify the relevant API calls for this feature

- Move the struct k_thread out of the thread stack where it currently
lives. We don't want the thread clobbering this if too much stack is
used, which is currently what happens since it's at the lowest memory
addresses of the stack. May need to reserve some additional memory on
the kernel side for this data to live in.
Normally, you need a kernel stack in addition to the user-space stack to
do this right. In pre-zephyr user-space, we tried doing everything on
the user-space stack to save space, but finally abandoned that approach
when facing too many issues.

- Some APIs which assume use of the system heap for all allocations may
need to be extended for user-supplied memory pools

- Separate out kernel globals (bss/data) from application globals. A
typical user thread may need to have access to application globals,
but should never need to write to kernel data structures in a
non-supervisor state. Application globals tagged with __supervisor
would be put with the OS data structures.

- Implement magic for implicit privilege elevation in kernel/driver
APIs when calling them from user mode

- Define a memory protection API specification between the core kernel
and arch/ code which actually talks to the MPU (or MMU)

- Possible adjustments to how the network stack does callbacks

- Test cases to prove that this all works

Join devel@lists.zephyrproject.org to automatically receive all group messages.