Re: Design Philosophy 'Minimal runtime error checking'...

Benjamin Walsh <benjamin.walsh@...>

Hi Marcus,

The discussion in this issue has prompted me to
raise this topic in the wider forum.

Zephyr has a philosophy of "minimal runtime error checking", it says
so here

There is little other written material on this topic (that I can
find), but in various jira issue comments and gerrit reviews there are
occasional references to this topic.

In the absence of any more detailed written material on this topic it
is difficult to reason about what any arbitrary piece of the code base
should or should not do in order to implement this philosophy
The approache taken by the core kernel is pretty simple:

- if an error happens because it's the user's fault, __ASSERT(): the
kernel doesn't care, you did something stupid :)

- if the error comes from proper usage and thus an internal error path,
return an -errno to user

This is why you will see almost no -EINVAL being returned from the core
kernel API. The only -EINVAL values returned come from a parameter that
should have been valid from the user's point-of-view, but the kernel
state changed and made it invalid. I did a quick grep for the kernel,
and there are only two instances of it.

You can take a look at how I used __ASSERT() and errnos in the new
k_poll() API to give you an idea, but it's pretty straightforwared.


Looking through the code base there is a fair amount of evidence to
suggest that interpretation of the philosophy varies considerably. By
way of example, if we look in the drivers tree, and look at the
prevalence of -E* return codes and ASSERT* static trapping, we find:

133 source files use -E* return codes
53 source files use ASSERT traps

The break down the use of specific -E* return codes and ASSERT* macros
across those files we find:

443 -EIO


This suggests, that many if not most drivers detect and pass error
codes dynamically rather than ASSERTing.

The use of -EIO is extensively used for general failure in the driver,
perhaps this is one group where the design philosophy might suggest
ASSERT should be used more aggressively.

The use of -EINVAL is predominantly associated with API 'configure'
implementation, sanity checking of input parameters. This seems
reasonable to me. But it could be argued that it is a static error
for a user to attempt an illegal driver configuration.

The -ENOTSUP uses are often returned by drivers that only partially
implement the API for a specific class of driver. This is primarily
useful for an application that catches the code and then implements an
alternative strategy.... Is this consistent with the design
philosophy, it is perhaps more appropriate to rigorously apply the
philosophy and ASSERT().

The -EBUSY uses vary. At least some instances appear to be cases
where the EBUSY code is returned by the driver API (ie not caught
internally). Thus effectively mandating that all users of that API
must check the return code (and presumably busy loop). This, at least,
superficially appears to be an anti pattern of the design philosophy.
The -EAGAIN uses also appear to be examples of design philosophy
'anti pattern'

Many of the remaining less frequent -E error codes are often used
interchangeably for the same purpose between different drivers. I
don;t have a view right now as to whether most of these fit the
philosophy or not, but at a minimum we should probably more consistent
with -E code choice.

If(f) we intend to hold to the "minimal runtime error checking"
philosophy I think it would be highly desirable to elaborate some more
detailed guidance on what to assert and what to error return. Doing so
will enable more effective review of new code coming into the tree,
and provide a basis to tidy up some of the existing code.

This quick rummage through drivers/* also suggest that for a given
API, driver implementations are inconsistent w.r.t the return codes
they pass back to the caller. I think we should extended the
include/*.h driver API documentation for each API function with a
return code to indicates what E* codes are possible in what
circumstances. Doing so will help driver users to understand the
failure scenarios they must handle (-EAGAIN and -EBUSY were a surprise
to me) and, perhaps more important right now, help device driver
writers to reason about how their driver is expected to conform.

Incidentally, I've written about drivers/* here, much of this applies
across other substantial parts of the tree...

Thoughts welcome, in particular:
- does the design philosophy still stand
- where should we document/elaborate implementation guidance
- should we tighten the include/*.h driver API documentation w.r.t
expected behaviour

Zephyr-devel mailing list
Benjamin Walsh, SMTS
WR VxWorks Virtualization Profile
Zephyr kernel maintainer

Join to automatically receive all group messages.