Re: RFC: return type of functions passed to DEVICE_INIT()


Daniel Leung <daniel.leung@...>
 

On Thu, Feb 18, 2016 at 05:54:56PM -0500, Benjamin Walsh wrote:
My take on it is that for Zephyr a failed device initialization
should be considered a fatal event. My expectation is that the
Zephyr user will only be enabling relevant (and important) devices to
their project. If one of these devices should fail, then that is a
serious system error and _NanoFatalErrorHandler() should be invoked.

If this train of thought holds up to scrutiny, and if the aim is to
save a few bytes then I would think that it would be better to have
the device initialization routines return a failure code and have
_sys_device_do_config_level() check for it and invoke the fatal error
handler upon the detection of failure. Otherwise we duplicate the
overhead of calling the fatal error handler in each device
initialization routine.
Sorry for the slow response. I agree with Peter here I think we should
be checking the return value and doing something useful with the
result. Maybe not _NanoFatalErrorHandler() but something notifying
the application that something bad happened. A given device not
initializing may not be fatal to the the whole application, just one
feature is currently unavailable.
For the kind of systems we are targeting, do we really expect the
application to handle devices not initializing correctly, being designed
so that parts are disabled if some parts of the initialization fail
(devices or others), or do we expect applications to require everything
to be present for them to function correctly ? I would have thought the
latter, but I can be convinced.
Delving into the realm of the hypothetical :-)

What about devices that have drivers in the system but are not present
(pluggable) or can't initialize because some resource external to the
device can't be contacted (network server).

The application may be able to still do useful work albeit with reduced
functionality.

Then, if the latter, do we expect the application catching the errors at
runtime when deployed or during development (basically catching software
errors mostly) not malfunctionning hardware. Here, I was thinking the
latter as well, which is why I was proposing __ASSERT() calls catching
initialization errors in debug loads only. And this fits with one of the
core values of the OS, which is small footprint.
Both models are useful for different reasons :-D

Any of those could be a valid approach I think, but we have to decide on
one. And right now, we have the worst since we return those error codes
which are meant for runtime handling, but they just go into the void.
Agreed we need to pick and stay with it for some amount of time until
we see a few real uses/applications/platforms.
OK, following your proposal below, what we could put in place is
standardizing on error codes that init routines must return if they want
the kernel init system to automatically trigger a fatal error.

Then, we could also allow configuring out the error handling if someone
needs to squeeze that last amount of space. One more Kconfig option! :)
The error handling would be enabled by default of course.
For those non-fatal errors, what should we do for runtime driver behaviors?
Should the drivers themselves fail API calls? Or should we let
device_get_binding() return NULL?

How we could/should report this type of error is an open question :-).
Brainstorming:

If we want to let the application handle the initialization issues, we
probably need some kind of queue that gets filled by the init system
when init functions return errors, and that the application drains to
see what failed. We might want to queue the associated device objects,
and have an errno field in there, or something like that.
How about having the driver return an error code saying whether the
failure is a fatal error or not. For the drivers that we have now where
we *know* that if it fails it is a hardware or configuration error
which is fatal. So we go with the _NanoFatalErrorHandler() error path.
That sounds good.

If a non-fatal error occurred (may work at next reset) just ignore it
an move on. The application can detect if the device is dead/not
present by the return codes from the driver call(s). Then the
application can decide what and how to report the error to the user.
That's another way of doing it. It's a bit less explicit than a list
of errors, but less overhead, and reuses what's already available.
-----
Daniel Leung

Join devel@lists.zephyrproject.org to automatically receive all group messages.