RFC: return type of functions passed to DEVICE_INIT()


Dirk Brandewie <dirk.j.brandewie@...>
 

On 02/24/2016 01:26 PM, Thomas, Ramesh wrote:
On Mon, 2016-02-22 at 07:39 -0800, Dirk Brandewie wrote:

On 02/19/2016 05:31 PM, Thomas, Ramesh wrote:
On Fri, 2016-02-19 at 10:05 -0800, Daniel Leung wrote:
On Fri, Feb 19, 2016 at 08:31:48AM -0800, Dirk Brandewie wrote:


On 02/18/2016 03:04 PM, Daniel Leung wrote:
On Thu, Feb 18, 2016 at 05:54:56PM -0500, Benjamin Walsh wrote:
My take on it is that for Zephyr a failed device initialization
should be considered a fatal event. My expectation is that the
Zephyr user will only be enabling relevant (and important) devices to
their project. If one of these devices should fail, then that is a
serious system error and _NanoFatalErrorHandler() should be invoked.

If this train of thought holds up to scrutiny, and if the aim is to
save a few bytes then I would think that it would be better to have
the device initialization routines return a failure code and have
_sys_device_do_config_level() check for it and invoke the fatal error
handler upon the detection of failure. Otherwise we duplicate the
overhead of calling the fatal error handler in each device
initialization routine.
Sorry for the slow response. I agree with Peter here I think we should
be checking the return value and doing something useful with the
result. Maybe not _NanoFatalErrorHandler() but something notifying
the application that something bad happened. A given device not
initializing may not be fatal to the the whole application, just one
feature is currently unavailable.
For the kind of systems we are targeting, do we really expect the
application to handle devices not initializing correctly, being designed
so that parts are disabled if some parts of the initialization fail
(devices or others), or do we expect applications to require everything
to be present for them to function correctly ? I would have thought the
latter, but I can be convinced.
Delving into the realm of the hypothetical :-)

What about devices that have drivers in the system but are not present
(pluggable) or can't initialize because some resource external to the
device can't be contacted (network server).

The application may be able to still do useful work albeit with reduced
functionality.

Then, if the latter, do we expect the application catching the errors at
runtime when deployed or during development (basically catching software
errors mostly) not malfunctionning hardware. Here, I was thinking the
latter as well, which is why I was proposing __ASSERT() calls catching
initialization errors in debug loads only. And this fits with one of the
core values of the OS, which is small footprint.
Both models are useful for different reasons :-D

Any of those could be a valid approach I think, but we have to decide on
one. And right now, we have the worst since we return those error codes
which are meant for runtime handling, but they just go into the void.
Agreed we need to pick and stay with it for some amount of time until
we see a few real uses/applications/platforms.
OK, following your proposal below, what we could put in place is
standardizing on error codes that init routines must return if they want
the kernel init system to automatically trigger a fatal error.

Then, we could also allow configuring out the error handling if someone
needs to squeeze that last amount of space. One more Kconfig option! :)
The error handling would be enabled by default of course.
For those non-fatal errors, what should we do for runtime driver behaviors?
Should the drivers themselves fail API calls? Or should we let
device_get_binding() return NULL?
How about something like this:

diff --git a/kernel/nanokernel/device.c b/kernel/nanokernel/device.c
index f86f95f..82774c4 100644
--- a/kernel/nanokernel/device.c
+++ b/kernel/nanokernel/device.c
@@ -58,7 +58,7 @@ struct device *device_get_binding(char *name)
struct device *info;

for (info = __device_init_start; info != __device_init_end; info++) {
- if (!strcmp(name, info->config->name)) {
+ if (!strcmp(name, info->config->name) && info->driver_api) {
return info;
}
}

if there is a non-fatal error the driver will not mark itself available
and we can get rid of the null checks in the API headers since the user of the
driver will never get a reference to it if init() did not complete successfully
Why not create a dedicated status field in device object?
It would take more RAM :-) Not sure exactly what you would want to
return in the status? If device init() fails there is nothing the
application code can do about it other that fail gracefully.
I do agree with you that the app cannot do much in failures. But then
kernel need not do even what you propose with failing binding.
Let the app figure out when it tries to use the port.
If get binding fails the app will know the device failed or is
otherwise unavailable. The return code from init() tells the kernel
whether it's a fatal error and to call _NanoFatalErrorHandler() or
continue initialization.

Also drivers can provide
apis for diagnosis if they need e.g. applications that can be remotely
diagnosed and fixed.


Additionally we can provide an API to query status of devices. This
will help query status of devices it does not bind to.
If you are not binding to a device why do you need the status? If
there is a good use case propose the API with the patch ;-)
The API proposal was based on your proposal using get_binding(). Binding
will not work for devices that do not have a name e.g. loapic/ioapic.
What do you need to control from the application in loapic/ioapic?


But, I guess first we need to see if there is a real need for an error
logging mechanism with kernel involvment :-) It should be noted that
some error logging can be achieved by driver api itself (without kernel
help) if the application needs.


This should work. We need to rework some of the drivers as they make
the driver_api assignment unconditionally.


----------
Daniel Leung