Dirk Brandewie <dirk.j.brandewie@...>
On 02/24/2016 01:26 PM, Thomas, Ramesh wrote: On Mon, 2016-02-22 at 07:39 -0800, Dirk Brandewie wrote:
On 02/19/2016 05:31 PM, Thomas, Ramesh wrote:
On Fri, 2016-02-19 at 10:05 -0800, Daniel Leung wrote:
On Fri, Feb 19, 2016 at 08:31:48AM -0800, Dirk Brandewie wrote:
On 02/18/2016 03:04 PM, Daniel Leung wrote:
On Thu, Feb 18, 2016 at 05:54:56PM -0500, Benjamin Walsh wrote:
My take on it is that for Zephyr a failed device initialization should be considered a fatal event. My expectation is that the Zephyr user will only be enabling relevant (and important) devices to their project. If one of these devices should fail, then that is a serious system error and _NanoFatalErrorHandler() should be invoked.
If this train of thought holds up to scrutiny, and if the aim is to save a few bytes then I would think that it would be better to have the device initialization routines return a failure code and have _sys_device_do_config_level() check for it and invoke the fatal error handler upon the detection of failure. Otherwise we duplicate the overhead of calling the fatal error handler in each device initialization routine. Sorry for the slow response. I agree with Peter here I think we should be checking the return value and doing something useful with the result. Maybe not _NanoFatalErrorHandler() but something notifying the application that something bad happened. A given device not initializing may not be fatal to the the whole application, just one feature is currently unavailable. For the kind of systems we are targeting, do we really expect the application to handle devices not initializing correctly, being designed so that parts are disabled if some parts of the initialization fail (devices or others), or do we expect applications to require everything to be present for them to function correctly ? I would have thought the latter, but I can be convinced. Delving into the realm of the hypothetical :-)
What about devices that have drivers in the system but are not present (pluggable) or can't initialize because some resource external to the device can't be contacted (network server).
The application may be able to still do useful work albeit with reduced functionality.
Then, if the latter, do we expect the application catching the errors at runtime when deployed or during development (basically catching software errors mostly) not malfunctionning hardware. Here, I was thinking the latter as well, which is why I was proposing __ASSERT() calls catching initialization errors in debug loads only. And this fits with one of the core values of the OS, which is small footprint. Both models are useful for different reasons :-D
Any of those could be a valid approach I think, but we have to decide on one. And right now, we have the worst since we return those error codes which are meant for runtime handling, but they just go into the void. Agreed we need to pick and stay with it for some amount of time until we see a few real uses/applications/platforms. OK, following your proposal below, what we could put in place is standardizing on error codes that init routines must return if they want the kernel init system to automatically trigger a fatal error.
Then, we could also allow configuring out the error handling if someone needs to squeeze that last amount of space. One more Kconfig option! :) The error handling would be enabled by default of course. For those non-fatal errors, what should we do for runtime driver behaviors? Should the drivers themselves fail API calls? Or should we let device_get_binding() return NULL?
How about something like this:
diff --git a/kernel/nanokernel/device.c b/kernel/nanokernel/device.c index f86f95f..82774c4 100644 --- a/kernel/nanokernel/device.c +++ b/kernel/nanokernel/device.c @@ -58,7 +58,7 @@ struct device *device_get_binding(char *name) struct device *info;
for (info = __device_init_start; info != __device_init_end; info++) { - if (!strcmp(name, info->config->name)) { + if (!strcmp(name, info->config->name) && info->driver_api) { return info; } }
if there is a non-fatal error the driver will not mark itself available and we can get rid of the null checks in the API headers since the user of the driver will never get a reference to it if init() did not complete successfully
Why not create a dedicated status field in device object?
It would take more RAM :-) Not sure exactly what you would want to return in the status? If device init() fails there is nothing the application code can do about it other that fail gracefully. I do agree with you that the app cannot do much in failures. But then kernel need not do even what you propose with failing binding. Let the app figure out when it tries to use the port. If get binding fails the app will know the device failed or is otherwise unavailable. The return code from init() tells the kernel whether it's a fatal error and to call _NanoFatalErrorHandler() or continue initialization. Also drivers can provide apis for diagnosis if they need e.g. applications that can be remotely diagnosed and fixed.
Additionally we can provide an API to query status of devices. This will help query status of devices it does not bind to.
If you are not binding to a device why do you need the status? If there is a good use case propose the API with the patch ;-) The API proposal was based on your proposal using get_binding(). Binding will not work for devices that do not have a name e.g. loapic/ioapic.
What do you need to control from the application in loapic/ioapic? But, I guess first we need to see if there is a real need for an error logging mechanism with kernel involvment :-) It should be noted that some error logging can be achieved by driver api itself (without kernel help) if the application needs.
This should work. We need to rework some of the drivers as they make the driver_api assignment unconditionally.
---------- Daniel Leung
|