Hi Joakim, On Fri, 2016-02-19 at 09:04 +0100, Joakim Eriksson wrote: Hi Jukka!
On 19 Feb 2016, at 08:28, Jukka Rissanen <jukka.rissanen(a)linux.inte l.com> wrote:
Hi Joakim,
On Thu, 2016-02-18 at 16:48 +0000, Joakim Eriksson wrote:
Hello!
I just cloned the Zephyr repository and saw that the 6LoWPAN stack was the same as the one I am used to - e,g, the Contiki 6LoWPAN stack. How are you planning to develop it further? Are you going to sync it from Contiki or would you like Contiki developers to post pull-requests to fixes also to Zephyr? The plan is to sync it from Contiki when applicable. If you look the code, we needed to make lot of changes to uIP stack in order to make it work in Zephyr. The biggest change is that the IP stack is now re- entrant e.g., there is no global buffer for network packets. Because of this it is not so simple to just pull changes from Contiki as manual work needs to be done anyway.
Ok, yes - I expected that there would be quite som changes. But I guess that they will be the same changes at each sync! I will take a look at the codebase and see how much differences you have.
I guess if Zephyr is ported to more platforms (some of the ones we have in Contiki) maybe that would make developers do pull requests to both OS:es.
Quick question - what size of the compiled code did you get on the IP stack when you moved from one buffer to re-entrant / multi buffers? I have not measured this. At the moment the IP stack is configured to have separate TX and RX buffers. So in minimal configuration there would be one TX and one RX buffer vs. in Contiki there would be only one buffer that is used for both TX and RX. User can also define at compile time how many network buffers he wants.
Anyway - congratulations - it looks like an exciting project and I will be happy contribute to it!
-- Joakim Eriksson, SICS Thanks, and patches are welcome! I will take a look and see if some of our work would be good to get in! (we have LWM2M and other things fairly recently put into Contiki).
If have ported the lwm2m code to zephyr but it is not yet tested so I have not sent it to review. Cheers, Jukka
|
|
Re: RFC: return type of functions passed to DEVICE_INIT()
Benjamin Walsh <benjamin.walsh@...>
On Thu, Feb 18, 2016 at 10:49:01PM +0000, Thomas, Ramesh wrote: On Thu, 2016-02-18 at 12:34 -0800, Dirk Brandewie wrote:
On 02/18/2016 10:03 AM, Benjamin Walsh wrote:
My take on it is that for Zephyr a failed device initialization should be considered a fatal event. My expectation is that the Zephyr user will only be enabling relevant (and important) devices to their project. If one of these devices should fail, then that is a serious system error and _NanoFatalErrorHandler() should be invoked.
How about a CONFIG option to determine if kernel should abort boot? The
What I was proposing in my last reply, too keep things simple, is one Kconfig option to enable/disable handling of error codes by init functions. If it's disabled, nothing is handled automatically and it relies on everything working (when deployed) and __ASSERTs (when debugging). If it's enabled, some runtime handling is already needed, so you can piggyback on it. One way is to standardize on a specific error code to be returned by init functions to tell the kernel to abort boot. No need for a Kconfig option for that case. behavior should be determined by the kind of application. Some can live with less features while for others it is all or nothing. In both cases a logging feature that can be queried by the app would be useful.
Any of those could be a valid approach I think, but we have to decide on one. And right now, we have the worst since we return those error codes which are meant for runtime handling, but they just go into the void.
+1 _sys_device_do_config_level() should at least pass on the error returned even if one device fails. This as well as the "resume_all/suspend_all" functions can be called by app in power management case. It would be useful for the app to know that something went wrong.
Question still remains whether _sys_device_do_config_level() should continue with other devices if one fails. IMO it should continue but should return error if one device had failed. At boot time, kernel can save the error condition somewhere to be queried by the app. The thing is that you can have several failures, one per-device object basically. That's why I was proposing saving the errors in some field in the device object and queuing them to be retrieved by the application. Brainstorming:
If we want to let the application handle the initialization issues, we probably need some kind of queue that gets filled by the init system when init functions return errors, and that the application drains to see what failed. We might want to queue the associated device objects, and have an errno field in there, or something like that. The app would be anyway getting a failure when it tries to use the device. It most probably cannot rectify any device error on the fly with more detailed info. Just my thought
A simple status from _sys_device_do_config_level() is probably enough so the app can decide whether to continue or abort.
This still has to be saved somewhere that can be retrieved by the app. The kernel does not pass anything to main() currently, and in the microkernel, there isn't necessarily a main() either.
|
|
Re: RFC: make _fiber_start() return a handle on the fiber
Benjamin Walsh <benjamin.walsh@...>
When we start a fiber via the _fiber_start() API family, we don't get back a handle on the created fiber. The fiber identifier is actually the start of the fiber's stack. This hasn't been a problem until now since no API requires a handle on the fiber, except one, fiber_delayed_start_cancel(): that API is part of a pair, where the other API, fiber_delayed_start() starts the fiber and returns a handle.
However, Jukka asked me an API could be created that cancels a fiber_sleep() call, something like fiber_wakeup(). The implementation of such an API is very simple, but it requires a handle on the fiber we want to wake up. This in turn requires the signature of the _fiber_start() family to return a handle to the fiber that gets started.
The signature of _fiber_start() et al. would then change from a void return type to a void * return type.
Objections, comments, etc ? No comments -> no objects -> perhaps we can continue with this route then?
Looks like it. You want to do the implementation yourself Jukka ?
|
|
Re: RFC: make _fiber_start() return a handle on the fiber
On Feb 16, 2016, at 11:35, Benjamin Walsh <benjamin.walsh(a)windriver.com> wrote:
Folks,
When we start a fiber via the _fiber_start() API family, we don't get back a handle on the created fiber. The fiber identifier is actually the start of the fiber's stack. This hasn't been a problem until now since no API requires a handle on the fiber, except one, fiber_delayed_start_cancel(): that API is part of a pair, where the other API, fiber_delayed_start() starts the fiber and returns a handle.
However, Jukka asked me an API could be created that cancels a fiber_sleep() call, something like fiber_wakeup(). The implementation of such an API is very simple, but it requires a handle on the fiber we want to wake up. This in turn requires the signature of the _fiber_start() family to return a handle to the fiber that gets started.
The signature of _fiber_start() et al. would then change from a void return type to a void * return type.
Objections, comments, etc ?
Sounds good, but we need to do it in away that keeps APIs compatible I guess. Anas Cheers, Ben
-- Benjamin Walsh, SMTS Wind River Rocket Zephyr kernel maintainer www.windriver.com
|
|
Re: How to setup git-review
Thanks for documenting this but I don't think forcing people to use git-review is the right approach. It worked for us before in many other projects, why is this different and why can't I use vanilla git to gerrit interaction?
What is the issue here exactly?
Anas
toggle quoted messageShow quoted text
On Feb 18, 2016, at 14:05, Andre Guedes <andre.guedes(a)intel.com> wrote:
Hi all,
I was having some trouble to submit a change to Gerrit which depends on someone else's change. If I use the 'git push' command as described in [1], I get the following error:
remote: ERROR: In commit <commit id> remote: ERROR: author email address <someone(a)mail.com> remote: ERROR: does not match your user account. remote: ERROR: remote: ERROR: The following addresses are currently registered: remote: ERROR: <your email> remote: ERROR: remote: ERROR: To register an email address, please visit: remote: ERROR: https://gerrit.zephyrproject.org/r/#/settings/contact
Talking to Andrew Grimberg from Linux Foundation, he recommended using git-review and it worked around the issue.
Here are the instructions I followed to setup git-review:
$ cat > .gitreview [gerrit] host=gerrit.zephyrproject.org port=29418 project=zephyr.git
$ git-review -s
To submit your patch(es) to review: $ git-review -T
I hope this can save you time in case you run into a similar problem ;)
BTW, Andrew submitted a patch adding the .gitreview file so we don't have to keep it locally.
Regards,
Andre
[1] https://www.zephyrproject.org/doc/collaboration/code/gerrit.html#gerrit [2] https://gerrit.zephyrproject.org/r/#/c/390/
|
|
Re: RFC: make _fiber_start() return a handle on the fiber
Dirk Brandewie <dirk.j.brandewie@...>
On 02/16/2016 11:34 AM, Benjamin Walsh wrote: Folks,
When we start a fiber via the _fiber_start() API family, we don't get back a handle on the created fiber. The fiber identifier is actually the start of the fiber's stack. This hasn't been a problem until now since no API requires a handle on the fiber, except one, fiber_delayed_start_cancel(): that API is part of a pair, where the other API, fiber_delayed_start() starts the fiber and returns a handle.
However, Jukka asked me an API could be created that cancels a fiber_sleep() call, something like fiber_wakeup(). The implementation of such an API is very simple, but it requires a handle on the fiber we want to wake up. This in turn requires the signature of the _fiber_start() family to return a handle to the fiber that gets started.
The signature of _fiber_start() et al. would then change from a void return type to a void * return type.
Makes sense but why not tell the compiler the truth about the type? Objections, comments, etc ?
Cheers, Ben
|
|
Re: RFC: return type of functions passed to DEVICE_INIT()
Dirk Brandewie <dirk.j.brandewie@...>
On 02/18/2016 03:04 PM, Daniel Leung wrote: On Thu, Feb 18, 2016 at 05:54:56PM -0500, Benjamin Walsh wrote:
My take on it is that for Zephyr a failed device initialization should be considered a fatal event. My expectation is that the Zephyr user will only be enabling relevant (and important) devices to their project. If one of these devices should fail, then that is a serious system error and _NanoFatalErrorHandler() should be invoked.
If this train of thought holds up to scrutiny, and if the aim is to save a few bytes then I would think that it would be better to have the device initialization routines return a failure code and have _sys_device_do_config_level() check for it and invoke the fatal error handler upon the detection of failure. Otherwise we duplicate the overhead of calling the fatal error handler in each device initialization routine. Sorry for the slow response. I agree with Peter here I think we should be checking the return value and doing something useful with the result. Maybe not _NanoFatalErrorHandler() but something notifying the application that something bad happened. A given device not initializing may not be fatal to the the whole application, just one feature is currently unavailable. For the kind of systems we are targeting, do we really expect the application to handle devices not initializing correctly, being designed so that parts are disabled if some parts of the initialization fail (devices or others), or do we expect applications to require everything to be present for them to function correctly ? I would have thought the latter, but I can be convinced. Delving into the realm of the hypothetical :-)
What about devices that have drivers in the system but are not present (pluggable) or can't initialize because some resource external to the device can't be contacted (network server).
The application may be able to still do useful work albeit with reduced functionality.
Then, if the latter, do we expect the application catching the errors at runtime when deployed or during development (basically catching software errors mostly) not malfunctionning hardware. Here, I was thinking the latter as well, which is why I was proposing __ASSERT() calls catching initialization errors in debug loads only. And this fits with one of the core values of the OS, which is small footprint. Both models are useful for different reasons :-D
Any of those could be a valid approach I think, but we have to decide on one. And right now, we have the worst since we return those error codes which are meant for runtime handling, but they just go into the void. Agreed we need to pick and stay with it for some amount of time until we see a few real uses/applications/platforms. OK, following your proposal below, what we could put in place is standardizing on error codes that init routines must return if they want the kernel init system to automatically trigger a fatal error.
Then, we could also allow configuring out the error handling if someone needs to squeeze that last amount of space. One more Kconfig option! :) The error handling would be enabled by default of course. For those non-fatal errors, what should we do for runtime driver behaviors? Should the drivers themselves fail API calls? Or should we let device_get_binding() return NULL?
How about something like this: diff --git a/kernel/nanokernel/device.c b/kernel/nanokernel/device.c index f86f95f..82774c4 100644 --- a/kernel/nanokernel/device.c +++ b/kernel/nanokernel/device.c @@ -58,7 +58,7 @@ struct device *device_get_binding(char *name) struct device *info; for (info = __device_init_start; info != __device_init_end; info++) { - if (!strcmp(name, info->config->name)) { + if (!strcmp(name, info->config->name) && info->driver_api) { return info; } } if there is a non-fatal error the driver will not mark itself available and we can get rid of the null checks in the API headers since the user of the driver will never get a reference to it if init() did not complete successfully How we could/should report this type of error is an open question :-). Brainstorming:
If we want to let the application handle the initialization issues, we probably need some kind of queue that gets filled by the init system when init functions return errors, and that the application drains to see what failed. We might want to queue the associated device objects, and have an errno field in there, or something like that. How about having the driver return an error code saying whether the failure is a fatal error or not. For the drivers that we have now where we *know* that if it fails it is a hardware or configuration error which is fatal. So we go with the _NanoFatalErrorHandler() error path. That sounds good.
If a non-fatal error occurred (may work at next reset) just ignore it an move on. The application can detect if the device is dead/not present by the return codes from the driver call(s). Then the application can decide what and how to report the error to the user. That's another way of doing it. It's a bit less explicit than a list of errors, but less overhead, and reuses what's already available. ----- Daniel Leung
|
|
Re: How to setup git-review
Andre Guedes <andre.guedes@...>
Hi Anas, Quoting Nashif, Anas (2016-02-19 14:07:36) Thanks for documenting this but I don't think forcing people to use git-review is the right approach. Yes, I agree. It worked for us before in many other projects, why is this different and why can't I use vanilla git to gerrit interaction?
What is the issue here exactly? I'm not a gerrit expert but from I could understand, this is related to the 'forge author' settings from submitter rights. I'm CCing Andrew so he might add more details about it. Regards, Andre
|
|
Re: RFC: make _fiber_start() return a handle on the fiber
Benjamin Walsh <benjamin.walsh@...>
On Fri, Feb 19, 2016 at 08:12:04AM -0800, Dirk Brandewie wrote:
On 02/16/2016 11:34 AM, Benjamin Walsh wrote:
Folks,
When we start a fiber via the _fiber_start() API family, we don't get back a handle on the created fiber. The fiber identifier is actually the start of the fiber's stack. This hasn't been a problem until now since no API requires a handle on the fiber, except one, fiber_delayed_start_cancel(): that API is part of a pair, where the other API, fiber_delayed_start() starts the fiber and returns a handle.
However, Jukka asked me an API could be created that cancels a fiber_sleep() call, something like fiber_wakeup(). The implementation of such an API is very simple, but it requires a handle on the fiber we want to wake up. This in turn requires the signature of the _fiber_start() family to return a handle to the fiber that gets started.
The signature of _fiber_start() et al. would then change from a void return type to a void * return type.
Makes sense but why not tell the compiler the truth about the type?
Ugh, you're right, it's a nano_thread_id_t.
|
|
Re: RFC: make _fiber_start() return a handle on the fiber
Benjamin Walsh <benjamin.walsh@...>
On Fri, Feb 19, 2016 at 11:07:33AM -0500, Nashif, Anas wrote:
On Feb 16, 2016, at 11:35, Benjamin Walsh <benjamin.walsh(a)windriver.com> wrote:
Folks,
When we start a fiber via the _fiber_start() API family, we don't get back a handle on the created fiber. The fiber identifier is actually the start of the fiber's stack. This hasn't been a problem until now since no API requires a handle on the fiber, except one, fiber_delayed_start_cancel(): that API is part of a pair, where the other API, fiber_delayed_start() starts the fiber and returns a handle.
However, Jukka asked me an API could be created that cancels a fiber_sleep() call, something like fiber_wakeup(). The implementation of such an API is very simple, but it requires a handle on the fiber we want to wake up. This in turn requires the signature of the _fiber_start() family to return a handle to the fiber that gets started.
The signature of _fiber_start() et al. would then change from a void return type to a void * return type.
Objections, comments, etc ?
Sounds good, but we need to do it in away that keeps APIs compatible I guess.
We're just returning a thread ID when we start a fiber now, you can ignore it. I don't see an issue here...
|
|
Re: RFC: make _fiber_start() return a handle on the fiber
On 19/02/2016, 08:40, "Benjamin Walsh" <benjamin.walsh(a)windriver.com> wrote: On Fri, Feb 19, 2016 at 11:07:33AM -0500, Nashif, Anas wrote:
On Feb 16, 2016, at 11:35, Benjamin Walsh <benjamin.walsh(a)windriver.com> wrote:
Folks,
When we start a fiber via the _fiber_start() API family, we don't get back a handle on the created fiber. The fiber identifier is actually the start of the fiber's stack. This hasn't been a problem until now since no API requires a handle on the fiber, except one, fiber_delayed_start_cancel(): that API is part of a pair, where the other API, fiber_delayed_start() starts the fiber and returns a handle.
However, Jukka asked me an API could be created that cancels a fiber_sleep() call, something like fiber_wakeup(). The implementation of such an API is very simple, but it requires a handle on the fiber we want to wake up. This in turn requires the signature of the _fiber_start() family to return a handle to the fiber that gets started.
The signature of _fiber_start() et al. would then change from a void return type to a void * return type.
Objections, comments, etc ?
Sounds good, but we need to do it in away that keeps APIs compatible I guess. We're just returning a thread ID when we start a fiber now, you can ignore it. I don't see an issue here... You are right, this will not break existing code. Nevertheless, we should track such API changes and document them in release notes. Anas
|
|
Re: RFC: make _fiber_start() return a handle on the fiber
Benjamin Walsh <benjamin.walsh@...>
When we start a fiber via the _fiber_start() API family, we don't get back a handle on the created fiber. The fiber identifier is actually the start of the fiber's stack. This hasn't been a problem until now since no API requires a handle on the fiber, except one, fiber_delayed_start_cancel(): that API is part of a pair, where the other API, fiber_delayed_start() starts the fiber and returns a handle.
However, Jukka asked me an API could be created that cancels a fiber_sleep() call, something like fiber_wakeup(). The implementation of such an API is very simple, but it requires a handle on the fiber we want to wake up. This in turn requires the signature of the _fiber_start() family to return a handle to the fiber that gets started.
The signature of _fiber_start() et al. would then change from a void return type to a void * return type.
Objections, comments, etc ?
Sounds good, but we need to do it in away that keeps APIs compatible I guess. We're just returning a thread ID when we start a fiber now, you can ignore it. I don't see an issue here... You are right, this will not break existing code. Nevertheless, we should track such API changes and document them in release notes.
Sure.
|
|
Re: RFC: return type of functions passed to DEVICE_INIT()
Daniel Leung <daniel.leung@...>
On Fri, Feb 19, 2016 at 08:31:48AM -0800, Dirk Brandewie wrote:
On 02/18/2016 03:04 PM, Daniel Leung wrote:
On Thu, Feb 18, 2016 at 05:54:56PM -0500, Benjamin Walsh wrote:
My take on it is that for Zephyr a failed device initialization should be considered a fatal event. My expectation is that the Zephyr user will only be enabling relevant (and important) devices to their project. If one of these devices should fail, then that is a serious system error and _NanoFatalErrorHandler() should be invoked.
If this train of thought holds up to scrutiny, and if the aim is to save a few bytes then I would think that it would be better to have the device initialization routines return a failure code and have _sys_device_do_config_level() check for it and invoke the fatal error handler upon the detection of failure. Otherwise we duplicate the overhead of calling the fatal error handler in each device initialization routine. Sorry for the slow response. I agree with Peter here I think we should be checking the return value and doing something useful with the result. Maybe not _NanoFatalErrorHandler() but something notifying the application that something bad happened. A given device not initializing may not be fatal to the the whole application, just one feature is currently unavailable. For the kind of systems we are targeting, do we really expect the application to handle devices not initializing correctly, being designed so that parts are disabled if some parts of the initialization fail (devices or others), or do we expect applications to require everything to be present for them to function correctly ? I would have thought the latter, but I can be convinced. Delving into the realm of the hypothetical :-)
What about devices that have drivers in the system but are not present (pluggable) or can't initialize because some resource external to the device can't be contacted (network server).
The application may be able to still do useful work albeit with reduced functionality.
Then, if the latter, do we expect the application catching the errors at runtime when deployed or during development (basically catching software errors mostly) not malfunctionning hardware. Here, I was thinking the latter as well, which is why I was proposing __ASSERT() calls catching initialization errors in debug loads only. And this fits with one of the core values of the OS, which is small footprint. Both models are useful for different reasons :-D
Any of those could be a valid approach I think, but we have to decide on one. And right now, we have the worst since we return those error codes which are meant for runtime handling, but they just go into the void. Agreed we need to pick and stay with it for some amount of time until we see a few real uses/applications/platforms. OK, following your proposal below, what we could put in place is standardizing on error codes that init routines must return if they want the kernel init system to automatically trigger a fatal error.
Then, we could also allow configuring out the error handling if someone needs to squeeze that last amount of space. One more Kconfig option! :) The error handling would be enabled by default of course. For those non-fatal errors, what should we do for runtime driver behaviors? Should the drivers themselves fail API calls? Or should we let device_get_binding() return NULL?
How about something like this:
diff --git a/kernel/nanokernel/device.c b/kernel/nanokernel/device.c index f86f95f..82774c4 100644 --- a/kernel/nanokernel/device.c +++ b/kernel/nanokernel/device.c @@ -58,7 +58,7 @@ struct device *device_get_binding(char *name) struct device *info; for (info = __device_init_start; info != __device_init_end; info++) { - if (!strcmp(name, info->config->name)) { + if (!strcmp(name, info->config->name) && info->driver_api) { return info; } }
if there is a non-fatal error the driver will not mark itself available and we can get rid of the null checks in the API headers since the user of the driver will never get a reference to it if init() did not complete successfully
This should work. We need to rework some of the drivers as they make the driver_api assignment unconditionally. ---------- Daniel Leung
|
|
issue setup galileo board
Hi, I had set up all the dev libaries, SDK and downloaded required files ( https://www.zephyrproject.org/doc/board/galileo.html ). but under $ZEPHYR_BASE directory when I did make BOARD=galileo it showed Makefile:11: *** Invoking make from top-level kernel directory is not supported. Stop. Anyone please let me know where the problem is? Regards Anupam
|
|
issue setup galileo board
Hi, I had set up all the dev libaries, SDK and downloaded required files ( https://www.zephyrproject.org/doc/board/galileo.html ). but under $ZEPHYR_BASE directory when I did make BOARD=galileo it showed Makefile:11: *** Invoking make from top-level kernel directory is not supported. Stop. Anyone please let me know where the problem is? Regards
|
|
Hi, I had set up all the dev libaries, SDK and downloaded required files ( https://www.zephyrproject.org/doc/board/galileo.html ). but under $ZEPHYR_BASE directory when I did make BOARD=galileo it showed Makefile:11: *** Invoking make from top-level kernel directory is not supported. Stop. Anyone please let me know where the problem is?
|
|
Re: galileo building problem
Hi, You need to go into an application directory and build in the application directory, for example:
$ cd samples/hello_world/nanokernel $ make BOARD=galileo
Regards, Anas
toggle quoted messageShow quoted text
On 19/02/2016, 13:55, "Anupam Datta" <adbd04(a)gmail.com> wrote: Hi, I had set up all the dev libaries, SDK and downloaded required files ( https://www.zephyrproject.org/doc/board/galileo.html ). but under $ZEPHYR_BASE directory when I did make BOARD=galileo it showed Makefile:11: *** Invoking make from top-level kernel directory is not supported. Stop. Anyone please let me know where the problem is?
|
|
Re: RFC: return type of functions passed to DEVICE_INIT()
On Fri, 2016-02-19 at 10:31 -0500, Benjamin Walsh wrote: On Thu, Feb 18, 2016 at 10:49:01PM +0000, Thomas, Ramesh wrote:
On Thu, 2016-02-18 at 12:34 -0800, Dirk Brandewie wrote:
On 02/18/2016 10:03 AM, Benjamin Walsh wrote:
My take on it is that for Zephyr a failed device initialization should be considered a fatal event. My expectation is that the Zephyr user will only be enabling relevant (and important) devices to their project. If one of these devices should fail, then that is a serious system error and _NanoFatalErrorHandler() should be invoked.
How about a CONFIG option to determine if kernel should abort boot? The What I was proposing in my last reply, too keep things simple, is one Kconfig option to enable/disable handling of error codes by init functions. If it's disabled, nothing is handled automatically and it relies on everything working (when deployed) and __ASSERTs (when debugging).
If it's enabled, some runtime handling is already needed, so you can piggyback on it. One way is to standardize on a specific error code to be returned by init functions to tell the kernel to abort boot. No need for a Kconfig option for that case. That should be fine. So the kernel decision to abort will be based on a "fatal" error return from any of the device's init function. We still need to decide what does "abort" mean for kernel. It could mean - 1. Break out of _sys_device_do_config_level() when one device init fails and continue boot. App takes necessary action when it finds device inits had failed. 2. Kernel continues with all device inits and boot. Basically ignores the error returned but devices have them logged so app will know when it tries to use them. 3. Kernel itself aborts boot. I prefer #2 or #1 (in that order) because that gives app flexibility of dealing with the situation. One more thing to note is that app will not get binding for all required devices e.g. ioapic, loapic, sys_clock etc. So it might not know if they failed unless it queries all devices explicitly which is unlikely unless we provide some API for that. IMO these should never fail and we need not consider any kernel action for this (so we can still stay away from #3 above). Just wanted to bring this up as app's ability to know device failure is being considered as part of the design.
behavior should be determined by the kind of application. Some can live with less features while for others it is all or nothing. In both cases a logging feature that can be queried by the app would be useful.
Any of those could be a valid approach I think, but we have to decide on one. And right now, we have the worst since we return those error codes which are meant for runtime handling, but they just go into the void.
+1 _sys_device_do_config_level() should at least pass on the error returned even if one device fails. This as well as the "resume_all/suspend_all" functions can be called by app in power management case. It would be useful for the app to know that something went wrong.
Question still remains whether _sys_device_do_config_level() should continue with other devices if one fails. IMO it should continue but should return error if one device had failed. At boot time, kernel can save the error condition somewhere to be queried by the app. The thing is that you can have several failures, one per-device object basically. That's why I was proposing saving the errors in some field in the device object and queuing them to be retrieved by the application.
Storing status in device object sounds good. I was considering the possible use case of PM app calling _sys_device_do_config_level(). In that case a return value of 0 or 1 would help it know some thing went wrong instead of having to query all devices to find out. But I think this does not impact the main topic of your RFC and can be addressed separately if necessary.
Brainstorming:
If we want to let the application handle the initialization issues, we probably need some kind of queue that gets filled by the init system when init functions return errors, and that the application drains to see what failed. We might want to queue the associated device objects, and have an errno field in there, or something like that. The app would be anyway getting a failure when it tries to use the device. It most probably cannot rectify any device error on the fly with more detailed info. Just my thought
A simple status from _sys_device_do_config_level() is probably enough so the app can decide whether to continue or abort. This still has to be saved somewhere that can be retrieved by the app. The kernel does not pass anything to main() currently, and in the microkernel, there isn't necessarily a main() either.
Yes, passing in main() or task entry points are messy. Storing somewhere for app to query is better. May be even provide an API to help in the query if that helps.
|
|
Re: RFC: return type of functions passed to DEVICE_INIT()
On Fri, 2016-02-19 at 10:05 -0800, Daniel Leung wrote: On Fri, Feb 19, 2016 at 08:31:48AM -0800, Dirk Brandewie wrote:
On 02/18/2016 03:04 PM, Daniel Leung wrote:
On Thu, Feb 18, 2016 at 05:54:56PM -0500, Benjamin Walsh wrote:
My take on it is that for Zephyr a failed device initialization should be considered a fatal event. My expectation is that the Zephyr user will only be enabling relevant (and important) devices to their project. If one of these devices should fail, then that is a serious system error and _NanoFatalErrorHandler() should be invoked.
If this train of thought holds up to scrutiny, and if the aim is to save a few bytes then I would think that it would be better to have the device initialization routines return a failure code and have _sys_device_do_config_level() check for it and invoke the fatal error handler upon the detection of failure. Otherwise we duplicate the overhead of calling the fatal error handler in each device initialization routine. Sorry for the slow response. I agree with Peter here I think we should be checking the return value and doing something useful with the result. Maybe not _NanoFatalErrorHandler() but something notifying the application that something bad happened. A given device not initializing may not be fatal to the the whole application, just one feature is currently unavailable. For the kind of systems we are targeting, do we really expect the application to handle devices not initializing correctly, being designed so that parts are disabled if some parts of the initialization fail (devices or others), or do we expect applications to require everything to be present for them to function correctly ? I would have thought the latter, but I can be convinced. Delving into the realm of the hypothetical :-)
What about devices that have drivers in the system but are not present (pluggable) or can't initialize because some resource external to the device can't be contacted (network server).
The application may be able to still do useful work albeit with reduced functionality.
Then, if the latter, do we expect the application catching the errors at runtime when deployed or during development (basically catching software errors mostly) not malfunctionning hardware. Here, I was thinking the latter as well, which is why I was proposing __ASSERT() calls catching initialization errors in debug loads only. And this fits with one of the core values of the OS, which is small footprint. Both models are useful for different reasons :-D
Any of those could be a valid approach I think, but we have to decide on one. And right now, we have the worst since we return those error codes which are meant for runtime handling, but they just go into the void. Agreed we need to pick and stay with it for some amount of time until we see a few real uses/applications/platforms. OK, following your proposal below, what we could put in place is standardizing on error codes that init routines must return if they want the kernel init system to automatically trigger a fatal error.
Then, we could also allow configuring out the error handling if someone needs to squeeze that last amount of space. One more Kconfig option! :) The error handling would be enabled by default of course. For those non-fatal errors, what should we do for runtime driver behaviors? Should the drivers themselves fail API calls? Or should we let device_get_binding() return NULL?
How about something like this:
diff --git a/kernel/nanokernel/device.c b/kernel/nanokernel/device.c index f86f95f..82774c4 100644 --- a/kernel/nanokernel/device.c +++ b/kernel/nanokernel/device.c @@ -58,7 +58,7 @@ struct device *device_get_binding(char *name) struct device *info; for (info = __device_init_start; info != __device_init_end; info++) { - if (!strcmp(name, info->config->name)) { + if (!strcmp(name, info->config->name) && info->driver_api) { return info; } }
if there is a non-fatal error the driver will not mark itself available and we can get rid of the null checks in the API headers since the user of the driver will never get a reference to it if init() did not complete successfully
Why not create a dedicated status field in device object? Additionally we can provide an API to query status of devices. This will help query status of devices it does not bind to. This should work. We need to rework some of the drivers as they make the driver_api assignment unconditionally.
---------- Daniel Leung
|
|
Source the project environment file from zsh
Oliver Hahm <oliver.hahm@...>
Dear Zephyr development team, following the instructions in your Getting Started Guide [1], I tried to run source zephyr-env.sh in my zsh 5.2, but get: zephyr-env.sh:2: = not found With my limited knowledge about shell the syntax of this script seems indeed not to be POSIX shell compatible, but rather resemble bash syntax. Cheers, Oleg P.S. Manually copying&pasting the last lines of the script seems to work though. [1] https://www.zephyrproject.org/doc/getting_started/installation_linux.html#environment-variables
|
|