Date   
Re: RFC: return type of functions passed to DEVICE_INIT()

Dirk Brandewie <dirk.j.brandewie@...>
 

On 02/18/2016 03:04 PM, Daniel Leung wrote:
On Thu, Feb 18, 2016 at 05:54:56PM -0500, Benjamin Walsh wrote:
My take on it is that for Zephyr a failed device initialization
should be considered a fatal event. My expectation is that the
Zephyr user will only be enabling relevant (and important) devices to
their project. If one of these devices should fail, then that is a
serious system error and _NanoFatalErrorHandler() should be invoked.

If this train of thought holds up to scrutiny, and if the aim is to
save a few bytes then I would think that it would be better to have
the device initialization routines return a failure code and have
_sys_device_do_config_level() check for it and invoke the fatal error
handler upon the detection of failure. Otherwise we duplicate the
overhead of calling the fatal error handler in each device
initialization routine.
Sorry for the slow response. I agree with Peter here I think we should
be checking the return value and doing something useful with the
result. Maybe not _NanoFatalErrorHandler() but something notifying
the application that something bad happened. A given device not
initializing may not be fatal to the the whole application, just one
feature is currently unavailable.
For the kind of systems we are targeting, do we really expect the
application to handle devices not initializing correctly, being designed
so that parts are disabled if some parts of the initialization fail
(devices or others), or do we expect applications to require everything
to be present for them to function correctly ? I would have thought the
latter, but I can be convinced.
Delving into the realm of the hypothetical :-)

What about devices that have drivers in the system but are not present
(pluggable) or can't initialize because some resource external to the
device can't be contacted (network server).

The application may be able to still do useful work albeit with reduced
functionality.

Then, if the latter, do we expect the application catching the errors at
runtime when deployed or during development (basically catching software
errors mostly) not malfunctionning hardware. Here, I was thinking the
latter as well, which is why I was proposing __ASSERT() calls catching
initialization errors in debug loads only. And this fits with one of the
core values of the OS, which is small footprint.
Both models are useful for different reasons :-D

Any of those could be a valid approach I think, but we have to decide on
one. And right now, we have the worst since we return those error codes
which are meant for runtime handling, but they just go into the void.
Agreed we need to pick and stay with it for some amount of time until
we see a few real uses/applications/platforms.
OK, following your proposal below, what we could put in place is
standardizing on error codes that init routines must return if they want
the kernel init system to automatically trigger a fatal error.

Then, we could also allow configuring out the error handling if someone
needs to squeeze that last amount of space. One more Kconfig option! :)
The error handling would be enabled by default of course.
For those non-fatal errors, what should we do for runtime driver behaviors?
Should the drivers themselves fail API calls? Or should we let
device_get_binding() return NULL?
How about something like this:

diff --git a/kernel/nanokernel/device.c b/kernel/nanokernel/device.c
index f86f95f..82774c4 100644
--- a/kernel/nanokernel/device.c
+++ b/kernel/nanokernel/device.c
@@ -58,7 +58,7 @@ struct device *device_get_binding(char *name)
struct device *info;

for (info = __device_init_start; info != __device_init_end; info++) {
- if (!strcmp(name, info->config->name)) {
+ if (!strcmp(name, info->config->name) && info->driver_api) {
return info;
}
}

if there is a non-fatal error the driver will not mark itself available
and we can get rid of the null checks in the API headers since the user of the
driver will never get a reference to it if init() did not complete successfully


How we could/should report this type of error is an open question :-).
Brainstorming:

If we want to let the application handle the initialization issues, we
probably need some kind of queue that gets filled by the init system
when init functions return errors, and that the application drains to
see what failed. We might want to queue the associated device objects,
and have an errno field in there, or something like that.
How about having the driver return an error code saying whether the
failure is a fatal error or not. For the drivers that we have now where
we *know* that if it fails it is a hardware or configuration error
which is fatal. So we go with the _NanoFatalErrorHandler() error path.
That sounds good.

If a non-fatal error occurred (may work at next reset) just ignore it
an move on. The application can detect if the device is dead/not
present by the return codes from the driver call(s). Then the
application can decide what and how to report the error to the user.
That's another way of doing it. It's a bit less explicit than a list
of errors, but less overhead, and reuses what's already available.
-----
Daniel Leung

Re: RFC: make _fiber_start() return a handle on the fiber

Dirk Brandewie <dirk.j.brandewie@...>
 

On 02/16/2016 11:34 AM, Benjamin Walsh wrote:
Folks,

When we start a fiber via the _fiber_start() API family, we don't get
back a handle on the created fiber. The fiber identifier is actually the
start of the fiber's stack. This hasn't been a problem until now since
no API requires a handle on the fiber, except one,
fiber_delayed_start_cancel(): that API is part of a pair, where the
other API, fiber_delayed_start() starts the fiber and returns a handle.

However, Jukka asked me an API could be created that cancels a
fiber_sleep() call, something like fiber_wakeup(). The implementation of
such an API is very simple, but it requires a handle on the fiber we
want to wake up. This in turn requires the signature of the
_fiber_start() family to return a handle to the fiber that gets started.

The signature of _fiber_start() et al. would then change from a void
return type to a void * return type.
Makes sense but why not tell the compiler the truth about the type?

Objections, comments, etc ?

Cheers,
Ben

Re: How to setup git-review

Nashif, Anas
 

Thanks for documenting this but I don't think forcing people to use git-review is the right approach. It worked for us before in many other projects, why is this different and why can't I use vanilla git to gerrit interaction?

What is the issue here exactly?

Anas

On Feb 18, 2016, at 14:05, Andre Guedes <andre.guedes(a)intel.com> wrote:

Hi all,

I was having some trouble to submit a change to Gerrit which depends on
someone else's change. If I use the 'git push' command as described in [1],
I get the following error:

remote: ERROR: In commit <commit id>
remote: ERROR: author email address <someone(a)mail.com>
remote: ERROR: does not match your user account.
remote: ERROR:
remote: ERROR: The following addresses are currently registered:
remote: ERROR: <your email>
remote: ERROR:
remote: ERROR: To register an email address, please visit:
remote: ERROR: https://gerrit.zephyrproject.org/r/#/settings/contact

Talking to Andrew Grimberg from Linux Foundation, he recommended using
git-review and it worked around the issue.

Here are the instructions I followed to setup git-review:

$ cat > .gitreview
[gerrit]
host=gerrit.zephyrproject.org
port=29418
project=zephyr.git

$ git-review -s

To submit your patch(es) to review:
$ git-review -T

I hope this can save you time in case you run into a similar problem ;)

BTW, Andrew submitted a patch adding the .gitreview file so we don't have
to keep it locally.

Regards,

Andre

[1] https://www.zephyrproject.org/doc/collaboration/code/gerrit.html#gerrit
[2] https://gerrit.zephyrproject.org/r/#/c/390/

Re: RFC: make _fiber_start() return a handle on the fiber

Nashif, Anas
 

On Feb 16, 2016, at 11:35, Benjamin Walsh <benjamin.walsh(a)windriver.com> wrote:

Folks,

When we start a fiber via the _fiber_start() API family, we don't get
back a handle on the created fiber. The fiber identifier is actually the
start of the fiber's stack. This hasn't been a problem until now since
no API requires a handle on the fiber, except one,
fiber_delayed_start_cancel(): that API is part of a pair, where the
other API, fiber_delayed_start() starts the fiber and returns a handle.

However, Jukka asked me an API could be created that cancels a
fiber_sleep() call, something like fiber_wakeup(). The implementation of
such an API is very simple, but it requires a handle on the fiber we
want to wake up. This in turn requires the signature of the
_fiber_start() family to return a handle to the fiber that gets started.

The signature of _fiber_start() et al. would then change from a void
return type to a void * return type.

Objections, comments, etc ?
Sounds good, but we need to do it in away that keeps APIs compatible I guess.

Anas

Cheers,
Ben

--
Benjamin Walsh, SMTS
Wind River Rocket
Zephyr kernel maintainer
www.windriver.com

Re: RFC: make _fiber_start() return a handle on the fiber

Benjamin Walsh <benjamin.walsh@...>
 

When we start a fiber via the _fiber_start() API family, we don't
get back a handle on the created fiber. The fiber identifier is
actually the start of the fiber's stack. This hasn't been a problem
until now since no API requires a handle on the fiber, except one,
fiber_delayed_start_cancel(): that API is part of a pair, where the
other API, fiber_delayed_start() starts the fiber and returns a
handle.

However, Jukka asked me an API could be created that cancels a
fiber_sleep() call, something like fiber_wakeup(). The
implementation of such an API is very simple, but it requires a
handle on the fiber we want to wake up. This in turn requires the
signature of the _fiber_start() family to return a handle to the
fiber that gets started.

The signature of _fiber_start() et al. would then change from a void
return type to a void * return type.

Objections, comments, etc ?
No comments -> no objects -> perhaps we can continue with this route
then?
Looks like it. You want to do the implementation yourself Jukka ?

Re: RFC: return type of functions passed to DEVICE_INIT()

Benjamin Walsh <benjamin.walsh@...>
 

On Thu, Feb 18, 2016 at 10:49:01PM +0000, Thomas, Ramesh wrote:

On Thu, 2016-02-18 at 12:34 -0800, Dirk Brandewie wrote:

On 02/18/2016 10:03 AM, Benjamin Walsh wrote:

My take on it is that for Zephyr a failed device initialization
should be considered a fatal event. My expectation is that the
Zephyr user will only be enabling relevant (and important) devices to
their project. If one of these devices should fail, then that is a
serious system error and _NanoFatalErrorHandler() should be invoked.
How about a CONFIG option to determine if kernel should abort boot? The
What I was proposing in my last reply, too keep things simple, is one
Kconfig option to enable/disable handling of error codes by init
functions. If it's disabled, nothing is handled automatically and it
relies on everything working (when deployed) and __ASSERTs (when
debugging).

If it's enabled, some runtime handling is already needed, so you can
piggyback on it. One way is to standardize on a specific error code to
be returned by init functions to tell the kernel to abort boot. No need
for a Kconfig option for that case.

behavior should be determined by the kind of application. Some can live
with less features while for others it is all or nothing. In both cases
a logging feature that can be queried by the app would be useful.

Any of those could be a valid approach I think, but we have to decide on
one. And right now, we have the worst since we return those error codes
which are meant for runtime handling, but they just go into the void.
+1 _sys_device_do_config_level() should at least pass on the error
returned even if one device fails. This as well as the
"resume_all/suspend_all" functions can be called by app in power
management case. It would be useful for the app to know that something
went wrong.

Question still remains whether _sys_device_do_config_level() should
continue with other devices if one fails. IMO it should continue but
should return error if one device had failed. At boot time, kernel can
save the error condition somewhere to be queried by the app.
The thing is that you can have several failures, one per-device object
basically. That's why I was proposing saving the errors in some field in
the device object and queuing them to be retrieved by the application.

Brainstorming:

If we want to let the application handle the initialization issues, we
probably need some kind of queue that gets filled by the init system
when init functions return errors, and that the application drains to
see what failed. We might want to queue the associated device objects,
and have an errno field in there, or something like that.
The app would be anyway getting a failure when it tries to use the
device. It most probably cannot rectify any device error on the fly with
more detailed info. Just my thought

A simple status from _sys_device_do_config_level() is probably enough so
the app can decide whether to continue or abort.
This still has to be saved somewhere that can be retrieved by the app.
The kernel does not pass anything to main() currently, and in the
microkernel, there isn't necessarily a main() either.

Re: 6LoWPAN Stack

Jukka Rissanen
 

Hi Joakim,

On Fri, 2016-02-19 at 09:04 +0100, Joakim Eriksson wrote:
Hi Jukka!

On 19 Feb 2016, at 08:28, Jukka Rissanen <jukka.rissanen(a)linux.inte
l.com> wrote:

Hi Joakim,

On Thu, 2016-02-18 at 16:48 +0000, Joakim Eriksson wrote:
Hello!

I just cloned the Zephyr repository and saw that the 6LoWPAN
stack
was the same as the one I am used to - e,g, the Contiki 6LoWPAN
stack. How are you planning to develop it further? Are you going
to
sync it from Contiki or would you like Contiki developers to post
pull-requests to fixes also to Zephyr?
The plan is to sync it from Contiki when applicable. If you look
the
code, we needed to make lot of changes to uIP stack in order to
make it
work in Zephyr. The biggest change is that the IP stack is now re-
entrant e.g., there is no global buffer for network packets.
Because of
this it is not so simple to just pull changes from Contiki as
manual
work needs to be done anyway.
Ok, yes - I expected that there would be quite som changes. But I
guess that they
will be the same changes at each sync! I will take a look at the
codebase and
see how much differences you have.

I guess if Zephyr is ported to more platforms (some of the ones we
have in Contiki)
maybe that would make developers do pull requests to both OS:es.

Quick question - what size of the compiled code did you get on the IP
stack when 
you moved from one buffer to re-entrant / multi buffers? 
I have not measured this. At the moment the IP stack is configured to
have separate TX and RX buffers. So in minimal configuration there
would be one TX and one RX buffer vs. in Contiki there would be only
one buffer that is used for both TX and RX.
User can also define at compile time how many network buffers he wants.



Anyway - congratulations - it looks like an exciting project and
I
will be happy contribute to it!

-- Joakim Eriksson, SICS
Thanks, and patches are welcome!
I will take a look and see if some of our work would be good to get
in!
(we have LWM2M and other things fairly recently put into Contiki).
If have ported the lwm2m code to zephyr but it is not yet tested so I
have not sent it to review.


Cheers,
Jukka

Re: 6LoWPAN Stack

Joakim Eriksson
 

Hi Jukka!

On 19 Feb 2016, at 08:28, Jukka Rissanen <jukka.rissanen(a)linux.intel.com> wrote:

Hi Joakim,

On Thu, 2016-02-18 at 16:48 +0000, Joakim Eriksson wrote:
Hello!

I just cloned the Zephyr repository and saw that the 6LoWPAN stack
was the same as the one I am used to - e,g, the Contiki 6LoWPAN
stack. How are you planning to develop it further? Are you going to
sync it from Contiki or would you like Contiki developers to post
pull-requests to fixes also to Zephyr?
The plan is to sync it from Contiki when applicable. If you look the
code, we needed to make lot of changes to uIP stack in order to make it
work in Zephyr. The biggest change is that the IP stack is now re-
entrant e.g., there is no global buffer for network packets. Because of
this it is not so simple to just pull changes from Contiki as manual
work needs to be done anyway.
Ok, yes - I expected that there would be quite som changes. But I guess that they
will be the same changes at each sync! I will take a look at the codebase and
see how much differences you have.

I guess if Zephyr is ported to more platforms (some of the ones we have in Contiki)
maybe that would make developers do pull requests to both OS:es.

Quick question - what size of the compiled code did you get on the IP stack when
you moved from one buffer to re-entrant / multi buffers?


Anyway - congratulations - it looks like an exciting project and I
will be happy contribute to it!

-- Joakim Eriksson, SICS
Thanks, and patches are welcome!
I will take a look and see if some of our work would be good to get in!
(we have LWM2M and other things fairly recently put into Contiki).

Best regards,
— Joakim Eriksson, SICS


Cheers,
Jukka

Re: RFC: make _fiber_start() return a handle on the fiber

Jukka Rissanen
 

Hi,

On Tue, 2016-02-16 at 14:34 -0500, Benjamin Walsh wrote:
Folks,

When we start a fiber via the _fiber_start() API family, we don't get
back a handle on the created fiber. The fiber identifier is actually
the
start of the fiber's stack. This hasn't been a problem until now
since
no API requires a handle on the fiber, except one,
fiber_delayed_start_cancel(): that API is part of a pair, where the
other API, fiber_delayed_start() starts the fiber and returns a
handle.

However, Jukka asked me an API could be created that cancels a
fiber_sleep() call, something like fiber_wakeup(). The implementation
of
such an API is very simple, but it requires a handle on the fiber we
want to wake up. This in turn requires the signature of the
_fiber_start() family to return a handle to the fiber that gets
started.

The signature of _fiber_start() et al. would then change from a void
return type to a void * return type.

Objections, comments, etc ?
No comments -> no objects -> perhaps we can continue with this route
then?


Cheers,
Jukka

Re: 6LoWPAN Stack

Jukka Rissanen
 

Hi Joakim,

On Thu, 2016-02-18 at 16:48 +0000, Joakim Eriksson wrote:
Hello!

I just cloned the Zephyr repository and saw that the 6LoWPAN stack
was the same as the one I am used to - e,g, the Contiki 6LoWPAN
stack. How are you planning to develop it further? Are you going to
sync it from Contiki or would you like Contiki developers to post
pull-requests to fixes also to Zephyr?
The plan is to sync it from Contiki when applicable. If you look the
code, we needed to make lot of changes to uIP stack in order to make it
work in Zephyr. The biggest change is that the IP stack is now re-
entrant e.g., there is no global buffer for network packets. Because of
this it is not so simple to just pull changes from Contiki as manual
work needs to be done anyway.


Anyway - congratulations - it looks like an exciting project and I
will be happy contribute to it!

-- Joakim Eriksson, SICS
Thanks, and patches are welcome!


Cheers,
Jukka

Re: compile error for ARM build

Ravi Sahita <rsahita@...>
 

that worked - thanks Ben!

Ravi

RFC: Kernel's Object Tracing API

Cruz Alcaraz, Juan M <juan.m.cruz.alcaraz@...>
 

Zephyr kernel provides debug hooks to allow debugging tools to access nanokernel and microkernel objects (FIFO, LIFO, semaphores, etc.) and display their attributes for content inspection.

Background
=========

The current implementation of zephyr declares, inside each nanokernel structure, a "next" field that allows to access the objects as a simple linked list:
Example:
struct nano_fifo {
union {
struct _nano_queue wait_q;
struct _nano_queue data_q;
};
int stat;
#ifdef CONFIG_DEBUG_TRACING_KERNEL_OBJECTS
struct nano_fifo *next;
#endif
};

Each kernel object has a global pointer to the first object of the linked list. Each pointer can be used by a debug tool as a handler to allow it to iterate over each data structure.
E.g.
struct nano_sem *_track_list_nano_sem;
struct nano_fifo *_track_list_nano_fifo;
struct nano_lifo *_track_list_nano_lifo;
struct nano_timer *_track_list_nano_timer;

The current implementation has the following issues:

- The handlers are buried inside the kernel implementation

- There is no documentation on how they can be used to implement object inspection.

- The implementation does not have a clear pattern that a developer can follow to extend the functionality (e.g. add new kernel objects to the tracing hooks)

Proposal
=======
The following is a proposal of a public API that will give the following improvements to the current implementation without modifying the gist of the original implementation.

- The API is published as a header file in the public API directory (/include)

- The API will propose a clear pattern to follow to extend the scope of the tracing functionality in the future.

- The API is easily documented through Doxygen and it can be added as a section in the Zephyr Kernel Primer.

The API will be defined in the header file: include/misc/object_traicing.h
The API will implement the following macros:

DECLARE_TRACE_LIST(name)
Declares a list of traceable objects with a particular name.
This macro declares the global pointer to the first element of the list.
The elements are of an existing type _debg_obj_<name> and the pointer will have the unique name _trace_list_<name>

DEBUG_TRACING_OBJ_INIT(name, obj)
Adds the object "obj" to the list "name". This macro will be used by the kernel when an specific object is created. After creation, the object is added in the linked list.

DEBUG_TRACING_HEAD(name)
Gives the access to the first element of the trace list "name"

DEBUG_TRACING_NEXT(name, obj)
Gives access to the contiguous element of the element "obj" in the trace list "name"

Typedef data type aliases are needed to homologate the kernel object's types and allow a macro to relate them whit the list pointers.
These typedefs would only be used on debugger mode and when the CONFIG_DEBUG_TRACING_KERNEL_OBJECTS is active.

The following code is an example of how the implementation could follow:

typedef struct nano_sem _dbg_obj_nano_sem;
typedef struct nano_fifo _dbg_obj_nano_fifo;
typedef struct nano_lifo _dbg_obj_nano_lifo;
typedef struct nano_timer _dbg_obj_nano_timer;
typedef struct pool_struct _ dbg_obj_micro_mem_pool;

#define DECLARE_TRACE_LIST(name) type *_trace_list_##name

#define DEBUG_TRACING_HEAD(name) ((_dbg_obj_##name *)_trace_list##name)

#define DEBUG_TRACING_NEXT(name, obj) (((_dbg_obj_##name *)obj)->next)

#define DEBUG_TRACING_OBJ_INIT(name, obj) { \
_dbg_obj_##name *temp = _trace_list_##name; \
_trace_list_##name = obj; \
obj->next = temp

All comments are welcome :)

Re: RFC: return type of functions passed to DEVICE_INIT()

Daniel Leung <daniel.leung@...>
 

On Thu, Feb 18, 2016 at 05:54:56PM -0500, Benjamin Walsh wrote:
My take on it is that for Zephyr a failed device initialization
should be considered a fatal event. My expectation is that the
Zephyr user will only be enabling relevant (and important) devices to
their project. If one of these devices should fail, then that is a
serious system error and _NanoFatalErrorHandler() should be invoked.

If this train of thought holds up to scrutiny, and if the aim is to
save a few bytes then I would think that it would be better to have
the device initialization routines return a failure code and have
_sys_device_do_config_level() check for it and invoke the fatal error
handler upon the detection of failure. Otherwise we duplicate the
overhead of calling the fatal error handler in each device
initialization routine.
Sorry for the slow response. I agree with Peter here I think we should
be checking the return value and doing something useful with the
result. Maybe not _NanoFatalErrorHandler() but something notifying
the application that something bad happened. A given device not
initializing may not be fatal to the the whole application, just one
feature is currently unavailable.
For the kind of systems we are targeting, do we really expect the
application to handle devices not initializing correctly, being designed
so that parts are disabled if some parts of the initialization fail
(devices or others), or do we expect applications to require everything
to be present for them to function correctly ? I would have thought the
latter, but I can be convinced.
Delving into the realm of the hypothetical :-)

What about devices that have drivers in the system but are not present
(pluggable) or can't initialize because some resource external to the
device can't be contacted (network server).

The application may be able to still do useful work albeit with reduced
functionality.

Then, if the latter, do we expect the application catching the errors at
runtime when deployed or during development (basically catching software
errors mostly) not malfunctionning hardware. Here, I was thinking the
latter as well, which is why I was proposing __ASSERT() calls catching
initialization errors in debug loads only. And this fits with one of the
core values of the OS, which is small footprint.
Both models are useful for different reasons :-D

Any of those could be a valid approach I think, but we have to decide on
one. And right now, we have the worst since we return those error codes
which are meant for runtime handling, but they just go into the void.
Agreed we need to pick and stay with it for some amount of time until
we see a few real uses/applications/platforms.
OK, following your proposal below, what we could put in place is
standardizing on error codes that init routines must return if they want
the kernel init system to automatically trigger a fatal error.

Then, we could also allow configuring out the error handling if someone
needs to squeeze that last amount of space. One more Kconfig option! :)
The error handling would be enabled by default of course.
For those non-fatal errors, what should we do for runtime driver behaviors?
Should the drivers themselves fail API calls? Or should we let
device_get_binding() return NULL?

How we could/should report this type of error is an open question :-).
Brainstorming:

If we want to let the application handle the initialization issues, we
probably need some kind of queue that gets filled by the init system
when init functions return errors, and that the application drains to
see what failed. We might want to queue the associated device objects,
and have an errno field in there, or something like that.
How about having the driver return an error code saying whether the
failure is a fatal error or not. For the drivers that we have now where
we *know* that if it fails it is a hardware or configuration error
which is fatal. So we go with the _NanoFatalErrorHandler() error path.
That sounds good.

If a non-fatal error occurred (may work at next reset) just ignore it
an move on. The application can detect if the device is dead/not
present by the return codes from the driver call(s). Then the
application can decide what and how to report the error to the user.
That's another way of doing it. It's a bit less explicit than a list
of errors, but less overhead, and reuses what's already available.
-----
Daniel Leung

Re: RFC: return type of functions passed to DEVICE_INIT()

Benjamin Walsh <benjamin.walsh@...>
 

My take on it is that for Zephyr a failed device initialization
should be considered a fatal event. My expectation is that the
Zephyr user will only be enabling relevant (and important) devices to
their project. If one of these devices should fail, then that is a
serious system error and _NanoFatalErrorHandler() should be invoked.

If this train of thought holds up to scrutiny, and if the aim is to
save a few bytes then I would think that it would be better to have
the device initialization routines return a failure code and have
_sys_device_do_config_level() check for it and invoke the fatal error
handler upon the detection of failure. Otherwise we duplicate the
overhead of calling the fatal error handler in each device
initialization routine.
Sorry for the slow response. I agree with Peter here I think we should
be checking the return value and doing something useful with the
result. Maybe not _NanoFatalErrorHandler() but something notifying
the application that something bad happened. A given device not
initializing may not be fatal to the the whole application, just one
feature is currently unavailable.
For the kind of systems we are targeting, do we really expect the
application to handle devices not initializing correctly, being designed
so that parts are disabled if some parts of the initialization fail
(devices or others), or do we expect applications to require everything
to be present for them to function correctly ? I would have thought the
latter, but I can be convinced.
Delving into the realm of the hypothetical :-)

What about devices that have drivers in the system but are not present
(pluggable) or can't initialize because some resource external to the
device can't be contacted (network server).

The application may be able to still do useful work albeit with reduced
functionality.

Then, if the latter, do we expect the application catching the errors at
runtime when deployed or during development (basically catching software
errors mostly) not malfunctionning hardware. Here, I was thinking the
latter as well, which is why I was proposing __ASSERT() calls catching
initialization errors in debug loads only. And this fits with one of the
core values of the OS, which is small footprint.
Both models are useful for different reasons :-D

Any of those could be a valid approach I think, but we have to decide on
one. And right now, we have the worst since we return those error codes
which are meant for runtime handling, but they just go into the void.
Agreed we need to pick and stay with it for some amount of time until
we see a few real uses/applications/platforms.
OK, following your proposal below, what we could put in place is
standardizing on error codes that init routines must return if they want
the kernel init system to automatically trigger a fatal error.

Then, we could also allow configuring out the error handling if someone
needs to squeeze that last amount of space. One more Kconfig option! :)
The error handling would be enabled by default of course.

How we could/should report this type of error is an open question :-).
Brainstorming:

If we want to let the application handle the initialization issues, we
probably need some kind of queue that gets filled by the init system
when init functions return errors, and that the application drains to
see what failed. We might want to queue the associated device objects,
and have an errno field in there, or something like that.
How about having the driver return an error code saying whether the
failure is a fatal error or not. For the drivers that we have now where
we *know* that if it fails it is a hardware or configuration error
which is fatal. So we go with the _NanoFatalErrorHandler() error path.
That sounds good.

If a non-fatal error occurred (may work at next reset) just ignore it
an move on. The application can detect if the device is dead/not
present by the return codes from the driver call(s). Then the
application can decide what and how to report the error to the user.
That's another way of doing it. It's a bit less explicit than a list
of errors, but less overhead, and reuses what's already available.

Re: RFC: return type of functions passed to DEVICE_INIT()

Thomas, Ramesh
 

On Thu, 2016-02-18 at 12:34 -0800, Dirk Brandewie wrote:

On 02/18/2016 10:03 AM, Benjamin Walsh wrote:

My take on it is that for Zephyr a failed device initialization
should be considered a fatal event. My expectation is that the
Zephyr user will only be enabling relevant (and important) devices to
their project. If one of these devices should fail, then that is a
serious system error and _NanoFatalErrorHandler() should be invoked.
How about a CONFIG option to determine if kernel should abort boot? The
behavior should be determined by the kind of application. Some can live
with less features while for others it is all or nothing. In both cases
a logging feature that can be queried by the app would be useful.

Any of those could be a valid approach I think, but we have to decide on
one. And right now, we have the worst since we return those error codes
which are meant for runtime handling, but they just go into the void.
+1 _sys_device_do_config_level() should at least pass on the error
returned even if one device fails. This as well as the
"resume_all/suspend_all" functions can be called by app in power
management case. It would be useful for the app to know that something
went wrong.

Question still remains whether _sys_device_do_config_level() should
continue with other devices if one fails. IMO it should continue but
should return error if one device had failed. At boot time, kernel can
save the error condition somewhere to be queried by the app.

Brainstorming:

If we want to let the application handle the initialization issues, we
probably need some kind of queue that gets filled by the init system
when init functions return errors, and that the application drains to
see what failed. We might want to queue the associated device objects,
and have an errno field in there, or something like that.
The app would be anyway getting a failure when it tries to use the
device. It most probably cannot rectify any device error on the fly with
more detailed info. Just my thought

A simple status from _sys_device_do_config_level() is probably enough so
the app can decide whether to continue or abort.

How to setup git-review

Andre Guedes <andre.guedes@...>
 

Hi all,

I was having some trouble to submit a change to Gerrit which depends on
someone else's change. If I use the 'git push' command as described in [1],
I get the following error:

remote: ERROR: In commit <commit id>
remote: ERROR: author email address <someone(a)mail.com>
remote: ERROR: does not match your user account.
remote: ERROR:
remote: ERROR: The following addresses are currently registered:
remote: ERROR: <your email>
remote: ERROR:
remote: ERROR: To register an email address, please visit:
remote: ERROR: https://gerrit.zephyrproject.org/r/#/settings/contact

Talking to Andrew Grimberg from Linux Foundation, he recommended using
git-review and it worked around the issue.

Here are the instructions I followed to setup git-review:

$ cat > .gitreview
[gerrit]
host=gerrit.zephyrproject.org
port=29418
project=zephyr.git

$ git-review -s

To submit your patch(es) to review:
$ git-review -T

I hope this can save you time in case you run into a similar problem ;)

BTW, Andrew submitted a patch adding the .gitreview file so we don't have
to keep it locally.

Regards,

Andre

[1] https://www.zephyrproject.org/doc/collaboration/code/gerrit.html#gerrit
[2] https://gerrit.zephyrproject.org/r/#/c/390/

RFC: Microkernel tasks pending on nanokernel objects

Mitsis, Peter <Peter.Mitsis@...>
 

The following is proposed to permit microkernel tasks to pend on nanokernel
objects.

Background:
-----------
When the nanokernel's background task must wait for a nanokernel object
(N.O.) it polls. The polling is done at a frequency of no more than once
per interrupt. Thus, if the timer is the only source of interrupts then
the polling is done once no more than once per tick.

This is fine for the nanokernel, but in a microkernel where there are
multiple tasks, it is very inefficient.

Proposal:
---------
To allow multiple microkernel tasks to pend on N.O.'s, let there be a
second queue per N.O. and let it be used strictly for pending tasks.

NOTE: This second queue would only apply to N.O.'s in a microkernel system.
Since there is only one task in a nanokernel system, there is no need for
this second queue.

+-------------------------------+
| Nanokernel Object |
+-------------------------------+
| |
Fiber1 Task1
| |
Fiber2 Task2
| |
--- Task3
- |
Task4
|
---
-

Fig 1: A N.O. with both waiting fibers and tasks

Proposed N.O. Rules:
--------------------
1. When a task must wait on a N.O., it is added to the task wait queue.
2. When a fiber must wait on a N.O., it is added to the fiber wait queue.
3. When someone posts to the N.O. ...
A. If there is a waiting fiber, let it claim it immediately.
B. If there are no waiting fibers, mark it is availble and wake all
pending tasks.
4. An awakened task must check for timeout before trying to claim the N.O.
If it can not claim it, then it adds itself to the task wait queue.

Explanation of Rules:
---------------------
Preference is to be given to fibers just as in a nanokernel system.

By waking all the tasks waiting on that N.O. ...
1. The waiting tasks are automatically sorted by the scheduler by priority.
2. The meaning of a task's timeout parameter remains unchanged from the
nanokernel.
3. Timeouts are handled neatly with minimal new code.

Additional Changes
------------------
In a microkernel system, the nanokernel would also require the ability to
notify the kernel of tasks that are to be set to a wait state as well as
tasks that are to be cleared of their wait state. As the nanokernel deals
with thread control structures (TCS) and the microkernel deals with k_task
structures. A mechanism is required to convert a nanokernel TCS pointer to
a k_task pointer. This can be done easily by ...
1. Adding a new field to the TCS such as ...
e.g.
void *uk_task_ptr; /* ptr to the microkernel k_task */
2. Modifying _new_thread() to accept the microkernel k_task pointer as a
parameter. If it is a fiber that it is being created, use NULL.

To mark a task as waiting on a nanokernel object, a new TF_xxx bit will
be defined.
#define TF_NANO 0x00000400

Known Limitations and Problems
------------------------------
#1.
This will not be done for the nanokernel stacks (nano_stack.c).

#2.
Since many tasks may be actively waiting on a N.O., it may take an
indeterminate amount of time to find the next task to run.
This is non-real time behavior!

For example:
Imagine 100 tasks of high priority that are actively waiting on a
nanokernel FIFO. That is, there are 100 tasks that are ready to execute
but they are waiting for data from that FIFO. Each one of those 100
tasks would have to put themselves back onto the FIFO's task wait queue
before a low priority task would get to run. That low priority task
might be the idle task, but it could just as easily be something else.

Re: RFC: return type of functions passed to DEVICE_INIT()

Dirk Brandewie <dirk.j.brandewie@...>
 

On 02/18/2016 10:03 AM, Benjamin Walsh wrote:

My take on it is that for Zephyr a failed device initialization
should be considered a fatal event. My expectation is that the
Zephyr user will only be enabling relevant (and important) devices to
their project. If one of these devices should fail, then that is a
serious system error and _NanoFatalErrorHandler() should be invoked.

If this train of thought holds up to scrutiny, and if the aim is to
save a few bytes then I would think that it would be better to have
the device initialization routines return a failure code and have
_sys_device_do_config_level() check for it and invoke the fatal error
handler upon the detection of failure. Otherwise we duplicate the
overhead of calling the fatal error handler in each device
initialization routine.
Sorry for the slow response. I agree with Peter here I think we should
be checking the return value and doing something useful with the
result. Maybe not _NanoFatalErrorHandler() but something notifying
the application that something bad happened. A given device not
initializing may not be fatal to the the whole application, just one
feature is currently unavailable.
For the kind of systems we are targeting, do we really expect the
application to handle devices not initializing correctly, being designed
so that parts are disabled if some parts of the initialization fail
(devices or others), or do we expect applications to require everything
to be present for them to function correctly ? I would have thought the
latter, but I can be convinced.
Delving into the realm of the hypothetical :-)

What about devices that have drivers in the system but are not present
(pluggable) or can't initialize because some resource external to the
device can't be contacted (network server).

The application may be able to still do useful work albeit with reduced
functionality.

Then, if the latter, do we expect the application catching the errors at
runtime when deployed or during development (basically catching software
errors mostly) not malfunctionning hardware. Here, I was thinking the
latter as well, which is why I was proposing __ASSERT() calls catching
initialization errors in debug loads only. And this fits with one of the
core values of the OS, which is small footprint.
Both models are useful for different reasons :-D

Any of those could be a valid approach I think, but we have to decide on
one. And right now, we have the worst since we return those error codes
which are meant for runtime handling, but they just go into the void.
Agreed we need to pick and stay with it for some amount of time until
we see a few real uses/applications/platforms.


How we could/should report this type of error is an open question :-).
Brainstorming:

If we want to let the application handle the initialization issues, we
probably need some kind of queue that gets filled by the init system
when init functions return errors, and that the application drains to
see what failed. We might want to queue the associated device objects,
and have an errno field in there, or something like that.
How about having the driver return an error code saying whether the
failure is a fatal error or not. For the drivers that we have now where
we *know* that if it fails it is a hardware or configuration error
which is fatal. So we go with the _NanoFatalErrorHandler() error path.

If a non-fatal error occurred (may work at next reset) just ignore it
an move on. The application can detect if the device is dead/not
present by the return codes from the driver call(s). Then the
application can decide what and how to report the error to the user.


Re: compile error for ARM build

Benjamin Walsh <benjamin.walsh@...>
 

Hi Ravi,

anyone else run into this error?
You probably built for one target, then built for a different one
without cleaning up the first build.

e.g.

$ make <- x86 by default
$ make BOARD=qemu_cortex_m3 <- bombs

Be default, the output goes into the 'outdir' directory, and the builds
cannot coexist in there. You can chose where the output goes via the O=
makefile variable.

e.g.

$ make O=out-x86
$ make O=out-arm BOARD=qemu_cortex_m3

Or you can clean by doing 'make pristine' between them.

Cheers,
Ben


thanks,
Ravi

rlsahita(a)ubuntu:~/zeph/zephyr-project/samples/hello_world/microkernel$ make BOARD=qemu_cortex_m3 ARCH=arm
make[1]: Entering directory `/home/rlsahita/zeph/zephyr-project'
make[2]: Entering directory `/home/rlsahita/zeph/zephyr-project/samples/hello_world/microkernel/outdir'
Using /home/rlsahita/zeph/zephyr-project as source for kernel
GEN ./Makefile
CHK include/generated/version.h
CHK misc/generated/configs.c
In file included from /home/rlsahita/zeph/zephyr-project/arch/arm/core/offsets/offsets.c:37:0:
/home/rlsahita/zeph/zephyr-project/arch/arm/include/nano_private.h: In function 'nanoArchInit':
/home/rlsahita/zeph/zephyr-project/arch/arm/include/nano_private.h:168:2: warning: implicit declaration of function '_InterruptStackSetup' [-Wimplicit-function-declaration]
_InterruptStackSetup();
^
/home/rlsahita/zeph/zephyr-project/arch/arm/include/nano_private.h:169:2: warning: implicit declaration of function '_ExcSetup' [-Wimplicit-function-declaration]
_ExcSetup();
^
/home/rlsahita/zeph/zephyr-project/arch/arm/include/nano_private.h: In function 'fiberRtnValueSet':
/home/rlsahita/zeph/zephyr-project/arch/arm/include/nano_private.h:192:6: error: dereferencing pointer to incomplete type 'tESF {aka struct __esf}'
pEsf->a1 = value;
^
In file included from /home/rlsahita/zeph/zephyr-project/include/toolchain.h:29:0,
from /home/rlsahita/zeph/zephyr-project/kernel/nanokernel/include/gen_offset.h:86,
from /home/rlsahita/zeph/zephyr-project/arch/arm/core/offsets/offsets.c:36:
/home/rlsahita/zeph/zephyr-project/arch/arm/core/offsets/offsets.c: In function '_OffsetAbsSyms':
/home/rlsahita/zeph/zephyr-project/arch/arm/core/offsets/offsets.c:67:40: error: invalid application of 'sizeof' to incomplete type 'tESF {aka struct __esf}'
GEN_ABSOLUTE_SYM(__tESF_SIZEOF, sizeof(tESF));
^
/home/rlsahita/zeph/zephyr-project/include/toolchain/gcc.h:272:43: note: in definition of macro 'GEN_ABSOLUTE_SYM'
"\n\t.type\t" #name ",@object" : : "n"(value))
^
make[3]: *** [arch/arm/core/offsets/offsets.o] Error 1
make[2]: *** [prepare] Error 2
make[2]: Leaving directory `/home/rlsahita/zeph/zephyr-project/samples/hello_world/microkernel/outdir'
make[1]: *** [sub-make] Error 2
make[1]: Leaving directory `/home/rlsahita/zeph/zephyr-project'
make: *** [all] Error 2
rlsahita(a)ubuntu:~/zeph/zephyr-project/samples/hello_world/microkernel$
--
Benjamin Walsh, SMTS
Wind River Rocket
Zephyr kernel maintainer
zephyrproject.org
www.windriver.com

compile error for ARM build

Ravi Sahita <rsahita@...>
 

anyone else run into this error?

thanks,
Ravi

rlsahita(a)ubuntu:~/zeph/zephyr-project/samples/hello_world/microkernel$ make BOARD=qemu_cortex_m3 ARCH=arm
make[1]: Entering directory `/home/rlsahita/zeph/zephyr-project'
make[2]: Entering directory `/home/rlsahita/zeph/zephyr-project/samples/hello_world/microkernel/outdir'
Using /home/rlsahita/zeph/zephyr-project as source for kernel
GEN ./Makefile
CHK include/generated/version.h
CHK misc/generated/configs.c
In file included from /home/rlsahita/zeph/zephyr-project/arch/arm/core/offsets/offsets.c:37:0:
/home/rlsahita/zeph/zephyr-project/arch/arm/include/nano_private.h: In function 'nanoArchInit':
/home/rlsahita/zeph/zephyr-project/arch/arm/include/nano_private.h:168:2: warning: implicit declaration of function '_InterruptStackSetup' [-Wimplicit-function-declaration]
_InterruptStackSetup();
^
/home/rlsahita/zeph/zephyr-project/arch/arm/include/nano_private.h:169:2: warning: implicit declaration of function '_ExcSetup' [-Wimplicit-function-declaration]
_ExcSetup();
^
/home/rlsahita/zeph/zephyr-project/arch/arm/include/nano_private.h: In function 'fiberRtnValueSet':
/home/rlsahita/zeph/zephyr-project/arch/arm/include/nano_private.h:192:6: error: dereferencing pointer to incomplete type 'tESF {aka struct __esf}'
pEsf->a1 = value;
^
In file included from /home/rlsahita/zeph/zephyr-project/include/toolchain.h:29:0,
from /home/rlsahita/zeph/zephyr-project/kernel/nanokernel/include/gen_offset.h:86,
from /home/rlsahita/zeph/zephyr-project/arch/arm/core/offsets/offsets.c:36:
/home/rlsahita/zeph/zephyr-project/arch/arm/core/offsets/offsets.c: In function '_OffsetAbsSyms':
/home/rlsahita/zeph/zephyr-project/arch/arm/core/offsets/offsets.c:67:40: error: invalid application of 'sizeof' to incomplete type 'tESF {aka struct __esf}'
GEN_ABSOLUTE_SYM(__tESF_SIZEOF, sizeof(tESF));
^
/home/rlsahita/zeph/zephyr-project/include/toolchain/gcc.h:272:43: note: in definition of macro 'GEN_ABSOLUTE_SYM'
"\n\t.type\t" #name ",@object" : : "n"(value))
^
make[3]: *** [arch/arm/core/offsets/offsets.o] Error 1
make[2]: *** [prepare] Error 2
make[2]: Leaving directory `/home/rlsahita/zeph/zephyr-project/samples/hello_world/microkernel/outdir'
make[1]: *** [sub-make] Error 2
make[1]: Leaving directory `/home/rlsahita/zeph/zephyr-project'
make: *** [all] Error 2
rlsahita(a)ubuntu:~/zeph/zephyr-project/samples/hello_world/microkernel$