Date   

Re: RFC: return type of functions passed to DEVICE_INIT()

Dirk Brandewie <dirk.j.brandewie@...>
 

On 02/12/2016 11:38 AM, Mitsis, Peter wrote:
See [Peter]

-----Original Message-----
From: Benjamin Walsh [mailto:benjamin.walsh(a)windriver.com]
Sent: February-12-16 1:37 PM
To: devel(a)lists.zephyrproject.org
Subject: [devel] RFC: return type of functions passed to DEVICE_INIT()

Folks,

For some reason, the signature of functions passed to the DEVICE_INIT()
<init_fn> parameter has a return type of 'int', but the return value is never
checked within _sys_device_do_config_level(). Some init functions do return an
error code, such as the ARC init code and the bluetooth init routines, but that
just gets ignored.

Question: should we have init functions of return type 'void' then ?
That would shave a few bytes in every init function if we don't have to return a
value.
[Peter] - We generally try to operate under the assumption that failures will not occur.
That being said, we do have some instances where we do check for errors, and some of these are considered fatal.

My take on it is that for Zephyr a failed device initialization should be considered a fatal event.
My expectation is that the Zephyr user will only be enabling relevant (and important) devices to their project.
If one of these devices should fail, then that is a serious system error and _NanoFatalErrorHandler() should be invoked.

If this train of thought holds up to scrutiny, and if the aim is to save a few bytes then I would think that it would be better to have the device initialization routines return a failure code and have _sys_device_do_config_level() check for it and invoke the fatal error handler upon the detection of failure. Otherwise we duplicate the overhead of calling the fatal error handler in each device initialization routine.
Sorry for the slow response. I agree with Peter here I think we should
be checking the return value and doing something useful with the
result. Maybe not _NanoFatalErrorHandler() but something notifying
the application that something bad happened. A given device not
initializing may not be fatal to the the whole application, just one
feature is currently unavailable.

How we could/should report this type of error is an open question :-).



Just my two cents.

Cheers,
Ben

--
Benjamin Walsh, SMTS
Wind River Rocket
Zephyr kernel maintainer
www.windriver.com


Re: [RFC] Add DEV_NOT_IMPLEMENTED error code

Andre Guedes <andre.guedes@...>
 

Hi Jesus,

Quoting Jesus Sanchez-Palencia (2016-02-18 12:42:07)
Just so we avoid mixing two different topics, it looks like this deserves a separate RFC.
Yes. I already had this in mind and I'm going to send the errno.h RFC later
today.

I was basically replying Daniel's comment and took the opportunity to get an
early feedback from him.

Regards,

Andre


Re: [RFC] Add DEV_NOT_IMPLEMENTED error code

Dirk Brandewie <dirk.j.brandewie@...>
 

On 02/18/2016 06:42 AM, Jesus Sanchez-Palencia wrote:
Hi,


On Thu, 18 Feb 2016 09:11:59 -0200
Andre Guedes <andre.guedes(a)intel.com> wrote:

Hi Daniel,

Quoting Kalowsky, Daniel (2016-02-17 05:09:07)
DEV_NO_SUPPORT seems to cover the concept. Not sure we need a
DEV_NOT_IMPLEMENTED. Or as Peter points out ENOSYS works as well.

Yes, DEV_NO_SUPPORT might cover the concept however it brings another
connotation which is "the hardware actually lacks support" (e.g. the
controller doesn't support that feature) and for that case I think the error
code DEV_INVALID_OP might be more suitable.
Good point. You're alternate point (in another post) of ENOSYS not matching is also valid.
I think ENOSYS would perfectly match what we want here since it means
"Function not implemented" (as Peter pointed in the other reply).

Speaking of which, I was talking to Dirk in the other day and we both agreed
that it would be a good idea we use errno.h codes instead of DEV_* at the
driver's layer. The main points are 1) errno.h is a well-known error convention
which pretty much all developer is familiar with, 2) errno.h codes address what
we need, 3) no need to create new codes such as DEV_NOT_IMPLEMENTED for instance,
and 4) changing the current drivers to use errno.h codes instead of DEV_* is a
feasible task.
A good first step is to redefine DEV_* in terms of -E* error codes :-)

Adding the include of errno.h and the removal of DEV_* can happen
at our leisure :-) The apps will be getting a consistent set of error
codes in any case.

What do you think?
Just so we avoid mixing two different topics, it looks like this deserves a separate RFC.

The original problem I raised on that code review a few weeks ago -- not having a defined return
code to state that a given driver API impl is dummy -- seems to have had some conclusion reached
here already. Looks like adding DEV_NOT_IMPLEMENTED has some appeal in the end. I also support that
meanwhile the errno idea is not discussed.


I think you've convinced me. The challenge now is to make sure it's used appropriately. As we've shown, we'll need to be explicit with why some functions actually need to return DEV_OK or whatever vs DEV_NO_SUPPORT.
Yes, this is the challenge. I volunteer to review and fix what we have
upstream. However, we have to pay attention for this kind of things
while reviewing new patches on Gerrit to ensure they are merged with
the proper return codes.
Agreed


Guidelines, folks. Let's make sure this is documented somewhere so we can just copy&paste a link on
reviews afterwards and people can refer to it when coding.
Great idea! This will help with readability of driver code and if the
app developer actually tries to decode the error they will know where
to get the *meaning*

Thanks for preparing this RFC, by the way!

Regards,
jesus



Thanks,

Andre


6LoWPAN Stack

Joakim Eriksson
 

Hello!

I just cloned the Zephyr repository and saw that the 6LoWPAN stack was the same as the one I am used to - e,g, the Contiki 6LoWPAN stack. How are you planning to develop it further? Are you going to sync it from Contiki or would you like Contiki developers to post pull-requests to fixes also to Zephyr?

Anyway - congratulations - it looks like an exciting project and I will be happy contribute to it!

-- Joakim Eriksson, SICS


Re: RFC: return type of functions passed to DEVICE_INIT()

Benjamin Walsh <benjamin.walsh@...>
 

For some reason, the signature of functions passed to the
DEVICE_INIT() <init_fn> parameter has a return type of 'int', but
the return value is never checked within
_sys_device_do_config_level(). Some init functions do return an
error code, such as the ARC init code and the bluetooth init
routines, but that just gets ignored.

Question: should we have init functions of return type 'void' then ?
That would shave a few bytes in every init function if we don't have
to return a value.
We generally try to operate under the assumption that failures will
not occur. That being said, we do have some instances where we do
check for errors, and some of these are considered fatal.

My take on it is that for Zephyr a failed device initialization
should be considered a fatal event. My expectation is that the
Zephyr user will only be enabling relevant (and important) devices to
their project. If one of these devices should fail, then that is a
serious system error and _NanoFatalErrorHandler() should be invoked.

If this train of thought holds up to scrutiny, and if the aim is to
save a few bytes then I would think that it would be better to have
the device initialization routines return a failure code and have
_sys_device_do_config_level() check for it and invoke the fatal error
handler upon the detection of failure. Otherwise we duplicate the
overhead of calling the fatal error handler in each device
initialization routine.
Sorry for the slow response. I agree with Peter here I think we should
be checking the return value and doing something useful with the
result. Maybe not _NanoFatalErrorHandler() but something notifying
the application that something bad happened. A given device not
initializing may not be fatal to the the whole application, just one
feature is currently unavailable.
For the kind of systems we are targeting, do we really expect the
application to handle devices not initializing correctly, being designed
so that parts are disabled if some parts of the initialization fail
(devices or others), or do we expect applications to require everything
to be present for them to function correctly ? I would have thought the
latter, but I can be convinced.

Then, if the latter, do we expect the application catching the errors at
runtime when deployed or during development (basically catching software
errors mostly) not malfunctionning hardware. Here, I was thinking the
latter as well, which is why I was proposing __ASSERT() calls catching
initialization errors in debug loads only. And this fits with one of the
core values of the OS, which is small footprint.

Any of those could be a valid approach I think, but we have to decide on
one. And right now, we have the worst since we return those error codes
which are meant for runtime handling, but they just go into the void.

How we could/should report this type of error is an open question :-).
Brainstorming:

If we want to let the application handle the initialization issues, we
probably need some kind of queue that gets filled by the init system
when init functions return errors, and that the application drains to
see what failed. We might want to queue the associated device objects,
and have an errno field in there, or something like that.


Zephyr SDK and building

Kalowsky, Daniel <daniel.kalowsky@...>
 

Because my main dev box has been in a steady state of failing for a while now, I moved everything over to a new (to me) development machine (dleung’s dregs, thanks Daniel!) with a fresh install of Fedora 23.

While following the directions on ZephyrProject.org, I questioned the need for the required packages. I believe these portions are left over requirements from the days when we built our own cross-compiler toolchain.

Is there any reason why these are still needed?

The short answer is yes. We still need these for the build tools/system to work.

Why is Zephyr’s SDK not capable of compiling everything for the OS?
Why are we still dependent upon the host computer having GCC when we just downloaded a version of it?

Seems like we just required someone to download a 350 MB SDK, plus all the dev tools for their platform at an additional 300 MB.


RFC: Use error codes from errno.h

Andre Guedes <andre.guedes@...>
 

Hi all,

While we were discussing about adding a new error code for device.h (see
"[RFC] Add DEV_NOT_IMPLEMENTED error code" thread), we had an initial
agreement that it does make sense to use error codes from errno.h instead
of the ones from include/device.h. Since this topic deserves its own RFC,
I'm sending this email so we can have a proper discussion.

So the main points in favor of this change are 1) errno.h is a well-known
error convention which pretty much all developer is familiar with, 2)
errno.h codes address what we need, 3) no need to create new codes such
as DEV_NOT_IMPLEMENTED for instance, and 4) changing the current drivers
to use errno.h codes instead of DEV_* is a feasible task.

The initial discussion was about using errno.h codes at the driver's layer
but I think we can expand it to the whole system. Actually, errno.h codes
are already used in net/bluetooth and net/ip.

Regards,

Andre


compile error for ARM build

Ravi Sahita <rsahita@...>
 

anyone else run into this error?

thanks,
Ravi

rlsahita(a)ubuntu:~/zeph/zephyr-project/samples/hello_world/microkernel$ make BOARD=qemu_cortex_m3 ARCH=arm
make[1]: Entering directory `/home/rlsahita/zeph/zephyr-project'
make[2]: Entering directory `/home/rlsahita/zeph/zephyr-project/samples/hello_world/microkernel/outdir'
Using /home/rlsahita/zeph/zephyr-project as source for kernel
GEN ./Makefile
CHK include/generated/version.h
CHK misc/generated/configs.c
In file included from /home/rlsahita/zeph/zephyr-project/arch/arm/core/offsets/offsets.c:37:0:
/home/rlsahita/zeph/zephyr-project/arch/arm/include/nano_private.h: In function 'nanoArchInit':
/home/rlsahita/zeph/zephyr-project/arch/arm/include/nano_private.h:168:2: warning: implicit declaration of function '_InterruptStackSetup' [-Wimplicit-function-declaration]
_InterruptStackSetup();
^
/home/rlsahita/zeph/zephyr-project/arch/arm/include/nano_private.h:169:2: warning: implicit declaration of function '_ExcSetup' [-Wimplicit-function-declaration]
_ExcSetup();
^
/home/rlsahita/zeph/zephyr-project/arch/arm/include/nano_private.h: In function 'fiberRtnValueSet':
/home/rlsahita/zeph/zephyr-project/arch/arm/include/nano_private.h:192:6: error: dereferencing pointer to incomplete type 'tESF {aka struct __esf}'
pEsf->a1 = value;
^
In file included from /home/rlsahita/zeph/zephyr-project/include/toolchain.h:29:0,
from /home/rlsahita/zeph/zephyr-project/kernel/nanokernel/include/gen_offset.h:86,
from /home/rlsahita/zeph/zephyr-project/arch/arm/core/offsets/offsets.c:36:
/home/rlsahita/zeph/zephyr-project/arch/arm/core/offsets/offsets.c: In function '_OffsetAbsSyms':
/home/rlsahita/zeph/zephyr-project/arch/arm/core/offsets/offsets.c:67:40: error: invalid application of 'sizeof' to incomplete type 'tESF {aka struct __esf}'
GEN_ABSOLUTE_SYM(__tESF_SIZEOF, sizeof(tESF));
^
/home/rlsahita/zeph/zephyr-project/include/toolchain/gcc.h:272:43: note: in definition of macro 'GEN_ABSOLUTE_SYM'
"\n\t.type\t" #name ",@object" : : "n"(value))
^
make[3]: *** [arch/arm/core/offsets/offsets.o] Error 1
make[2]: *** [prepare] Error 2
make[2]: Leaving directory `/home/rlsahita/zeph/zephyr-project/samples/hello_world/microkernel/outdir'
make[1]: *** [sub-make] Error 2
make[1]: Leaving directory `/home/rlsahita/zeph/zephyr-project'
make: *** [all] Error 2
rlsahita(a)ubuntu:~/zeph/zephyr-project/samples/hello_world/microkernel$


Re: compile error for ARM build

Benjamin Walsh <benjamin.walsh@...>
 

Hi Ravi,

anyone else run into this error?
You probably built for one target, then built for a different one
without cleaning up the first build.

e.g.

$ make <- x86 by default
$ make BOARD=qemu_cortex_m3 <- bombs

Be default, the output goes into the 'outdir' directory, and the builds
cannot coexist in there. You can chose where the output goes via the O=
makefile variable.

e.g.

$ make O=out-x86
$ make O=out-arm BOARD=qemu_cortex_m3

Or you can clean by doing 'make pristine' between them.

Cheers,
Ben


thanks,
Ravi

rlsahita(a)ubuntu:~/zeph/zephyr-project/samples/hello_world/microkernel$ make BOARD=qemu_cortex_m3 ARCH=arm
make[1]: Entering directory `/home/rlsahita/zeph/zephyr-project'
make[2]: Entering directory `/home/rlsahita/zeph/zephyr-project/samples/hello_world/microkernel/outdir'
Using /home/rlsahita/zeph/zephyr-project as source for kernel
GEN ./Makefile
CHK include/generated/version.h
CHK misc/generated/configs.c
In file included from /home/rlsahita/zeph/zephyr-project/arch/arm/core/offsets/offsets.c:37:0:
/home/rlsahita/zeph/zephyr-project/arch/arm/include/nano_private.h: In function 'nanoArchInit':
/home/rlsahita/zeph/zephyr-project/arch/arm/include/nano_private.h:168:2: warning: implicit declaration of function '_InterruptStackSetup' [-Wimplicit-function-declaration]
_InterruptStackSetup();
^
/home/rlsahita/zeph/zephyr-project/arch/arm/include/nano_private.h:169:2: warning: implicit declaration of function '_ExcSetup' [-Wimplicit-function-declaration]
_ExcSetup();
^
/home/rlsahita/zeph/zephyr-project/arch/arm/include/nano_private.h: In function 'fiberRtnValueSet':
/home/rlsahita/zeph/zephyr-project/arch/arm/include/nano_private.h:192:6: error: dereferencing pointer to incomplete type 'tESF {aka struct __esf}'
pEsf->a1 = value;
^
In file included from /home/rlsahita/zeph/zephyr-project/include/toolchain.h:29:0,
from /home/rlsahita/zeph/zephyr-project/kernel/nanokernel/include/gen_offset.h:86,
from /home/rlsahita/zeph/zephyr-project/arch/arm/core/offsets/offsets.c:36:
/home/rlsahita/zeph/zephyr-project/arch/arm/core/offsets/offsets.c: In function '_OffsetAbsSyms':
/home/rlsahita/zeph/zephyr-project/arch/arm/core/offsets/offsets.c:67:40: error: invalid application of 'sizeof' to incomplete type 'tESF {aka struct __esf}'
GEN_ABSOLUTE_SYM(__tESF_SIZEOF, sizeof(tESF));
^
/home/rlsahita/zeph/zephyr-project/include/toolchain/gcc.h:272:43: note: in definition of macro 'GEN_ABSOLUTE_SYM'
"\n\t.type\t" #name ",@object" : : "n"(value))
^
make[3]: *** [arch/arm/core/offsets/offsets.o] Error 1
make[2]: *** [prepare] Error 2
make[2]: Leaving directory `/home/rlsahita/zeph/zephyr-project/samples/hello_world/microkernel/outdir'
make[1]: *** [sub-make] Error 2
make[1]: Leaving directory `/home/rlsahita/zeph/zephyr-project'
make: *** [all] Error 2
rlsahita(a)ubuntu:~/zeph/zephyr-project/samples/hello_world/microkernel$
--
Benjamin Walsh, SMTS
Wind River Rocket
Zephyr kernel maintainer
zephyrproject.org
www.windriver.com


Re: RFC: return type of functions passed to DEVICE_INIT()

Dirk Brandewie <dirk.j.brandewie@...>
 

On 02/18/2016 10:03 AM, Benjamin Walsh wrote:

My take on it is that for Zephyr a failed device initialization
should be considered a fatal event. My expectation is that the
Zephyr user will only be enabling relevant (and important) devices to
their project. If one of these devices should fail, then that is a
serious system error and _NanoFatalErrorHandler() should be invoked.

If this train of thought holds up to scrutiny, and if the aim is to
save a few bytes then I would think that it would be better to have
the device initialization routines return a failure code and have
_sys_device_do_config_level() check for it and invoke the fatal error
handler upon the detection of failure. Otherwise we duplicate the
overhead of calling the fatal error handler in each device
initialization routine.
Sorry for the slow response. I agree with Peter here I think we should
be checking the return value and doing something useful with the
result. Maybe not _NanoFatalErrorHandler() but something notifying
the application that something bad happened. A given device not
initializing may not be fatal to the the whole application, just one
feature is currently unavailable.
For the kind of systems we are targeting, do we really expect the
application to handle devices not initializing correctly, being designed
so that parts are disabled if some parts of the initialization fail
(devices or others), or do we expect applications to require everything
to be present for them to function correctly ? I would have thought the
latter, but I can be convinced.
Delving into the realm of the hypothetical :-)

What about devices that have drivers in the system but are not present
(pluggable) or can't initialize because some resource external to the
device can't be contacted (network server).

The application may be able to still do useful work albeit with reduced
functionality.

Then, if the latter, do we expect the application catching the errors at
runtime when deployed or during development (basically catching software
errors mostly) not malfunctionning hardware. Here, I was thinking the
latter as well, which is why I was proposing __ASSERT() calls catching
initialization errors in debug loads only. And this fits with one of the
core values of the OS, which is small footprint.
Both models are useful for different reasons :-D

Any of those could be a valid approach I think, but we have to decide on
one. And right now, we have the worst since we return those error codes
which are meant for runtime handling, but they just go into the void.
Agreed we need to pick and stay with it for some amount of time until
we see a few real uses/applications/platforms.


How we could/should report this type of error is an open question :-).
Brainstorming:

If we want to let the application handle the initialization issues, we
probably need some kind of queue that gets filled by the init system
when init functions return errors, and that the application drains to
see what failed. We might want to queue the associated device objects,
and have an errno field in there, or something like that.
How about having the driver return an error code saying whether the
failure is a fatal error or not. For the drivers that we have now where
we *know* that if it fails it is a hardware or configuration error
which is fatal. So we go with the _NanoFatalErrorHandler() error path.

If a non-fatal error occurred (may work at next reset) just ignore it
an move on. The application can detect if the device is dead/not
present by the return codes from the driver call(s). Then the
application can decide what and how to report the error to the user.



RFC: Microkernel tasks pending on nanokernel objects

Mitsis, Peter <Peter.Mitsis@...>
 

The following is proposed to permit microkernel tasks to pend on nanokernel
objects.

Background:
-----------
When the nanokernel's background task must wait for a nanokernel object
(N.O.) it polls. The polling is done at a frequency of no more than once
per interrupt. Thus, if the timer is the only source of interrupts then
the polling is done once no more than once per tick.

This is fine for the nanokernel, but in a microkernel where there are
multiple tasks, it is very inefficient.

Proposal:
---------
To allow multiple microkernel tasks to pend on N.O.'s, let there be a
second queue per N.O. and let it be used strictly for pending tasks.

NOTE: This second queue would only apply to N.O.'s in a microkernel system.
Since there is only one task in a nanokernel system, there is no need for
this second queue.

+-------------------------------+
| Nanokernel Object |
+-------------------------------+
| |
Fiber1 Task1
| |
Fiber2 Task2
| |
--- Task3
- |
Task4
|
---
-

Fig 1: A N.O. with both waiting fibers and tasks

Proposed N.O. Rules:
--------------------
1. When a task must wait on a N.O., it is added to the task wait queue.
2. When a fiber must wait on a N.O., it is added to the fiber wait queue.
3. When someone posts to the N.O. ...
A. If there is a waiting fiber, let it claim it immediately.
B. If there are no waiting fibers, mark it is availble and wake all
pending tasks.
4. An awakened task must check for timeout before trying to claim the N.O.
If it can not claim it, then it adds itself to the task wait queue.

Explanation of Rules:
---------------------
Preference is to be given to fibers just as in a nanokernel system.

By waking all the tasks waiting on that N.O. ...
1. The waiting tasks are automatically sorted by the scheduler by priority.
2. The meaning of a task's timeout parameter remains unchanged from the
nanokernel.
3. Timeouts are handled neatly with minimal new code.

Additional Changes
------------------
In a microkernel system, the nanokernel would also require the ability to
notify the kernel of tasks that are to be set to a wait state as well as
tasks that are to be cleared of their wait state. As the nanokernel deals
with thread control structures (TCS) and the microkernel deals with k_task
structures. A mechanism is required to convert a nanokernel TCS pointer to
a k_task pointer. This can be done easily by ...
1. Adding a new field to the TCS such as ...
e.g.
void *uk_task_ptr; /* ptr to the microkernel k_task */
2. Modifying _new_thread() to accept the microkernel k_task pointer as a
parameter. If it is a fiber that it is being created, use NULL.

To mark a task as waiting on a nanokernel object, a new TF_xxx bit will
be defined.
#define TF_NANO 0x00000400

Known Limitations and Problems
------------------------------
#1.
This will not be done for the nanokernel stacks (nano_stack.c).

#2.
Since many tasks may be actively waiting on a N.O., it may take an
indeterminate amount of time to find the next task to run.
This is non-real time behavior!

For example:
Imagine 100 tasks of high priority that are actively waiting on a
nanokernel FIFO. That is, there are 100 tasks that are ready to execute
but they are waiting for data from that FIFO. Each one of those 100
tasks would have to put themselves back onto the FIFO's task wait queue
before a low priority task would get to run. That low priority task
might be the idle task, but it could just as easily be something else.


How to setup git-review

Andre Guedes <andre.guedes@...>
 

Hi all,

I was having some trouble to submit a change to Gerrit which depends on
someone else's change. If I use the 'git push' command as described in [1],
I get the following error:

remote: ERROR: In commit <commit id>
remote: ERROR: author email address <someone(a)mail.com>
remote: ERROR: does not match your user account.
remote: ERROR:
remote: ERROR: The following addresses are currently registered:
remote: ERROR: <your email>
remote: ERROR:
remote: ERROR: To register an email address, please visit:
remote: ERROR: https://gerrit.zephyrproject.org/r/#/settings/contact

Talking to Andrew Grimberg from Linux Foundation, he recommended using
git-review and it worked around the issue.

Here are the instructions I followed to setup git-review:

$ cat > .gitreview
[gerrit]
host=gerrit.zephyrproject.org
port=29418
project=zephyr.git

$ git-review -s

To submit your patch(es) to review:
$ git-review -T

I hope this can save you time in case you run into a similar problem ;)

BTW, Andrew submitted a patch adding the .gitreview file so we don't have
to keep it locally.

Regards,

Andre

[1] https://www.zephyrproject.org/doc/collaboration/code/gerrit.html#gerrit
[2] https://gerrit.zephyrproject.org/r/#/c/390/


Re: RFC: return type of functions passed to DEVICE_INIT()

Thomas, Ramesh
 

On Thu, 2016-02-18 at 12:34 -0800, Dirk Brandewie wrote:

On 02/18/2016 10:03 AM, Benjamin Walsh wrote:

My take on it is that for Zephyr a failed device initialization
should be considered a fatal event. My expectation is that the
Zephyr user will only be enabling relevant (and important) devices to
their project. If one of these devices should fail, then that is a
serious system error and _NanoFatalErrorHandler() should be invoked.
How about a CONFIG option to determine if kernel should abort boot? The
behavior should be determined by the kind of application. Some can live
with less features while for others it is all or nothing. In both cases
a logging feature that can be queried by the app would be useful.

Any of those could be a valid approach I think, but we have to decide on
one. And right now, we have the worst since we return those error codes
which are meant for runtime handling, but they just go into the void.
+1 _sys_device_do_config_level() should at least pass on the error
returned even if one device fails. This as well as the
"resume_all/suspend_all" functions can be called by app in power
management case. It would be useful for the app to know that something
went wrong.

Question still remains whether _sys_device_do_config_level() should
continue with other devices if one fails. IMO it should continue but
should return error if one device had failed. At boot time, kernel can
save the error condition somewhere to be queried by the app.

Brainstorming:

If we want to let the application handle the initialization issues, we
probably need some kind of queue that gets filled by the init system
when init functions return errors, and that the application drains to
see what failed. We might want to queue the associated device objects,
and have an errno field in there, or something like that.
The app would be anyway getting a failure when it tries to use the
device. It most probably cannot rectify any device error on the fly with
more detailed info. Just my thought

A simple status from _sys_device_do_config_level() is probably enough so
the app can decide whether to continue or abort.


Re: RFC: return type of functions passed to DEVICE_INIT()

Benjamin Walsh <benjamin.walsh@...>
 

My take on it is that for Zephyr a failed device initialization
should be considered a fatal event. My expectation is that the
Zephyr user will only be enabling relevant (and important) devices to
their project. If one of these devices should fail, then that is a
serious system error and _NanoFatalErrorHandler() should be invoked.

If this train of thought holds up to scrutiny, and if the aim is to
save a few bytes then I would think that it would be better to have
the device initialization routines return a failure code and have
_sys_device_do_config_level() check for it and invoke the fatal error
handler upon the detection of failure. Otherwise we duplicate the
overhead of calling the fatal error handler in each device
initialization routine.
Sorry for the slow response. I agree with Peter here I think we should
be checking the return value and doing something useful with the
result. Maybe not _NanoFatalErrorHandler() but something notifying
the application that something bad happened. A given device not
initializing may not be fatal to the the whole application, just one
feature is currently unavailable.
For the kind of systems we are targeting, do we really expect the
application to handle devices not initializing correctly, being designed
so that parts are disabled if some parts of the initialization fail
(devices or others), or do we expect applications to require everything
to be present for them to function correctly ? I would have thought the
latter, but I can be convinced.
Delving into the realm of the hypothetical :-)

What about devices that have drivers in the system but are not present
(pluggable) or can't initialize because some resource external to the
device can't be contacted (network server).

The application may be able to still do useful work albeit with reduced
functionality.

Then, if the latter, do we expect the application catching the errors at
runtime when deployed or during development (basically catching software
errors mostly) not malfunctionning hardware. Here, I was thinking the
latter as well, which is why I was proposing __ASSERT() calls catching
initialization errors in debug loads only. And this fits with one of the
core values of the OS, which is small footprint.
Both models are useful for different reasons :-D

Any of those could be a valid approach I think, but we have to decide on
one. And right now, we have the worst since we return those error codes
which are meant for runtime handling, but they just go into the void.
Agreed we need to pick and stay with it for some amount of time until
we see a few real uses/applications/platforms.
OK, following your proposal below, what we could put in place is
standardizing on error codes that init routines must return if they want
the kernel init system to automatically trigger a fatal error.

Then, we could also allow configuring out the error handling if someone
needs to squeeze that last amount of space. One more Kconfig option! :)
The error handling would be enabled by default of course.

How we could/should report this type of error is an open question :-).
Brainstorming:

If we want to let the application handle the initialization issues, we
probably need some kind of queue that gets filled by the init system
when init functions return errors, and that the application drains to
see what failed. We might want to queue the associated device objects,
and have an errno field in there, or something like that.
How about having the driver return an error code saying whether the
failure is a fatal error or not. For the drivers that we have now where
we *know* that if it fails it is a hardware or configuration error
which is fatal. So we go with the _NanoFatalErrorHandler() error path.
That sounds good.

If a non-fatal error occurred (may work at next reset) just ignore it
an move on. The application can detect if the device is dead/not
present by the return codes from the driver call(s). Then the
application can decide what and how to report the error to the user.
That's another way of doing it. It's a bit less explicit than a list
of errors, but less overhead, and reuses what's already available.


Re: RFC: return type of functions passed to DEVICE_INIT()

Daniel Leung <daniel.leung@...>
 

On Thu, Feb 18, 2016 at 05:54:56PM -0500, Benjamin Walsh wrote:
My take on it is that for Zephyr a failed device initialization
should be considered a fatal event. My expectation is that the
Zephyr user will only be enabling relevant (and important) devices to
their project. If one of these devices should fail, then that is a
serious system error and _NanoFatalErrorHandler() should be invoked.

If this train of thought holds up to scrutiny, and if the aim is to
save a few bytes then I would think that it would be better to have
the device initialization routines return a failure code and have
_sys_device_do_config_level() check for it and invoke the fatal error
handler upon the detection of failure. Otherwise we duplicate the
overhead of calling the fatal error handler in each device
initialization routine.
Sorry for the slow response. I agree with Peter here I think we should
be checking the return value and doing something useful with the
result. Maybe not _NanoFatalErrorHandler() but something notifying
the application that something bad happened. A given device not
initializing may not be fatal to the the whole application, just one
feature is currently unavailable.
For the kind of systems we are targeting, do we really expect the
application to handle devices not initializing correctly, being designed
so that parts are disabled if some parts of the initialization fail
(devices or others), or do we expect applications to require everything
to be present for them to function correctly ? I would have thought the
latter, but I can be convinced.
Delving into the realm of the hypothetical :-)

What about devices that have drivers in the system but are not present
(pluggable) or can't initialize because some resource external to the
device can't be contacted (network server).

The application may be able to still do useful work albeit with reduced
functionality.

Then, if the latter, do we expect the application catching the errors at
runtime when deployed or during development (basically catching software
errors mostly) not malfunctionning hardware. Here, I was thinking the
latter as well, which is why I was proposing __ASSERT() calls catching
initialization errors in debug loads only. And this fits with one of the
core values of the OS, which is small footprint.
Both models are useful for different reasons :-D

Any of those could be a valid approach I think, but we have to decide on
one. And right now, we have the worst since we return those error codes
which are meant for runtime handling, but they just go into the void.
Agreed we need to pick and stay with it for some amount of time until
we see a few real uses/applications/platforms.
OK, following your proposal below, what we could put in place is
standardizing on error codes that init routines must return if they want
the kernel init system to automatically trigger a fatal error.

Then, we could also allow configuring out the error handling if someone
needs to squeeze that last amount of space. One more Kconfig option! :)
The error handling would be enabled by default of course.
For those non-fatal errors, what should we do for runtime driver behaviors?
Should the drivers themselves fail API calls? Or should we let
device_get_binding() return NULL?

How we could/should report this type of error is an open question :-).
Brainstorming:

If we want to let the application handle the initialization issues, we
probably need some kind of queue that gets filled by the init system
when init functions return errors, and that the application drains to
see what failed. We might want to queue the associated device objects,
and have an errno field in there, or something like that.
How about having the driver return an error code saying whether the
failure is a fatal error or not. For the drivers that we have now where
we *know* that if it fails it is a hardware or configuration error
which is fatal. So we go with the _NanoFatalErrorHandler() error path.
That sounds good.

If a non-fatal error occurred (may work at next reset) just ignore it
an move on. The application can detect if the device is dead/not
present by the return codes from the driver call(s). Then the
application can decide what and how to report the error to the user.
That's another way of doing it. It's a bit less explicit than a list
of errors, but less overhead, and reuses what's already available.
-----
Daniel Leung


RFC: Kernel's Object Tracing API

Cruz Alcaraz, Juan M <juan.m.cruz.alcaraz@...>
 

Zephyr kernel provides debug hooks to allow debugging tools to access nanokernel and microkernel objects (FIFO, LIFO, semaphores, etc.) and display their attributes for content inspection.

Background
=========

The current implementation of zephyr declares, inside each nanokernel structure, a "next" field that allows to access the objects as a simple linked list:
Example:
struct nano_fifo {
union {
struct _nano_queue wait_q;
struct _nano_queue data_q;
};
int stat;
#ifdef CONFIG_DEBUG_TRACING_KERNEL_OBJECTS
struct nano_fifo *next;
#endif
};

Each kernel object has a global pointer to the first object of the linked list. Each pointer can be used by a debug tool as a handler to allow it to iterate over each data structure.
E.g.
struct nano_sem *_track_list_nano_sem;
struct nano_fifo *_track_list_nano_fifo;
struct nano_lifo *_track_list_nano_lifo;
struct nano_timer *_track_list_nano_timer;

The current implementation has the following issues:

- The handlers are buried inside the kernel implementation

- There is no documentation on how they can be used to implement object inspection.

- The implementation does not have a clear pattern that a developer can follow to extend the functionality (e.g. add new kernel objects to the tracing hooks)

Proposal
=======
The following is a proposal of a public API that will give the following improvements to the current implementation without modifying the gist of the original implementation.

- The API is published as a header file in the public API directory (/include)

- The API will propose a clear pattern to follow to extend the scope of the tracing functionality in the future.

- The API is easily documented through Doxygen and it can be added as a section in the Zephyr Kernel Primer.

The API will be defined in the header file: include/misc/object_traicing.h
The API will implement the following macros:

DECLARE_TRACE_LIST(name)
Declares a list of traceable objects with a particular name.
This macro declares the global pointer to the first element of the list.
The elements are of an existing type _debg_obj_<name> and the pointer will have the unique name _trace_list_<name>

DEBUG_TRACING_OBJ_INIT(name, obj)
Adds the object "obj" to the list "name". This macro will be used by the kernel when an specific object is created. After creation, the object is added in the linked list.

DEBUG_TRACING_HEAD(name)
Gives the access to the first element of the trace list "name"

DEBUG_TRACING_NEXT(name, obj)
Gives access to the contiguous element of the element "obj" in the trace list "name"

Typedef data type aliases are needed to homologate the kernel object's types and allow a macro to relate them whit the list pointers.
These typedefs would only be used on debugger mode and when the CONFIG_DEBUG_TRACING_KERNEL_OBJECTS is active.

The following code is an example of how the implementation could follow:

typedef struct nano_sem _dbg_obj_nano_sem;
typedef struct nano_fifo _dbg_obj_nano_fifo;
typedef struct nano_lifo _dbg_obj_nano_lifo;
typedef struct nano_timer _dbg_obj_nano_timer;
typedef struct pool_struct _ dbg_obj_micro_mem_pool;

#define DECLARE_TRACE_LIST(name) type *_trace_list_##name

#define DEBUG_TRACING_HEAD(name) ((_dbg_obj_##name *)_trace_list##name)

#define DEBUG_TRACING_NEXT(name, obj) (((_dbg_obj_##name *)obj)->next)

#define DEBUG_TRACING_OBJ_INIT(name, obj) { \
_dbg_obj_##name *temp = _trace_list_##name; \
_trace_list_##name = obj; \
obj->next = temp

All comments are welcome :)


Re: compile error for ARM build

Ravi Sahita <rsahita@...>
 

that worked - thanks Ben!

Ravi


Re: 6LoWPAN Stack

Jukka Rissanen
 

Hi Joakim,

On Thu, 2016-02-18 at 16:48 +0000, Joakim Eriksson wrote:
Hello!

I just cloned the Zephyr repository and saw that the 6LoWPAN stack
was the same as the one I am used to - e,g, the Contiki 6LoWPAN
stack. How are you planning to develop it further? Are you going to
sync it from Contiki or would you like Contiki developers to post
pull-requests to fixes also to Zephyr?
The plan is to sync it from Contiki when applicable. If you look the
code, we needed to make lot of changes to uIP stack in order to make it
work in Zephyr. The biggest change is that the IP stack is now re-
entrant e.g., there is no global buffer for network packets. Because of
this it is not so simple to just pull changes from Contiki as manual
work needs to be done anyway.


Anyway - congratulations - it looks like an exciting project and I
will be happy contribute to it!

-- Joakim Eriksson, SICS
Thanks, and patches are welcome!


Cheers,
Jukka


Re: RFC: make _fiber_start() return a handle on the fiber

Jukka Rissanen
 

Hi,

On Tue, 2016-02-16 at 14:34 -0500, Benjamin Walsh wrote:
Folks,

When we start a fiber via the _fiber_start() API family, we don't get
back a handle on the created fiber. The fiber identifier is actually
the
start of the fiber's stack. This hasn't been a problem until now
since
no API requires a handle on the fiber, except one,
fiber_delayed_start_cancel(): that API is part of a pair, where the
other API, fiber_delayed_start() starts the fiber and returns a
handle.

However, Jukka asked me an API could be created that cancels a
fiber_sleep() call, something like fiber_wakeup(). The implementation
of
such an API is very simple, but it requires a handle on the fiber we
want to wake up. This in turn requires the signature of the
_fiber_start() family to return a handle to the fiber that gets
started.

The signature of _fiber_start() et al. would then change from a void
return type to a void * return type.

Objections, comments, etc ?
No comments -> no objects -> perhaps we can continue with this route
then?


Cheers,
Jukka


Re: 6LoWPAN Stack

Joakim Eriksson
 

Hi Jukka!

On 19 Feb 2016, at 08:28, Jukka Rissanen <jukka.rissanen(a)linux.intel.com> wrote:

Hi Joakim,

On Thu, 2016-02-18 at 16:48 +0000, Joakim Eriksson wrote:
Hello!

I just cloned the Zephyr repository and saw that the 6LoWPAN stack
was the same as the one I am used to - e,g, the Contiki 6LoWPAN
stack. How are you planning to develop it further? Are you going to
sync it from Contiki or would you like Contiki developers to post
pull-requests to fixes also to Zephyr?
The plan is to sync it from Contiki when applicable. If you look the
code, we needed to make lot of changes to uIP stack in order to make it
work in Zephyr. The biggest change is that the IP stack is now re-
entrant e.g., there is no global buffer for network packets. Because of
this it is not so simple to just pull changes from Contiki as manual
work needs to be done anyway.
Ok, yes - I expected that there would be quite som changes. But I guess that they
will be the same changes at each sync! I will take a look at the codebase and
see how much differences you have.

I guess if Zephyr is ported to more platforms (some of the ones we have in Contiki)
maybe that would make developers do pull requests to both OS:es.

Quick question - what size of the compiled code did you get on the IP stack when
you moved from one buffer to re-entrant / multi buffers?


Anyway - congratulations - it looks like an exciting project and I
will be happy contribute to it!

-- Joakim Eriksson, SICS
Thanks, and patches are welcome!
I will take a look and see if some of our work would be good to get in!
(we have LWM2M and other things fairly recently put into Contiki).

Best regards,
— Joakim Eriksson, SICS


Cheers,
Jukka