Date   

Re: Problems managing NBUF DATA pool in the networking stack

Jukka Rissanen
 

Hi Piotr,

On Tue, 2017-02-07 at 17:56 +0100, Piotr Mienkowski wrote:
Hi,
There seems to be a conceptual issue in a way networking buffers are
currently set up. I was thinking about entering Jira bug report but
maybe it's just me missing some information or otherwise
misunderstanding how the networking stack is supposed to be used.
I'll shortly describe the problem here based on Zephyr echo_server
sample application.
Currently if the echo_server application receives a large amount of
data, e.g. when a large file is sent via ncat the application will
lock up and stop responding. The only way out is to reset the device.
This problem is very easily observed with eth_sam_gmac Ethernet
driver and should be just as easy to spot with eth_mcux. Due to a
different driver architecture it may be more difficult to observe
with eth_enc28j60.
The problem is as follows. Via Kconfig we define RX, TX and data
buffers pool. Let's say like this:
CONFIG_NET_NBUF_RX_COUNT=14
CONFIG_NET_NBUF_TX_COUNT=14
CONFIG_NET_NBUF_DATA_COUNT=72
The number of RX and TX buffers corresponds to the number of RX/TX
frames which may be simultaneously received/send. The data buffers
count tells us how much storage we reserve for the actual data. This
pool is shared between RX and TX path. If we receive a large amount
of data the RX path will consume all available data buffers leaving
none for the TX path. If an application then tries to reserve data
buffers for the TX path, e.g. echo_server does it in
build_reply_buf() function, it will get stuck waiting forever for a
free data buffer. echo_server application gets stuck on the following
line
frag = net_nbuf_get_data(context);
The simplified sequence of events in the echo_server application is
as follows: receive RX frame -> reserve data buffers for TX frame ->
copy data from RX frame to TX frame -> free resources associated with
RX frame -> send TX frame.
One way to avoid it is to define number of data buffers large enough
so the RX path cannot exhaust available data pool. Taking into
account that data buffer size is 128 bytes, this is defined by the
following Kconfig parameter,
CONFIG_NET_NBUF_DATA_SIZE=128
and maximum frame size is 1518 or 1536 bytes one RX frame can use up
to 12 data buffers. In our example we would need to reserve more than
12*14 data buffers to ensure correct behavior. In case of
eth_sam_gmac Ethernet driver even more. 
After recent updates to the networking stack the functions reserving
RX/TX/DATA buffers have a timeout parameter. That would prevent lock
up but it still does not really solve the issue.
Is there a better way to manage this?
One option would be to split the DATA pool to two so one pool for
sending and receiving. Then again this does not solve much as you might
still get to a situation where all the buffers are exhausted.

The timeout to buffer API helps a bit but still we might run out of
buffers.

One should allocate as many buffers to DATA pool as possible but this
really depends on hw and applications of course.


Thanks and regards,
Piotr
Cheers,
Jukka


Re: Inconsistent and ever-changing device naming in Zephyr

Maureen Helm
 

Hi Paul,

-----Original Message-----
From: Paul Sokolovsky [mailto:paul.sokolovsky@linaro.org]
Sent: Monday, February 06, 2017 7:34 AM
To: devel@lists.zephyrproject.org; Anas Nashif <anas.nashif@intel.com>;
Maureen Helm <maureen.helm@nxp.com>; sakari.poussa@intel.com;
geoff@linux.intel.com; Daniel Thompson <daniel.thompson@linaro.org>
Subject: Inconsistent and ever-changing device naming in Zephyr

Hello,

Writing on this was in my TODO at least since November, but I expected this to
be a controversial topic, so kind of skipped it. Controversial, as in: different
parties finding to be "obvious" quite different, if not opposite solutions to the
problem. Well, the breakage continues, so this should be raised.

(Recent) facts:

1. Update to Kinetis (BOARD=frdm_k64f) port broke GPIO support in a
(complex) Zephyr-based application, Zephyr.js:
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
b.com%2F01org%2Fzephyr.js%2Fissues%2F665&data=01%7C01%7Cmaureen.h
elm%40nxp.com%7C8e842c85d2ec427bb45308d44e94e0a3%7C686ea1d3bc2b4c
6fa92cd99c5c301635%7C0&sdata=%2BNQJiBZZiZO0DCFSerir9SPSDmLKfvWSzhT
wUJ3dgpg%3D&reserved=0

2. A day or two, Arduino 101 I2C support in Zephyr.js was updated, apparently
because Zephyr upgrade broke it. Today it became known that the fix for
Arduino 101 broke FRDM-K64F.
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
b.com%2F01org%2Fzephyr.js%2Fissues%2F676&data=01%7C01%7Cmaureen.h
elm%40nxp.com%7C8e842c85d2ec427bb45308d44e94e0a3%7C686ea1d3bc2b4c
6fa92cd99c5c301635%7C0&sdata=u3JpPzybvD%2FcVNSzWzrw%2BsgGBbZ1vGK
AEwoLvmzyaaw%3D&reserved=0

Investigating the issues, the causes of them are:

1. frdm_k64f GPIO port naming changed from "GPIO_0" to (!!) "gpio_porta".
This is my fault. I was not aware the port names were being used this way. I tend to use the config names rather than the strings themselves. For example, in boards/arm/frdm_k64f/Kconfig.defconfig:

config FXOS8700_GPIO_NAME
default GPIO_MCUX_PORTC_NAME

Please, please let me know if you encounter problems like this again.


2. Name of (one of) I2C device for arduino_101 changed from "I2C_0" to
"I2C_SS_0".


This is not the first time when device names change in Zephyr (see reference
above to November 2016), it's the first time (at least for last half-year), when
the naming changes in relatively well developed ports, which leads to cascade
of breakage in the one. And one case, it can be seen how inconsistent naming
(and changes of it) can be.

The question what to do about this. Possible high-level approaches can
be:

a) Establish (well, reconfirm/clarify) and uphold consistent naming conventions
across ports, architectures, and boards.
Since you're most affected, mind making a proposal?


b) Send a strong message to application developers that they should not rely
on any patterns of Zephyr device naming, and even on the naming at all, and
instead should treat it as a configuration parameter, which ultimately should be
set by a user (and thus apps should make such configuration explicit and easy
for users).

Whether a) is a viable solution in turn depends on paradigmatic decision what
aim Zephyr is trying to achieve:

I. Is it a common platform, literally, an Operating System, providing consistent
means to develop software *across* various hardware?

II. Or is it loose collection of APIs of working with hardware, with the aim of
getting most (or something) of a particular hardware, without pledging much
for consistency and portability across hardware.


It would be nice to get opinions of both the core maintainers and the
maintainers of particular ports, as well as specific suggestions how to deal with
the bugs above.


Thanks,
Paul

Linaro.org | Open source software for ARM SoCs Follow Linaro:
https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.
facebook.com%2Fpages%2FLinaro&data=01%7C01%7Cmaureen.helm%40nxp.
com%7C8e842c85d2ec427bb45308d44e94e0a3%7C686ea1d3bc2b4c6fa92cd99c5
c301635%7C0&sdata=0U09i2dq9yMB0XQPrUBcduLolSF2e2zEo%2BqHtBALOg4%
3D&reserved=0
https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftwitte
r.com%2F%23!%2Flinaroorg&data=01%7C01%7Cmaureen.helm%40nxp.com%7
C8e842c85d2ec427bb45308d44e94e0a3%7C686ea1d3bc2b4c6fa92cd99c5c30163
5%7C0&sdata=n3rYqztnrOOxdvjXOY4zWhaADesQAle3S%2B%2FU0bFWzOk%3
D&reserved=0 -
https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.l
inaro.org%2Flinaro-
blog&data=01%7C01%7Cmaureen.helm%40nxp.com%7C8e842c85d2ec427bb45
308d44e94e0a3%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0&sdata=YQ3UqYJ
oivu24eyW1pNviWSouVTwpQMvmo2LjcxvfwE%3D&reserved=0


Problems managing NBUF DATA pool in the networking stack

Piotr Mienkowski
 

Hi,

There seems to be a conceptual issue in a way networking buffers are currently set up. I was thinking about entering Jira bug report but maybe it's just me missing some information or otherwise misunderstanding how the networking stack is supposed to be used. I'll shortly describe the problem here based on Zephyr echo_server sample application.

Currently if the echo_server application receives a large amount of data, e.g. when a large file is sent via ncat the application will lock up and stop responding. The only way out is to reset the device. This problem is very easily observed with eth_sam_gmac Ethernet driver and should be just as easy to spot with eth_mcux. Due to a different driver architecture it may be more difficult to observe with eth_enc28j60.

The problem is as follows. Via Kconfig we define RX, TX and data buffers pool. Let's say like this:

CONFIG_NET_NBUF_RX_COUNT=14
CONFIG_NET_NBUF_TX_COUNT=14
CONFIG_NET_NBUF_DATA_COUNT=72

The number of RX and TX buffers corresponds to the number of RX/TX frames which may be simultaneously received/send. The data buffers count tells us how much storage we reserve for the actual data. This pool is shared between RX and TX path. If we receive a large amount of data the RX path will consume all available data buffers leaving none for the TX path. If an application then tries to reserve data buffers for the TX path, e.g. echo_server does it in build_reply_buf() function, it will get stuck waiting forever for a free data buffer. echo_server application gets stuck on the following line

frag = net_nbuf_get_data(context);

The simplified sequence of events in the echo_server application is as follows: receive RX frame -> reserve data buffers for TX frame -> copy data from RX frame to TX frame -> free resources associated with RX frame -> send TX frame.

One way to avoid it is to define number of data buffers large enough so the RX path cannot exhaust available data pool. Taking into account that data buffer size is 128 bytes, this is defined by the following Kconfig parameter,

CONFIG_NET_NBUF_DATA_SIZE=128

and maximum frame size is 1518 or 1536 bytes one RX frame can use up to 12 data buffers. In our example we would need to reserve more than 12*14 data buffers to ensure correct behavior. In case of eth_sam_gmac Ethernet driver even more.

After recent updates to the networking stack the functions reserving RX/TX/DATA buffers have a timeout parameter. That would prevent lock up but it still does not really solve the issue.

Is there a better way to manage this?

Thanks and regards,
Piotr


Re: CONFIG_SEMAPHORE_GROUPS

Benjamin Walsh <benjamin.walsh@...>
 

On Tue, Feb 07, 2017 at 11:42:22AM +0000, Shtivelman, Bella wrote:
Hi,
I have a question regarding the impact of CONFIG_SEMAPHORE_GROUPS flag.

In my sample application I experienced weird failures in k_sem_give
being called from udp receive callback. After debugging this issue I
found out that the failure occurred in sem_give_common function, and
more precisely, when entering this if condition:

If(handle_sem_group(sem, thread))

where handle_sem_group is defined as follows:

#ifdef CONFIG_SEMAPHORE_GROUPS
static int handle_sem_group(struct k_sem *sem, struct k_thread *thread)
{

}
#else
#define handle_sem_group(sem, thread) 0
#endif

Disassembly shows the following (faulting instruction address is 0x0040f38c:)

0040f36c: ands.w r5, r3, #1
0040f370: beq.n 0x40f42a <sem_give_common+226>
177 list = (sys_dlist_t *)dummy->desc.thread->base.swap_data;
0040f372: ldr r3, [r4, #48] ; 0x30
0040f374: ldr.w r8, [r3, #12]
132 return list->head == list;
0040f378: ldr.w r4, [r8]
145 return sys_dlist_is_empty(list) ? NULL : list->head;
0040f37c: cmp r8, r4
0040f37e: it eq
0040f380: moveq r4, #0
176 return (!node || node == list->tail) ? NULL : node->next;
0040f382: cbz r4, 0x40f390 <sem_give_common+72>
0040f384: ldr.w r3, [r8, #4]
0040f388: cmp r3, r4
0040f38a: beq.n 0x40f394 <sem_give_common+76>
0040f38c: ldr r5, [r4, #0]
0040f38e: b.n 0x40f396 <sem_give_common+78>
0040f390: mov r5, r4
0040f392: b.n 0x40f396 <sem_give_common+78>
0040f394: movs r5, #0
0040f396: ldr r3, [r4, #12]
0040f398: cmp r6, r3
0040f39a: beq.n 0x40f3c8 <sem_give_common+128>
0040f39c: ldr.w r3, [r4, #-8]
0040f3a0: adds r3, #2
0040f3a2: beq.n 0x40f382 <sem_give_common+58>
0040f3a4: sub.w r9, r4, #40 ; 0x28
0040f3a8: sub.w r0, r4, #24
0040f3ac: bl 0x40f2a0 <_abort_timeout>
0040f3b0: mov r0, r9
0040f3b2: bl 0x40f290 <sys_dlist_remove>

The default value of CONFIG_SEMAPHORE_GROUPS is “y”, and after setting
it to “n” in my prj.conf I stopped facing this issue.
OK, so it seems that, since you can disable the CONFIG_SEMAPHORE_GROUPS
feature, you are _not_ using said feature. In theory, this if statement
should always be hit in handle_sem_group():

if (!(thread->base.thread_state & _THREAD_DUMMY)) {
/*
* The awakened thread is a real thread and thus was not
* involved in a semaphore group operation.
*/
return 0;
}

and handle_sem_group() should never do anything else, unless the
thread_state of some thread got corrupted.

Can you verify that the thread->base.thread.state _THREAD_DUMMY bit is
actually set when you get that crash ?

Looking at it, I think I found a bug in handle_sem_group():

if (desc->sem != sem) {
sem_thread = CONTAINER_OF(desc, struct _sem_thread,
desc);
struct k_thread *dummy_thread =
(struct k_thread *)&sem_thread->dummy;

if (_is_thread_timeout_expired(dummy_thread)) {
+ sys_dlist_remove(node);
+ node = next;
continue;
}
_abort_thread_timeout(dummy_thread);
_unpend_thread(dummy_thread);

sys_dlist_remove(node);
}

If you _are_ using sem groups, I would suggest using the new k_poll()
API instead: semaphore groups are part of the legacy API, and the
implemenation is pretty bad w.r.t. interrupt locking duration.

Regards,
Ben


But I’m not sure it is a correct handling of the issue.

Are you familiar with this issue?
Am I missing anything here?


Daily Gerrit Digest

donotreply@...
 

NEW within last 24 hours:
- https://gerrit.zephyrproject.org/r/10947 : samples/mbedtls_dtls_server: Use k_uptime_get_32()
- https://gerrit.zephyrproject.org/r/10946 : samples/mbedtls_dtls_client: Use k_uptime_get_32()
- https://gerrit.zephyrproject.org/r/10949 : checkpatch: Remove reference to legacy IP stack
- https://gerrit.zephyrproject.org/r/10918 : Xtensa port: Used plain C implementation for STACK_CANARY_INIT.
- https://gerrit.zephyrproject.org/r/10948 : samples/mbedtls_dtls_client: Fix wild write in entropy_source
- https://gerrit.zephyrproject.org/r/10945 : Xtensa port: Fixed defintion of MAX_HEAP_SIZE, thus, compilation of new_lib.
- https://gerrit.zephyrproject.org/r/10901 : xt-sim: add support for 'make debug'
- https://gerrit.zephyrproject.org/r/10910 : gpio: Error unsupported access_op consistently.
- https://gerrit.zephyrproject.org/r/10927 : tests/slist: Exercise CONTAINER macros
- https://gerrit.zephyrproject.org/r/10926 : slist: Introduce CONTAINER macros
- https://gerrit.zephyrproject.org/r/10896 : xtensa: cleanup fatal error handling
- https://gerrit.zephyrproject.org/r/10920 : kw41z: add base DTS support
- https://gerrit.zephyrproject.org/r/10919 : Xtensa port: Fixed memory corruption in interrupt handler exit function.
- https://gerrit.zephyrproject.org/r/10902 : eth/mcux: Add basic PHY support.
- https://gerrit.zephyrproject.org/r/10894 : doc/net: Add L2 and device driver document
- https://gerrit.zephyrproject.org/r/10886 : errno.h: Changed errno.h to make it safe to be included in assembly files.
- https://gerrit.zephyrproject.org/r/10913 : Add packer verify and merge jobs
- https://gerrit.zephyrproject.org/r/10912 : tests: kernel: import obj_tracing test to unified kernel
- https://gerrit.zephyrproject.org/r/10911 : kernel: Remove redundant TICKLESS_IDLE_SUPPORTED option
- https://gerrit.zephyrproject.org/r/10903 : sw_isr_table.h: clean up definition
- https://gerrit.zephyrproject.org/r/10909 : gpio: Error pin out of range consistently.
- https://gerrit.zephyrproject.org/r/10908 : gpio: Error GPIO_INT with GPIO_DIR_OUT consistently.
- https://gerrit.zephyrproject.org/r/10905 : toolchain: gcc.h: add indirection to _GENERIC_SECTION() macro
- https://gerrit.zephyrproject.org/r/10906 : section_tags.h: cleanup
- https://gerrit.zephyrproject.org/r/10904 : tests: remove old ARM vector table test
- https://gerrit.zephyrproject.org/r/10889 : make_zephyr_sdk.sh: Bump version to 0.9.1
- https://gerrit.zephyrproject.org/r/10890 : xt-run: don't leave dead emulator processes lying around
- https://gerrit.zephyrproject.org/r/10888 : doc: add permalinks to document headings
- https://gerrit.zephyrproject.org/r/10885 : sysgen: Fixed macro evaluation issue in kernel_main.c with compiling with xcc.
- https://gerrit.zephyrproject.org/r/10883 : boards: add panther board

UPDATED within last 24 hours:
- https://gerrit.zephyrproject.org/r/10859 : v2m_beetle: Add base DTS support
- https://gerrit.zephyrproject.org/r/10848 : tinycrypt: Fixed compilation error on Xtensa caused by bad definition of bool.
- https://gerrit.zephyrproject.org/r/10690 : tests: Introduced new config option to add extra stack size for tests.
- https://gerrit.zephyrproject.org/r/10881 : sensor/mma865x: Add driver for MMA865x 3 Axis Accelerometer Family
- https://gerrit.zephyrproject.org/r/10879 : sensor/mag3110: Add mag3110 three axis magnetometer driver.
- https://gerrit.zephyrproject.org/r/10720 : flash/stm32: driver for STM32F4x series
- https://gerrit.zephyrproject.org/r/10755 : flash/nrf5: fix invalid write access
- https://gerrit.zephyrproject.org/r/6720 : Bluetooth: A2DP: Stream End Point Registration
- https://gerrit.zephyrproject.org/r/10882 : bbc_microbit: Enable MMA8653
- https://gerrit.zephyrproject.org/r/10880 : bbc_microbit: Enable MAG3110
- https://gerrit.zephyrproject.org/r/7492 : Bluetooth: A2DP: Added Preset Structure
- https://gerrit.zephyrproject.org/r/6586 : misc: Let the compiler choose whether to omit frame pointer
- https://gerrit.zephyrproject.org/r/10853 : drivers: Add Atmel SAM RTC driver
- https://gerrit.zephyrproject.org/r/10860 : v2m_beetle: uart: Add DTS support to UART driver
- https://gerrit.zephyrproject.org/r/9460 : Bluetooth: AVDTP: Add AVDTP Discover Function Definition
- https://gerrit.zephyrproject.org/r/10807 : net/mqtt: Add BT support to MQTT publisher sample
- https://gerrit.zephyrproject.org/r/10628 : RFC: DTS: Kinetis: Add FRDM_K64F support
- https://gerrit.zephyrproject.org/r/9663 : Bluetooth: AVDTP: Add AVDTP Receive Function
- https://gerrit.zephyrproject.org/r/10806 : Bluetooth: AVDTP: Handling Discover response
- https://gerrit.zephyrproject.org/r/10670 : xtensa port: Added arch .ini file to support xt-sim
- https://gerrit.zephyrproject.org/r/10645 : Bluetooth: HFP HF: Handling AG Network error
- https://gerrit.zephyrproject.org/r/10834 : Xtensa port: Ensure tests exit when running in simulator.
- https://gerrit.zephyrproject.org/r/9449 : tests: add systhreads test case
- https://gerrit.zephyrproject.org/r/10858 : stm32: uart: Add DTS support to STM32 UART driver
- https://gerrit.zephyrproject.org/r/9820 : tests: add zephyr adc driver api test case
- https://gerrit.zephyrproject.org/r/10857 : nucleo_l476rg: Enable device tree usage for Nucleo
- https://gerrit.zephyrproject.org/r/10626 : RFC: DTS: Add base DTS and YAML definitions
- https://gerrit.zephyrproject.org/r/10854 : cc3200: dts: Add base DTS support for TI CC3200
- https://gerrit.zephyrproject.org/r/10627 : RFC: DTS: Kinetis: Add base support for Kinetis
- https://gerrit.zephyrproject.org/r/10856 : stm32: Add base DTS support for Nucleo board
- https://gerrit.zephyrproject.org/r/10855 : cc3200: Enable device tree usage for CC3200
- https://gerrit.zephyrproject.org/r/10629 : RFC: DTS: Kinetis: Add support for Hexiwear K64
- https://gerrit.zephyrproject.org/r/7738 : RFC: Add support for Device Tree
- https://gerrit.zephyrproject.org/r/10801 : board: add nucleo_l476rg documentation
- https://gerrit.zephyrproject.org/r/10718 : tests: dma: update dma loop transfer app
- https://gerrit.zephyrproject.org/r/10693 : tests: dma: update dma block transfer app
- https://gerrit.zephyrproject.org/r/10788 : xtensa: apply overlay to newlib
- https://gerrit.zephyrproject.org/r/10787 : xtensa: add recipes-devtools-xtensa
- https://gerrit.zephyrproject.org/r/10134 : drivers: QMSI PWM: simplify driver reentrancy code using IS_ENABLED macro
- https://gerrit.zephyrproject.org/r/9681 : quark_se: Add shared GDT in RAM memory to linker
- https://gerrit.zephyrproject.org/r/9682 : quark_d2000: Add shared GDT memory to linker
- https://gerrit.zephyrproject.org/r/10723 : DO NOT MERGE ext qmsi: update to QMSI 1.4
- https://gerrit.zephyrproject.org/r/9680 : quark_se: Fix restore info address
- https://gerrit.zephyrproject.org/r/10541 : drivers: dma_shim: update dma qmsi shim driver
- https://gerrit.zephyrproject.org/r/10877 : i2c: Name parameters consistently.
- https://gerrit.zephyrproject.org/r/10804 : net/mqtt: Add support for IBM BlueMix Watson topic format
- https://gerrit.zephyrproject.org/r/10646 : hexiwear_k64: Add RST board documentation
- https://gerrit.zephyrproject.org/r/9890 : sys_bitfield*(): use 'void *' instead of memaddr_t
- https://gerrit.zephyrproject.org/r/10599 : arm: cmsis: Convert _ClearFaults to use direct CMSIS register access
- https://gerrit.zephyrproject.org/r/10598 : arm: cmsis: Convert printing of MMFSR, BFSR, and UFSR to CMSIS
- https://gerrit.zephyrproject.org/r/10596 : arm: cmsis: Convert _Scb*FaultIs* & _ScbIs*Fault to use CMSIS register access
- https://gerrit.zephyrproject.org/r/7190 : build: Fix unconditional re-link of libzephyr.a
- https://gerrit.zephyrproject.org/r/10862 : ARC: fix I2C SPI and GPIO default name issue for ARC
- https://gerrit.zephyrproject.org/r/10725 : ext: qmsi: wait for AON periodic timer alarm's being cleared
- https://gerrit.zephyrproject.org/r/10849 : doc: rename doxygen configuration file and build from doc/
- https://gerrit.zephyrproject.org/r/10874 : i2c: Implement consistent i2c no msgs behaviour
- https://gerrit.zephyrproject.org/r/10875 : i2c/stm32lx: Fix layout.
- https://gerrit.zephyrproject.org/r/10876 : i2c/dw: Switch from EPERM to EIO
- https://gerrit.zephyrproject.org/r/10878 : i2c: Elaborate API return values
- https://gerrit.zephyrproject.org/r/10873 : i2c: Remove unused definition.

MERGED within last 24 hours:
- https://gerrit.zephyrproject.org/r/10943 : net: nbuf: Fix net_nbuf_compact() API
- https://gerrit.zephyrproject.org/r/10942 : net: nbuf: Remove unused net_nbuf_push() API
- https://gerrit.zephyrproject.org/r/10925 : net: tests: 15.4: Increase max data size and fix config option
- https://gerrit.zephyrproject.org/r/10937 : Bluetooth: L2CAP: Make l2cap_br_send_conn_rsp return void
- https://gerrit.zephyrproject.org/r/10936 : Bluetooth: SDP: Make bt_sdp_create_pdu static
- https://gerrit.zephyrproject.org/r/10934 : Bluetooth: Remove some dead code
- https://gerrit.zephyrproject.org/r/10928 : net: arp: Fix the ethernet header location
- https://gerrit.zephyrproject.org/r/10924 : net: fragment: Fix the 802.15.4 fragmentation
- https://gerrit.zephyrproject.org/r/10935 : Bluetooth: L2CAP: Remove dead code
- https://gerrit.zephyrproject.org/r/10930 : Bluetooth: Controller: Use LL_ASSERT instead of BT_ASSERT
- https://gerrit.zephyrproject.org/r/10933 : Bluetooth: Use assert when getting net buf with K_FOREVER
- https://gerrit.zephyrproject.org/r/10915 : net: nbuf: Add helper to print fragment chain
- https://gerrit.zephyrproject.org/r/10916 : net: nbuf: Remove dead code in net_nbuf_compact()
- https://gerrit.zephyrproject.org/r/10917 : net: nbuf: Fix double free in net_nbuf_compact()
- https://gerrit.zephyrproject.org/r/10914 : Bluetooth: samples: Add missing README.rst files
- https://gerrit.zephyrproject.org/r/10897 : REVERTME: disable xip test on xtensa
- https://gerrit.zephyrproject.org/r/10899 : Xtensa port: Fixed compilation of cpp_synchronization legacy test.
- https://gerrit.zephyrproject.org/r/10900 : Xtensa port: Implemented STACK_CANARY_INIT for Xtensa cores.
- https://gerrit.zephyrproject.org/r/10895 : Xtensa port: Connect Xtensa timer to tick IRQ in legacy test_context.
- https://gerrit.zephyrproject.org/r/10887 : doc: frdm_k64f: Document Eth PHY known issue
- https://gerrit.zephyrproject.org/r/10863 : net: nbuf: Add timeout to net_buf getters
- https://gerrit.zephyrproject.org/r/10861 : net: todo: Add CAN support entry
- https://gerrit.zephyrproject.org/r/10837 : net: context: keep randomly assigned port for TCP bind() calls
- https://gerrit.zephyrproject.org/r/10865 : drivers/net/ieee802154: Change configuration prefix
- https://gerrit.zephyrproject.org/r/10867 : net: ipv6: Fix pending buf leak when NA is received
- https://gerrit.zephyrproject.org/r/10868 : net: ipv6: Free received NA net_buf
- https://gerrit.zephyrproject.org/r/10869 : drivers/eth/mcux: Free net_buf using net_nbuf_unref
- https://gerrit.zephyrproject.org/r/10842 : Xtensa port: Set Swap function result to -EAGAIN.
- https://gerrit.zephyrproject.org/r/10644 : Bluetooth: HFP HF: Remove unused variable 'buf'
- https://gerrit.zephyrproject.org/r/10871 : Xtensa port: Removed unsupported c++ flags cuasing xt-c++ to throw an error.
- https://gerrit.zephyrproject.org/r/10872 : Bluetooth: GATT: set subscribe value to zero for unsubscription
- https://gerrit.zephyrproject.org/r/10835 : Xtensa port: xt*-run requires options to be passed before file to be ran.
- https://gerrit.zephyrproject.org/r/10547 : Bluetooth: HFP HF: Disconnect rfcomm on SLC error
- https://gerrit.zephyrproject.org/r/10798 : Bluetooth: GATT: introduce volatile subscription flag
- https://gerrit.zephyrproject.org/r/10830 : Xtensa port: Moved coprocessor stack area on bottom of stack, just after TCS.
- https://gerrit.zephyrproject.org/r/10870 : samples: net: Fix invalid memory access for TCP
- https://gerrit.zephyrproject.org/r/10683 : drivers: mcr20a: control access to SPI with semaphore
- https://gerrit.zephyrproject.org/r/10864 : drivers/ieee802154: Split drivers Kconfig


CONFIG_SEMAPHORE_GROUPS

Shtivelman, Bella <bella.shtivelman@...>
 

Hi,
I have a question regarding the impact of CONFIG_SEMAPHORE_GROUPS flag.

In my sample application I experienced weird failures in k_sem_give being called from udp receive callback.
After debugging this issue I found out that the failure occurred in sem_give_common function, and more precisely, when entering this if condition:

If(handle_sem_group(sem, thread))

where handle_sem_group is defined as follows:

#ifdef CONFIG_SEMAPHORE_GROUPS
static int handle_sem_group(struct k_sem *sem, struct k_thread *thread)
{

}
#else
#define handle_sem_group(sem, thread) 0
#endif

Disassembly shows the following (faulting instruction address is 0x0040f38c:)

0040f36c: ands.w r5, r3, #1
0040f370: beq.n 0x40f42a <sem_give_common+226>
177 list = (sys_dlist_t *)dummy->desc.thread->base.swap_data;
0040f372: ldr r3, [r4, #48] ; 0x30
0040f374: ldr.w r8, [r3, #12]
132 return list->head == list;
0040f378: ldr.w r4, [r8]
145 return sys_dlist_is_empty(list) ? NULL : list->head;
0040f37c: cmp r8, r4
0040f37e: it eq
0040f380: moveq r4, #0
176 return (!node || node == list->tail) ? NULL : node->next;
0040f382: cbz r4, 0x40f390 <sem_give_common+72>
0040f384: ldr.w r3, [r8, #4]
0040f388: cmp r3, r4
0040f38a: beq.n 0x40f394 <sem_give_common+76>
0040f38c: ldr r5, [r4, #0]
0040f38e: b.n 0x40f396 <sem_give_common+78>
0040f390: mov r5, r4
0040f392: b.n 0x40f396 <sem_give_common+78>
0040f394: movs r5, #0
0040f396: ldr r3, [r4, #12]
0040f398: cmp r6, r3
0040f39a: beq.n 0x40f3c8 <sem_give_common+128>
0040f39c: ldr.w r3, [r4, #-8]
0040f3a0: adds r3, #2
0040f3a2: beq.n 0x40f382 <sem_give_common+58>
0040f3a4: sub.w r9, r4, #40 ; 0x28
0040f3a8: sub.w r0, r4, #24
0040f3ac: bl 0x40f2a0 <_abort_timeout>
0040f3b0: mov r0, r9
0040f3b2: bl 0x40f290 <sys_dlist_remove>

The default value of CONFIG_SEMAPHORE_GROUPS is “y”, and after setting it to “n” in my prj.conf I stopped facing this issue.

But I’m not sure it is a correct handling of the issue.

Are you familiar with this issue?
Am I missing anything here?

Regards,
Bella.
 



---------------------------------------------------------------------
A member of the Intel Corporation group of companies

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Design Philosophy 'Minimal runtime error checking'...

Chuck Jordan <Chuck.Jordan@...>
 

As Anas mentioned, I still think we need 2 level of "asserts". There might be one level which, as you state, is there only to detect early user errors during development and can be turned off on production. But then I feel we should have the "fatal" or "panic" level of assert macro that covers unrecoverable errors in a way that can *not* be disabled *at all*. The reason for those is to detect memory corruption or any other unrecoverable situation in a way that provides information to the developers if it ever happens in the wild. This typically takes very little ROM space (just injects an invalid instruction that triggers a fault which then recovers the PC and outputs/stores it somehow, then resets the IC). This is the approach that us (Nordic) have followed for unrecoverable BLE errors for years, and it has proven to be extremely valuable to find rare conditions and get information about those from customers, especially given the wide range of behavior that devices encounter in the wild from other peers.
Regards,
Carles

[Chuck Jordan] " injects an invalid instruction that triggers a fault" is an implementation detail. Actually, when you have power management in the mix or when you have multi-core, when a fault happens, there may be many things you need to do BEFORE triggering a reset. You may have to flush all files and close them, you may have to save state to non-volatile memories, you may have to signal other CPUs that you are about to reset, etc. It can be quite complicated depending on the topology of the hardware.


Re: Inconsistent and ever-changing device naming in Zephyr

Daniel Thompson <daniel.thompson@...>
 

On 06/02/17 13:34, Paul Sokolovsky wrote:
Hello,

Writing on this was in my TODO at least since November, but I expected
this to be a controversial topic, so kind of skipped it. Controversial,
as in: different parties finding to be "obvious" quite different, if
not opposite solutions to the problem. Well, the breakage continues,
so this should be raised.

(Recent) facts:

1. Update to Kinetis (BOARD=frdm_k64f) port broke GPIO support in a
(complex) Zephyr-based application, Zephyr.js:
https://github.com/01org/zephyr.js/issues/665

2. A day or two, Arduino 101 I2C support in Zephyr.js was updated,
apparently because Zephyr upgrade broke it. Today it became known that
the fix for Arduino 101 broke FRDM-K64F.
https://github.com/01org/zephyr.js/issues/676

Investigating the issues, the causes of them are:

1. frdm_k64f GPIO port naming changed from "GPIO_0" to (!!) "gpio_porta".

2. Name of (one of) I2C device for arduino_101 changed from "I2C_0" to
"I2C_SS_0".


This is not the first time when device names change in Zephyr (see
reference above to November 2016), it's the first time (at least for
last half-year), when the naming changes in relatively well developed
ports, which leads to cascade of breakage in the one. And one case, it
can be seen how inconsistent naming (and changes of it) can be.

The question what to do about this. Possible high-level approaches can
be:

a) Establish (well, reconfirm/clarify) and uphold consistent naming
conventions across ports, architectures, and boards.

b) Send a strong message to application developers that they should
not rely on any patterns of Zephyr device naming, and even on the naming
at all, and instead should treat it as a configuration parameter, which
ultimately should be set by a user (and thus apps should make such
configuration explicit and easy for users).
Or c) inherit the device names from device tree?

I don't actually remember that much about the proposed DT tooling but it should certainly be included in this discussion.

It might be made slightly more complex by the presence of aliased names in DT (often used to bridge SoC datasheet namings into board based names). I guess Zephyr might prefer to use the most-derived alias rather then the SoC datasheet name?


Daniel.


Daily Gerrit Digest

donotreply@...
 

NEW within last 24 hours:
- https://gerrit.zephyrproject.org/r/10865 : drivers/net/ieee802154: Change configuration prefix
- https://gerrit.zephyrproject.org/r/10872 : Bluetooth: GATT: set subscribe value to zero for unsubscription
- https://gerrit.zephyrproject.org/r/10877 : i2c: Name parameters consistently.
- https://gerrit.zephyrproject.org/r/10875 : i2c/stm32lx: Fix layout.
- https://gerrit.zephyrproject.org/r/10876 : i2c/dw: Switch from EPERM to EIO
- https://gerrit.zephyrproject.org/r/10878 : i2c: Elaborate API return values
- https://gerrit.zephyrproject.org/r/10874 : i2c: Implement consistent i2c no msgs behaviour
- https://gerrit.zephyrproject.org/r/10873 : i2c: Remove unused definition.
- https://gerrit.zephyrproject.org/r/10871 : Xtensa port: Removed unsupported c++ flags cuasing xt-c++ to throw an error.
- https://gerrit.zephyrproject.org/r/10860 : v2m_beetle: uart: Add DTS support to UART driver
- https://gerrit.zephyrproject.org/r/10868 : net: ipv6: Free received NA net_buf
- https://gerrit.zephyrproject.org/r/10867 : net: ipv6: Fix pending buf leak when NA is received
- https://gerrit.zephyrproject.org/r/10870 : samples: net: Fix invalid memory access for TCP
- https://gerrit.zephyrproject.org/r/10869 : drivers/etc/mcux: Free net_buf using net_nbuf_unref
- https://gerrit.zephyrproject.org/r/10858 : stm32: uart: Add DTS support to STM32 UART driver
- https://gerrit.zephyrproject.org/r/10856 : stm32: Add base DTS support for Nucleo board
- https://gerrit.zephyrproject.org/r/10864 : drivers/ieee802154: Split drivers Kconfig
- https://gerrit.zephyrproject.org/r/10863 : net: nbuf: Add timeout to net_buf getters
- https://gerrit.zephyrproject.org/r/10862 : ARC: fix I2C SPI and GPIO default name issue for ARC
- https://gerrit.zephyrproject.org/r/10861 : net: todo: Add CAN support entry
- https://gerrit.zephyrproject.org/r/10859 : v2m_beetle: Add base DTS support
- https://gerrit.zephyrproject.org/r/10855 : cc3200: Enable device tree usage for CC3200
- https://gerrit.zephyrproject.org/r/10857 : nucleo_l476rg: Enable device tree usage for Nucleo
- https://gerrit.zephyrproject.org/r/10854 : cc3200: dts: Add base DTS support for TI CC3200
- https://gerrit.zephyrproject.org/r/10853 : drivers: Add Atmel SAM RTC driver
- https://gerrit.zephyrproject.org/r/10849 : doc: rename doxygen configuration file and build from doc/

UPDATED within last 24 hours:
- https://gerrit.zephyrproject.org/r/10798 : Bluetooth: GATT: introduce volatile subscription flag
- https://gerrit.zephyrproject.org/r/10725 : ext: qmsi: wait for AON periodic timer alarm's being cleared
- https://gerrit.zephyrproject.org/r/10645 : Bluetooth: HFP HF: Handling AG Network error
- https://gerrit.zephyrproject.org/r/10644 : Bluetooth: HFP HF: Remove unused variable 'buf'
- https://gerrit.zephyrproject.org/r/10547 : Bluetooth: HFP HF: Disconnect rfcomm on SLC error
- https://gerrit.zephyrproject.org/r/7738 : RFC: Add support for Device Tree
- https://gerrit.zephyrproject.org/r/10629 : RFC: DTS: Kinetis: Add support for Hexiwear K64
- https://gerrit.zephyrproject.org/r/10627 : RFC: DTS: Kinetis: Add base support for Kinetis
- https://gerrit.zephyrproject.org/r/10626 : RFC: DTS: Add base DTS and YAML definitions
- https://gerrit.zephyrproject.org/r/10806 : Bluetooth: AVDTP: Handling Discover response
- https://gerrit.zephyrproject.org/r/9460 : Bluetooth: AVDTP: Add AVDTP Discover Function Definition
- https://gerrit.zephyrproject.org/r/9663 : Bluetooth: AVDTP: Add AVDTP Receive Function
- https://gerrit.zephyrproject.org/r/10807 : net/mqtt: Add BT support to MQTT publisher sample
- https://gerrit.zephyrproject.org/r/10475 : Simple PWM driver for nRF5
- https://gerrit.zephyrproject.org/r/10683 : drivers: mcr20a: control access to SPI with semaphore
- https://gerrit.zephyrproject.org/r/10793 : tests/gpio: fix test GPIO_INT_EDGE bug
- https://gerrit.zephyrproject.org/r/10841 : net/dns: Fix inline documentation
- https://gerrit.zephyrproject.org/r/10628 : RFC: DTS: Kinetis: Add FRDM_K64F support
- https://gerrit.zephyrproject.org/r/10837 : net: context: keep randomly assigned port for TCP bind() calls
- https://gerrit.zephyrproject.org/r/10832 : mbedtls: add arduino 101 configuration to ssl client sample
- https://gerrit.zephyrproject.org/r/10618 : Add hashicorp's packer 0.12.2 to basebuild
- https://gerrit.zephyrproject.org/r/9947 : tests/pwm: enable PWM case to work on D2000 board
- https://gerrit.zephyrproject.org/r/10601 : arm: cmsis: Remove last bits of scs/scb as we've converted to CMSIS
- https://gerrit.zephyrproject.org/r/10605 : arm: cmsis: cleanup use of _SCS_CPACR_CP1{0,1}_Pos define
- https://gerrit.zephyrproject.org/r/10600 : arm: cmsis: Convert _Scb*Fault*Reset to use direct CMSIS register access
- https://gerrit.zephyrproject.org/r/10597 : arm: cmsis: Convert _Scb*FaultAddrGet to use direct CMSIS register access
- https://gerrit.zephyrproject.org/r/10595 : arm: cmsis: Convert _ScbDivByZeroFaultEnable to use direct CMSIS register access
- https://gerrit.zephyrproject.org/r/10594 : arm: cmsis: Convert _ScbHardFaultIsForced to use direct CMSIS register access
- https://gerrit.zephyrproject.org/r/10593 : arm: cmsis: Convert _ScbActiveVectorGet to use direct CMSIS register access
- https://gerrit.zephyrproject.org/r/10592 : arm: cmsis: Convert FaultEnable to use direct CMSIS register access
- https://gerrit.zephyrproject.org/r/10591 : arm: cmsis: Convert _ScbIsNestedExc to use direct CMSIS register access

MERGED within last 24 hours:
- https://gerrit.zephyrproject.org/r/10546 : Bluetooth: HFP HF: Indicate disconnect to application
- https://gerrit.zephyrproject.org/r/10680 : gpio: Clarify API
- https://gerrit.zephyrproject.org/r/10682 : gpio/nrf5: Implement port read and write
- https://gerrit.zephyrproject.org/r/10681 : gpio/nrf5: Fix GPIO_ACCESS_BY_PIN behaviour
- https://gerrit.zephyrproject.org/r/10808 : Bluetooth: GATT: Fix not removing subscriptions safely
- https://gerrit.zephyrproject.org/r/10426 : Bluetooth: HFP HF: Enable extended AG Error Result Code
- https://gerrit.zephyrproject.org/r/10479 : Bluetooth: HFP HF: Service level connection completed
- https://gerrit.zephyrproject.org/r/10625 : samples/net/http: Add the HTTP client sample application
- https://gerrit.zephyrproject.org/r/10831 : drivers: enc28j60: Enables reception of multicast packets
- https://gerrit.zephyrproject.org/r/10846 : net: nbuf: Fix style of doxygen comment
- https://gerrit.zephyrproject.org/r/10754 : Bluetooth: Merge HCI command and connection TX threads


Re: Design Philosophy 'Minimal runtime error checking'...

Benjamin Walsh <benjamin.walsh@...>
 

On Mon, Feb 06, 2017 at 11:44:50AM +0000, Cufi, Carles wrote:
Hi Ben,

-----Original Message-----
From: zephyr-devel-bounces@lists.zephyrproject.org [mailto:zephyr-devel-
bounces@lists.zephyrproject.org] On Behalf Of Benjamin Walsh
Sent: Friday, February 03, 2017 22:24
To: Chuck Jordan <Chuck.Jordan@synopsys.com>
Cc: zephyr-devel@lists.zephyrproject.org
Subject: Re: [Zephyr-devel] Design Philosophy 'Minimal runtime error
checking'...

On Fri, Feb 03, 2017 at 08:37:13PM +0000, Chuck Jordan wrote:
Hi Zephyr, Realize that something like Assert should only be compiled
in when building for a "debugging" session. The Assert macro should
evaluate to NOTHING when you build for a deployed implementation,
where all testing has been done w/o issue.
Agreed, and that is why the kernel takes the approach below: __ASSERT()s
are use to catch user errors early during development, and are meant to
be turned off before deployment.
As Anas mentioned, I still think we need 2 level of "asserts". There
might be one level which, as you state, is there only to detect early
user errors during development and can be turned off on production.
But then I feel we should have the "fatal" or "panic" level of assert
macro that covers unrecoverable errors in a way that can *not* be
disabled *at all*. The reason for those is to detect memory corruption
or any other unrecoverable situation in a way that provides
information to the developers if it ever happens in the wild. This
typically takes very little ROM space (just injects an invalid
instruction that triggers a fault which then recovers the PC and
outputs/stores it somehow, then resets the IC). This is the approach
that us (Nordic) have followed for unrecoverable BLE errors for years,
and it has proven to be extremely valuable to find rare conditions and
get information about those from customers, especially given the wide
range of behavior that devices encounter in the wild from other peers.

I have tried to summarize this "unrecoverable error" support in this
Jira issue:

https://jira.zephyrproject.org/browse/ZEP-843
Sure, a panic() API is a good thing to have as well. Like you said, this
is for a completely different use-case then the current __ASSERT() we
have.


Inconsistent and ever-changing device naming in Zephyr

Paul Sokolovsky
 

Hello,

Writing on this was in my TODO at least since November, but I expected
this to be a controversial topic, so kind of skipped it. Controversial,
as in: different parties finding to be "obvious" quite different, if
not opposite solutions to the problem. Well, the breakage continues,
so this should be raised.

(Recent) facts:

1. Update to Kinetis (BOARD=frdm_k64f) port broke GPIO support in a
(complex) Zephyr-based application, Zephyr.js:
https://github.com/01org/zephyr.js/issues/665

2. A day or two, Arduino 101 I2C support in Zephyr.js was updated,
apparently because Zephyr upgrade broke it. Today it became known that
the fix for Arduino 101 broke FRDM-K64F.
https://github.com/01org/zephyr.js/issues/676

Investigating the issues, the causes of them are:

1. frdm_k64f GPIO port naming changed from "GPIO_0" to (!!) "gpio_porta".

2. Name of (one of) I2C device for arduino_101 changed from "I2C_0" to
"I2C_SS_0".


This is not the first time when device names change in Zephyr (see
reference above to November 2016), it's the first time (at least for
last half-year), when the naming changes in relatively well developed
ports, which leads to cascade of breakage in the one. And one case, it
can be seen how inconsistent naming (and changes of it) can be.

The question what to do about this. Possible high-level approaches can
be:

a) Establish (well, reconfirm/clarify) and uphold consistent naming
conventions across ports, architectures, and boards.

b) Send a strong message to application developers that they should
not rely on any patterns of Zephyr device naming, and even on the naming
at all, and instead should treat it as a configuration parameter, which
ultimately should be set by a user (and thus apps should make such
configuration explicit and easy for users).

Whether a) is a viable solution in turn depends on paradigmatic decision
what aim Zephyr is trying to achieve:

I. Is it a common platform, literally, an Operating System, providing
consistent means to develop software *across* various hardware?

II. Or is it loose collection of APIs of working with hardware, with
the aim of getting most (or something) of a particular hardware,
without pledging much for consistency and portability across hardware.


It would be nice to get opinions of both the core maintainers and the
maintainers of particular ports, as well as specific suggestions how to
deal with the bugs above.


Thanks,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog


how can i do bluetooth and wireless test on arduino_due?

曹子龙
 

Hi everyone:

  i have got a single arduino_due core board and  from the user spec, it did not find have any wireless device for me to do some tests with zephyr, so maybe some extension board needed,  so anyone can tell me how to prepare a  blue tooth or wireless test environment? buy some extension board that zephyr already support?   

    and this is my fist time, any help would be great appreciate.



 


Re: Design Philosophy 'Minimal runtime error checking'...

Carles Cufi
 

Hi Ben,

-----Original Message-----
From: zephyr-devel-bounces@lists.zephyrproject.org [mailto:zephyr-devel-
bounces@lists.zephyrproject.org] On Behalf Of Benjamin Walsh
Sent: Friday, February 03, 2017 22:24
To: Chuck Jordan <Chuck.Jordan@synopsys.com>
Cc: zephyr-devel@lists.zephyrproject.org
Subject: Re: [Zephyr-devel] Design Philosophy 'Minimal runtime error
checking'...

On Fri, Feb 03, 2017 at 08:37:13PM +0000, Chuck Jordan wrote:
Hi Zephyr, Realize that something like Assert should only be compiled
in when building for a "debugging" session. The Assert macro should
evaluate to NOTHING when you build for a deployed implementation,
where all testing has been done w/o issue.
Agreed, and that is why the kernel takes the approach below: __ASSERT()s
are use to catch user errors early during development, and are meant to
be turned off before deployment.
As Anas mentioned, I still think we need 2 level of "asserts". There might be one level which, as you state, is there only to detect early user errors during development and can be turned off on production. But then I feel we should have the "fatal" or "panic" level of assert macro that covers unrecoverable errors in a way that can *not* be disabled *at all*. The reason for those is to detect memory corruption or any other unrecoverable situation in a way that provides information to the developers if it ever happens in the wild. This typically takes very little ROM space (just injects an invalid instruction that triggers a fault which then recovers the PC and outputs/stores it somehow, then resets the IC). This is the approach that us (Nordic) have followed for unrecoverable BLE errors for years, and it has proven to be extremely valuable to find rare conditions and get information about those from customers, especially given the wide range of behavior that devices encounter in the wild from other peers.

I have tried to summarize this "unrecoverable error" support in this Jira issue:

https://jira.zephyrproject.org/browse/ZEP-843

Regards,

Carles


Re: Design Philosophy 'Minimal runtime error checking'...

Paul Sokolovsky
 

Hello,

On Fri, 3 Feb 2017 22:08:50 +0000
"Boie, Andrew P" <andrew.p.boie@intel.com> wrote:

On Fri, Feb 03, 2017 at 08:37:13PM +0000, Chuck Jordan wrote:
Hi Zephyr, Realize that something like Assert should only be
compiled in when building for a "debugging" session. The Assert
macro should evaluate to NOTHING when you build for a deployed
implementation, where all testing has been done w/o issue.
Agreed, and that is why the kernel takes the approach below:
__ASSERT()s are use to catch user errors early during development,
and are meant to be turned off before deployment.
I also concur. They should not be present in a production build.
But default should be "development" or "debug" build, and it's
probably even like that on Kconfig level (or maybe not), but there're
still many builtin examples with their own prj.conf which are perhaps
too light on both error checking and logging. And the point is that
likely there should be more, perhaps much more checking (at least in
some subsystems).

Also, would be interesting to hear security experts' opinions, against
what kind of software more CVEs are produced - one which has more
checks, or one which has less? ;-) (Btw, was Zephyr issued a CVE yet?
That would be a fun and important day - "Receiving CVEs, ergo sum").


Andrew

--
Best Regards,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog


Daily Gerrit Digest

donotreply@...
 

NEW within last 24 hours:
- https://gerrit.zephyrproject.org/r/10848 : tinycrypt: Fixed compilation error on Xtensa caused by bad definition of bool.
- https://gerrit.zephyrproject.org/r/10847 : net: set default shell module to 'net'
- https://gerrit.zephyrproject.org/r/10846 : net: fix style of doxygen comment
- https://gerrit.zephyrproject.org/r/10842 : Xtensa port: Set Swap function result to -EAGIN.
- https://gerrit.zephyrproject.org/r/10841 : net/dns: Fix inline documentation

UPDATED within last 24 hours:
- https://gerrit.zephyrproject.org/r/10783 : samples: bmi160: use new device name
- https://gerrit.zephyrproject.org/r/9820 : tests: add zephyr adc driver api test case
- https://gerrit.zephyrproject.org/r/10177 : samples: pwm: move pwm samples into one directory
- https://gerrit.zephyrproject.org/r/10680 : gpio: Clarify API
- https://gerrit.zephyrproject.org/r/10725 : ext: qmsi: wait for AON periodic timer alarm's being cleared
- https://gerrit.zephyrproject.org/r/10793 : tests/gpio: fix test GPIO_INT_EDGE bug

MERGED within last 24 hours:
- https://gerrit.zephyrproject.org/r/10829 : xt-run: fix sanitycheck communication
- https://gerrit.zephyrproject.org/r/9516 : drivers: Add Atmel SAM family GMAC Ethernet driver
- https://gerrit.zephyrproject.org/r/10828 : Xtensa port: Restore interrupted context the same way as premption resume.
- https://gerrit.zephyrproject.org/r/10822 : Xtensa port: Removed duplicate code and fixed update of current thread pointer.
- https://gerrit.zephyrproject.org/r/10840 : sanitycheck: Fixed displayed log when current directory is a symbolic link.
- https://gerrit.zephyrproject.org/r/9939 : samples: bmi160: reduce polling mode sampling frequency
- https://gerrit.zephyrproject.org/r/7244 : ext: qmsi: fix an incomplete type issue
- https://gerrit.zephyrproject.org/r/9981 : samples: spi_flash: remove an unnecessary config symbol
- https://gerrit.zephyrproject.org/r/9938 : drivers: bmi160: add sample ready check
- https://gerrit.zephyrproject.org/r/10782 : boards: arduino_101_sss: use new gpio device name
- https://gerrit.zephyrproject.org/r/10506 : tests: kernel: mem_heap: added api and concept tests
- https://gerrit.zephyrproject.org/r/9967 : arm: Adjust entry point of XIP kernels to the __reset function
- https://gerrit.zephyrproject.org/r/9818 : tests: add zephyr i2c driver api test
- https://gerrit.zephyrproject.org/r/10724 : kernel: k_timer_stop: remove assert when called from an ISR
- https://gerrit.zephyrproject.org/r/10827 : libc: minimal: rename private macro
- https://gerrit.zephyrproject.org/r/10768 : quark_se: sensor: use consistent device names for GPIO, SPI and I2C


Daily Gerrit Digest

donotreply@...
 

NEW within last 24 hours:
- https://gerrit.zephyrproject.org/r/10840 : sanitycheck: Fixed displayed log when current directory is a symbolic link.
- https://gerrit.zephyrproject.org/r/10835 : Xtensa port: xt*-run requires options to be passed before file to be ran.
- https://gerrit.zephyrproject.org/r/10838 : sanitycheck: do not run legacy tests
- https://gerrit.zephyrproject.org/r/10837 : net: context: keep randomly assigned port for TCP bind() calls
- https://gerrit.zephyrproject.org/r/10836 : kernel: Add OpenOCD support
- https://gerrit.zephyrproject.org/r/10826 : xtensa: fix assembling bb[cs]i.l on big-endian targets
- https://gerrit.zephyrproject.org/r/10834 : Xtensa port: Ensure tests exit when running in simulator.
- https://gerrit.zephyrproject.org/r/10832 : mbedtls: add arduino 101 configuration to ssl client sample
- https://gerrit.zephyrproject.org/r/10831 : enc28j60: Enables reception of multicast packets
- https://gerrit.zephyrproject.org/r/10830 : Xtensa port: Moved coprocessor stack area on bottom of stack, just after TCS.
- https://gerrit.zephyrproject.org/r/10828 : Xtensa port: Restore interrupted context the same way as premption resume.
- https://gerrit.zephyrproject.org/r/10827 : libc: minimal: rename private macro
- https://gerrit.zephyrproject.org/r/10818 : drivers/ieee802154: Split drivers Kconfig
- https://gerrit.zephyrproject.org/r/10822 : Xtensa port: Removed duplicate code and fixed update of current thread pointer.

UPDATED within last 24 hours:
- https://gerrit.zephyrproject.org/r/10618 : Add hashicorp's packer binary to basebuild
- https://gerrit.zephyrproject.org/r/10627 : RFC: DTS: Kinetis: Add base support for Kinetis
- https://gerrit.zephyrproject.org/r/10629 : RFC: DTS: Kinetis: Add support for Hexiwear K64
- https://gerrit.zephyrproject.org/r/7738 : RFC: Add support for Device Tree
- https://gerrit.zephyrproject.org/r/10626 : RFC: DTS: Add base DTS and YAML definitions
- https://gerrit.zephyrproject.org/r/10628 : RFC: DTS: Kinetis: Add FRDM_K64F support
- https://gerrit.zephyrproject.org/r/10782 : boards: arduino_101_sss: use new gpio device name
- https://gerrit.zephyrproject.org/r/9947 : tests/pwm: add PINMUX config for D2000 board
- https://gerrit.zephyrproject.org/r/10787 : xtensa: add recipes-devtools-xtensa
- https://gerrit.zephyrproject.org/r/10785 : xtensa: remove unneeded patches
- https://gerrit.zephyrproject.org/r/10718 : tests: dma: update dma loop transfer app
- https://gerrit.zephyrproject.org/r/9681 : quark_se: Add shared GDT in RAM memory to linker
- https://gerrit.zephyrproject.org/r/10693 : tests: dma: update dma block transfer app
- https://gerrit.zephyrproject.org/r/10690 : tests: Increased minimum stack size for tests to support Xtensa.
- https://gerrit.zephyrproject.org/r/10541 : drivers: dma_shim: update dma qmsi shim driver
- https://gerrit.zephyrproject.org/r/10808 : Bluetooth: GATT: Fix not removing subscriptions safely
- https://gerrit.zephyrproject.org/r/9682 : quark_d2000: Add shared GDT memory to linker
- https://gerrit.zephyrproject.org/r/10754 : Bluetooth: Merge HCI command and connection TX threads
- https://gerrit.zephyrproject.org/r/10670 : xtensa port: Added arch .ini file to support xt-sim
- https://gerrit.zephyrproject.org/r/10076 : verify: replace verify build steps with run_phases call
- https://gerrit.zephyrproject.org/r/3114 : DONT MERGE: - break doc
- https://gerrit.zephyrproject.org/r/10811 : BME280 added support for SPI comunication. quark_se_c1000_ss added support for third Slave Select of SS_SPI0
- https://gerrit.zephyrproject.org/r/10814 : Added sensor driver for SI1153. Added proximity sensor_channel entries in sensor.h
- https://gerrit.zephyrproject.org/r/10813 : Added sensor driver for ADXRS290
- https://gerrit.zephyrproject.org/r/10812 : Added sensor driver for ADXL362
- https://gerrit.zephyrproject.org/r/10786 : xtensa: fix endianness
- https://gerrit.zephyrproject.org/r/10797 : Bluetooth: hci_core: fixing subscription removal
- https://gerrit.zephyrproject.org/r/10798 : Bluetooth: GATT: introduce volatile subscription flag
- https://gerrit.zephyrproject.org/r/10772 : Revert "Xtensa port: Changed default core to D_233L."
- https://gerrit.zephyrproject.org/r/10788 : xtensa: apply overlay to newlib
- https://gerrit.zephyrproject.org/r/10333 : test_mbedtls: add a test for Port AES-CMAC-PRF-128[RFC 4615] encryption library
- https://gerrit.zephyrproject.org/r/10784 : pinmux: fix default pinmux driver for quark_se_ss
- https://gerrit.zephyrproject.org/r/10768 : quark_se: sensor: use consistent device names for GPIO, SPI and I2C
- https://gerrit.zephyrproject.org/r/10801 : board: add nucleo_l476rg documentation
- https://gerrit.zephyrproject.org/r/10625 : samples/net/http: Add the HTTP client sample application
- https://gerrit.zephyrproject.org/r/10804 : net/mqtt: Add support for IBM BlueMix Watson topic format
- https://gerrit.zephyrproject.org/r/10778 : kernel: retain device init return value
- https://gerrit.zephyrproject.org/r/9680 : quark_se: Fix restore info address
- https://gerrit.zephyrproject.org/r/10755 : flash/nrf5: fix invalid write access
- https://gerrit.zephyrproject.org/r/10807 : net/mqtt: Add BT support to MQTT publisher sample
- https://gerrit.zephyrproject.org/r/10741 : gpio: Add gpio_simple32 driver to access basic 32-bit i/o registers
- https://gerrit.zephyrproject.org/r/10293 : DONT MERGE drivers: Add DAC driver API

MERGED within last 24 hours:
- https://gerrit.zephyrproject.org/r/10819 : scripts: add configuration file for uncrustify
- https://gerrit.zephyrproject.org/r/10829 : xt-run: fix sanitycheck communication
- https://gerrit.zephyrproject.org/r/10833 : doc: add cross-references to hello_world sample
- https://gerrit.zephyrproject.org/r/10823 : Xtensa port: Fixed linker script for hifi_mini core.
- https://gerrit.zephyrproject.org/r/10824 : MAINTAINERS: added maintainer for riscv32
- https://gerrit.zephyrproject.org/r/10821 : MAINTAINERS: Update network applications section
- https://gerrit.zephyrproject.org/r/10820 : doc: reference bluetooth section of Arduino 101
- https://gerrit.zephyrproject.org/r/10734 : sanitcheck: add sam_e70_xplained
- https://gerrit.zephyrproject.org/r/10733 : boards: sam e70: add openocd file for SAM E70
- https://gerrit.zephyrproject.org/r/10815 : ext: Atmel ASF: remove unused and broken preprocesor macros
- https://gerrit.zephyrproject.org/r/10668 : net: context: Assign a random port number when context is created
- https://gerrit.zephyrproject.org/r/10809 : Xtensa port: Use directly assmebler and linker instead of via the compiler.
- https://gerrit.zephyrproject.org/r/10781 : sanitycheck: don't disable tryrun
- https://gerrit.zephyrproject.org/r/10777 : xt-sim: enable use of xtensa simulator with 'make run'
- https://gerrit.zephyrproject.org/r/10776 : libc: minimal: rename private macro
- https://gerrit.zephyrproject.org/r/10730 : riscv32: timer: replace riscv_qemu_driver by the generic riscv_machine_driver
- https://gerrit.zephyrproject.org/r/10731 : riscv32: added a generic linker script for the riscv32 platform
- https://gerrit.zephyrproject.org/r/10192 : x86: implement direct interrupts
- https://gerrit.zephyrproject.org/r/10792 : xt-run: delete any stale fifos to avoid mkfifo error
- https://gerrit.zephyrproject.org/r/10775 : xtensa_sim_console: remove deprecation warning
- https://gerrit.zephyrproject.org/r/10774 : xtensa: Makefile: cleanup
- https://gerrit.zephyrproject.org/r/10773 : Makefile: test for -fno-asynchronous-unwind-tables
- https://gerrit.zephyrproject.org/r/10771 : xcc: add location of C++ compiler
- https://gerrit.zephyrproject.org/r/10454 : archives: refactor archive script
- https://gerrit.zephyrproject.org/r/10753 : Bluetooth: Merge bt_conn TX threads into a single one with k_poll
- https://gerrit.zephyrproject.org/r/10817 : net: buf: Fix timeout parameter documentation
- https://gerrit.zephyrproject.org/r/9717 : Bluetooth: shell: Add support to retrieve A2SRC record


Re: Design Philosophy 'Minimal runtime error checking'...

Boie, Andrew P
 

On Fri, Feb 03, 2017 at 08:37:13PM +0000, Chuck Jordan wrote:
Hi Zephyr, Realize that something like Assert should only be compiled
in when building for a "debugging" session. The Assert macro should
evaluate to NOTHING when you build for a deployed implementation,
where all testing has been done w/o issue.
Agreed, and that is why the kernel takes the approach below: __ASSERT()s
are use to catch user errors early during development, and are meant to be
turned off before deployment.
I also concur. They should not be present in a production build.

Andrew


Re: Design Philosophy 'Minimal runtime error checking'...

Benjamin Walsh <benjamin.walsh@...>
 

On Fri, Feb 03, 2017 at 08:37:13PM +0000, Chuck Jordan wrote:
Hi Zephyr, Realize that something like Assert should only be compiled
in when building for a "debugging" session. The Assert macro should
evaluate to NOTHING when you build for a deployed implementation,
where all testing has been done w/o issue.
Agreed, and that is why the kernel takes the approach below: __ASSERT()s
are use to catch user errors early during development, and are meant to
be turned off before deployment.

The problem with leaving Asserts in is that each IF-statement is a
branch that might cause an instruction pipeline to be killed.
Especially if the branch is taken in the wrong direction. Further,
what is being tested in the assert might case a d-cache line MISS
event, as it tries to fetch that data item, or many data items. These
cache-line misses can be quite time expensive -- especially on low-end
devices with slower clock frequencies, slower memory subsystems, etc.
Further, people can add TOO many such asserts if they were trained
like me to be a very paranoid programmer who is looking for ANYTHING
that might go wrong.

My 2 cents.

-ChuckJ

-----Original Message-----
From: zephyr-devel-bounces@lists.zephyrproject.org [mailto:zephyr-devel-bounces@lists.zephyrproject.org] On Behalf Of Benjamin Walsh
Sent: Friday, February 03, 2017 8:33 AM
To: Marcus Shawcroft <marcus.shawcroft@gmail.com>
Cc: zephyr-devel@lists.zephyrproject.org
Subject: Re: [Zephyr-devel] Design Philosophy 'Minimal runtime error checking'...

Hi Marcus,

The discussion in this issue
https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.zephyrprojec
t.org_browse_ZEP-2D1618&d=DwICAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=Ur8eT0ZY1DOjJtZ8biKXp_CyOvecmZtsphp7SXIsE1Y&m=gpOaAU-cOMNJrt21NOBkOIUq6FX5T4Fhaiv-duTE22A&s=fj5iSI4yC35d6Fb2_uvW7TmltO9Bl91LGZ6rY_0IFOY&e= has prompted me to raise this topic in the wider forum.

Zephyr has a philosophy of "minimal runtime error checking", it says
so here
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.zephyrproject
.org_doc_introduction_introducing-5Fzephyr.html&d=DwICAg&c=DPL6_X_6JkX
Fx7AXWqB0tg&r=Ur8eT0ZY1DOjJtZ8biKXp_CyOvecmZtsphp7SXIsE1Y&m=gpOaAU-cOM
NJrt21NOBkOIUq6FX5T4Fhaiv-duTE22A&s=05S5_26RFpyKsNf9RTA_30TaVTGma7uUf0
fhQUipdYE&e=
!

There is little other written material on this topic (that I can
find), but in various jira issue comments and gerrit reviews there are
occasional references to this topic.

In the absence of any more detailed written material on this topic it
is difficult to reason about what any arbitrary piece of the code base
should or should not do in order to implement this philosophy
consistently.
The approache taken by the core kernel is pretty simple:

- if an error happens because it's the user's fault, __ASSERT(): the
kernel doesn't care, you did something stupid :)

- if the error comes from proper usage and thus an internal error path,
return an -errno to user

This is why you will see almost no -EINVAL being returned from the
core kernel API. The only -EINVAL values returned come from a
parameter that should have been valid from the user's point-of-view,
but the kernel state changed and made it invalid. I did a quick grep
for the kernel, and there are only two instances of it.

You can take a look at how I used __ASSERT() and errnos in the new
k_poll() API to give you an idea, but it's pretty straightforwared.

Regards,
Ben

Looking through the code base there is a fair amount of evidence to
suggest that interpretation of the philosophy varies considerably. By
way of example, if we look in the drivers tree, and look at the
prevalence of -E* return codes and ASSERT* static trapping, we find:

133 source files use -E* return codes
53 source files use ASSERT traps

The break down the use of specific -E* return codes and ASSERT* macros
across those files we find:

443 -EIO
247 -EINVAL
179 -ENOTSUP
25 -EBUSY
20 -ENODEV
11 -ENOTCONN
10 -EPERM
7 -EALREADY
3 -ENOSYS
3 -ENOMEM
3 -EINPROGRESS
3 -EAGAIN
2 -EVAL
1 -ENODATA
1 -ENOBUFS
1 -EMSGSIZE

97 ASSERT_NO_MSG
23 ASSERT
5 ASSERT_EVAL

This suggests, that many if not most drivers detect and pass error
codes dynamically rather than ASSERTing.

The use of -EIO is extensively used for general failure in the driver,
perhaps this is one group where the design philosophy might suggest
ASSERT should be used more aggressively.

The use of -EINVAL is predominantly associated with API 'configure'
implementation, sanity checking of input parameters. This seems
reasonable to me. But it could be argued that it is a static error
for a user to attempt an illegal driver configuration.

The -ENOTSUP uses are often returned by drivers that only partially
implement the API for a specific class of driver. This is primarily
useful for an application that catches the code and then implements an
alternative strategy.... Is this consistent with the design
philosophy, it is perhaps more appropriate to rigorously apply the
philosophy and ASSERT().

The -EBUSY uses vary. At least some instances appear to be cases
where the EBUSY code is returned by the driver API (ie not caught
internally). Thus effectively mandating that all users of that API
must check the return code (and presumably busy loop). This, at least,
superficially appears to be an anti pattern of the design philosophy.
The -EAGAIN uses also appear to be examples of design philosophy
'anti pattern'

Many of the remaining less frequent -E error codes are often used
interchangeably for the same purpose between different drivers. I
don;t have a view right now as to whether most of these fit the
philosophy or not, but at a minimum we should probably more consistent
with -E code choice.

If(f) we intend to hold to the "minimal runtime error checking"
philosophy I think it would be highly desirable to elaborate some more
detailed guidance on what to assert and what to error return. Doing so
will enable more effective review of new code coming into the tree,
and provide a basis to tidy up some of the existing code.

This quick rummage through drivers/* also suggest that for a given
API, driver implementations are inconsistent w.r.t the return codes
they pass back to the caller. I think we should extended the
include/*.h driver API documentation for each API function with a
return code to indicates what E* codes are possible in what
circumstances. Doing so will help driver users to understand the
failure scenarios they must handle (-EAGAIN and -EBUSY were a surprise
to me) and, perhaps more important right now, help device driver
writers to reason about how their driver is expected to conform.

Incidentally, I've written about drivers/* here, much of this applies
across other substantial parts of the tree...

Thoughts welcome, in particular:
- does the design philosophy still stand
- where should we document/elaborate implementation guidance
- should we tighten the include/*.h driver API documentation w.r.t
expected behaviour

Cheers
/Marcus
_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@lists.zephyrproject.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.zephyrproje
ct.org_mailman_listinfo_zephyr-2Ddevel&d=DwICAg&c=DPL6_X_6JkXFx7AXWqB0
tg&r=Ur8eT0ZY1DOjJtZ8biKXp_CyOvecmZtsphp7SXIsE1Y&m=gpOaAU-cOMNJrt21NOB
kOIUq6FX5T4Fhaiv-duTE22A&s=LlFmZfRQXvgSFLtDKlBelnaq2tsB4zQxZJ6oqGYee2o
&e=
--
Benjamin Walsh, SMTS
WR VxWorks Virtualization Profile
www.windriver.com
Zephyr kernel maintainer
www.zephyrproject.org
_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@lists.zephyrproject.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.zephyrproject.org_mailman_listinfo_zephyr-2Ddevel&d=DwICAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=Ur8eT0ZY1DOjJtZ8biKXp_CyOvecmZtsphp7SXIsE1Y&m=gpOaAU-cOMNJrt21NOBkOIUq6FX5T4Fhaiv-duTE22A&s=LlFmZfRQXvgSFLtDKlBelnaq2tsB4zQxZJ6oqGYee2o&e=
--
Benjamin Walsh, SMTS
WR VxWorks Virtualization Profile
www.windriver.com
Zephyr kernel maintainer
www.zephyrproject.org


Re: Design Philosophy 'Minimal runtime error checking'...

Gil Pitney
 

https://en.wikipedia.org/wiki/Design_by_contract

One can think of DBC as a structured means of using assertions.

I once used this technique in a complex software project (written in
C) that eventually shipped on over 1M mobile phones.

It was a framework for offloading signal processing algorithms
(speech, video, audio) to a DSP coprocessor over IPC, including
dynamic linking and loading.

Other components in the software stack that did not use DBC logged an
order of magnitude more bugs. And when something failed during
testing, one could find the issue immediately (i.e., a contract
violation). "Crash early".

I highly recommend it.

On 3 February 2017 at 12:37, Chuck Jordan <Chuck.Jordan@synopsys.com> wrote:
Hi Zephyr,
Realize that something like Assert should only be compiled in when building for a "debugging" session.
The Assert macro should evaluate to NOTHING when you build for a deployed implementation, where all testing has been done w/o issue.

The problem with leaving Asserts in is that each IF-statement is a branch that might cause an instruction pipeline to be killed.
Especially if the branch is taken in the wrong direction. Further, what is being tested in the assert might case a d-cache line MISS event, as it tries to fetch that data item, or many data items.
These cache-line misses can be quite time expensive -- especially on low-end devices with slower clock frequencies, slower memory subsystems, etc.
Further, people can add TOO many such asserts if they were trained like me to be a very paranoid programmer who is looking for ANYTHING that might go wrong.

My 2 cents.

-ChuckJ

-----Original Message-----
From: zephyr-devel-bounces@lists.zephyrproject.org [mailto:zephyr-devel-bounces@lists.zephyrproject.org] On Behalf Of Benjamin Walsh
Sent: Friday, February 03, 2017 8:33 AM
To: Marcus Shawcroft <marcus.shawcroft@gmail.com>
Cc: zephyr-devel@lists.zephyrproject.org
Subject: Re: [Zephyr-devel] Design Philosophy 'Minimal runtime error checking'...

Hi Marcus,

The discussion in this issue
https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.zephyrprojec
t.org_browse_ZEP-2D1618&d=DwICAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=Ur8eT0ZY1DOjJtZ8biKXp_CyOvecmZtsphp7SXIsE1Y&m=gpOaAU-cOMNJrt21NOBkOIUq6FX5T4Fhaiv-duTE22A&s=fj5iSI4yC35d6Fb2_uvW7TmltO9Bl91LGZ6rY_0IFOY&e= has prompted me to raise this topic in the wider forum.

Zephyr has a philosophy of "minimal runtime error checking", it says
so here
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.zephyrproject
.org_doc_introduction_introducing-5Fzephyr.html&d=DwICAg&c=DPL6_X_6JkX
Fx7AXWqB0tg&r=Ur8eT0ZY1DOjJtZ8biKXp_CyOvecmZtsphp7SXIsE1Y&m=gpOaAU-cOM
NJrt21NOBkOIUq6FX5T4Fhaiv-duTE22A&s=05S5_26RFpyKsNf9RTA_30TaVTGma7uUf0
fhQUipdYE&e=
!

There is little other written material on this topic (that I can
find), but in various jira issue comments and gerrit reviews there are
occasional references to this topic.

In the absence of any more detailed written material on this topic it
is difficult to reason about what any arbitrary piece of the code base
should or should not do in order to implement this philosophy
consistently.
The approache taken by the core kernel is pretty simple:

- if an error happens because it's the user's fault, __ASSERT(): the
kernel doesn't care, you did something stupid :)

- if the error comes from proper usage and thus an internal error path,
return an -errno to user

This is why you will see almost no -EINVAL being returned from the core kernel API. The only -EINVAL values returned come from a parameter that should have been valid from the user's point-of-view, but the kernel state changed and made it invalid. I did a quick grep for the kernel, and there are only two instances of it.

You can take a look at how I used __ASSERT() and errnos in the new
k_poll() API to give you an idea, but it's pretty straightforwared.

Regards,
Ben

Looking through the code base there is a fair amount of evidence to
suggest that interpretation of the philosophy varies considerably. By
way of example, if we look in the drivers tree, and look at the
prevalence of -E* return codes and ASSERT* static trapping, we find:

133 source files use -E* return codes
53 source files use ASSERT traps

The break down the use of specific -E* return codes and ASSERT* macros
across those files we find:

443 -EIO
247 -EINVAL
179 -ENOTSUP
25 -EBUSY
20 -ENODEV
11 -ENOTCONN
10 -EPERM
7 -EALREADY
3 -ENOSYS
3 -ENOMEM
3 -EINPROGRESS
3 -EAGAIN
2 -EVAL
1 -ENODATA
1 -ENOBUFS
1 -EMSGSIZE

97 ASSERT_NO_MSG
23 ASSERT
5 ASSERT_EVAL

This suggests, that many if not most drivers detect and pass error
codes dynamically rather than ASSERTing.

The use of -EIO is extensively used for general failure in the driver,
perhaps this is one group where the design philosophy might suggest
ASSERT should be used more aggressively.

The use of -EINVAL is predominantly associated with API 'configure'
implementation, sanity checking of input parameters. This seems
reasonable to me. But it could be argued that it is a static error
for a user to attempt an illegal driver configuration.

The -ENOTSUP uses are often returned by drivers that only partially
implement the API for a specific class of driver. This is primarily
useful for an application that catches the code and then implements an
alternative strategy.... Is this consistent with the design
philosophy, it is perhaps more appropriate to rigorously apply the
philosophy and ASSERT().

The -EBUSY uses vary. At least some instances appear to be cases
where the EBUSY code is returned by the driver API (ie not caught
internally). Thus effectively mandating that all users of that API
must check the return code (and presumably busy loop). This, at least,
superficially appears to be an anti pattern of the design philosophy.
The -EAGAIN uses also appear to be examples of design philosophy
'anti pattern'

Many of the remaining less frequent -E error codes are often used
interchangeably for the same purpose between different drivers. I
don;t have a view right now as to whether most of these fit the
philosophy or not, but at a minimum we should probably more consistent
with -E code choice.

If(f) we intend to hold to the "minimal runtime error checking"
philosophy I think it would be highly desirable to elaborate some more
detailed guidance on what to assert and what to error return. Doing so
will enable more effective review of new code coming into the tree,
and provide a basis to tidy up some of the existing code.

This quick rummage through drivers/* also suggest that for a given
API, driver implementations are inconsistent w.r.t the return codes
they pass back to the caller. I think we should extended the
include/*.h driver API documentation for each API function with a
return code to indicates what E* codes are possible in what
circumstances. Doing so will help driver users to understand the
failure scenarios they must handle (-EAGAIN and -EBUSY were a surprise
to me) and, perhaps more important right now, help device driver
writers to reason about how their driver is expected to conform.

Incidentally, I've written about drivers/* here, much of this applies
across other substantial parts of the tree...

Thoughts welcome, in particular:
- does the design philosophy still stand
- where should we document/elaborate implementation guidance
- should we tighten the include/*.h driver API documentation w.r.t
expected behaviour

Cheers
/Marcus
_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@lists.zephyrproject.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.zephyrproje
ct.org_mailman_listinfo_zephyr-2Ddevel&d=DwICAg&c=DPL6_X_6JkXFx7AXWqB0
tg&r=Ur8eT0ZY1DOjJtZ8biKXp_CyOvecmZtsphp7SXIsE1Y&m=gpOaAU-cOMNJrt21NOB
kOIUq6FX5T4Fhaiv-duTE22A&s=LlFmZfRQXvgSFLtDKlBelnaq2tsB4zQxZJ6oqGYee2o
&e=
--
Benjamin Walsh, SMTS
WR VxWorks Virtualization Profile
www.windriver.com
Zephyr kernel maintainer
www.zephyrproject.org
_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@lists.zephyrproject.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.zephyrproject.org_mailman_listinfo_zephyr-2Ddevel&d=DwICAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=Ur8eT0ZY1DOjJtZ8biKXp_CyOvecmZtsphp7SXIsE1Y&m=gpOaAU-cOMNJrt21NOBkOIUq6FX5T4Fhaiv-duTE22A&s=LlFmZfRQXvgSFLtDKlBelnaq2tsB4zQxZJ6oqGYee2o&e=
_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@lists.zephyrproject.org
https://lists.zephyrproject.org/mailman/listinfo/zephyr-devel


Re: Design Philosophy 'Minimal runtime error checking'...

Chuck Jordan <Chuck.Jordan@...>
 

Hi Zephyr,
Realize that something like Assert should only be compiled in when building for a "debugging" session.
The Assert macro should evaluate to NOTHING when you build for a deployed implementation, where all testing has been done w/o issue.

The problem with leaving Asserts in is that each IF-statement is a branch that might cause an instruction pipeline to be killed.
Especially if the branch is taken in the wrong direction. Further, what is being tested in the assert might case a d-cache line MISS event, as it tries to fetch that data item, or many data items.
These cache-line misses can be quite time expensive -- especially on low-end devices with slower clock frequencies, slower memory subsystems, etc.
Further, people can add TOO many such asserts if they were trained like me to be a very paranoid programmer who is looking for ANYTHING that might go wrong.

My 2 cents.

-ChuckJ

-----Original Message-----
From: zephyr-devel-bounces@lists.zephyrproject.org [mailto:zephyr-devel-bounces@lists.zephyrproject.org] On Behalf Of Benjamin Walsh
Sent: Friday, February 03, 2017 8:33 AM
To: Marcus Shawcroft <marcus.shawcroft@gmail.com>
Cc: zephyr-devel@lists.zephyrproject.org
Subject: Re: [Zephyr-devel] Design Philosophy 'Minimal runtime error checking'...

Hi Marcus,

The discussion in this issue
https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.zephyrprojec
t.org_browse_ZEP-2D1618&d=DwICAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=Ur8eT0ZY1DOjJtZ8biKXp_CyOvecmZtsphp7SXIsE1Y&m=gpOaAU-cOMNJrt21NOBkOIUq6FX5T4Fhaiv-duTE22A&s=fj5iSI4yC35d6Fb2_uvW7TmltO9Bl91LGZ6rY_0IFOY&e= has prompted me to raise this topic in the wider forum.

Zephyr has a philosophy of "minimal runtime error checking", it says
so here
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.zephyrproject
.org_doc_introduction_introducing-5Fzephyr.html&d=DwICAg&c=DPL6_X_6JkX
Fx7AXWqB0tg&r=Ur8eT0ZY1DOjJtZ8biKXp_CyOvecmZtsphp7SXIsE1Y&m=gpOaAU-cOM
NJrt21NOBkOIUq6FX5T4Fhaiv-duTE22A&s=05S5_26RFpyKsNf9RTA_30TaVTGma7uUf0
fhQUipdYE&e=
!

There is little other written material on this topic (that I can
find), but in various jira issue comments and gerrit reviews there are
occasional references to this topic.

In the absence of any more detailed written material on this topic it
is difficult to reason about what any arbitrary piece of the code base
should or should not do in order to implement this philosophy
consistently.
The approache taken by the core kernel is pretty simple:

- if an error happens because it's the user's fault, __ASSERT(): the
kernel doesn't care, you did something stupid :)

- if the error comes from proper usage and thus an internal error path,
return an -errno to user

This is why you will see almost no -EINVAL being returned from the core kernel API. The only -EINVAL values returned come from a parameter that should have been valid from the user's point-of-view, but the kernel state changed and made it invalid. I did a quick grep for the kernel, and there are only two instances of it.

You can take a look at how I used __ASSERT() and errnos in the new
k_poll() API to give you an idea, but it's pretty straightforwared.

Regards,
Ben

Looking through the code base there is a fair amount of evidence to
suggest that interpretation of the philosophy varies considerably. By
way of example, if we look in the drivers tree, and look at the
prevalence of -E* return codes and ASSERT* static trapping, we find:

133 source files use -E* return codes
53 source files use ASSERT traps

The break down the use of specific -E* return codes and ASSERT* macros
across those files we find:

443 -EIO
247 -EINVAL
179 -ENOTSUP
25 -EBUSY
20 -ENODEV
11 -ENOTCONN
10 -EPERM
7 -EALREADY
3 -ENOSYS
3 -ENOMEM
3 -EINPROGRESS
3 -EAGAIN
2 -EVAL
1 -ENODATA
1 -ENOBUFS
1 -EMSGSIZE

97 ASSERT_NO_MSG
23 ASSERT
5 ASSERT_EVAL

This suggests, that many if not most drivers detect and pass error
codes dynamically rather than ASSERTing.

The use of -EIO is extensively used for general failure in the driver,
perhaps this is one group where the design philosophy might suggest
ASSERT should be used more aggressively.

The use of -EINVAL is predominantly associated with API 'configure'
implementation, sanity checking of input parameters. This seems
reasonable to me. But it could be argued that it is a static error
for a user to attempt an illegal driver configuration.

The -ENOTSUP uses are often returned by drivers that only partially
implement the API for a specific class of driver. This is primarily
useful for an application that catches the code and then implements an
alternative strategy.... Is this consistent with the design
philosophy, it is perhaps more appropriate to rigorously apply the
philosophy and ASSERT().

The -EBUSY uses vary. At least some instances appear to be cases
where the EBUSY code is returned by the driver API (ie not caught
internally). Thus effectively mandating that all users of that API
must check the return code (and presumably busy loop). This, at least,
superficially appears to be an anti pattern of the design philosophy.
The -EAGAIN uses also appear to be examples of design philosophy
'anti pattern'

Many of the remaining less frequent -E error codes are often used
interchangeably for the same purpose between different drivers. I
don;t have a view right now as to whether most of these fit the
philosophy or not, but at a minimum we should probably more consistent
with -E code choice.

If(f) we intend to hold to the "minimal runtime error checking"
philosophy I think it would be highly desirable to elaborate some more
detailed guidance on what to assert and what to error return. Doing so
will enable more effective review of new code coming into the tree,
and provide a basis to tidy up some of the existing code.

This quick rummage through drivers/* also suggest that for a given
API, driver implementations are inconsistent w.r.t the return codes
they pass back to the caller. I think we should extended the
include/*.h driver API documentation for each API function with a
return code to indicates what E* codes are possible in what
circumstances. Doing so will help driver users to understand the
failure scenarios they must handle (-EAGAIN and -EBUSY were a surprise
to me) and, perhaps more important right now, help device driver
writers to reason about how their driver is expected to conform.

Incidentally, I've written about drivers/* here, much of this applies
across other substantial parts of the tree...

Thoughts welcome, in particular:
- does the design philosophy still stand
- where should we document/elaborate implementation guidance
- should we tighten the include/*.h driver API documentation w.r.t
expected behaviour

Cheers
/Marcus
_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@lists.zephyrproject.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.zephyrproje
ct.org_mailman_listinfo_zephyr-2Ddevel&d=DwICAg&c=DPL6_X_6JkXFx7AXWqB0
tg&r=Ur8eT0ZY1DOjJtZ8biKXp_CyOvecmZtsphp7SXIsE1Y&m=gpOaAU-cOMNJrt21NOB
kOIUq6FX5T4Fhaiv-duTE22A&s=LlFmZfRQXvgSFLtDKlBelnaq2tsB4zQxZJ6oqGYee2o
&e=
--
Benjamin Walsh, SMTS
WR VxWorks Virtualization Profile
www.windriver.com
Zephyr kernel maintainer
www.zephyrproject.org
_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@lists.zephyrproject.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.zephyrproject.org_mailman_listinfo_zephyr-2Ddevel&d=DwICAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=Ur8eT0ZY1DOjJtZ8biKXp_CyOvecmZtsphp7SXIsE1Y&m=gpOaAU-cOMNJrt21NOBkOIUq6FX5T4Fhaiv-duTE22A&s=LlFmZfRQXvgSFLtDKlBelnaq2tsB4zQxZJ6oqGYee2o&e=

5621 - 5640 of 8046