Assert in sched.c


erik.johnson@...
 

I'm running v2.1.99-ncs1, which is a downstream minimal fork provided by Nordic Semiconductor. For the purposes of this bug, I can't find any difference between their code and the upstream Zephyr source.

There's no tag for it, but we're using the nRF9160 from Nordic.
 
Anyway, I'm having an assertion in the tick sleep code. It seems that the assert is trying to make sure the resuming context isn't marked as "suspended", but somehow that assertion is getting hit...
 
I'm unfortunately just not familiar enough with the kernel yet to know what's going on, so I was hoping to get some help on it. In particular, if there are any kind of known "gotcha" cases that would cause this assertion.
 
I'm happy to provide as much information as I can, with the caveat that part of the source code impacted by this is not available: we're using a binary library from Nordic Semiconductor that runs code inside of a thread we're creating in our own source code.

The function being used is k_sleep(), and the function is successfully used a number of times. But, as soon as I do a particular other thing with a download client, I suddenly get the assert. Running the additional code will successfully do its stuff without issues a few times, but eventually the condition gets hit. As far as I can tell, it's not a race condition but rather based on some sort of state in the Nordic library.
 
Here's the assertion in the debug log:
ASSERTION FAIL [!z_is_thread_state_set(_kernel.current, ((1UL << (4))))] @ ZEPHYR_BASE/kernel/sched.c:1096
 
[00:05:05.932,373] <err> os: r0/a1: 0x00000004 r1/a2: 0x00000457 r2/a3: 0x00000001
[00:05:05.941,314] <err> os: r3/a4: 0x00063757 r12/ip: 0x00000030 r14/lr: 0x00054c2b
[00:05:05.950,256] <err> os: xpsr: 0x61040000
[00:05:05.955,657] <err> os: s[ 0]: 0x00000001 s[ 1]: 0x00000001 s[ 2]: 0x00000001 s[ 3]: 0x00000001
[00:05:05.966,400] <err> os: s[ 4]: 0x00000001 s[ 5]: 0x00000001 s[ 6]: 0x00000001 s[ 7]: 0x00000001
[00:05:05.977,172] <err> os: s[ 8]: 0x00000001 s[ 9]: 0x00000001 s[10]: 0x00000001 s[11]: 0x00000001
[00:05:05.987,915] <err> os: s[12]: 0x00000001 s[13]: 0x00000001 s[14]: 0x00000001 s[15]: 0x00000001
[00:05:05.998,657] <err> os: fpscr: 0x00000000
[00:05:06.004,028] <err> os: Faulting instruction address (r15/pc): 0x00058806
[00:05:06.012,176] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
[00:05:06.020,111] <err> os: Current thread: 0x200276c4 (unknown)
[00:05:06.027,099] <err> fatal_error: Resetting system


Boie, Andrew P
 

Check for stack overflows, enable CONFIG_STACK_SENTINEL if you don't have an MPU.

 

From: users@... <users@...> On Behalf Of erik.johnson@...
Sent: Wednesday, April 1, 2020 4:24 PM
To: users@...
Subject: [Zephyr-users] Assert in sched.c

 

I'm running v2.1.99-ncs1, which is a downstream minimal fork provided by Nordic Semiconductor. For the purposes of this bug, I can't find any difference between their code and the upstream Zephyr source.

There's no tag for it, but we're using the nRF9160 from Nordic.

 

Anyway, I'm having an assertion in the tick sleep code. It seems that the assert is trying to make sure the resuming context isn't marked as "suspended", but somehow that assertion is getting hit...

 

I'm unfortunately just not familiar enough with the kernel yet to know what's going on, so I was hoping to get some help on it. In particular, if there are any kind of known "gotcha" cases that would cause this assertion.

 

I'm happy to provide as much information as I can, with the caveat that part of the source code impacted by this is not available: we're using a binary library from Nordic Semiconductor that runs code inside of a thread we're creating in our own source code.

The function being used is k_sleep(), and the function is successfully used a number of times. But, as soon as I do a particular other thing with a download client, I suddenly get the assert. Running the additional code will successfully do its stuff without issues a few times, but eventually the condition gets hit. As far as I can tell, it's not a race condition but rather based on some sort of state in the Nordic library.

 

Here's the assertion in the debug log:

ASSERTION FAIL [!z_is_thread_state_set(_kernel.current, ((1UL << (4))))] @ ZEPHYR_BASE/kernel/sched.c:1096

 

[00:05:05.932,373] <err> os: r0/a1: 0x00000004 r1/a2: 0x00000457 r2/a3: 0x00000001

[00:05:05.941,314] <err> os: r3/a4: 0x00063757 r12/ip: 0x00000030 r14/lr: 0x00054c2b

[00:05:05.950,256] <err> os: xpsr: 0x61040000

[00:05:05.955,657] <err> os: s[ 0]: 0x00000001 s[ 1]: 0x00000001 s[ 2]: 0x00000001 s[ 3]: 0x00000001

[00:05:05.966,400] <err> os: s[ 4]: 0x00000001 s[ 5]: 0x00000001 s[ 6]: 0x00000001 s[ 7]: 0x00000001

[00:05:05.977,172] <err> os: s[ 8]: 0x00000001 s[ 9]: 0x00000001 s[10]: 0x00000001 s[11]: 0x00000001

[00:05:05.987,915] <err> os: s[12]: 0x00000001 s[13]: 0x00000001 s[14]: 0x00000001 s[15]: 0x00000001

[00:05:05.998,657] <err> os: fpscr: 0x00000000

[00:05:06.004,028] <err> os: Faulting instruction address (r15/pc): 0x00058806

[00:05:06.012,176] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0

[00:05:06.020,111] <err> os: Current thread: 0x200276c4 (unknown)

[00:05:06.027,099] <err> fatal_error: Resetting system