why can’t le thread be added to the run queue?


Justin Huang
 

Hi Andy,

Thank you for replying.
Sorry for the confusion. I incorrectly described the situation.
In this case it is the ‘idle thread’ that was to be swapped out, and fails the ASSERT at 
https://github.com/zephyrproject-rtos/zephyr/blob/3f826560aaf81a444018293bd6acce3c339fe150/kernel/sched.c#L177 
(I am using 2.7.0)

I’ve attached a stack trace hopefully to show a bit more details.
What I see in the stack is that when a core yields from the idle thread, and when the scheduler tries to put the idle thread back to the run queue (via z_priq_dumb_add()), we failed the ASSERT.

Could you share your thoughts?

Many thanks again,
Justin

On Sep 8, 2022, 7:44 AM -0600, Andy Ross <andy@...>, wrote:

The dummy threads don't "run", they're just thread structs used in place of a real thread.  They're essentially a trick used internally during CPU initialization[1] to be able to use the regular context switch to switch into the first thread without having a proper thread context to switch "out" of.  There should be protection against situations like you're seeing where the dummy thread is being presented to the scheduler as a real thread struct, thus that assertion.
 
Basically something has broken your scheduler state somewhere.  Is your exerciser something simple you can show against upstream Zephyr?  If so, it's probably worth filing an issue in github and tagging or assigning @andyross.  If not, can you detail what you're doing so we can try to guess what's going wrong?
 
Andy

[1] And in one legacy IPC primitive, to be able to have something in a wait queue record that isn't an actual thread.  One of these days I hope to fix that, it complicates things in a bunch of areas.  But that's unlikely to be your problem unless you're using (and probably modifying the implementation of) kernel mailboxes.


Andy Ross
 

On 9/8/2022 7:29 PM, Justin Huang wrote:
> I think I am able to reproduce what I see with mainstream Zephyr, and below is what I do:
>
>     checkout HEAD of main (I was using dddb5dd6b0de9a8b8621b9ac695fd3fd980a33d6)
>     disable SCHED_IPI_SUPPORTED in the build (I just hacked arch/Kconfig and commented out “select SCHED_IPI_SUPPORTED if SMP”)
>     Run "west build -b qemu_riscv64_smp tests/kernel/smp && west build -t run"
>     Should see the ASSERT.
>
> It appears that the path of SMP test on RISCV without IPI support has not yet been tested, or maybe this configuration is invalid?
 
Ah, OK.  Indeed, that's a somewhat obscure feature, and it's not used in-tree currently I don't think (esp32 was for a long while, but they've moved off of SMP in the default configuration IIRC?).  The intent of that busy loop was to allow for bringup of SMP configurations before a working interprocessor interrupt was available.  Very few real/production systems can tolerate a spin loop in the idle thread.
 
But it looks like it's bitrotten and the default k_yield() path now doesn't work in idle?[1].  Can you submit a bug and assign it to me (@andyross on github)?
 
Also: I'm a little worried about your development setup.  To repeat: the circumstance where you'd want this feature corresponds to bringup of a new SMP platform where IPIs aren't ready.  If that's what you're doing, it would likely be better or everyone if you were working against the main branch and not a year-stale release.
 
Andy
 
[1] I'm going to place my bet that it was at or soon after this commit, which removed the last in-tree usage of k_yield() within the idle thread about a year ago: https://github.com/zephyrproject-rtos/zephyr/commit/851d14afc8941313a6f3faeb74f84ed73a33429a