why can’t e thread be added to the run queue?

Justin Huang

I think I am able to reproduce what I see with mainstream Zephyr, and below is what I do:
  • checkout HEAD of main (I was using dddb5dd6b0de9a8b8621b9ac695fd3fd980a33d6)
  • disable SCHED_IPI_SUPPORTED in the build (I just hacked arch/Kconfig and commented out “select SCHED_IPI_SUPPORTED if SMP”)
  • Run "west build -b qemu_riscv64_smp tests/kernel/smp && west build -t run"
  • Should see the ASSERT.
It appears that the path of SMP test on RISCV without IPI support has not yet been tested, or maybe this configuration is invalid?

I’d appreciate your input.

On Sep 8, 2022, 5:25 PM -0600, Justin Huang <justin.y.huang@...>, wrote:

Hi Andy,

Thank you for replying.
Sorry for the confusion. I incorrectly described the situation.
In this case it is the ‘idle thread’ that was to be swapped out, and fails the ASSERT at 
(I am using 2.7.0)

I’ve attached a stack trace hopefully to show a bit more details.
What I see in the stack is that when a core yields from the idle thread, and when the scheduler tries to put the idle thread back to the run queue (via z_priq_dumb_add()), we failed the ASSERT.

Could you share your thoughts?

Many thanks again,

On Sep 8, 2022, 7:44 AM -0600, Andy Ross <andy@...>, wrote:
The dummy threads don't "run", they're just thread structs used in place of a real thread.  They're essentially a trick used internally during CPU initialization[1] to be able to use the regular context switch to switch into the first thread without having a proper thread context to switch "out" of.  There should be protection against situations like you're seeing where the dummy thread is being presented to the scheduler as a real thread struct, thus that assertion.
Basically something has broken your scheduler state somewhere.  Is your exerciser something simple you can show against upstream Zephyr?  If so, it's probably worth filing an issue in github and tagging or assigning @andyross.  If not, can you detail what you're doing so we can try to guess what's going wrong?

[1] And in one legacy IPC primitive, to be able to have something in a wait queue record that isn't an actual thread.  One of these days I hope to fix that, it complicates things in a bunch of areas.  But that's unlikely to be your problem unless you're using (and probably modifying the implementation of) kernel mailboxes.