why can’t dle thread be added to the run queue?


Justin Huang
 

Hi Zephyr users,

I recently looked into the Zephyr scheduler for an SMP application and see a case where:
  • One core that runs an old thread (dummy in this case) and swaps to its idle thread.
  • Soon it fails the ASSERT in z_priq_dumb_add(), called by z_requeue_current(), and by do_swap()
My naive understanding about the scheduler is when one core doesn’t have any other thread to run, it will just run its idle thread, which is, I believe, what the do_swap() does. So I cannot understand why the ASSERT is there.

But apparently I am missing something here.
Could someone please shed some light on this case?

Cheers,
Justin


Andy Ross
 

The dummy threads don't "run", they're just thread structs used in place of a real thread.  They're essentially a trick used internally during CPU initialization[1] to be able to use the regular context switch to switch into the first thread without having a proper thread context to switch "out" of.  There should be protection against situations like you're seeing where the dummy thread is being presented to the scheduler as a real thread struct, thus that assertion.
 
Basically something has broken your scheduler state somewhere.  Is your exerciser something simple you can show against upstream Zephyr?  If so, it's probably worth filing an issue in github and tagging or assigning @andyross.  If not, can you detail what you're doing so we can try to guess what's going wrong?
 
Andy

[1] And in one legacy IPC primitive, to be able to have something in a wait queue record that isn't an actual thread.  One of these days I hope to fix that, it complicates things in a bunch of areas.  But that's unlikely to be your problem unless you're using (and probably modifying the implementation of) kernel mailboxes.