Re: [Zephyr-users] #BluetoothMesh ...about latest kernel OOPS & exception #bluetoothmesh


vikrant8051 <vikrant8051@...>
 

Hi Andy,

commit 43553da9b2bb9e537185695899fb5184c3b17ebe
Author: Andy Ross <andrew.j.ross@...>
Date:   Thu May 31 11:13:49 2018 -0700

    kernel/sched: Fix preemption logic
   
    The should_preempt() code was catching some of the "unrunnable" cases
    but not all of them, opening the possibility of failing to preempt a
    just-pended thread and thus waking it up synchronously.  There are
    reports of this causing spin loops over k_poll() in the network stack
    work queues (see #8049).
   
    Note that the previous _is_dummy() call is folded into (the somewhat
    verbosely named) _is_thread_prevented_from_running(), and that the
    order of tests has been changed/optimized to hopefully catch common
    cases earlier.
   
    Suggested-by: Michael Scott <michael@...>
    Signed-off-by: Andy Ross <andrew.j.ross@...>

Is this what you are pointing?
It is there in my local repository.
May I now comment out line no. line no. 3318 of subsys/bluetooth/host/hci_core.c?

--------------------------------------------------------------------------------------------------------------------------------------

Could anybody help me to find out serious bugs in my PR:8101?


It is very simple & will help for testing. I am not interested in merging it, but
need a complete bug free/easy to test App which will help in such scenarios.

Thank You !!



On Sat, Jun 2, 2018 at 4:10 AM, Andy Ross <andrew.j.ross@...> wrote:
Vikrant8051 <vikrant8051@...> wrote:
> I commented line no. 3318 of subsys/bluetooth/host/hci_core.c i.e. //
> BT_ASSERT(buf);
>
> And after that everything is working perfectly normal.

I was pointed to this on IRC.  The symptom you've got there sounds a
lot like a scheduler mistake that got fixed yesterday, where
ostensibly-pended threads could be incorrectly swapped back in.  They
would then wake up (generally with an -EAGAIN return value from
_Swap()) and take whatever default action is approparite (like
returning a NULL value from the empty list), so it's a little subtle
to recognize as a bug and often "recoverable" by existing handling.

But it's a real bug.  Can you verify that you tree contains commit
43553da9b2bb9e5 and see if that was the same root cause?

There's at least one other report of instability after that merged,
but this definitely sounds like it might be your local problem.  I'm
trying to track this down locally, but alas our test suite passes, so
I'm hoping to get more information.

Andy

Join devel@lists.zephyrproject.org to automatically receive all group messages.