Issue with k_poll


Venkatesh Sukumaran
 

Hi Zephyr developers,

I was looking at the k_poll mechanism in Zephyr and found that it can potentially miss events from the producer in SMP when the following happens:

1) Set up a bunch of events with K_POLL_TYPE_SIGNAL and K_POLL_MODE_NOTIFY_ONLY.
2) A producer thread calls k_poll_signal_raise() to set the "signalled" field to 1 for event X in the list above.
3) The destination/consumer thread is already in the middle of processing of events and finds the same event X "raised" (from a previous signal_raise() call maybe), runs the handler for the event to completion and sets the "signalled" field to zero.
4) Destination thread goes back to k_poll to pend on events.

Now depending how 2) and 3) are ordered - step 3) could set "signalled" back to zero and lose the update from the producer thread in step 2).

Am I missing something here? Shouldn't we take a snapshot of the events/signals when exiting k_poll in the critical section and pass that to the destination thread? Once, a thread calls k_poll again then "OR" the pending events with events that the thread has not handled so far (if any)?

Thanks,
- Venkatesh.


Venkatesh Sukumaran
 

Anyone familiar with k_poll architecture here? Let me know if you need any other info.

Thanks,
- Venkatesh.


On Wed, Oct 27, 2021 at 12:47 PM Venkatesh Sukumaran <venkatesh.sukumaran@...> wrote:
Hi Zephyr developers,

I was looking at the k_poll mechanism in Zephyr and found that it can potentially miss events from the producer in SMP when the following happens:

1) Set up a bunch of events with K_POLL_TYPE_SIGNAL and K_POLL_MODE_NOTIFY_ONLY.
2) A producer thread calls k_poll_signal_raise() to set the "signalled" field to 1 for event X in the list above.
3) The destination/consumer thread is already in the middle of processing of events and finds the same event X "raised" (from a previous signal_raise() call maybe), runs the handler for the event to completion and sets the "signalled" field to zero.
4) Destination thread goes back to k_poll to pend on events.

Now depending how 2) and 3) are ordered - step 3) could set "signalled" back to zero and lose the update from the producer thread in step 2).

Am I missing something here? Shouldn't we take a snapshot of the events/signals when exiting k_poll in the critical section and pass that to the destination thread? Once, a thread calls k_poll again then "OR" the pending events with events that the thread has not handled so far (if any)?

Thanks,
- Venkatesh.


Carles Cufi
 

+ Andy, Daniel

 

From: devel@... <devel@...> On Behalf Of Venkatesh Sukumaran via lists.zephyrproject.org
Sent: 08 November 2021 08:08
To: devel@...
Subject: Re: [Zephyr-devel] Issue with k_poll

 

Anyone familiar with k_poll architecture here? Let me know if you need any other info.


Thanks,

- Venkatesh.

 

 

On Wed, Oct 27, 2021 at 12:47 PM Venkatesh Sukumaran <venkatesh.sukumaran@...> wrote:

Hi Zephyr developers,

 

I was looking at the k_poll mechanism in Zephyr and found that it can potentially miss events from the producer in SMP when the following happens:

 

1) Set up a bunch of events with K_POLL_TYPE_SIGNAL and K_POLL_MODE_NOTIFY_ONLY.

2) A producer thread calls k_poll_signal_raise() to set the "signalled" field to 1 for event X in the list above.

3) The destination/consumer thread is already in the middle of processing of events and finds the same event X "raised" (from a previous signal_raise() call maybe), runs the handler for the event to completion and sets the "signalled" field to zero.

4) Destination thread goes back to k_poll to pend on events.

 

Now depending how 2) and 3) are ordered - step 3) could set "signalled" back to zero and lose the update from the producer thread in step 2).

 

Am I missing something here? Shouldn't we take a snapshot of the events/signals when exiting k_poll in the critical section and pass that to the destination thread? Once, a thread calls k_poll again then "OR" the pending events with events that the thread has not handled so far (if any)?


Thanks,

- Venkatesh.


Andy Ross
 

That's right:  k_poll() is level triggered, it's not a queue.  The number of wakeups is not guaranteed to be equal to the number of events signalled.  The idea is that you go to sleep waiting on any of a number of events/things to occur and are woken up when at least one of them does.  But then you need to do the accounting yourself to figure out who was signaled (and how many times) and what should happen.

In this particular use case it sounds like you should be polling on a semaphore and not a signal.  Semaphores do provide proper counting of how many give/take calls are made.

Andy


From: Cufi, Carles <Carles.Cufi@...>
Sent: Monday, November 8, 2021 1:12 AM
To: venkatesh.sukumaran@... <venkatesh.sukumaran@...>; devel@... <devel@...>; Ross, Andrew J <andrew.j.ross@...>; Leung, Daniel <daniel.leung@...>
Subject: RE: [Zephyr-devel] Issue with k_poll
 

+ Andy, Daniel

 

From: devel@... <devel@...> On Behalf Of Venkatesh Sukumaran via lists.zephyrproject.org
Sent: 08 November 2021 08:08
To: devel@...
Subject: Re: [Zephyr-devel] Issue with k_poll

 

Anyone familiar with k_poll architecture here? Let me know if you need any other info.


Thanks,

- Venkatesh.

 

 

On Wed, Oct 27, 2021 at 12:47 PM Venkatesh Sukumaran <venkatesh.sukumaran@...> wrote:

Hi Zephyr developers,

 

I was looking at the k_poll mechanism in Zephyr and found that it can potentially miss events from the producer in SMP when the following happens:

 

1) Set up a bunch of events with K_POLL_TYPE_SIGNAL and K_POLL_MODE_NOTIFY_ONLY.

2) A producer thread calls k_poll_signal_raise() to set the "signalled" field to 1 for event X in the list above.

3) The destination/consumer thread is already in the middle of processing of events and finds the same event X "raised" (from a previous signal_raise() call maybe), runs the handler for the event to completion and sets the "signalled" field to zero.

4) Destination thread goes back to k_poll to pend on events.

 

Now depending how 2) and 3) are ordered - step 3) could set "signalled" back to zero and lose the update from the producer thread in step 2).

 

Am I missing something here? Shouldn't we take a snapshot of the events/signals when exiting k_poll in the critical section and pass that to the destination thread? Once, a thread calls k_poll again then "OR" the pending events with events that the thread has not handled so far (if any)?


Thanks,

- Venkatesh.


Venkatesh Sukumaran
 

I understand that the number of wakeups is not equal to the number of events signalled but the problem here is that the pending thread can *miss* a signal. The "signalled" field is being reset to zero by the pending thread after servicing the signal handler and at that time, the same signal could be set again.

Thanks,
- Venkatesh.


On Mon, Nov 8, 2021 at 6:54 PM Andy Ross <andrew.j.ross@...> wrote:
That's right:  k_poll() is level triggered, it's not a queue.  The number of wakeups is not guaranteed to be equal to the number of events signalled.  The idea is that you go to sleep waiting on any of a number of events/things to occur and are woken up when at least one of them does.  But then you need to do the accounting yourself to figure out who was signaled (and how many times) and what should happen.

In this particular use case it sounds like you should be polling on a semaphore and not a signal.  Semaphores do provide proper counting of how many give/take calls are made.

Andy

From: Cufi, Carles <Carles.Cufi@...>
Sent: Monday, November 8, 2021 1:12 AM
To: venkatesh.sukumaran@... <venkatesh.sukumaran@...>; devel@... <devel@...>; Ross, Andrew J <andrew.j.ross@...>; Leung, Daniel <daniel.leung@...>
Subject: RE: [Zephyr-devel] Issue with k_poll
 

+ Andy, Daniel

 

From: devel@... <devel@...> On Behalf Of Venkatesh Sukumaran via lists.zephyrproject.org
Sent: 08 November 2021 08:08
To: devel@...
Subject: Re: [Zephyr-devel] Issue with k_poll

 

Anyone familiar with k_poll architecture here? Let me know if you need any other info.


Thanks,

- Venkatesh.

 

 

On Wed, Oct 27, 2021 at 12:47 PM Venkatesh Sukumaran <venkatesh.sukumaran@...> wrote:

Hi Zephyr developers,

 

I was looking at the k_poll mechanism in Zephyr and found that it can potentially miss events from the producer in SMP when the following happens:

 

1) Set up a bunch of events with K_POLL_TYPE_SIGNAL and K_POLL_MODE_NOTIFY_ONLY.

2) A producer thread calls k_poll_signal_raise() to set the "signalled" field to 1 for event X in the list above.

3) The destination/consumer thread is already in the middle of processing of events and finds the same event X "raised" (from a previous signal_raise() call maybe), runs the handler for the event to completion and sets the "signalled" field to zero.

4) Destination thread goes back to k_poll to pend on events.

 

Now depending how 2) and 3) are ordered - step 3) could set "signalled" back to zero and lose the update from the producer thread in step 2).

 

Am I missing something here? Shouldn't we take a snapshot of the events/signals when exiting k_poll in the critical section and pass that to the destination thread? Once, a thread calls k_poll again then "OR" the pending events with events that the thread has not handled so far (if any)?


Thanks,

- Venkatesh.


Andy Ross
 

Can you submit that as a github bug?  I don't know that I see it, but if you're sure that it can happen it's something we need to fix.

Andy


From: Venkatesh Sukumaran <venkatesh.sukumaran@...>
Sent: Monday, November 8, 2021 9:39 AM
To: Ross, Andrew J <andrew.j.ross@...>
Cc: Cufi, Carles <Carles.Cufi@...>; devel@... <devel@...>; Leung, Daniel <daniel.leung@...>
Subject: Re: [Zephyr-devel] Issue with k_poll
 
I understand that the number of wakeups is not equal to the number of events signalled but the problem here is that the pending thread can *miss* a signal. The "signalled" field is being reset to zero by the pending thread after servicing the signal handler and at that time, the same signal could be set again.

Thanks,
- Venkatesh.


On Mon, Nov 8, 2021 at 6:54 PM Andy Ross <andrew.j.ross@...> wrote:
That's right:  k_poll() is level triggered, it's not a queue.  The number of wakeups is not guaranteed to be equal to the number of events signalled.  The idea is that you go to sleep waiting on any of a number of events/things to occur and are woken up when at least one of them does.  But then you need to do the accounting yourself to figure out who was signaled (and how many times) and what should happen.

In this particular use case it sounds like you should be polling on a semaphore and not a signal.  Semaphores do provide proper counting of how many give/take calls are made.

Andy

From: Cufi, Carles <Carles.Cufi@...>
Sent: Monday, November 8, 2021 1:12 AM
To: venkatesh.sukumaran@... <venkatesh.sukumaran@...>; devel@... <devel@...>; Ross, Andrew J <andrew.j.ross@...>; Leung, Daniel <daniel.leung@...>
Subject: RE: [Zephyr-devel] Issue with k_poll
 

+ Andy, Daniel

 

From: devel@... <devel@...> On Behalf Of Venkatesh Sukumaran via lists.zephyrproject.org
Sent: 08 November 2021 08:08
To: devel@...
Subject: Re: [Zephyr-devel] Issue with k_poll

 

Anyone familiar with k_poll architecture here? Let me know if you need any other info.


Thanks,

- Venkatesh.

 

 

On Wed, Oct 27, 2021 at 12:47 PM Venkatesh Sukumaran <venkatesh.sukumaran@...> wrote:

Hi Zephyr developers,

 

I was looking at the k_poll mechanism in Zephyr and found that it can potentially miss events from the producer in SMP when the following happens:

 

1) Set up a bunch of events with K_POLL_TYPE_SIGNAL and K_POLL_MODE_NOTIFY_ONLY.

2) A producer thread calls k_poll_signal_raise() to set the "signalled" field to 1 for event X in the list above.

3) The destination/consumer thread is already in the middle of processing of events and finds the same event X "raised" (from a previous signal_raise() call maybe), runs the handler for the event to completion and sets the "signalled" field to zero.

4) Destination thread goes back to k_poll to pend on events.

 

Now depending how 2) and 3) are ordered - step 3) could set "signalled" back to zero and lose the update from the producer thread in step 2).

 

Am I missing something here? Shouldn't we take a snapshot of the events/signals when exiting k_poll in the critical section and pass that to the destination thread? Once, a thread calls k_poll again then "OR" the pending events with events that the thread has not handled so far (if any)?


Thanks,

- Venkatesh.


Venkatesh Sukumaran
 


On Mon, Nov 8, 2021 at 11:27 PM Andy Ross <andrew.j.ross@...> wrote:
Can you submit that as a github bug?  I don't know that I see it, but if you're sure that it can happen it's something we need to fix.

Andy

From: Venkatesh Sukumaran <venkatesh.sukumaran@...>
Sent: Monday, November 8, 2021 9:39 AM
To: Ross, Andrew J <andrew.j.ross@...>
Cc: Cufi, Carles <Carles.Cufi@...>; devel@... <devel@...>; Leung, Daniel <daniel.leung@...>
Subject: Re: [Zephyr-devel] Issue with k_poll
 
I understand that the number of wakeups is not equal to the number of events signalled but the problem here is that the pending thread can *miss* a signal. The "signalled" field is being reset to zero by the pending thread after servicing the signal handler and at that time, the same signal could be set again.

Thanks,
- Venkatesh.


On Mon, Nov 8, 2021 at 6:54 PM Andy Ross <andrew.j.ross@...> wrote:
That's right:  k_poll() is level triggered, it's not a queue.  The number of wakeups is not guaranteed to be equal to the number of events signalled.  The idea is that you go to sleep waiting on any of a number of events/things to occur and are woken up when at least one of them does.  But then you need to do the accounting yourself to figure out who was signaled (and how many times) and what should happen.

In this particular use case it sounds like you should be polling on a semaphore and not a signal.  Semaphores do provide proper counting of how many give/take calls are made.

Andy

From: Cufi, Carles <Carles.Cufi@...>
Sent: Monday, November 8, 2021 1:12 AM
To: venkatesh.sukumaran@... <venkatesh.sukumaran@...>; devel@... <devel@...>; Ross, Andrew J <andrew.j.ross@...>; Leung, Daniel <daniel.leung@...>
Subject: RE: [Zephyr-devel] Issue with k_poll
 

+ Andy, Daniel

 

From: devel@... <devel@...> On Behalf Of Venkatesh Sukumaran via lists.zephyrproject.org
Sent: 08 November 2021 08:08
To: devel@...
Subject: Re: [Zephyr-devel] Issue with k_poll

 

Anyone familiar with k_poll architecture here? Let me know if you need any other info.


Thanks,

- Venkatesh.

 

 

On Wed, Oct 27, 2021 at 12:47 PM Venkatesh Sukumaran <venkatesh.sukumaran@...> wrote:

Hi Zephyr developers,

 

I was looking at the k_poll mechanism in Zephyr and found that it can potentially miss events from the producer in SMP when the following happens:

 

1) Set up a bunch of events with K_POLL_TYPE_SIGNAL and K_POLL_MODE_NOTIFY_ONLY.

2) A producer thread calls k_poll_signal_raise() to set the "signalled" field to 1 for event X in the list above.

3) The destination/consumer thread is already in the middle of processing of events and finds the same event X "raised" (from a previous signal_raise() call maybe), runs the handler for the event to completion and sets the "signalled" field to zero.

4) Destination thread goes back to k_poll to pend on events.

 

Now depending how 2) and 3) are ordered - step 3) could set "signalled" back to zero and lose the update from the producer thread in step 2).

 

Am I missing something here? Shouldn't we take a snapshot of the events/signals when exiting k_poll in the critical section and pass that to the destination thread? Once, a thread calls k_poll again then "OR" the pending events with events that the thread has not handled so far (if any)?


Thanks,

- Venkatesh.