Using/misusing k_sem_init()


Piotr Mieńkowski <piotr.mienkowski at gmail.com...>
 

Hi all,

I'm reviewing my new code for Atmel SAM low level Ethernet driver and
there is one place where I have a troubling use of k_sem_init()
function. Very shortly my situation:

As is typically the case Atmel's Ethernet MAC module is using a so
called descriptor list to define a linked list of transmission and
reception buffers. These shared buffers are then used to pass data
between MAC module and low level Ethernet driver.

Let's focus on the transmit path. Assuming we have two transmission
buffers and each buffer can store one full Ethernet frame the Ethernet
driver can send up to two frames to the MAC module but to send a third
frame it has to wait until a free buffer becomes available. This is a
perfect case for using counting semaphores.

During driver initialization phase we would call

k_sem_init(&tx_sem, 2, 2);

to initialize semaphores and then k_sem_take() in transmit thread every
time we send a frame and k_sem_give() in IRQ handler every time the MAC
module has read all the data from the transmit buffer. So far so good.
However, now we can have a situation where the driver (transmit thread)
has sent two frames, stopped on k_sem_take() call when trying to send a
third one and at that moment an unrecoverable transmission error
happens. At this point I would like to reinitialize the descriptor list
and also reinitialize the semaphores, i.e. I would like to call
k_sem_init() while there is a thread waiting for the very semaphore to
become available.

Would that be OK, is there a better solution?

Thanks and regards,
Piotr


Benjamin Walsh <benjamin.walsh@...>
 

Hi Piotr,

Sorry for the late response.

I'm reviewing my new code for Atmel SAM low level Ethernet driver and
there is one place where I have a troubling use of k_sem_init()
function. Very shortly my situation:

As is typically the case Atmel's Ethernet MAC module is using a so
called descriptor list to define a linked list of transmission and
reception buffers. These shared buffers are then used to pass data
between MAC module and low level Ethernet driver.

Let's focus on the transmit path. Assuming we have two transmission
buffers and each buffer can store one full Ethernet frame the Ethernet
driver can send up to two frames to the MAC module but to send a third
frame it has to wait until a free buffer becomes available. This is a
perfect case for using counting semaphores.

During driver initialization phase we would call

k_sem_init(&tx_sem, 2, 2);

to initialize semaphores and then k_sem_take() in transmit thread every
time we send a frame and k_sem_give() in IRQ handler every time the MAC
module has read all the data from the transmit buffer. So far so good.
However, now we can have a situation where the driver (transmit thread)
has sent two frames, stopped on k_sem_take() call when trying to send a
third one and at that moment an unrecoverable transmission error
happens. At this point I would like to reinitialize the descriptor list
and also reinitialize the semaphores, i.e. I would like to call
k_sem_init() while there is a thread waiting for the very semaphore to
become available.
What is the behaviour you expect exactly ? What would a semaphore
re-init signal to the thread pending on the semaphore in your
application ?

I can tell you that with the current implementation of k_sem_init(), the
thread would not be awakened, but the wait queue would be reinitialized,
so that thread would be "lost".

You could give the semaphore twice (since you initialize it with '2') to
flush the threads pending on it, but that would probably just cause them
to pend again on the semaphore. Worst, the thread would probably just
think it got the semaphore and just try to do the operation expected of
it when it is able to get the semaphore.

Would that be OK, is there a better solution?
One thing you could do would be to abort the thread, then re-init the
semaphore, then respawn the thread.

We could maybe add a k_sem_flush() API, that unpend all threads waiting
on the semaphore, but returning them an errno.

Cheers,
Ben


Thanks and regards,
Piotr
--
Benjamin Walsh, SMTS
Wind River Rocket
www.windriver.com
Zephyr kernel maintainer
www.zephyrproject.org


Daniel Thompson <daniel.thompson@...>
 

On 12/12/16 18:10, Benjamin Walsh wrote:
Let's focus on the transmit path. Assuming we have two transmission
buffers and each buffer can store one full Ethernet frame the Ethernet
driver can send up to two frames to the MAC module but to send a third
frame it has to wait until a free buffer becomes available. This is a
perfect case for using counting semaphores.

<snip>
>>
However, now we can have a situation where the driver (transmit thread)
has sent two frames, stopped on k_sem_take() call when trying to send a
third one and at that moment an unrecoverable transmission error
happens. At this point I would like to reinitialize the descriptor list
and also reinitialize the semaphores,
To reinitialize things here implies that the owner of the buffer has
lost so much state that it can no longer release it as normal. Why does
an unrecoverable error cause such a catastrophic loss of state?


Would that be OK, is there a better solution?
One thing you could do would be to abort the thread, then re-init the
semaphore, then respawn the thread.

We could maybe add a k_sem_flush() API, that unpend all threads waiting
on the semaphore, but returning them an errno.
To avoid race conditions its likely a flush would have to be sticky,
causing everything to return an error until the next init.

Basically in a situation where you resource tracking is borked to the
point that you no longer know how to resolve the situation with the
"give" API then its likely you can't know, when calling k_sem_flush(),
that all the threads are nicely blocked waiting for the flush.

On that basis, perhaps any proposed API should be more explicit,
k_sem_set_error().

I admit a degree of skepticism here. The code to handle the new
semaphore error paths may end up more complex than the code to correctly
track buffer ownership within the IP stack error paths.


Daniel.


Piotr Mieńkowski <piotr.mienkowski at gmail.com...>
 

Hi Ben,

What is the behaviour you expect exactly ? What would a semaphore
re-init signal to the thread pending on the semaphore in your
application ?
Sorry, I wrote a somehow longish description of the issue but didn't
mentioned the expected behavior.

When I clean up the descriptor list and re-initialize the semaphores I
would expect the pending thread to be awaken and continue as normal. At
that point the resources (transmission buffers) are available again so
the transmit thread can send the data (copy them to transmission buffers
and let the MAC module know there is new data).
You could give the semaphore twice (since you initialize it with '2') to
flush the threads pending on it, but that would probably just cause them
to pend again on the semaphore. Worst, the thread would probably just
think it got the semaphore and just try to do the operation expected of
it when it is able to get the semaphore.
I'm not sure I understand what is meant by flushing the threads. I
believe that using k_sem_give() in a loop instead of k_sem_init() would
do exactly the job I need, the transmit thread would be awaken and
continue as normal. It's only that this solution seemed not very elegant
especially in cases where the initial semaphore count is high.

That said, I redesigned my Ethernet driver code not to copy the data.
The network buffers passed to the driver by the higher layer are
released directly by the IRQ handler with net_buf_unref() after the data
are sent by the MAC module. At the moment my code is not using any
semaphores and for me the issue is resolved.

Thanks and regards,
Piotr


Benjamin Walsh <benjamin.walsh@...>
 

On Tue, Dec 13, 2016 at 03:57:38PM +0100, Piotr Mienkowski wrote:
Hi Ben,

What is the behaviour you expect exactly ? What would a semaphore
re-init signal to the thread pending on the semaphore in your
application ?
Sorry, I wrote a somehow longish description of the issue but didn't
mentioned the expected behavior.

When I clean up the descriptor list and re-initialize the semaphores I
would expect the pending thread to be awaken and continue as normal. At
that point the resources (transmission buffers) are available again so
the transmit thread can send the data (copy them to transmission buffers
and let the MAC module know there is new data).

You could give the semaphore twice (since you initialize it with '2') to
flush the threads pending on it, but that would probably just cause them
to pend again on the semaphore. Worst, the thread would probably just
think it got the semaphore and just try to do the operation expected of
it when it is able to get the semaphore.
I'm not sure I understand what is meant by flushing the threads. I
I meant that would wake up the thread waiting on the semaphore, since
reinitializing it would not do that.

believe that using k_sem_give() in a loop instead of k_sem_init() would
do exactly the job I need, the transmit thread would be awaken and
continue as normal. It's only that this solution seemed not very elegant
especially in cases where the initial semaphore count is high.

That said, I redesigned my Ethernet driver code not to copy the data.
The network buffers passed to the driver by the higher layer are
released directly by the IRQ handler with net_buf_unref() after the data
are sent by the MAC module. At the moment my code is not using any
semaphores and for me the issue is resolved.
OK, cool. No need to add a feature to the semaphore API that does not
have at least one real use-case. :)

Cheers,
Ben


Benjamin Walsh <benjamin.walsh@...>
 

On Tue, Dec 13, 2016 at 09:59:31AM +0000, Daniel Thompson wrote:
On 12/12/16 18:10, Benjamin Walsh wrote:
Let's focus on the transmit path. Assuming we have two transmission
buffers and each buffer can store one full Ethernet frame the Ethernet
driver can send up to two frames to the MAC module but to send a third
frame it has to wait until a free buffer becomes available. This is a
perfect case for using counting semaphores.

<snip>

However, now we can have a situation where the driver (transmit thread)
has sent two frames, stopped on k_sem_take() call when trying to send a
third one and at that moment an unrecoverable transmission error
happens. At this point I would like to reinitialize the descriptor list
and also reinitialize the semaphores,
To reinitialize things here implies that the owner of the buffer has
lost so much state that it can no longer release it as normal. Why
does an unrecoverable error cause such a catastrophic loss of state?


Would that be OK, is there a better solution?
One thing you could do would be to abort the thread, then re-init the
semaphore, then respawn the thread.

We could maybe add a k_sem_flush() API, that unpend all threads waiting
on the semaphore, but returning them an errno.
To avoid race conditions its likely a flush would have to be sticky,
causing everything to return an error until the next init.
Yeah, I guess, and there has to be some protocol between the thread that
does the flush which then has to wait for the threads that take the
semaphore which then all have to reply to it before it can do the
re-init.

Basically in a situation where you resource tracking is borked to
the point that you no longer know how to resolve the situation with
the "give" API then its likely you can't know, when calling
k_sem_flush(), that all the threads are nicely blocked waiting for
the flush.

On that basis, perhaps any proposed API should be more explicit,
k_sem_set_error().

I admit a degree of skepticism here. The code to handle the new
semaphore error paths may end up more complex than the code to
correctly track buffer ownership within the IP stack error paths.


Luiz Augusto von Dentz
 

Hi,

On Tue, Dec 13, 2016 at 9:14 PM, Benjamin Walsh
<benjamin.walsh(a)windriver.com> wrote:
On Tue, Dec 13, 2016 at 09:59:31AM +0000, Daniel Thompson wrote:
On 12/12/16 18:10, Benjamin Walsh wrote:
Let's focus on the transmit path. Assuming we have two transmission
buffers and each buffer can store one full Ethernet frame the Ethernet
driver can send up to two frames to the MAC module but to send a third
frame it has to wait until a free buffer becomes available. This is a
perfect case for using counting semaphores.

<snip>

However, now we can have a situation where the driver (transmit thread)
has sent two frames, stopped on k_sem_take() call when trying to send a
third one and at that moment an unrecoverable transmission error
happens. At this point I would like to reinitialize the descriptor list
and also reinitialize the semaphores,
To reinitialize things here implies that the owner of the buffer has
lost so much state that it can no longer release it as normal. Why
does an unrecoverable error cause such a catastrophic loss of state?


Would that be OK, is there a better solution?
One thing you could do would be to abort the thread, then re-init the
semaphore, then respawn the thread.

We could maybe add a k_sem_flush() API, that unpend all threads waiting
on the semaphore, but returning them an errno.
To avoid race conditions its likely a flush would have to be sticky,
causing everything to return an error until the next init.
Yeah, I guess, and there has to be some protocol between the thread that
does the flush which then has to wait for the threads that take the
semaphore which then all have to reply to it before it can do the
re-init.

Basically in a situation where you resource tracking is borked to
the point that you no longer know how to resolve the situation with
the "give" API then its likely you can't know, when calling
k_sem_flush(), that all the threads are nicely blocked waiting for
the flush.

On that basis, perhaps any proposed API should be more explicit,
k_sem_set_error().

I admit a degree of skepticism here. The code to handle the new
semaphore error paths may end up more complex than the code to
correctly track buffer ownership within the IP stack error paths.
Btw, this design looks completely different than what the IP stack
expect, the so called NET_BUF_POOL uses k_fifo so it doesn't really
use k_sem API, furthermore there doesn't seem to be any mention to
net_buf which probably means that there is another type of buffer
being used so the driver will end up having to copying data.



--
Luiz Augusto von Dentz