Closing an accepting BSD socket from a different thread


Stephan Gatzka
 

Hello!

I've a thread blocking on an zsock_accept(). After a certain time another thread decides that this socket is no longer required and calls zsock_close() on that socket. Now the thread blocking on zsock_accept() crashes horribly deep down in zephyrs socket implementation.

My question is how I can safely "unblock" the thread waiting in the zsock_accept()?

Thanks,
Stephan


Paul Sokolovsky
 

Hello Stephan,

On Wed, 22 May 2019 16:10:19 +0200
"Stephan Gatzka" <stephan.gatzka@...> wrote:

Hello!

I've a thread blocking on an zsock_accept(). After a certain time
another thread decides that this socket is no longer required and
calls zsock_close() on that socket.
Paradigmatically correct approach to this situation is:

1. Avoid sharing I/O resources (not just sockets) across different
threads.
2. If/when you can't avoid it, you need to synchronize access to those
resources from different threads using synchronization primitives
(mutexes, semaphores, etc.)

Now the thread blocking on
zsock_accept() crashes horribly deep down in zephyrs socket
implementation.
Eventually, we'll need to catch and fix such cases. But the only
visible effect for well-behaving applications following the guidelines
above will be bloating code size in the Zephyr network stack/socket
implementation (so hopefully, we won't get bad community stereotypes
due to that). If you have a small reproduction testcase for the issue,
definitely please submit it at
https://github.com/zephyrproject-rtos/zephyr/issues

My question is how I can safely "unblock" the thread waiting in the
zsock_accept()?
A way to not block forever in accept() call is to use timed poll() on
that socket. The thread issuing the poll() call would be the best
party to know when to close this socket (e.g., if there's no
activity during some period of time). Other threads could signal the
owner thread that they want something to be done to the socket via flag
variables. E.g., following is a well-know pattern:

=== main loop thread ===
while (!should_exit) {
...
poll(..., MAIN_LOOP_PERIOD);
...
}
close(...);
exit();

=== other threads ===
should_exit = true;


Thanks,
Stephan
--
Best Regards,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog


Stephan Gatzka
 

Hello Paul!

Thanks for the answer.

Paradigmatically correct approach to this situation is:
1. Avoid sharing I/O resources (not just sockets) across different
threads.
2. If/when you can't avoid it, you need to synchronize access to those
resources from different threads using synchronization primitives
(mutexes, semaphores, etc.)
Sure, no doubt on that. The problem is, that I need a mechanism to "unblock" the accept.
Yes, I could use non-blocking sockets with poll(), but I also found no easy to use mechanism to "unblock" poll(). I can't just send a signal to that thread which called poll() like it would work in Linux.



Eventually, we'll need to catch and fix such cases. But the only
visible effect for well-behaving applications following the guidelines
above will be bloating code size in the Zephyr network stack/socket
implementation (so hopefully, we won't get bad community stereotypes
due to that). If you have a small reproduction testcase for the issue,
definitely please submit it at
https://github.com/zephyrproject-rtos/zephyr/issues
Will do.

My question is how I can safely "unblock" the thread waiting in the
zsock_accept()?
A way to not block forever in accept() call is to use timed poll() on
that socket. The thread issuing the poll() call would be the best
party to know when to close this socket (e.g., if there's no
activity during some period of time). Other threads could signal the
owner thread that they want something to be done to the socket via flag
variables. E.g., following is a well-know pattern:
=== main loop thread ===
while (!should_exit) {
...
poll(..., MAIN_LOOP_PERIOD);
...
}
close(...);
exit();
=== other threads ===
should_exit = true;
Yeah sure, put this is polling and a waste of resources. That I really don't like, especially an small battery powered systems.

No, the only possible solution I see is an additional socket connection via localhost which "signals" poll() and afterwards I can see what needs to be done (e.g. calling close()).

The reason for my question is that I need to implement an event loop based system. I need events for sockets, timers, DNS.
The idea is to use e zephyr message queue with a thread reading from the queue and calling the callback functions.

Regards,
Stephan


Marc Herbert
 

On 23 May 2019, at 00:10, Stephan Gatzka <stephan.gatzka@...> wrote:

I can't just send a signal to that thread which called poll() like it would work in Linux.

You're assuming Unix signals work...

https://lwn.net/Articles/414618/ Unfixable designs



variables. E.g., following is a well-know pattern:
=== main loop thread ===
while (!should_exit) {
...
poll(..., MAIN_LOOP_PERIOD);
...
}
close(...);
exit();
=== other threads ===
should_exit = true;
Yeah sure, put this is polling and a waste of resources.
Even if MAIN_LOOP_PERIOD is somewhat longer than the network protocol timeout(s) after which the socket should be closed anyway if the other end disappears?


No, the only possible solution I see is an additional socket connection via localhost which "signals" poll() and afterwards I can see what needs to be done (e.g. calling close()).
Nice.


Stephan Gatzka
 

You're assuming Unix signals work...
https://lwn.net/Articles/414618/ Unfixable designs
Well, not really. I know the drawbacks of dealing with signals and in Linux with epoll() and nearly everything being a file descriptor you can put into epoll() there is no need to rely on signals.

But because poll() in zephyr only accepts socket fd's, I have to build another mechanism for asynchronous I/O (for sockets, files, timers, DNS resolution etc.).

variables. E.g., following is a well-know pattern:
=== main loop thread ===
while (!should_exit) {
...
poll(..., MAIN_LOOP_PERIOD);
...
}
close(...);
exit();
=== other threads ===
should_exit = true;
Yeah sure, put this is polling and a waste of resources.
Even if MAIN_LOOP_PERIOD is somewhat longer than the network protocol timeout(s) after which the socket should be closed anyway if the other end disappears?
Well, for a normal socket connection this might be probably o.k., but what about a server socket blocking on an accept()? There is no such thing like a network timeout.


Paul Sokolovsky
 

Hello,

On Thu, 23 May 2019 09:10:38 +0200
Stephan Gatzka <stephan.gatzka@...> wrote:

Hello Paul!

Thanks for the answer.


Paradigmatically correct approach to this situation is:

1. Avoid sharing I/O resources (not just sockets) across different
threads.
2. If/when you can't avoid it, you need to synchronize access to
those resources from different threads using synchronization
primitives (mutexes, semaphores, etc.)
Sure, no doubt on that. The problem is, that I need a mechanism to
"unblock" the accept.
Yes, I could use non-blocking sockets with poll(),
For completeness of covering the topic, the sockets for poll() don't
have to be non-blocking.

but I also found
no easy to use mechanism to "unblock" poll().
You can pass a timeout after which poll() will "unblock".

I can't just send a
signal to that thread which called poll() like it would work in Linux.
Yes, Unix signals aren't implemented in Zephyr, and I personally don't
see them coming anytime soon due to reasons hinted by Marc in another
mail. (But then somehow who may contribute a high-quality
implementation of them may think otherwise).

Eventually, we'll need to catch and fix such cases. But the only
visible effect for well-behaving applications following the
guidelines above will be bloating code size in the Zephyr network
stack/socket implementation (so hopefully, we won't get bad
community stereotypes due to that). If you have a small
reproduction testcase for the issue, definitely please submit it at
https://github.com/zephyrproject-rtos/zephyr/issues
Will do.

My question is how I can safely "unblock" the thread waiting in the
zsock_accept()?
A way to not block forever in accept() call is to use timed poll()
on that socket. The thread issuing the poll() call would be the best
party to know when to close this socket (e.g., if there's no
activity during some period of time). Other threads could signal the
owner thread that they want something to be done to the socket via
flag variables. E.g., following is a well-know pattern:

=== main loop thread ===
while (!should_exit) {
...
poll(..., MAIN_LOOP_PERIOD);
...
}
close(...);
exit();

=== other threads ===
should_exit = true;
Yeah sure, put this is polling and a waste of resources.
No, it's not, in a sense of "busy-wait polling". It's a well-known low
duty cycle design pattern. If you poll() with timeout of 100ms and then
wake up for 1ms, you you're sleeping 99% of time, 1% duty cycle. But 1ms
is a huge period of time, it's 100,000 cycles of a 100MHz CPU. If you
optimize that to some 1000s of cycles on event-free wakeups, you can
achieve >0.1% duty cycle.

That I
really don't like, especially an small battery powered systems.
Sure, Zephyr needs a lot of optimizations of low-power usage, any
contribution is welcome.

No, the only possible solution I see is an additional socket
connection via localhost which "signals" poll() and afterwards I can
see what needs to be done (e.g. calling close()).
There was actually a patch submitted to Zephyr mainline which used that
technique. I dissuaded the author from following that approach, as I
consider it to be definitely too heavy-weight to seriously used in
mainline. But you definitely can use it at prototyping stage on your
app's side.

The reason for my question is that I need to implement an event loop
based system. I need events for sockets, timers, DNS.
The idea is to use e zephyr message queue with a thread reading from
the queue and calling the callback functions.
Right, so everyone wants Zephyr to be able to do advanced things, and
everyone agrees that it's too young yet and missing a lot of such
advanced functionality. My response tried to outline ways to get
started right away, albeit with some compromises here and there.
Hopefully, such a smooth start-up curve would motivate you to
contribute for resolution of the issues you write about.

An alternative is to wait until someone implements it all. As an
example, I want to implement epoll() for 2 years now. And I can't even
bootstrap a good discussion of it (at least shoot a brain-dump message
once: https://lists.zephyrproject.org/g/devel/topic/25004178) - we have
much more mundane things to do so far :-I.


Regards,
Stephan

--
Best Regards,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog


Paul Sokolovsky
 

On Fri, 24 May 2019 08:29:27 +0200
Stephan Gatzka <stephan.gatzka@...> wrote:

You're assuming Unix signals work...

https://lwn.net/Articles/414618/ Unfixable designs
Well, not really. I know the drawbacks of dealing with signals and in
Linux with epoll() and nearly everything being a file descriptor you
can put into epoll() there is no need to rely on signals.

But because poll() in zephyr only accepts socket fd's, I have to
build another mechanism for asynchronous I/O (for sockets, files,
timers, DNS resolution etc.).
Well, Linux and Zephyr aren't made of stone and dropped from the
sky ;-). Linux has it, because men-centuries of effort were put into it.
Zephyr doesn't have it *yet*, because there's no critical mass gathered
yet to implement those features. They are on roadmap (though I must
add, that first we need to resolve some paradigmatic matters of how
Zephyr vs POSIX are laid out).

[]

--
Best Regards,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog


Paul Sokolovsky
 

On Fri, 24 May 2019 08:29:27 +0200
Stephan Gatzka <stephan.gatzka@...> wrote:

[]

I have to
build another mechanism for asynchronous I/O (for sockets, files,
timers, DNS resolution etc.).
To "get forward" with this, I submitted
https://github.com/zephyrproject-rtos/zephyr/issues/16376 .

Note that recently, I stopped submitting such things, because if you
look at the current list of known issues/missing features for sockets,
https://github.com/zephyrproject-rtos/zephyr/issues?q=is%3Aopen+is%3Aissue+label%3A%22area%3A+Sockets%22 ,
you'll see that majority of them submitted by me. Recently, I actually
have my issues closed "by timeout", e.g.
https://github.com/zephyrproject-rtos/zephyr/issues/3547 .

So, instead I encourage community members facing issues/missing
features to submit requests and "lobby" for them (best way to
"lobby" is to of course prepare patches ;-) ).

[]

--
Best Regards,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog


Marc Herbert
 

On 23 May 2019, at 23:29, Stephan Gatzka <stephan.gatzka@...> wrote

variables. E.g., following is a well-know pattern:
=== main loop thread ===
while (!should_exit) {
...
poll(..., MAIN_LOOP_PERIOD);
...
}
close(...);
exit();
=== other threads ===
should_exit = true;
Yeah sure, put this is polling and a waste of resources.
Even if MAIN_LOOP_PERIOD is somewhat longer than the network protocol timeout(s) after which the socket should be closed anyway if the other end disappears?
Well, for a normal socket connection this might be probably o.k., but what about a server socket blocking on an accept() There is no such thing like a network timeout.
You could have a second poll() thread for all accepts with a much bigger ACCEPTS_LOOP_PERIOD; hours or even days.