Re: BSD Sockets in mainline, and how that affects design decisions for the rest of IP stack (e.g. send MTU handling)

Jukka Rissanen


On Tue, 2017-10-10 at 21:50 +0300, Paul Sokolovsky wrote:

A solution originally proposed was that the mentioned API functions
should take an MTU into account, and not allow a user to add more
than MTU allows (accounting also for protocol headers). This solution
is rooted in the well-known POSIX semantics of "short writes" - an
application can request an arbitrary amount of data to be written,
a system is free to process less data, based on system resource
availability. Amount of processed data is returned, and an
is expected to retry the operation for the remaining data. It was
posted as .
at that time, there was no consensus about way to solve it, so it was
implemented only for BSD Sockets API.
We can certainly implement something like this for the net_context
APIs. There is at least one issue with this as it is currently not
easy to pass information to application how much data we are able to
send, so currently it would be either that we could send all the data
or none of it.

Much later, was posted. It
works in following way: it allows an application to create an
oversized packet, but a stack does a separate pass over it and splits
this packet into several packets with a valid length. A comment
immediately received (not by me) was that this patch just duplicates
in an adhoc way IP fragmentation support as required by TCP/IP
Note that currently we do not have IPv4 fragmentation support
implemented, and IPv6 fragmentation is also disabled by default. Reason
for this is that the fragmentation requires lot of extra memory to be
used which might not be necessary in usual cases. Having TCP segments
split needs much less memory.

I would like to raise an additional argument while POSIX-inspired
approach may be better.
I would say there is no better or worse approach here. Just a different
point of view.

Consider a case when an application wants to
send a big amount of constant data, e.g. 900KB. It can be a system
with e.g. 1MB of flash and 64KB of RAM, an app sitting in ~100KB
of flash, the rest containing constant data to send. Following an
"split oversized packet" approach wouldn't help - an app wouldn't be
able to create an oversized packet of 900K - there's simply not
RAM for it. So, it would need to handle such a case differently
Of course your application is constrained by available memory and other
limits by your hw.

But POSIX-based approach, would allow to handle it right away - any
application need to be prepared to retry operation until
completion anyway, the amount of data is not important.

That's the essence of the question this RFC poses: given that the
POSIX-based approach is already in the mainline, does it make sense
go for a Zephyr-special, adhoc solutions for a problem (and as
mentioned at the beginning, there can be more issues with a similar
Please note that BSD socket API is fully optional and not always
available. You cannot rely it to be present especially if you want to
minimize memory consumption. We need more general solution instead of
something that is only available for BSD sockets.

Answering "yes" may have interesting implications. For example, the
code in is
not needed for applications using BSD Sockets. There's at least
issue solved on BSD Sockets level, but not on the native API. There's
an ongoing effort to separate kernel and userspace, and BSD Sockets
offer an automagic solution for that, while native API allows a user
app to access straight to the kernel networking buffer, so there's a
lot to solve there yet. Going like that, it may turn out that native
adhoc API, which initially was intended to small and efficient, will
grow bigger and more complex (== harder to stabilize, containing more
bugs) than something based on well tried and tested approach like
There has not been any public talk in mailing list about
userspace/kernel separation and how it affects IP stack etc. so it is a
bit difficult to say anything about this.

So, it would be nice if the networking stack, and overall Zephyr
architecture stakeholders consider both a particular issue and
implications on the design/implementation. There're many more
details than presented above, and the devil is definitely in details,
there's no absolutely "right" solution, it's a compromise. I hope
Jukka and Tomasz, who are proponents of the second (GH-1330) approach
can correct me on the benefits of it.
You are unnecessarily creating this scenario about pro or against
solution. I have an example application in
ject-rtos/zephyr/pull/980 that needs to send large (several kb) file to
outside world using HTTP, and I am trying so solve it efficiently. The
application will not use BSD sockets.



Join to automatically receive all group messages.