Re: BSD Sockets in mainline, and how that affects design decisions for the rest of IP stack (e.g. send MTU handling)

Nashif, Anas


You gave very detailed background information and listed issues we had in the past but it was not clear what you are proposing, we do have sockets already, are you suggesting we should move everything to use sockets? Is the socket interface ready for this?
Then there is the usual comments being made whenever we discuss the IP stack related to memory usage and footprint (here made by Jukka), can we please quantify this and provide more data and context? For example I would be interested in numbers showing how much more memory/flash do we consume when sockets are used vs the same implementation using low level APIs. What is the penalty and is it justifiable, given that using sockets would give us a more portable solution and would allow the random user/developer to implement protocols more easily.

So my request is to have a more details proposals with going into the history of this and how we can move forward from here and what such a proposal would mean to existing code and protocols not using sockets...


-----Original Message-----
From: Jukka Rissanen [mailto:jukka.rissanen@...]
Sent: Wednesday, October 11, 2017 6:06 AM
To: Paul Sokolovsky <paul.sokolovsky@...>; devel@...; Tomasz Bursztyka <tomasz.bursztyka@...>; David Brown <david.brown@...>; Kumar Gala <kumar.gala@...>; Nashif, Anas <anas.nashif@...>
Subject: Re: BSD Sockets in mainline, and how that affects design decisions for the rest of IP stack (e.g. send MTU handling)


On Tue, 2017-10-10 at 21:50 +0300, Paul Sokolovsky wrote:

A solution originally proposed was that the mentioned API functions
should take an MTU into account, and not allow a user to add more data
than MTU allows (accounting also for protocol headers). This solution
is rooted in the well-known POSIX semantics of "short writes" - an
application can request an arbitrary amount of data to be written, but
a system is free to process less data, based on system resource
availability. Amount of processed data is returned, and an application
is expected to retry the operation for the remaining data. It was
posted as .
at that time, there was no consensus about way to solve it, so it was
implemented only for BSD Sockets API.
We can certainly implement something like this for the net_context APIs. There is at least one issue with this as it is currently not easy to pass information to application how much data we are able to send, so currently it would be either that we could send all the data or none of it.

Much later, was posted. It
works in following way: it allows an application to create an
oversized packet, but a stack does a separate pass over it and splits
this packet into several packets with a valid length. A comment
immediately received (not by me) was that this patch just duplicates
in an adhoc way IP fragmentation support as required by TCP/IP
Note that currently we do not have IPv4 fragmentation support implemented, and IPv6 fragmentation is also disabled by default. Reason for this is that the fragmentation requires lot of extra memory to be used which might not be necessary in usual cases. Having TCP segments split needs much less memory.

I would like to raise an additional argument while POSIX-inspired
approach may be better.
I would say there is no better or worse approach here. Just a different point of view.

Consider a case when an application wants to send a big amount of
constant data, e.g. 900KB. It can be a system with e.g. 1MB of flash
and 64KB of RAM, an app sitting in ~100KB of flash, the rest
containing constant data to send. Following an "split oversized
packet" approach wouldn't help - an app wouldn't be able to create an
oversized packet of 900K - there's simply not enough RAM for it. So,
it would need to handle such a case differently anyway.
Of course your application is constrained by available memory and other limits by your hw.

But POSIX-based approach, would allow to handle it right away - any
application need to be prepared to retry operation until completion
anyway, the amount of data is not important.

That's the essence of the question this RFC poses: given that the
POSIX-based approach is already in the mainline, does it make sense to
go for a Zephyr-special, adhoc solutions for a problem (and as
mentioned at the beginning, there can be more issues with a similar
Please note that BSD socket API is fully optional and not always available. You cannot rely it to be present especially if you want to minimize memory consumption. We need more general solution instead of something that is only available for BSD sockets.

Answering "yes" may have interesting implications. For example, the
code in is not
needed for applications using BSD Sockets. There's at least another
issue solved on BSD Sockets level, but not on the native API. There's
an ongoing effort to separate kernel and userspace, and BSD Sockets
offer an automagic solution for that, while native API allows a user
app to access straight to the kernel networking buffer, so there's a
lot to solve there yet. Going like that, it may turn out that native
adhoc API, which initially was intended to small and efficient, will
grow bigger and more complex (== harder to stabilize, containing more
bugs) than something based on well tried and tested approach like
There has not been any public talk in mailing list about userspace/kernel separation and how it affects IP stack etc. so it is a bit difficult to say anything about this.

So, it would be nice if the networking stack, and overall Zephyr
architecture stakeholders consider both a particular issue and overall
implications on the design/implementation. There're many more details
than presented above, and the devil is definitely in details, there's
no absolutely "right" solution, it's a compromise. I hope that Jukka
and Tomasz, who are proponents of the second (GH-1330) approach can
correct me on the benefits of it.
You are unnecessarily creating this scenario about pro or against solution. I have an example application in
ject-rtos/zephyr/pull/980 that needs to send large (several kb) file to outside world using HTTP, and I am trying so solve it efficiently. The application will not use BSD sockets.



Join { to automatically receive all group messages.