Re: BSD Sockets in mainline, and how that affects design decisions for the rest of IP stack (e.g. send MTU handling)

Paul Sokolovsky


On Wed, 11 Oct 2017 14:56:02 +0000
"Nashif, Anas" <anas.nashif@...> wrote:


You gave very detailed background information and listed issues we
had in the past but it was not clear what you are proposing,
Yes, looking at Jukka's response, I must have failed miserably to
convey what I propose. I propose:

1. To reject approach to the send MTU handling in

2. To adopt approach from , which however
may need further work to address concerns raised against it.

we do
have sockets already, are you suggesting we should move everything to
use sockets?
No, I don't suggest that (here).

Is the socket interface ready for this? Then there is
the usual comments being made whenever we discuss the IP stack
related to memory usage and footprint (here made by Jukka), can we
please quantify this and provide more data and context? For example I
would be interested in numbers showing how much more memory/flash do
we consume when sockets are used vs the same implementation using low
level APIs.
To have such numbers, first socket-based implementations of various
application-level protocols would need to exist. They currently don't,
and I personally don't think it's worthy investment of effort, at least
with the current state of affairs, when there're still known issues in
the underlying stack.

So, I'm left with just speculating that it's better to cross-adopt
approaches between native API and sockets API, instead of making them
diverge. That was the point of my post.

What is the penalty and is it justifiable, given that
using sockets would give us a more portable solution and would allow
the random user/developer to implement protocols more easily.

So my request is to have a more details proposals with going into the
history of this and how we can move forward from here and what such a
proposal would mean to existing code and protocols not using
I exactly tried to go thru the history of the question, with the
relevant links. Hopefully the summary above clarifies the essence of
the proposal.



-----Original Message-----
From: Jukka Rissanen [mailto:jukka.rissanen@...]
Sent: Wednesday, October 11, 2017 6:06 AM
To: Paul Sokolovsky <paul.sokolovsky@...>;
devel@...; Tomasz Bursztyka
<tomasz.bursztyka@...>; David Brown
<david.brown@...>; Kumar Gala <kumar.gala@...>; Nashif,
Anas <anas.nashif@...> Subject: Re: BSD Sockets in mainline,
and how that affects design decisions for the rest of IP stack (e.g.
send MTU handling)


On Tue, 2017-10-10 at 21:50 +0300, Paul Sokolovsky wrote:

A solution originally proposed was that the mentioned API functions
should take an MTU into account, and not allow a user to add more
data than MTU allows (accounting also for protocol headers). This
solution is rooted in the well-known POSIX semantics of "short
writes" - an application can request an arbitrary amount of data to
be written, but a system is free to process less data, based on
system resource availability. Amount of processed data is returned,
and an application is expected to retry the operation for the
remaining data. It was posted as . Again,
at that time, there was no consensus about way to solve it, so it
was implemented only for BSD Sockets API.
We can certainly implement something like this for the net_context
APIs. There is at least one issue with this as it is currently not
easy to pass information to application how much data we are able to
send, so currently it would be either that we could send all the data
or none of it.

Much later, was posted.
It works in following way: it allows an application to create an
oversized packet, but a stack does a separate pass over it and
splits this packet into several packets with a valid length. A
comment immediately received (not by me) was that this patch just
duplicates in an adhoc way IP fragmentation support as required by
TCP/IP protocol.
Note that currently we do not have IPv4 fragmentation support
implemented, and IPv6 fragmentation is also disabled by default.
Reason for this is that the fragmentation requires lot of extra
memory to be used which might not be necessary in usual cases. Having
TCP segments split needs much less memory.

I would like to raise an additional argument while POSIX-inspired
approach may be better.
I would say there is no better or worse approach here. Just a
different point of view.

Consider a case when an application wants to send a big amount of
constant data, e.g. 900KB. It can be a system with e.g. 1MB of
flash and 64KB of RAM, an app sitting in ~100KB of flash, the rest
containing constant data to send. Following an "split oversized
packet" approach wouldn't help - an app wouldn't be able to create
an oversized packet of 900K - there's simply not enough RAM for it.
So, it would need to handle such a case differently anyway.
Of course your application is constrained by available memory and
other limits by your hw.

But POSIX-based approach, would allow to handle it right away - any
application need to be prepared to retry operation until completion
anyway, the amount of data is not important.

That's the essence of the question this RFC poses: given that the
POSIX-based approach is already in the mainline, does it make sense
to go for a Zephyr-special, adhoc solutions for a problem (and as
mentioned at the beginning, there can be more issues with a similar
Please note that BSD socket API is fully optional and not always
available. You cannot rely it to be present especially if you want to
minimize memory consumption. We need more general solution instead of
something that is only available for BSD sockets.

Answering "yes" may have interesting implications. For example, the
code in is
not needed for applications using BSD Sockets. There's at least
another issue solved on BSD Sockets level, but not on the native
API. There's an ongoing effort to separate kernel and userspace,
and BSD Sockets offer an automagic solution for that, while native
API allows a user app to access straight to the kernel networking
buffer, so there's a lot to solve there yet. Going like that, it
may turn out that native adhoc API, which initially was intended to
small and efficient, will grow bigger and more complex (== harder
to stabilize, containing more bugs) than something based on well
tried and tested approach like POSIX.
There has not been any public talk in mailing list about
userspace/kernel separation and how it affects IP stack etc. so it is
a bit difficult to say anything about this.

So, it would be nice if the networking stack, and overall Zephyr
architecture stakeholders consider both a particular issue and
overall implications on the design/implementation. There're many
more details than presented above, and the devil is definitely in
details, there's no absolutely "right" solution, it's a compromise.
I hope that Jukka and Tomasz, who are proponents of the second
(GH-1330) approach can correct me on the benefits of it.
You are unnecessarily creating this scenario about pro or against
solution. I have an example application in ject-rtos/zephyr/pull/980 that needs to
send large (several kb) file to outside world using HTTP, and I am
trying so solve it efficiently. The application will not use BSD



Best Regards,
Paul | Open source software for ARM SoCs
Follow Linaro:!/linaroorg -

Join to automatically receive all group messages.