Re: BSD Sockets in mainline, and how that affects design decisions for the rest of IP stack (e.g. send MTU handling)

Paul Sokolovsky

Hello Tomasz,

Thanks for responding and bringing up this discussion - it got
backlogged (so I'm doing homework on it in the background).

On Wed, 25 Oct 2017 18:13:18 +0200
Tomasz Bursztyka <tomasz.bursztyka@...> wrote:

Hi guys,

It was
posted as .
Again, at that time, there was no consensus about way to solve it,
so it was implemented only for BSD Sockets API.

Much later, was posted.
It works in following way: it allows an application to create an
oversized packet

There're many more
details than presented above, and the devil is definitely in
details, there's no absolutely "right" solution, it's a compromise.
I hope that Jukka and Tomasz, who are proponents of the second
(GH-1330) approach can correct me on the benefits of it.
Actually I missed the fact PR 1330 was about MTU handling. Does not
sound generic enough.

In the end I don't approve both of the proposed solution.
That sounds fresh, thanks ;-)

Let me
explain why:

First, let's not rush on this MTU handling just yet, though it is
much needed. We first need this:

Ack, that's good thing to do...

it will simplify a lot how packet are allocated. I haven't touched
MTU stuff since I did the net_pkt move because of this feature we'll

I foresee a lot of possible improvements with this issue resolved:
certainly MTU handling, better memory management than current frag
model, but also better response against low memory
... but I don't see how it directly relates to the topic of this RFC,
which is selecting paradigm to deal with the case that we have finite
units of buffering, and how that should affect user-facing API design.

There're definitely a lot to improve and optimize in our IP stack, and
the issue you mention is one of them. But it's going to be just that -
the optimization. But what we discuss is how to structure API:

1. Accept that the amount of buffering we can do is very finite, and
make applications be aware of that and ready to handle - the POSIX
inspired way. If done that way, we can just use a network packet as
a buffering unit and further optimize that handling.

2. Keep pretending that we can buffer mini-infinite amount of data.
It's mini-infinite because we still won't be able to buffer more than
RAM allows (actually, more than TX slab allows), and that's still too
little, so won't work for "real" amounts of data, which still will need
to fall back to p.1 handling above. Packet buffers are still used for
buffering, but looking at Jukka's implementation, they are used as
generic data buffers, and require pretty heavy post-processing - first
splitting oversized buffers into packet-friendly sizes (#1330),
stuffing protocol headers in front (we already do that, and that's
pretty awful and not zero-copy at all), etc. Again, all that happens
with no free memory available - it was already spent to buffer that
"mini-infinite" amount of data.

You also say that you don't like any of these choices. Well, there're
only so many ways to do. What do you have in mind?

(we could after
all send asap a tinier than MTU TCP segment if there was only a small
amount of memory available, and continue with the rest etc...).
That's how sockets work already - they ask user's data to be added to a
packets, and if less is added, it passes that info back to app (for it
to retry). The whole talk is about making that available to the native
API too (governed also by other constraints like MTU size).


Best Regards,
Paul | Open source software for ARM SoCs
Follow Linaro:!/linaroorg -

Join to automatically receive all group messages.