Re: RFC: TCP receive/send window handling in Zephyr IP stack


Paul Sokolovsky
 

Hello Jukka,

On Tue, 02 May 2017 17:32:30 +0300
Jukka Rissanen <jukka.rissanen@...> wrote:

[]

https://jira.zephyrproject.org/browse/ZEP-1999 .

So, as the issue is confirmed, I would like to proceed with
resolving it, and this RFC is about ways to do it.
Thanks for pushing this, we definitely have an issue with the rcv wnd
handling.
Thanks for supporting it. Given that I won't have time to prepare BSD
Sockets API patches for 1.8, I'm switching instead on trying to resolve
as many issues as was spotted during its prototyping work for 1.8
timeframe.

lwIP offers a very simple and easy to understand model of that: IP
stack only decreases the available receive window, never increases
it,
it's a task of application code. When an application really
processes received data (vs just buffering it), it calls a special
function (called tcp_recved()) to advance the window.
[]

way to do that would be to change recv callback signature from:

typedef void (*net_context_recv_cb_t)(struct net_context *context,
                                      struct net_pkt *pkt,
                                      int status,
                                      void *user_data);

to one returning u32_t, the size by which to advance the receive
window. For each existing recv callback, that would be
net_pkt_appdatalen(pkt), so we can try to optimize that one step
further by defining some constant and allowing to return it:

return NET_ADVANCE_WHOLE_PKT;
I like this better than an application manually calling a special
function to do the trick like what was found in the gerrit patch.
I see. And I on my side was pondering these days about the comment you
gave in Gerrit review: "Why cannot we call this directly inside the
stack when we receive data?"

Is it likely that the application would not move the window for full
packet? I mean what would happen if we always move the window the
received data amount automatically by the core stack after we return
from recv cb?
So, that would be almost the same as it is now, except that currently
receive window moved (i.e. ack value updated) before calling recv cb.
It's definitely more correct to move it after, but the difference in
behavior probably will be negligible in practice, and it won't help
applications which *buffer* incoming packets, like what BSD Sockets API
does.

The net_buf's should be released anyway by the
application asap, otherwise we have possible memory leak.
Not memory leak, but buffering ;-). So, returning to your idea of doing
it completely automatically, like a grown-up IP stack. The whole idea
is that receive window should be moved not when packet *received* (and
that's the event which recv_cb signals), but when data in packet is
*processed* (which may be much later due to buffering).

How can we know that application finished processing? Again, simple
stacks like lwIP require explicit notification of that. But how to do
it otherwise? Well, we know that data processing was 100% finished when
a network buffer holding it is freed. Hopefully, that gives enough of
insight on the idea, and you can comment - how do you like that
net_buf_frag_del() and friends will now update TCP structures related
to a fragment - at the minimum, at the maximum they may trigger an
outgoing packet to be sent (to notify peer of the window update).

Unfortunately, that's inevitable - TCP for efficient implementation
requires close enough coupling with IP and buffer management. Again,
apparently to simplify that lwIP put window update step in the user app
instead. I'll submit a patch for MTU handling for outgoing packets
soon, which already exposes this problem of the need to intermingle
different "layers" of API (so, we would need to stop thinking about
them as different layers, just as one).

Is it likely that the application would not move the window for full
packet?
Fairly speaking, we should move window as soon as we're able to
receive new data chunk. As we allocate data chunks in fragment buffers,
freeing one would be already a good ground to update recv window. But:
recv window is measured in terms of data bytes, but we receive data in
terms of packets, which have protocol headers overhead. So, if we got 1
fragment buffer freed (128 bytes), if we announce that to peer, we'll
get 128 + 40 (IPv4) bytes back. If we had only one fragment buffer,
this packet will be dropped, and everything will be as bad as it is now
(exponential backoff triggering, etc.)

So, it's not that easy, but I'm excited about the idea to try to do it
fully automatic, and as I said, I was pondering about it all these days,
and hope to proceed to experimenting a bit later. Overall, recv/send
window handling will require various rules, majority of which would be
heuristic. lwIP for one, even if receives window updates from an app,
does quite a lot of magic with them before they go into real packets.

[]

--
Best Regards,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog

Join devel@lists.zephyrproject.org to automatically receive all group messages.