Re: BSD Sockets in mainline, and how that affects design decisions for the rest of IP stack (e.g. send MTU handling)


Paul Sokolovsky
 

Hello,

On Wed, 11 Oct 2017 13:06:25 +0300
Jukka Rissanen <jukka.rissanen@linux.intel.com> wrote:

[]

You are unnecessarily creating this scenario about pro or against
solution. I have an example application in
https://github.com/zephyrproject-rtos/zephyr/pull/980 that needs to
send large (several kb) file to outside world using HTTP, and I am
trying so solve it efficiently. The application will not use BSD
sockets.
So, this thread got backlogged somehow (bumped up by Tomasz yesterday),
so I decided to approach it from the other side - to look into your
usecase (#980) and see how easy it would be to convert it to rely on
https://github.com/zephyrproject-rtos/zephyr/pull/119 instead.

Before going to that, I'd like to mention another thing which happened
in the meantime: https://github.com/zephyrproject-rtos/zephyr/pull/4402
got merged, which actually uses the technique proposed by me to send
largish files (more than 1 network packet), and can pass a more or less
non-trivial load tests (10000 iterations with Apache Bench). That's an
improvement from few months ago, when it was easy to deadlock it with
much less iterations. So, the solution can be called "tested and tried"
now, in a sense. (There're still deadlocks happening (e.g. #4216)
which we need to investigate.)


Anyway, back to your
https://github.com/zephyrproject-rtos/zephyr/pull/980. Specifically, I
reviewed its commit
https://github.com/zephyrproject-rtos/zephyr/pull/980/commits/2092924326dc59eea16eb3327a385666431c39e7#diff-1118253502f0844ef016460a07db48df
"samples: net: rpl: Simple RPL border router application".

Looking thru it, I got an idea why a socket sample is subject to
deadlocks (#4216), while other samples maybe not. That's because they
have comments like:

+#if defined(CONFIG_NET_L2_BLUETOOTH)
+#error "TCP connections over Bluetooth need CONFIG_NET_CONTEXT_NET_PKT_POOL "\
+ "defined."
+#endif /* CONFIG_NET_L2_BLUETOOTH */

Instead of investigating cause of deadlocks, they workaround it with:

+#if defined(CONFIG_NET_CONTEXT_NET_PKT_POOL)
+NET_PKT_TX_SLAB_DEFINE(http_srv_tx, 64);
+NET_PKT_DATA_POOL_DEFINE(http_srv_data, 64);

That's 8K with our default fragment size of 128 bytes.

But that's actually not what's done by your sample app, it doesn't
define CONFIG_NET_CONTEXT_NET_PKT_POOL. Instead it defines:

+CONFIG_NET_BUF_RX_COUNT=128
+CONFIG_NET_BUF_TX_COUNT=128

That's 16KB RX and TX buffers each.


So, let's summarize:

Your application, with 16KB send buffer, and patch
https://github.com/zephyrproject-rtos/zephyr/pull/1330, can send files
of several kb in size. Few simple questions:

1. What happens if your app needs to send file of 17KB?
2. What happens if there're no 16KB for send buffers, but only 1-2K?

The answer is obvious: it won't work.


At the same time, my proposal is all about making an API which will
allow any app to send 1MB (or more) files with 1KB (or less) buffers.


I agree with what you wrote - there're different ways to approach
problems and many ways of implementation. But we design an embedded IP
stack, and constrained by the hardware resources ("Zephyr runs in
8K"). It doesn't make sense two implement 2 solutions. We should
choose the one which allows to cover more usecases with less resources.


Now to remind, I started looking with the idea to see how the sample
app can be converted to use short-write-and-retry approach. I found
that it's not directly possible on the level of the app - due to
peculiarities of HTTP API used:
https://github.com/zephyrproject-rtos/zephyr/pull/980#pullrequestreview-71843524

I shared my concerns with the existing HTTP API (e.g.
https://github.com/zephyrproject-rtos/zephyr/issues/3796), and concerns
that its rewrite doesn't solve enough issues. But all this time, I
treated matter of HTTP API exactly as "there're different ways to do
it, and one way shouldn't be much worse than another". But I'm afraid,
we reached a point when design of the HTTP API affects the design of
IP stack, and not in the very right direction. I suggest we pause and
try to rework it (HTTP API), even if from the basics, and using the
ground requirements like "relying on more buffering than absolute
bare minimum is a bad thing".


I'm also pretty much sad to come out with such suggestion, because you
have a pretty cool and useful app on your hands, and I just
some useless demo which barely started to work. But I explained the
problem with it - your app works, because it requires more resources
than needed, and thus it won't work so well on other hardware. And as
experience shows, every app so far has various problems, so by taking
time to rebase it on a more generic, simpler API, we can solve many
yet-to-be-exposed problems.


Thanks for your consideration.

[]

--
Best Regards,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog

Join devel@lists.zephyrproject.org to automatically receive all group messages.