Re: BSD Sockets in mainline, and how that affects design decisions for the rest of IP stack (e.g. send MTU handling)
Paul Sokolovsky
Hello,
toggle quoted messageShow quoted text
On Wed, 11 Oct 2017 13:06:25 +0300
Jukka Rissanen <jukka.rissanen@...> wrote: [] You are unnecessarily creating this scenario about pro or againstSo, this thread got backlogged somehow (bumped up by Tomasz yesterday), so I decided to approach it from the other side - to look into your usecase (#980) and see how easy it would be to convert it to rely on https://github.com/zephyrproject-rtos/zephyr/pull/119 instead. Before going to that, I'd like to mention another thing which happened in the meantime: https://github.com/zephyrproject-rtos/zephyr/pull/4402 got merged, which actually uses the technique proposed by me to send largish files (more than 1 network packet), and can pass a more or less non-trivial load tests (10000 iterations with Apache Bench). That's an improvement from few months ago, when it was easy to deadlock it with much less iterations. So, the solution can be called "tested and tried" now, in a sense. (There're still deadlocks happening (e.g. #4216) which we need to investigate.) Anyway, back to your https://github.com/zephyrproject-rtos/zephyr/pull/980. Specifically, I reviewed its commit https://github.com/zephyrproject-rtos/zephyr/pull/980/commits/2092924326dc59eea16eb3327a385666431c39e7#diff-1118253502f0844ef016460a07db48df "samples: net: rpl: Simple RPL border router application". Looking thru it, I got an idea why a socket sample is subject to deadlocks (#4216), while other samples maybe not. That's because they have comments like: +#if defined(CONFIG_NET_L2_BLUETOOTH) +#error "TCP connections over Bluetooth need CONFIG_NET_CONTEXT_NET_PKT_POOL "\ + "defined." +#endif /* CONFIG_NET_L2_BLUETOOTH */ Instead of investigating cause of deadlocks, they workaround it with: +#if defined(CONFIG_NET_CONTEXT_NET_PKT_POOL) +NET_PKT_TX_SLAB_DEFINE(http_srv_tx, 64); +NET_PKT_DATA_POOL_DEFINE(http_srv_data, 64); That's 8K with our default fragment size of 128 bytes. But that's actually not what's done by your sample app, it doesn't define CONFIG_NET_CONTEXT_NET_PKT_POOL. Instead it defines: +CONFIG_NET_BUF_RX_COUNT=128 +CONFIG_NET_BUF_TX_COUNT=128 That's 16KB RX and TX buffers each. So, let's summarize: Your application, with 16KB send buffer, and patch https://github.com/zephyrproject-rtos/zephyr/pull/1330, can send files of several kb in size. Few simple questions: 1. What happens if your app needs to send file of 17KB? 2. What happens if there're no 16KB for send buffers, but only 1-2K? The answer is obvious: it won't work. At the same time, my proposal is all about making an API which will allow any app to send 1MB (or more) files with 1KB (or less) buffers. I agree with what you wrote - there're different ways to approach problems and many ways of implementation. But we design an embedded IP stack, and constrained by the hardware resources ("Zephyr runs in 8K"). It doesn't make sense two implement 2 solutions. We should choose the one which allows to cover more usecases with less resources. Now to remind, I started looking with the idea to see how the sample app can be converted to use short-write-and-retry approach. I found that it's not directly possible on the level of the app - due to peculiarities of HTTP API used: https://github.com/zephyrproject-rtos/zephyr/pull/980#pullrequestreview-71843524 I shared my concerns with the existing HTTP API (e.g. https://github.com/zephyrproject-rtos/zephyr/issues/3796), and concerns that its rewrite doesn't solve enough issues. But all this time, I treated matter of HTTP API exactly as "there're different ways to do it, and one way shouldn't be much worse than another". But I'm afraid, we reached a point when design of the HTTP API affects the design of IP stack, and not in the very right direction. I suggest we pause and try to rework it (HTTP API), even if from the basics, and using the ground requirements like "relying on more buffering than absolute bare minimum is a bad thing". I'm also pretty much sad to come out with such suggestion, because you have a pretty cool and useful app on your hands, and I just some useless demo which barely started to work. But I explained the problem with it - your app works, because it requires more resources than needed, and thus it won't work so well on other hardware. And as experience shows, every app so far has various problems, so by taking time to rebase it on a more generic, simpler API, we can solve many yet-to-be-exposed problems. Thanks for your consideration. [] -- Best Regards, Paul Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog
|
|