Re: Problems managing NBUF DATA pool in the networking stack


Luiz Augusto von Dentz
 

Hi Marcus,

On Wed, Feb 8, 2017 at 12:37 PM, Marcus Shawcroft
<marcus.shawcroft@gmail.com> wrote:
On 8 February 2017 at 07:04, Jukka Rissanen
<jukka.rissanen@linux.intel.com> wrote:

One option would be to split the DATA pool to two so one pool for
sending and receiving. Then again this does not solve much as you might
still get to a situation where all the buffers are exhausted.
Running out of resources is bad, dead lock, especially undetected
deadlock, is worse. Avoiding the dead lock where the RX path starves
the rest of the system of resources requires that the resources the RX
path can consume are separate from the resources available to the TX
path(s). Limiting resource consumption by the RX path is straight
forward, buffers come from fixed size pool, when the pool is empty we
drop packets. Now we have a situation where RX cannot starve TX, we
just need to ensure that multiple TX paths cannot deadlock each other.

Dealing with resource exhaustion on the TX side is harder. In a
system with multiple TX paths either, there need to be sufficient TX
resources that all TX paths can acquire sufficient resources to
proceed in parallel or there need to be sufficient resources for any
one path to make progress along with a mechanism to serialize those
paths. The former solution is probably a none starter for a small
system because the number of buffers required is likely to be
unreasonably large. The latter solution I think implies that no TX
path can block waiting for resources unless it currently holds no
resources.... ie blocking to get a buffer is ok, blocking to extend a
buffer or to get a second buffer is not ok.
While I agree we should prevent the remote to consume all the buffer
and possible starve the TX, this is probably due to echo_server design
that deep copies the buffers from RX to TX, in a normal application
the RX would be processed and unrefed causing the data buffers to
return to the pool immediately. Even if we split the RX in a separate
pool any context can just ref the buffer causing the RX to stave
again, so at least in this aspect it seems to be a bug in the
application otherwise we will end up having each and every context to
have its own exclusive pool.

That said it is perhaps not a bad idea to design an optional callback
for the net_context to provide their own pools, we have something like
that for L2CAP channels:

/** Channel alloc_buf callback
*
* If this callback is provided the channel will use it to allocate
* buffers to store incoming data.
*
* @param chan The channel requesting a buffer.
*
* @return Allocated buffer.
*/
struct net_buf *(*alloc_buf)(struct bt_l2cap_chan *chan);

This is how we allocate net_buf from the IP stack which has a much
bigger MTU than Bluetooth and that way we also avoid starving the
Bluetooth RX pool when reassembling the segments, actually this most
likely will be necessary in case there are protocols that need to
implement their own fragmentation and reassembly because in that case
the lifetime of the buffers cannot be controlled directly by the
stack.

The timeout to buffer API helps a bit but still we might run out of
buffers.
For incremental acquisition of further resources this doesn't help, it
can't guarantee to prevent dead lock and its use in the software stack
makes reasoning about deadlock harder.

Cheers
/Marcus
_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@lists.zephyrproject.org
https://lists.zephyrproject.org/mailman/listinfo/zephyr-devel


--
Luiz Augusto von Dentz

Join devel@lists.zephyrproject.org to automatically receive all group messages.