Re: Problems managing NBUF DATA pool in the networking stack


Jukka Rissanen
 

Hi Piotr,

On Tue, 2017-02-07 at 17:56 +0100, Piotr Mienkowski wrote:
Hi,
There seems to be a conceptual issue in a way networking buffers are
currently set up. I was thinking about entering Jira bug report but
maybe it's just me missing some information or otherwise
misunderstanding how the networking stack is supposed to be used.
I'll shortly describe the problem here based on Zephyr echo_server
sample application.
Currently if the echo_server application receives a large amount of
data, e.g. when a large file is sent via ncat the application will
lock up and stop responding. The only way out is to reset the device.
This problem is very easily observed with eth_sam_gmac Ethernet
driver and should be just as easy to spot with eth_mcux. Due to a
different driver architecture it may be more difficult to observe
with eth_enc28j60.
The problem is as follows. Via Kconfig we define RX, TX and data
buffers pool. Let's say like this:
CONFIG_NET_NBUF_RX_COUNT=14
CONFIG_NET_NBUF_TX_COUNT=14
CONFIG_NET_NBUF_DATA_COUNT=72
The number of RX and TX buffers corresponds to the number of RX/TX
frames which may be simultaneously received/send. The data buffers
count tells us how much storage we reserve for the actual data. This
pool is shared between RX and TX path. If we receive a large amount
of data the RX path will consume all available data buffers leaving
none for the TX path. If an application then tries to reserve data
buffers for the TX path, e.g. echo_server does it in
build_reply_buf() function, it will get stuck waiting forever for a
free data buffer. echo_server application gets stuck on the following
line
frag = net_nbuf_get_data(context);
The simplified sequence of events in the echo_server application is
as follows: receive RX frame -> reserve data buffers for TX frame ->
copy data from RX frame to TX frame -> free resources associated with
RX frame -> send TX frame.
One way to avoid it is to define number of data buffers large enough
so the RX path cannot exhaust available data pool. Taking into
account that data buffer size is 128 bytes, this is defined by the
following Kconfig parameter,
CONFIG_NET_NBUF_DATA_SIZE=128
and maximum frame size is 1518 or 1536 bytes one RX frame can use up
to 12 data buffers. In our example we would need to reserve more than
12*14 data buffers to ensure correct behavior. In case of
eth_sam_gmac Ethernet driver even more. 
After recent updates to the networking stack the functions reserving
RX/TX/DATA buffers have a timeout parameter. That would prevent lock
up but it still does not really solve the issue.
Is there a better way to manage this?
One option would be to split the DATA pool to two so one pool for
sending and receiving. Then again this does not solve much as you might
still get to a situation where all the buffers are exhausted.

The timeout to buffer API helps a bit but still we might run out of
buffers.

One should allocate as many buffers to DATA pool as possible but this
really depends on hw and applications of course.


Thanks and regards,
Piotr
Cheers,
Jukka

Join devel@lists.zephyrproject.org to automatically receive all group messages.