Hi,
There seems to be a conceptual issue in a way networking buffers
are currently set up. I was thinking about entering Jira bug
report but maybe it's just me missing some information or
otherwise misunderstanding how the networking stack is supposed to
be used. I'll shortly describe the problem here based on Zephyr
echo_server sample application.
Currently if the echo_server application receives a large amount
of data, e.g. when a large file is sent via ncat the application
will lock up and stop responding. The only way out is to reset the
device. This problem is very easily observed with eth_sam_gmac
Ethernet driver and should be just as easy to spot with eth_mcux.
Due to a different driver architecture it may be more difficult to
observe with eth_enc28j60.
The problem is as follows. Via Kconfig we define RX, TX and data
buffers pool. Let's say like this:
CONFIG_NET_NBUF_RX_COUNT=14
CONFIG_NET_NBUF_TX_COUNT=14
CONFIG_NET_NBUF_DATA_COUNT=72
The number of RX and TX buffers corresponds to the number of
RX/TX frames which may be simultaneously received/send. The data
buffers count tells us how much storage we reserve for the actual
data. This pool is shared between RX and TX path. If we receive a
large amount of data the RX path will consume all available data
buffers leaving none for the TX path. If an application then tries
to reserve data buffers for the TX path, e.g. echo_server does it
in build_reply_buf() function, it will get stuck waiting forever
for a free data buffer. echo_server application gets stuck on the
following line
frag = net_nbuf_get_data(context);
The simplified sequence of events in the echo_server application
is as follows: receive RX frame -> reserve data buffers for TX
frame -> copy data from RX frame to TX frame -> free
resources associated with RX frame -> send TX frame.
One way to avoid it is to define number of data buffers large
enough so the RX path cannot exhaust available data pool. Taking
into account that data buffer size is 128 bytes, this is defined
by the following Kconfig parameter,
CONFIG_NET_NBUF_DATA_SIZE=128
and maximum frame size is 1518 or 1536 bytes one RX frame can use
up to 12 data buffers. In our example we would need to reserve
more than 12*14 data buffers to ensure correct behavior. In case
of eth_sam_gmac Ethernet driver even more.
After recent updates to the networking stack the functions
reserving RX/TX/DATA buffers have a timeout parameter. That would
prevent lock up but it still does not really solve the issue.
Is there a better way to manage this?
Thanks and regards,
Piotr