Problems managing NBUF DATA pool in the networking stack


Luiz Augusto von Dentz
 

Hi Geoff,


While it is probably a good idea to look for good research solutions
we actually need to make sense what it does, and what doesn't make
sense, for zephyr. Pretty much any layer that requires a lot more
memory, and that include threads that requires dedicated stack, buffer
pools and complexity in general is imo a big no, no for zephyr. That
said the net_buf, which is what nbuf uses, has been based on skb
concept from Linux, the pool work a big differently though since we
don't use memory allocation, but it is not that we haven't look at any
prior art, it just that we don't have any plans for queuing
discipline/network schedulers that perhaps you have in mind.

On Tue, Feb 14, 2017 at 5:24 PM, Geoff Thorpe <geoff.thorpe@...> wrote:
While I don't personally have answers to these buffer-management questions, I am certain they are well-studied, because they are intermingled with lots of other well-studied questions and use-cases that influence buffer handling, like flow-control, QoS, order-restoration, order-preservation, bridging, forwarding, tunneling, VLANs, and so on. If I recall, the "obvious solutions" usually aren't - i.e. they're either not obvious or not (general) solutions. The buffer-handling change to remediate one problematic use-case usually causes some other equally valid use-case to degenerate.

I guess I'm just saying that we should find prior art and best practice, rather than trying to derive it from first principles and experimentation.

Do we already have in our midst anyone who has familiarity with NPUs, OpenDataPlane, etc? If not, I can put out some feelers.

Cheers
Geoff


-----Original Message-----
From: zephyr-devel-bounces@... [mailto:zephyr-devel-bounces@...] On Behalf Of Jukka Rissanen
Sent: February-14-17 8:46 AM
To: Piotr Mieńkowski <piotr.mienkowski@...>; zephyr-devel@...
Subject: Re: [Zephyr-devel] Problems managing NBUF DATA pool in the networking stack

Hi Piotr,

On Tue, 2017-02-14 at 02:26 +0100, Piotr Mieńkowski wrote:
Hi,
While I agree we should prevent the remote to consume all the
buffer
and possible starve the TX, this is probably due to echo_server
design
that deep copies the buffers from RX to TX, in a normal
application
Indeed the echo server could perhaps be optimized not to deep
copy
thus removing the issue. The wider question here is whether or
not we
want a design rule that effectively states that all applications
should consume and unref their rx buffers before attempting to
allocate tx buffers. This may be convenient for some
applications,
but I'm not convinced that is always the case. Such a design
rule
effectively states that an application that needs to retain or
process
information from request to response must now have somewhere to
store
all of that information between buffers and rules out any form of
incremental processing of an rx buffer interleaved with the
construction of the tx message.
If you read the entire email it would be clearer that I did not
suggest it was fine to rule out incremental processing, in fact I
suggested to add pools per net_context that way the stack itself
will
not have to drop its own packets and stop working because some
context
is taking all its buffers just to create clones.
So, what should be the final solution to the NBUF DATA issue? Do we
want to redesign echo_server sample application to use shallow copy,
should we introduce NBUF DATA pool per context, a separate NBUF DATA
pool for TX and RX? Something else?

In my opinion enforcing too much granularity on allocation of data
buffers, i.e. having a separate nbuf data pool per context, maybe
another one for networking stack will not be optimal. Firstly,
Kconfig would become even more complex and users would have hard time
figuring out a safe set of options. What if we know one context will
not use many data buffers and another one a lot. Should we still
assign the same amount of data buffers per context? Secondly, every
separate data pool will add some spare buffers as a 'margin error'.
Thirdly, Ethernet driver which reserves data buffers for the RX path
has no notion of context, doesn't know which packets are meant for
the networking stack, which one for the application. It would not
know from which data pool to take the buffers. It can only
distinguish between RX and TX path.

In principle, having shared resources is not a bad design approach.
However, we probably should have a way to guarantee a minimum amount
of buffers for the TX path. As a software engineer, if I need to
design a TX path in my networking application and I know that I have
some fixed amount of data buffers available I should be able to
manage it. The same task becomes much more difficult if my fixed
amount of data buffers can at any given moment become zero for
reasons which are beyond my control. This is the case currently.
I agree that having too fine grained setup for the buffers is bad and
should be avoided. The current setup of RX, TX and shared DATA buffers
has worked for UDP quite well. For TCP the situation gets much more
difficult as TCP might hold the nbuf for a while until an ack is
received for those pending packets. TCP code should not affect the
other part of the IP stack and starve the buffers from other part of
the stack.

One option is to have a separate pool for TCP data nbuf's that could be
shared by all the TCP contexts. The TCP code could allocate all the
buffers that need to wait ack from this pool instead of global data
pool. This would avoid allocating a separate pool for each context
which is sub-optimal for memory consumption.

Cheers,
Jukka

_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@...
https://lists.zephyrproject.org/mailman/listinfo/zephyr-devel
_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@...
https://lists.zephyrproject.org/mailman/listinfo/zephyr-devel
--
Luiz Augusto von Dentz


Geoff Thorpe <geoff.thorpe@...>
 

While I don't personally have answers to these buffer-management questions, I am certain they are well-studied, because they are intermingled with lots of other well-studied questions and use-cases that influence buffer handling, like flow-control, QoS, order-restoration, order-preservation, bridging, forwarding, tunneling, VLANs, and so on. If I recall, the "obvious solutions" usually aren't - i.e. they're either not obvious or not (general) solutions. The buffer-handling change to remediate one problematic use-case usually causes some other equally valid use-case to degenerate.

I guess I'm just saying that we should find prior art and best practice, rather than trying to derive it from first principles and experimentation.

Do we already have in our midst anyone who has familiarity with NPUs, OpenDataPlane, etc? If not, I can put out some feelers.

Cheers
Geoff

-----Original Message-----
From: zephyr-devel-bounces@... [mailto:zephyr-devel-bounces@...] On Behalf Of Jukka Rissanen
Sent: February-14-17 8:46 AM
To: Piotr Mieńkowski <piotr.mienkowski@...>; zephyr-devel@...
Subject: Re: [Zephyr-devel] Problems managing NBUF DATA pool in the networking stack

Hi Piotr,

On Tue, 2017-02-14 at 02:26 +0100, Piotr Mieńkowski wrote:
Hi,
While I agree we should prevent the remote to consume all the
buffer
and possible starve the TX, this is probably due to echo_server
design
that deep copies the buffers from RX to TX, in a normal
application
Indeed the echo server could perhaps be optimized not to deep
copy
thus removing the issue.  The wider question here is whether or
not we
want a design rule that effectively states that all applications
should consume and unref their rx buffers before attempting to
allocate tx buffers.   This may be convenient for some
applications,
but I'm not convinced that is always the case.  Such a design
rule
effectively states that an application that needs to retain or
process
information from request to response must now have somewhere to
store
all of that information between buffers and rules out any form of
incremental processing of an rx buffer interleaved with the
construction of the tx message.
If you read the entire email it would be clearer that I did not
suggest it was fine to rule out incremental processing, in fact I
suggested to add pools per net_context that way the stack itself
will
not have to drop its own packets and stop working because some
context
is taking all its buffers just to create clones.
 So, what should be the final solution to the NBUF DATA issue? Do we
want to redesign echo_server sample application to use shallow copy,
should we introduce NBUF DATA pool per context, a separate NBUF DATA
pool for TX and RX? Something else?

In my opinion enforcing too much granularity on allocation of data
buffers, i.e. having a separate nbuf data pool per context, maybe
another one for networking stack will not be optimal. Firstly,
Kconfig would become even more complex and users would have hard time
figuring out a safe set of options. What if we know one context will
not use many data buffers and another one a lot. Should we still
assign the same amount of data buffers per context? Secondly, every
separate data pool will add some spare buffers as a 'margin error'.
Thirdly, Ethernet driver which reserves data buffers for the RX path
has no notion of context, doesn't know which packets are meant for
the networking stack, which one for the application. It would not
know from which data pool to take the buffers. It can only
distinguish between RX and TX path.

In principle, having shared resources is not a bad design approach.
However, we probably should have a way to guarantee a minimum amount
of buffers for the TX path. As a software engineer, if I need to
design a TX path in my networking application and I know that I have
some fixed amount of data buffers available I should be able to
manage it. The same task becomes much more difficult if my fixed
amount of data buffers can at any given moment become zero for
reasons which are beyond my control. This is the case currently.
I agree that having too fine grained setup for the buffers is bad and
should be avoided. The current setup of RX, TX and shared DATA buffers
has worked for UDP quite well. For TCP the situation gets much more
difficult as TCP might hold the nbuf for a while until an ack is
received for those pending packets. TCP code should not affect the
other part of the IP stack and starve the buffers from other part of
the stack.

One option is to have a separate pool for TCP data nbuf's that could be
shared by all the TCP contexts. The TCP code could allocate all the
buffers that need to wait ack from this pool instead of global data
pool. This would avoid allocating a separate pool for each context
which is sub-optimal for memory consumption.

Cheers,
Jukka

_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@...
https://lists.zephyrproject.org/mailman/listinfo/zephyr-devel


Marcus Shawcroft <marcus.shawcroft@...>
 

On 14 February 2017 at 13:46, Jukka Rissanen
<jukka.rissanen@...> wrote:

I agree that having too fine grained setup for the buffers is bad and
should be avoided. The current setup of RX, TX and shared DATA buffers
has worked for UDP quite well.
We do however still need to figure out, for udp, how to prevent:
- rx path starving tx path to dead lock
- multiple tx paths deadlocking (by attempting to acquire buffers incrementally)

Cheers
/Marcus


Jukka Rissanen
 

Hi Piotr,

On Tue, 2017-02-14 at 02:26 +0100, Piotr Mieńkowski wrote:
Hi,
While I agree we should prevent the remote to consume all the
buffer
and possible starve the TX, this is probably due to echo_server
design
that deep copies the buffers from RX to TX, in a normal
application
Indeed the echo server could perhaps be optimized not to deep
copy
thus removing the issue.  The wider question here is whether or
not we
want a design rule that effectively states that all applications
should consume and unref their rx buffers before attempting to
allocate tx buffers.   This may be convenient for some
applications,
but I'm not convinced that is always the case.  Such a design
rule
effectively states that an application that needs to retain or
process
information from request to response must now have somewhere to
store
all of that information between buffers and rules out any form of
incremental processing of an rx buffer interleaved with the
construction of the tx message.
If you read the entire email it would be clearer that I did not
suggest it was fine to rule out incremental processing, in fact I
suggested to add pools per net_context that way the stack itself
will
not have to drop its own packets and stop working because some
context
is taking all its buffers just to create clones.
 So, what should be the final solution to the NBUF DATA issue? Do we
want to redesign echo_server sample application to use shallow copy,
should we introduce NBUF DATA pool per context, a separate NBUF DATA
pool for TX and RX? Something else?

In my opinion enforcing too much granularity on allocation of data
buffers, i.e. having a separate nbuf data pool per context, maybe
another one for networking stack will not be optimal. Firstly,
Kconfig would become even more complex and users would have hard time
figuring out a safe set of options. What if we know one context will
not use many data buffers and another one a lot. Should we still
assign the same amount of data buffers per context? Secondly, every
separate data pool will add some spare buffers as a 'margin error'.
Thirdly, Ethernet driver which reserves data buffers for the RX path
has no notion of context, doesn't know which packets are meant for
the networking stack, which one for the application. It would not
know from which data pool to take the buffers. It can only
distinguish between RX and TX path.

In principle, having shared resources is not a bad design approach.
However, we probably should have a way to guarantee a minimum amount
of buffers for the TX path. As a software engineer, if I need to
design a TX path in my networking application and I know that I have
some fixed amount of data buffers available I should be able to
manage it. The same task becomes much more difficult if my fixed
amount of data buffers can at any given moment become zero for
reasons which are beyond my control. This is the case currently.
I agree that having too fine grained setup for the buffers is bad and
should be avoided. The current setup of RX, TX and shared DATA buffers
has worked for UDP quite well. For TCP the situation gets much more
difficult as TCP might hold the nbuf for a while until an ack is
received for those pending packets. TCP code should not affect the
other part of the IP stack and starve the buffers from other part of
the stack.

One option is to have a separate pool for TCP data nbuf's that could be
shared by all the TCP contexts. The TCP code could allocate all the
buffers that need to wait ack from this pool instead of global data
pool. This would avoid allocating a separate pool for each context
which is sub-optimal for memory consumption.

Cheers,
Jukka


Piotr Mienkowski
 

Hi,

While I agree we should prevent the remote to consume all the buffer
and possible starve the TX, this is probably due to echo_server design
that deep copies the buffers from RX to TX, in a normal application
Indeed the echo server could perhaps be optimized not to deep copy
thus removing the issue.  The wider question here is whether or not we
want a design rule that effectively states that all applications
should consume and unref their rx buffers before attempting to
allocate tx buffers.   This may be convenient for some applications,
but I'm not convinced that is always the case.  Such a design rule
effectively states that an application that needs to retain or process
information from request to response must now have somewhere to store
all of that information between buffers and rules out any form of
incremental processing of an rx buffer interleaved with the
construction of the tx message.
If you read the entire email it would be clearer that I did not
suggest it was fine to rule out incremental processing, in fact I
suggested to add pools per net_context that way the stack itself will
not have to drop its own packets and stop working because some context
is taking all its buffers just to create clones.
So, what should be the final solution to the NBUF DATA issue? Do we want to redesign echo_server sample application to use shallow copy, should we introduce NBUF DATA pool per context, a separate NBUF DATA pool for TX and RX? Something else?

In my opinion enforcing too much granularity on allocation of data buffers, i.e. having a separate nbuf data pool per context, maybe another one for networking stack will not be optimal. Firstly, Kconfig would become even more complex and users would have hard time figuring out a safe set of options. What if we know one context will not use many data buffers and another one a lot. Should we still assign the same amount of data buffers per context? Secondly, every separate data pool will add some spare buffers as a 'margin error'. Thirdly, Ethernet driver which reserves data buffers for the RX path has no notion of context, doesn't know which packets are meant for the networking stack, which one for the application. It would not know from which data pool to take the buffers. It can only distinguish between RX and TX path.

In principle, having shared resources is not a bad design approach. However, we probably should have a way to guarantee a minimum amount of buffers for the TX path. As a software engineer, if I need to design a TX path in my networking application and I know that I have some fixed amount of data buffers available I should be able to manage it. The same task becomes much more difficult if my fixed amount of data buffers can at any given moment become zero for reasons which are beyond my control. This is the case currently.

Regards,
Piotr


Luiz Augusto von Dentz
 

Hi Marcus,

On Thu, Feb 9, 2017 at 1:17 PM, Marcus Shawcroft
<marcus.shawcroft@...> wrote:
On 8 February 2017 at 13:18, Luiz Augusto von Dentz
<luiz.dentz@...> wrote:

While I agree we should prevent the remote to consume all the buffer
and possible starve the TX, this is probably due to echo_server design
that deep copies the buffers from RX to TX, in a normal application
Indeed the echo server could perhaps be optimized not to deep copy
thus removing the issue. The wider question here is whether or not we
want a design rule that effectively states that all applications
should consume and unref their rx buffers before attempting to
allocate tx buffers. This may be convenient for some applications,
but I'm not convinced that is always the case. Such a design rule
effectively states that an application that needs to retain or process
information from request to response must now have somewhere to store
all of that information between buffers and rules out any form of
incremental processing of an rx buffer interleaved with the
construction of the tx message.
If you read the entire email it would be clearer that I did not
suggest it was fine to rule out incremental processing, in fact I
suggested to add pools per net_context that way the stack itself will
not have to drop its own packets and stop working because some context
is taking all its buffers just to create clones.

the RX would be processed and unrefed causing the data buffers to
return to the pool immediately. Even if we split the RX in a separate
pool any context can just ref the buffer causing the RX to stave
Maybe I missed the point here, but dropped packets due to rx buffer
starvation is no different to network packet loss, while undesirable,
high levels in the stack are already design to cope appropriately.
Indeed it might be considered the same as packet loss, while it is
handled properly it also cause a drop in throughput, but I wasn't
arguing against dropping packets but the fact that sharing data
buffers with TX don't work for the reason that the application may
hold their own references.

--
Luiz Augusto von Dentz


Marcus Shawcroft <marcus.shawcroft@...>
 

On 8 February 2017 at 13:18, Luiz Augusto von Dentz
<luiz.dentz@...> wrote:

While I agree we should prevent the remote to consume all the buffer
and possible starve the TX, this is probably due to echo_server design
that deep copies the buffers from RX to TX, in a normal application
Indeed the echo server could perhaps be optimized not to deep copy
thus removing the issue. The wider question here is whether or not we
want a design rule that effectively states that all applications
should consume and unref their rx buffers before attempting to
allocate tx buffers. This may be convenient for some applications,
but I'm not convinced that is always the case. Such a design rule
effectively states that an application that needs to retain or process
information from request to response must now have somewhere to store
all of that information between buffers and rules out any form of
incremental processing of an rx buffer interleaved with the
construction of the tx message.

the RX would be processed and unrefed causing the data buffers to
return to the pool immediately. Even if we split the RX in a separate
pool any context can just ref the buffer causing the RX to stave
Maybe I missed the point here, but dropped packets due to rx buffer
starvation is no different to network packet loss, while undesirable,
high levels in the stack are already design to cope appropriately.

Cheers
/Marcus


Luiz Augusto von Dentz
 

Hi Marcus,

On Wed, Feb 8, 2017 at 12:37 PM, Marcus Shawcroft
<marcus.shawcroft@...> wrote:
On 8 February 2017 at 07:04, Jukka Rissanen
<jukka.rissanen@...> wrote:

One option would be to split the DATA pool to two so one pool for
sending and receiving. Then again this does not solve much as you might
still get to a situation where all the buffers are exhausted.
Running out of resources is bad, dead lock, especially undetected
deadlock, is worse. Avoiding the dead lock where the RX path starves
the rest of the system of resources requires that the resources the RX
path can consume are separate from the resources available to the TX
path(s). Limiting resource consumption by the RX path is straight
forward, buffers come from fixed size pool, when the pool is empty we
drop packets. Now we have a situation where RX cannot starve TX, we
just need to ensure that multiple TX paths cannot deadlock each other.

Dealing with resource exhaustion on the TX side is harder. In a
system with multiple TX paths either, there need to be sufficient TX
resources that all TX paths can acquire sufficient resources to
proceed in parallel or there need to be sufficient resources for any
one path to make progress along with a mechanism to serialize those
paths. The former solution is probably a none starter for a small
system because the number of buffers required is likely to be
unreasonably large. The latter solution I think implies that no TX
path can block waiting for resources unless it currently holds no
resources.... ie blocking to get a buffer is ok, blocking to extend a
buffer or to get a second buffer is not ok.
While I agree we should prevent the remote to consume all the buffer
and possible starve the TX, this is probably due to echo_server design
that deep copies the buffers from RX to TX, in a normal application
the RX would be processed and unrefed causing the data buffers to
return to the pool immediately. Even if we split the RX in a separate
pool any context can just ref the buffer causing the RX to stave
again, so at least in this aspect it seems to be a bug in the
application otherwise we will end up having each and every context to
have its own exclusive pool.

That said it is perhaps not a bad idea to design an optional callback
for the net_context to provide their own pools, we have something like
that for L2CAP channels:

/** Channel alloc_buf callback
*
* If this callback is provided the channel will use it to allocate
* buffers to store incoming data.
*
* @param chan The channel requesting a buffer.
*
* @return Allocated buffer.
*/
struct net_buf *(*alloc_buf)(struct bt_l2cap_chan *chan);

This is how we allocate net_buf from the IP stack which has a much
bigger MTU than Bluetooth and that way we also avoid starving the
Bluetooth RX pool when reassembling the segments, actually this most
likely will be necessary in case there are protocols that need to
implement their own fragmentation and reassembly because in that case
the lifetime of the buffers cannot be controlled directly by the
stack.

The timeout to buffer API helps a bit but still we might run out of
buffers.
For incremental acquisition of further resources this doesn't help, it
can't guarantee to prevent dead lock and its use in the software stack
makes reasoning about deadlock harder.

Cheers
/Marcus
_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@...
https://lists.zephyrproject.org/mailman/listinfo/zephyr-devel


--
Luiz Augusto von Dentz


Marcus Shawcroft <marcus.shawcroft@...>
 

On 8 February 2017 at 07:04, Jukka Rissanen
<jukka.rissanen@...> wrote:

One option would be to split the DATA pool to two so one pool for
sending and receiving. Then again this does not solve much as you might
still get to a situation where all the buffers are exhausted.
Running out of resources is bad, dead lock, especially undetected
deadlock, is worse. Avoiding the dead lock where the RX path starves
the rest of the system of resources requires that the resources the RX
path can consume are separate from the resources available to the TX
path(s). Limiting resource consumption by the RX path is straight
forward, buffers come from fixed size pool, when the pool is empty we
drop packets. Now we have a situation where RX cannot starve TX, we
just need to ensure that multiple TX paths cannot deadlock each other.

Dealing with resource exhaustion on the TX side is harder. In a
system with multiple TX paths either, there need to be sufficient TX
resources that all TX paths can acquire sufficient resources to
proceed in parallel or there need to be sufficient resources for any
one path to make progress along with a mechanism to serialize those
paths. The former solution is probably a none starter for a small
system because the number of buffers required is likely to be
unreasonably large. The latter solution I think implies that no TX
path can block waiting for resources unless it currently holds no
resources.... ie blocking to get a buffer is ok, blocking to extend a
buffer or to get a second buffer is not ok.

The timeout to buffer API helps a bit but still we might run out of
buffers.
For incremental acquisition of further resources this doesn't help, it
can't guarantee to prevent dead lock and its use in the software stack
makes reasoning about deadlock harder.

Cheers
/Marcus


Jukka Rissanen
 

Hi Piotr,

On Tue, 2017-02-07 at 17:56 +0100, Piotr Mienkowski wrote:
Hi,
There seems to be a conceptual issue in a way networking buffers are
currently set up. I was thinking about entering Jira bug report but
maybe it's just me missing some information or otherwise
misunderstanding how the networking stack is supposed to be used.
I'll shortly describe the problem here based on Zephyr echo_server
sample application.
Currently if the echo_server application receives a large amount of
data, e.g. when a large file is sent via ncat the application will
lock up and stop responding. The only way out is to reset the device.
This problem is very easily observed with eth_sam_gmac Ethernet
driver and should be just as easy to spot with eth_mcux. Due to a
different driver architecture it may be more difficult to observe
with eth_enc28j60.
The problem is as follows. Via Kconfig we define RX, TX and data
buffers pool. Let's say like this:
CONFIG_NET_NBUF_RX_COUNT=14
CONFIG_NET_NBUF_TX_COUNT=14
CONFIG_NET_NBUF_DATA_COUNT=72
The number of RX and TX buffers corresponds to the number of RX/TX
frames which may be simultaneously received/send. The data buffers
count tells us how much storage we reserve for the actual data. This
pool is shared between RX and TX path. If we receive a large amount
of data the RX path will consume all available data buffers leaving
none for the TX path. If an application then tries to reserve data
buffers for the TX path, e.g. echo_server does it in
build_reply_buf() function, it will get stuck waiting forever for a
free data buffer. echo_server application gets stuck on the following
line
frag = net_nbuf_get_data(context);
The simplified sequence of events in the echo_server application is
as follows: receive RX frame -> reserve data buffers for TX frame ->
copy data from RX frame to TX frame -> free resources associated with
RX frame -> send TX frame.
One way to avoid it is to define number of data buffers large enough
so the RX path cannot exhaust available data pool. Taking into
account that data buffer size is 128 bytes, this is defined by the
following Kconfig parameter,
CONFIG_NET_NBUF_DATA_SIZE=128
and maximum frame size is 1518 or 1536 bytes one RX frame can use up
to 12 data buffers. In our example we would need to reserve more than
12*14 data buffers to ensure correct behavior. In case of
eth_sam_gmac Ethernet driver even more. 
After recent updates to the networking stack the functions reserving
RX/TX/DATA buffers have a timeout parameter. That would prevent lock
up but it still does not really solve the issue.
Is there a better way to manage this?
One option would be to split the DATA pool to two so one pool for
sending and receiving. Then again this does not solve much as you might
still get to a situation where all the buffers are exhausted.

The timeout to buffer API helps a bit but still we might run out of
buffers.

One should allocate as many buffers to DATA pool as possible but this
really depends on hw and applications of course.


Thanks and regards,
Piotr
Cheers,
Jukka


Piotr Mienkowski
 

Hi,

There seems to be a conceptual issue in a way networking buffers are currently set up. I was thinking about entering Jira bug report but maybe it's just me missing some information or otherwise misunderstanding how the networking stack is supposed to be used. I'll shortly describe the problem here based on Zephyr echo_server sample application.

Currently if the echo_server application receives a large amount of data, e.g. when a large file is sent via ncat the application will lock up and stop responding. The only way out is to reset the device. This problem is very easily observed with eth_sam_gmac Ethernet driver and should be just as easy to spot with eth_mcux. Due to a different driver architecture it may be more difficult to observe with eth_enc28j60.

The problem is as follows. Via Kconfig we define RX, TX and data buffers pool. Let's say like this:

CONFIG_NET_NBUF_RX_COUNT=14
CONFIG_NET_NBUF_TX_COUNT=14
CONFIG_NET_NBUF_DATA_COUNT=72

The number of RX and TX buffers corresponds to the number of RX/TX frames which may be simultaneously received/send. The data buffers count tells us how much storage we reserve for the actual data. This pool is shared between RX and TX path. If we receive a large amount of data the RX path will consume all available data buffers leaving none for the TX path. If an application then tries to reserve data buffers for the TX path, e.g. echo_server does it in build_reply_buf() function, it will get stuck waiting forever for a free data buffer. echo_server application gets stuck on the following line

frag = net_nbuf_get_data(context);

The simplified sequence of events in the echo_server application is as follows: receive RX frame -> reserve data buffers for TX frame -> copy data from RX frame to TX frame -> free resources associated with RX frame -> send TX frame.

One way to avoid it is to define number of data buffers large enough so the RX path cannot exhaust available data pool. Taking into account that data buffer size is 128 bytes, this is defined by the following Kconfig parameter,

CONFIG_NET_NBUF_DATA_SIZE=128

and maximum frame size is 1518 or 1536 bytes one RX frame can use up to 12 data buffers. In our example we would need to reserve more than 12*14 data buffers to ensure correct behavior. In case of eth_sam_gmac Ethernet driver even more.

After recent updates to the networking stack the functions reserving RX/TX/DATA buffers have a timeout parameter. That would prevent lock up but it still does not really solve the issue.

Is there a better way to manage this?

Thanks and regards,
Piotr