A possible case of TCP server no-response, was: Re: bufs lost in TCP connection establishment
Hello Rohit, et al,
Thanks for the information below. I didn't yet have chance to test
multiple TCP/IPv4 connections in row, because I have issue with
reliably handling one. Just as before, my setup is a scripting language
(MicroPython), using combination of Zephyr and uIP APIs to emulate BSD
sockets. I however proceeded to play with mbedTLS on top of sockets
implemented in such way. That means multiple bidirectional packet
exchanges which exposes quite a number of edge conditions. There're
packets can be dropped anytime.
One particularly naughty conditions I experienced is that while my
usual testcase usually proceeded beyond connect() call, sometimes I got
time "strips" when connect() call just hanged (I don't use timeouts,
like BSD sockets by default) - with consecutive dozen or so of QEMU
restarts (I'm using QEMU virtual networking still so far). Then it
suddenly started working again, to repeat soon again. Looking with
Wireshark, Zephyr app just was sending SYN packets, with zero reply
from a Linux side. At this time, it's easy to start wonder what gets
wrong - Linux handling of TUN interface, tunslip's setting it up, socat
which is involved in connecting tunslip and QEMU, Wireshark, or
I set to debug just that weird case, with initial hypothesis that Linux
SYN flood protection may be involved. With some googling and netstat
grepping, I found that when I get those can't-connect time strips,
there's actually ESTABLISHED connection on the host. That led me to
which finished the picture.
So, uIP/Zephyr don't use source port randomization, and always open
connection from port 1025. While code snippets I ran included closing
a socket, if something went wrong, like a packet was lost during
initial connection or TLS exchange, the connection wasn't closed
explicitly, but left open on server. On QEMU restart, it started to
send SYN to establish new connection from the same source port, which,
as explained in the link above, gets zero response from a Linux host.
Eventually Apache on host side times out that client connection, then
the situation recovers.
The solution is thus to check whether there's ESTABLISHED connection on
host side with netstat and restart a server process which owns it
(apache in my case).
The same situation may surely happen with a real device too, so I hope
this mail may save debugging time to some readers.
On Wed, 7 Sep 2016 15:26:23 +0000
Rohit Grover <Rohit.Grover(a)arm.com> wrote:
Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog