having trouble getting the echo client to work using TCP/uIP on K64F


Rohit Grover
 

Hello Community,

I've still not been able to get a TCP stream going over uIP using the echo-client sample.
I've unearthed problems along the way (including data-integrity issues in Zephyr's port of uIP; refer to https://gerrit.zephyrproject.org/r/#/c/4226/ ) which suggest that this kind of thing isn't actively tested. With my recent changes, I manage to get the first packet bounce back successfully to the echo-client. Successive packets aren't getting processed on their way out from the client.

I find that uip_process() isn't getting called on successive packets. It gets called for the first packet due to some retransmission timer. The recently made change to eventhandler() in tcpip.c (https://gerrit.zephyrproject.org/r/#/c/4050/) makes things worse because the code meant to refresh certain periodic timers gets skipped.

These issues seem to arise from uIP. I would appreciate some help from people who ported uIP to zephyr. I believe the problems will surface if someone tries a TCP/IPV4 stream.

rohit
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


Morgaine
 

This state of play with FRDM-K64F is really quite unexpected. Freescale
and their new owners NXP are after all *founding members* of the Zephyr
Project, and FRDM-K64F is not only their leading FRDM-series board but it
is also the only NXP board which is explicitly listed as Supported.

Given the above, it would be very reasonable for any reader to assume that
at least this one board would be excellently supported by Zephyr (as a
result of NXP's support), especially in regard to its key features like
built-in networking. Evidently that would be a false assumption.

Do founding members of the project not offer to provide manpower support or
running code, at least for their own boards?

I think it would be useful to understand the relationship between
sponsoring corporations and the project better. Despite operating under
the umbrella of the Linux Foundation, code contributions in Zephyr may
differ from the Linux model, for which companies commonly contribute kernel
code for their hardware.

If this is written down somewhere already, a link would be awesome.
Thanks! :-)

Morgaine.


Rohit Grover
 

Hello Community,

In my attempts to get a TCP stream to flow in and out of a K64F, I've had to make a few changes to core of the uIP port: https://gerrit.zephyrproject.org/r/#/q/rohit+grover+status:open.
I now have some success in getting data to flow continuously, nevertheless I haven't fixed all issues and I'm not sure my changes take all possible cases into account.
These changes have had to do with errors in how uIP handles outbound packets.
I'm not an expert in IP stacks, and in one particular case, https://gerrit.zephyrproject.org/r/#/c/4225/, I've not been able to provide a fix--my patch simply raises a certain retry-limit and results in the expected behaviour.

I would appreciate assistance from people who may have authored the port of uIP to Zephyr. It seems to me that these issues should have surfaced for any application attempting to setup a TCP/IPv4 stream over uIP; it is a surprise that these have gone undetected so far.

Rohit.

-----Original Message-----
> From: Rohit Grover
> Sent: 22 August 2016 15:58
> To: devel(a)lists.zephyrproject.org
> Cc: Jukka Rissanen; 'Paul Sokolovsky'
> Subject: having trouble getting the echo client to work using TCP/uIP on K64F
>
> Hello Community,
>
> I've still not been able to get a TCP stream going over uIP using the echo-
> client sample.
> I've unearthed problems along the way (including data-integrity issues in
> Zephyr's port of uIP; refer to https://gerrit.zephyrproject.org/r/#/c/4226/ )
> which suggest that this kind of thing isn't actively tested. With my recent
> changes, I manage to get the first packet bounce back successfully to the
> echo-client. Successive packets aren't getting processed on their way out
> from the client.

> rohit
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


Paul Sokolovsky
 

Hello Rohit,

On Mon, 22 Aug 2016 14:58:11 +0000
Rohit Grover <Rohit.Grover(a)arm.com> wrote:

I've still not been able to get a TCP stream going over uIP using the
echo-client sample. I've unearthed problems along the way (including
data-integrity issues in Zephyr's port of uIP; refer to
https://gerrit.zephyrproject.org/r/#/c/4226/ ) which suggest that
this kind of thing isn't actively tested.
I can now confirm this issue, running via QEMU virtual networking
(actually, even doing this from MicroPython Zephyr port running in
QEMU, this being initial steps to add Z's networking binding to
MicroPython).

Well, as you know, TCP was added to Z relatively recently, so some
white spots aren't surprising. In this particular case, with
"echo_server" sample app available, I guess most people (of those who
enable TCP support in it at all) test the other direction, host ->
Z-device.

What's more concerning is that the code in which you find issue
(apparently) belongs to uIP, and you know, uIP was used in many dozens
of thousands of devices over the last decade, so getting to the bottom
of the issue would be really important.

With my recent changes, I
manage to get the first packet bounce back successfully to the
echo-client. Successive packets aren't getting processed on their way
out from the client.
Can't say much about this, yet. Reverse case (multiple packers, host ->
QEMU Z device) works well. Did you pay attention that networking needs
to be used differently for micro-kernel and nano-kernel? Microkernel
can issue networking calls directly, while nano needs them to be
wrapped in a fiber. Anyway, that's what echo_server/echo_client do, so
if you use them already, shouldn't be a concern.


--
Best Regards,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog


Paul Sokolovsky
 

On Mon, 22 Aug 2016 14:58:11 +0000
Rohit Grover <Rohit.Grover(a)arm.com> wrote:

With my recent changes, I
manage to get the first packet bounce back successfully to the
echo-client. Successive packets aren't getting processed on their way
out from the client.
I'm happy to report that I now see this too. Specifically, if using
net_context_get()+net_send() (implicit connection establishment), data
in first net_send() gets delivered (I see it in netcat), but not in
subsequent calls to net_send().

I'm however interested to emulate standard socket API paradigm, where
you get all the standard and important calls like connect(), listen(),
etc. For this, I explicitly call
net_context_tcp_init(ctx, NULL, NET_TCP_TYPE_CLIENT) to emulate
connect(). As netbuf is NULL in this call, only SYN/ACK handshake
happens (with retries, just as you describe), and then first call
net_send() doesn't send anything on wire. With logging it looks like:

s.connect(("192.0.2.1", 8083))
resolved: 192.0.2.1
net: net_driver_slip_send (001349e0): Sending 40 bytes, application data 0 bytes
net: net_rx_fiber (001363e0): Received buf 0012c600
ret=0
net: net_rx_fiber (001363e0): Received buf 0012d0f0
net: net_rx_fiber (001363e0): Received buf 0012b598
net: net_driver_slip_send (001363e0): Sending 40 bytes, application data 0 bytes

s.send("GET / HTTP/1.0\r\n")
net: net_send (001349e0): context 0012dfa0 buf 00129a38 status 0
net: net_tx_fiber (00135fe0): Sending (buf 00129a38, len 56) to IP stack
0
So, the difference is clear: SYN/ACK stuff goes to
net_driver_slip_send(), but real app data get stuck in net_tx_fiber().

I apply all your 3 patches, and they don't help.



I find that uip_process() isn't getting called on successive packets.
It gets called for the first packet due to some retransmission timer.
The recently made change to eventhandler() in tcpip.c
(https://gerrit.zephyrproject.org/r/#/c/4050/) makes things worse
because the code meant to refresh certain periodic timers gets
skipped.

These issues seem to arise from uIP. I would appreciate some help
from people who ported uIP to zephyr. I believe the problems will
surface if someone tries a TCP/IPV4 stream.

rohit
--
Best Regards,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog


Paul Sokolovsky
 

On Fri, 26 Aug 2016 01:07:29 +0300
Paul Sokolovsky <Paul.Sokolovsky(a)linaro.org> wrote:

Some more info: Issues with IPv4/TCP can be seen building echo_server
with microkernel - making a TCP connection in this case hard locks-up
QEMU VM, not even pings work. Nothing special in the logging. However
with nanokernel, everything works, and send (reply) path doesn't
involve net_tx_fiber:

Received packet 0x0010a4d0 len 7
receive: tcp received 7 bytes data
net: tcp_prepare_and_send (0010d27c): Packet output len 47
net: net_driver_slip_send (0010d27c): Sending 47 bytes, application data 7 bytes



On Mon, 22 Aug 2016 14:58:11 +0000
Rohit Grover <Rohit.Grover(a)arm.com> wrote:

With my recent changes, I
manage to get the first packet bounce back successfully to the
echo-client. Successive packets aren't getting processed on their
way out from the client.
I'm happy to report that I now see this too. Specifically, if using
net_context_get()+net_send() (implicit connection establishment), data
in first net_send() gets delivered (I see it in netcat), but not in
subsequent calls to net_send().

I'm however interested to emulate standard socket API paradigm, where
you get all the standard and important calls like connect(), listen(),
etc. For this, I explicitly call
net_context_tcp_init(ctx, NULL, NET_TCP_TYPE_CLIENT) to emulate
connect(). As netbuf is NULL in this call, only SYN/ACK handshake
happens (with retries, just as you describe), and then first call
net_send() doesn't send anything on wire. With logging it looks like:

s.connect(("192.0.2.1", 8083))
resolved: 192.0.2.1
net: net_driver_slip_send (001349e0): Sending 40 bytes, application
data 0 bytes net: net_rx_fiber (001363e0): Received buf 0012c600
ret=0
net: net_rx_fiber (001363e0): Received buf 0012d0f0
net: net_rx_fiber (001363e0): Received buf 0012b598
net: net_driver_slip_send (001363e0): Sending 40 bytes, application
data 0 bytes

s.send("GET / HTTP/1.0\r\n")
net: net_send (001349e0): context 0012dfa0 buf 00129a38 status 0
net: net_tx_fiber (00135fe0): Sending (buf 00129a38, len 56) to IP
stack 0
So, the difference is clear: SYN/ACK stuff goes to
net_driver_slip_send(), but real app data get stuck in net_tx_fiber().

I apply all your 3 patches, and they don't help.



I find that uip_process() isn't getting called on successive
packets. It gets called for the first packet due to some
retransmission timer. The recently made change to eventhandler() in
tcpip.c (https://gerrit.zephyrproject.org/r/#/c/4050/) makes things
worse because the code meant to refresh certain periodic timers gets
skipped.

These issues seem to arise from uIP. I would appreciate some help
from people who ported uIP to zephyr. I believe the problems will
surface if someone tries a TCP/IPV4 stream.

rohit


--
Best Regards,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog


Rohit Grover
 

Paul,

It is useful to work our way through the core BSD socket API in the context of Z-uIP.

Let's take connect(). Assuming that a net_context has been setup, connect() can be implemented using the snippet:

net_context_tcp_init(net_context, NULL /* bp */, NET_TCP_TYPE_CLIENT);
while ((status = net_context_get_connection_status(unicast)) != 0) {
fiber_sleep(SOME_SMALL_DURATION);
}

We're passing `bp' as NULL, but it seems un-necessary for net_context_init() to require a bp. This leads us to commit 61edc68c, by Jukka, where many of the APIs suddenly acquired an additional 'bp' parameter. In some cases, this led to code which I'm having trouble understanding. , I've attached the diff, but I'll highlight a snippet. Take the case of tcpip_poll_tcp(), which is responsible for sending out the initial SYN packet.

@@ -837,12 +837,18 @@ tcpip_poll_udp(struct uip_udp_conn *conn)
/*---------------------------------------------------------------------------*/
#if UIP_TCP
void
-tcpip_poll_tcp(struct uip_conn *conn)
+tcpip_poll_tcp(struct uip_conn *conn, struct net_buf *data_buf)
{
/* We are sending here the initial SYN */
struct net_buf *buf = ip_buf_get_tx(conn->appstate.state);
- uip_set_conn(buf) = conn;
- conn->buf = ip_buf_ref(buf);
+ uip_set_conn(data_buf) = conn;
+
+ /* The conn->buf will be freed after we have established the connection,
+ * sent the message and received an ack to it. This will happen in
+ * net_core.c:net_send().
+ */
+ conn->buf = ip_buf_ref(data_buf);
+
process_post_synch(&tcpip_process, TCP_POLL, conn, buf);
}

Here we see that we acquire a buf to send out a 'SYN' (refer to ip_buf_get_tx(...) above), but whereas previously we would associate it with the new connection and have its ref-count bumped, we now do those things for the 'data_buf' instead. This is a bit confusing. I don't see why the handling of the SYN buf needed to be changed.

Rohit.

-----Original Message-----
> From: Paul Sokolovsky [mailto:paul.sokolovsky(a)linaro.org]
> Sent: 26 August 2016 00:57
> To: Rohit Grover
> Cc: devel(a)lists.zephyrproject.org; Jukka Rissanen
> Subject: Re: having trouble getting the echo client to work using TCP/uIP on
> K64F
>
> On Fri, 26 Aug 2016 01:07:29 +0300
> Paul Sokolovsky <Paul.Sokolovsky(a)linaro.org> wrote:
>
> Some more info: Issues with IPv4/TCP can be seen building echo_server with
> microkernel - making a TCP connection in this case hard locks-up QEMU VM,
> not even pings work. Nothing special in the logging. However with
> nanokernel, everything works, and send (reply) path doesn't involve
> net_tx_fiber:
>
> Received packet 0x0010a4d0 len 7
> receive: tcp received 7 bytes data
> net: tcp_prepare_and_send (0010d27c): Packet output len 47
> net: net_driver_slip_send (0010d27c): Sending 47 bytes, application data 7
> bytes
>
>
>
> > On Mon, 22 Aug 2016 14:58:11 +0000
> > Rohit Grover <Rohit.Grover(a)arm.com> wrote:
> >
> > > With my recent changes, I
> > > manage to get the first packet bounce back successfully to the
> > > echo-client. Successive packets aren't getting processed on their
> > > way out from the client.
> >
> > I'm happy to report that I now see this too. Specifically, if using
> > net_context_get()+net_send() (implicit connection establishment), data
> > in first net_send() gets delivered (I see it in netcat), but not in
> > subsequent calls to net_send().
> >
> > I'm however interested to emulate standard socket API paradigm, where
> > you get all the standard and important calls like connect(), listen(),
> > etc. For this, I explicitly call net_context_tcp_init(ctx, NULL,
> > NET_TCP_TYPE_CLIENT) to emulate connect(). As netbuf is NULL in this
> > call, only SYN/ACK handshake happens (with retries, just as you
> > describe), and then first call
> > net_send() doesn't send anything on wire. With logging it looks like:
> >
> > >>> s.connect(("192.0.2.1", 8083))
> > resolved: 192.0.2.1
> > net: net_driver_slip_send (001349e0): Sending 40 bytes, application
> > data 0 bytes net: net_rx_fiber (001363e0): Received buf 0012c600
> > ret=0
> > >>> net: net_rx_fiber (001363e0): Received buf 0012d0f0
> > net: net_rx_fiber (001363e0): Received buf 0012b598
> > net: net_driver_slip_send (001363e0): Sending 40 bytes, application
> > data 0 bytes
> >
> > >>> s.send("GET / HTTP/1.0\r\n")
> > net: net_send (001349e0): context 0012dfa0 buf 00129a38 status 0
> > net: net_tx_fiber (00135fe0): Sending (buf 00129a38, len 56) to IP
> > stack 0
> > >>>
> >
> > So, the difference is clear: SYN/ACK stuff goes to
> > net_driver_slip_send(), but real app data get stuck in net_tx_fiber().
> >
> > I apply all your 3 patches, and they don't help.
> >
> >
> > >
> > > I find that uip_process() isn't getting called on successive
> > > packets. It gets called for the first packet due to some
> > > retransmission timer. The recently made change to eventhandler() in
> > > tcpip.c (https://gerrit.zephyrproject.org/r/#/c/4050/) makes things
> > > worse because the code meant to refresh certain periodic timers gets
> > > skipped.
> > >
> > > These issues seem to arise from uIP. I would appreciate some help
> > > from people who ported uIP to zephyr. I believe the problems will
> > > surface if someone tries a TCP/IPV4 stream.
> > >
> > > rohit
> >
>
>
>
> --
> Best Regards,
> Paul
>
> Linaro.org | Open source software for ARM SoCs Follow Linaro:
> http://www.facebook.com/pages/Linaro
> http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


Rohit Grover
 

Paul,

With tcpip_poll_tcp() reverted to its original version:

void
tcpip_poll_tcp(struct uip_conn *conn, struct net_buf *data_buf)
{
/* We are sending here the initial SYN */
struct net_buf *buf = ip_buf_get_tx(conn->appstate.state);
uip_set_conn(buf) = conn;
conn->buf = ip_buf_ref(buf);

process_post_synch(&tcpip_process, TCP_POLL, conn, buf);
}

I can now connect and send a packet using the following code run either as a fiber or a task:

void activity(void)
{
printk("starting activity\n");

// int expecting_len = 0;
int status;

/* CONNECT */
net_context_tcp_init(unicast, NULL, NET_TCP_TYPE_CLIENT);
printk("activity: returned from net_context_tcp_init; waiting for status to change\n");

while ((status = net_context_get_connection_status(unicast)) != 0) {
#if defined(CONFIG_MICROKERNEL)
task_sleep(100);
#else
fiber_sleep(100);
#endif
}
printk("activity: status = %d\n", status);

/* SEND */
struct net_buf *buf = ip_buf_get_tx(unicast);
if (buf) {
uint8_t *ptr;
int sending_len = 64;

printk("activity: filling in net_buf with %u bytes from ipsum+%u\n", sending_len, ipsum_len - sending_len);
ptr = net_buf_add(buf, sending_len);
memcpy(ptr, lorem_ipsum + ipsum_len - sending_len, sending_len);

ip_buf_appdatalen(buf) = sending_len;

while ((status = net_send(buf)) < 0) {
printk("activity: got status %d; retrying in a while\n", status);
#if defined(CONFIG_MICROKERNEL)
task_sleep(100);
#else
fiber_sleep(100);
#endif
}
printk("activity: net_send returned %d\n", status);
}
}

This is already an improvement over the state where I could not send using a microkernel.
Sending out a second packet before processing the receipt of the first hasn't worked yet, but I'll leave that investigation for next week.

Regards,
Rohit.

-----Original Message-----
> From: Rohit Grover
> Sent: 26 August 2016 12:07
> To: 'Paul Sokolovsky'
> Cc: devel(a)lists.zephyrproject.org; Jukka Rissanen
> Subject: RE: having trouble getting the echo client to work using TCP/uIP on
> K64F
>
> Paul,
>
> It is useful to work our way through the core BSD socket API in the context of
> Z-uIP.
>
> Let's take connect(). Assuming that a net_context has been setup, connect()
> can be implemented using the snippet:
>
> net_context_tcp_init(net_context, NULL /* bp */,
> NET_TCP_TYPE_CLIENT);
> while ((status = net_context_get_connection_status(unicast)) != 0) {
> fiber_sleep(SOME_SMALL_DURATION);
> }
>
> We're passing `bp' as NULL, but it seems un-necessary for net_context_init()
> to require a bp. This leads us to commit 61edc68c, by Jukka, where many of
> the APIs suddenly acquired an additional 'bp' parameter. In some cases, this
> led to code which I'm having trouble understanding. , I've attached the diff,
> but I'll highlight a snippet. Take the case of tcpip_poll_tcp(), which is
> responsible for sending out the initial SYN packet.
>
> @@ -837,12 +837,18 @@ tcpip_poll_udp(struct uip_udp_conn *conn) /*-----
> ----------------------------------------------------------------------*/
> #if UIP_TCP
> void
> -tcpip_poll_tcp(struct uip_conn *conn)
> +tcpip_poll_tcp(struct uip_conn *conn, struct net_buf *data_buf)
> {
> /* We are sending here the initial SYN */
> struct net_buf *buf = ip_buf_get_tx(conn->appstate.state);
> - uip_set_conn(buf) = conn;
> - conn->buf = ip_buf_ref(buf);
> + uip_set_conn(data_buf) = conn;
> +
> + /* The conn->buf will be freed after we have established the connection,
> + * sent the message and received an ack to it. This will happen in
> + * net_core.c:net_send().
> + */
> + conn->buf = ip_buf_ref(data_buf);
> +
> process_post_synch(&tcpip_process, TCP_POLL, conn, buf); }
>
> Here we see that we acquire a buf to send out a 'SYN' (refer to
> ip_buf_get_tx(...) above), but whereas previously we would associate it with
> the new connection and have its ref-count bumped, we now do those things
> for the 'data_buf' instead. This is a bit confusing. I don't see why the handling
> of the SYN buf needed to be changed.
>
> Rohit.
>
>
> > -----Original Message-----
> > From: Paul Sokolovsky [mailto:paul.sokolovsky(a)linaro.org]
> > Sent: 26 August 2016 00:57
> > To: Rohit Grover
> > Cc: devel(a)lists.zephyrproject.org; Jukka Rissanen
> > Subject: Re: having trouble getting the echo client to work using TCP/uIP
> on
> > K64F
> >
> > On Fri, 26 Aug 2016 01:07:29 +0300
> > Paul Sokolovsky <Paul.Sokolovsky(a)linaro.org> wrote:
> >
> > Some more info: Issues with IPv4/TCP can be seen building echo_server
> with
> > microkernel - making a TCP connection in this case hard locks-up QEMU
> VM,
> > not even pings work. Nothing special in the logging. However with
> > nanokernel, everything works, and send (reply) path doesn't involve
> > net_tx_fiber:
> >
> > Received packet 0x0010a4d0 len 7
> > receive: tcp received 7 bytes data
> > net: tcp_prepare_and_send (0010d27c): Packet output len 47
> > net: net_driver_slip_send (0010d27c): Sending 47 bytes, application data 7
> > bytes
> >
> >
> >
> > > On Mon, 22 Aug 2016 14:58:11 +0000
> > > Rohit Grover <Rohit.Grover(a)arm.com> wrote:
> > >
> > > > With my recent changes, I
> > > > manage to get the first packet bounce back successfully to the
> > > > echo-client. Successive packets aren't getting processed on their
> > > > way out from the client.
> > >
> > > I'm happy to report that I now see this too. Specifically, if using
> > > net_context_get()+net_send() (implicit connection establishment), data
> > > in first net_send() gets delivered (I see it in netcat), but not in
> > > subsequent calls to net_send().
> > >
> > > I'm however interested to emulate standard socket API paradigm, where
> > > you get all the standard and important calls like connect(), listen(),
> > > etc. For this, I explicitly call net_context_tcp_init(ctx, NULL,
> > > NET_TCP_TYPE_CLIENT) to emulate connect(). As netbuf is NULL in this
> > > call, only SYN/ACK handshake happens (with retries, just as you
> > > describe), and then first call
> > > net_send() doesn't send anything on wire. With logging it looks like:
> > >
> > > >>> s.connect(("192.0.2.1", 8083))
> > > resolved: 192.0.2.1
> > > net: net_driver_slip_send (001349e0): Sending 40 bytes, application
> > > data 0 bytes net: net_rx_fiber (001363e0): Received buf 0012c600
> > > ret=0
> > > >>> net: net_rx_fiber (001363e0): Received buf 0012d0f0
> > > net: net_rx_fiber (001363e0): Received buf 0012b598
> > > net: net_driver_slip_send (001363e0): Sending 40 bytes, application
> > > data 0 bytes
> > >
> > > >>> s.send("GET / HTTP/1.0\r\n")
> > > net: net_send (001349e0): context 0012dfa0 buf 00129a38 status 0
> > > net: net_tx_fiber (00135fe0): Sending (buf 00129a38, len 56) to IP
> > > stack 0
> > > >>>
> > >
> > > So, the difference is clear: SYN/ACK stuff goes to
> > > net_driver_slip_send(), but real app data get stuck in net_tx_fiber().
> > >
> > > I apply all your 3 patches, and they don't help.
> > >
> > >
> > > >
> > > > I find that uip_process() isn't getting called on successive
> > > > packets. It gets called for the first packet due to some
> > > > retransmission timer. The recently made change to eventhandler() in
> > > > tcpip.c (https://gerrit.zephyrproject.org/r/#/c/4050/) makes things
> > > > worse because the code meant to refresh certain periodic timers gets
> > > > skipped.
> > > >
> > > > These issues seem to arise from uIP. I would appreciate some help
> > > > from people who ported uIP to zephyr. I believe the problems will
> > > > surface if someone tries a TCP/IPV4 stream.
> > > >
> > > > rohit
> > >
> >
> >
> >
> > --
> > Best Regards,
> > Paul
> >
> > Linaro.org | Open source software for ARM SoCs Follow Linaro:
> > http://www.facebook.com/pages/Linaro
> > http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


Paul Sokolovsky
 

Hello Rohit,

On Fri, 26 Aug 2016 11:06:33 +0000
Rohit Grover <Rohit.Grover(a)arm.com> wrote:

Paul,

It is useful to work our way through the core BSD socket API in the
context of Z-uIP.
I'm glad to know you care about this usecase too, and thanks for
sharing this on list, I hope this will be useful feedback for Z's
network stack designers. I also dumped some additional thoughts
on this matter at https://jira.zephyrproject.org/browse/ZEP-741 .


Let's take connect(). Assuming that a net_context has been setup,
connect() can be implemented using the snippet:

net_context_tcp_init(net_context, NULL /* bp */, NET_TCP_TYPE_CLIENT);
while ((status = net_context_get_connection_status(unicast)) != 0) {
fiber_sleep(SOME_SMALL_DURATION);
}

We're passing `bp' as NULL, but it seems un-necessary for
net_context_init() to require a bp. This leads us to commit 61edc68c,
by Jukka, where many of the APIs suddenly acquired an additional 'bp'
parameter. In some cases, this led to code which I'm having trouble
understanding.
Right, I also had a look at that commit. Well, it says it's related to
IPv6 support, and I'm not much familiar with its support in uIP to
assess those changes, but it indeed looks like TCP/IPv4 support might
have regressed as a result. Going to keep looking into that...

, I've attached the diff, but I'll highlight a
snippet. Take the case of tcpip_poll_tcp(), which is responsible for
sending out the initial SYN packet.

@@ -837,12 +837,18 @@ tcpip_poll_udp(struct uip_udp_conn *conn)
/*---------------------------------------------------------------------------*/
#if UIP_TCP
void
-tcpip_poll_tcp(struct uip_conn *conn)
+tcpip_poll_tcp(struct uip_conn *conn, struct net_buf *data_buf)
{
/* We are sending here the initial SYN */
struct net_buf *buf = ip_buf_get_tx(conn->appstate.state);
- uip_set_conn(buf) = conn;
- conn->buf = ip_buf_ref(buf);
+ uip_set_conn(data_buf) = conn;
+
+ /* The conn->buf will be freed after we have established the
connection,
+ * sent the message and received an ack to it. This will happen in
+ * net_core.c:net_send().
+ */
+ conn->buf = ip_buf_ref(data_buf);
+
process_post_synch(&tcpip_process, TCP_POLL, conn, buf);
}

Here we see that we acquire a buf to send out a 'SYN' (refer to
ip_buf_get_tx(...) above), but whereas previously we would associate
it with the new connection and have its ref-count bumped, we now do
those things for the 'data_buf' instead. This is a bit confusing. I
don't see why the handling of the SYN buf needed to be changed.

Rohit.
[]

--
Best Regards,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog


Paul Sokolovsky
 

Hello Rohit,

On Fri, 26 Aug 2016 14:36:07 +0000
Rohit Grover <Rohit.Grover(a)arm.com> wrote:

Paul,

With tcpip_poll_tcp() reverted to its original version:

void
tcpip_poll_tcp(struct uip_conn *conn, struct net_buf *data_buf)
{
/* We are sending here the initial SYN */
struct net_buf *buf = ip_buf_get_tx(conn->appstate.state);
uip_set_conn(buf) = conn;
conn->buf = ip_buf_ref(buf);

process_post_synch(&tcpip_process, TCP_POLL, conn, buf);
}

I can now connect and send a packet using the following code run
either as a fiber or a task:
[]

This is already an improvement over the state where I could not send
using a microkernel. Sending out a second packet before processing
the receipt of the first hasn't worked yet, but I'll leave that
investigation for next week.
Thanks for the above patch, Rohit. So far, that's the most definitive
patch to resolve many different issues. With it and only it I get
pretty working client-side BSD socket API like communication. In
particular, it appears to supersede need for
https://gerrit.zephyrproject.org/r/#/c/4226 - with just a patch
above, I couldn't reproduce overwriting 4 initial data bytes with MSS
option.

Still, that commit was made in
https://gerrit.zephyrproject.org/r/gitweb?p=zephyr.git;a=commitdiff;h=61edc68c9ecf221d18acc8037d541f5d79eee3c7 ,
with description
"net: tcp: Supporting TCP client functionality when using IPv6". I
don't know if something in IPv6 warrants changes to tcpip_poll_tcp()
done in that commit, I guess only Jukka can tell. But the issue in
https://gerrit.zephyrproject.org/r/#/c/4226 could be explained by it
pretty well too: comment says "We are sending here the initial SYN",
and allocates a packet to host that SYN (and then other options, like
MSS), but tcpip_poll_tcp() from 61edc68c9 instead of this SYN packet
records actual data packet in some structures where a SYN packet is
expected, so the data packet gets patched with MSS, etc.


Note that I don't get 100% perfect networking with the patch above,
so there may be more issues to tackle (or alternatively, I don't
handle all conditions well in my code). But figuring out if the
tcpip_poll_tcp() should be reverted or not is the most productive next
step IMHO.



Regards,
Rohit.

--
Best Regards,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog


Jukka Rissanen
 

Hi Rohit and Paul,

On Thu, 2016-09-01 at 03:52 +0300, Paul Sokolovsky wrote:
Hello Rohit,

On Fri, 26 Aug 2016 14:36:07 +0000
Rohit Grover <Rohit.Grover(a)arm.com> wrote:


Paul,

With tcpip_poll_tcp() reverted to its original version:

void
tcpip_poll_tcp(struct uip_conn *conn, struct net_buf *data_buf)
{
  /* We are sending here the initial SYN */
  struct net_buf *buf = ip_buf_get_tx(conn->appstate.state);
  uip_set_conn(buf) = conn;
  conn->buf = ip_buf_ref(buf);

  process_post_synch(&tcpip_process, TCP_POLL, conn, buf);
}

I can now connect and send a packet using the following code run
either as a fiber or a task:
[]


This is already an improvement over the state where I could not
send
using a microkernel. Sending out a second packet before processing
the receipt of the first hasn't worked yet, but I'll leave that
investigation for next week.
Thanks for the above patch, Rohit. So far, that's the most definitive
patch to resolve many different issues. With it and only it I get
pretty working client-side BSD socket API like communication. In
particular, it appears to supersede need for
https://gerrit.zephyrproject.org/r/#/c/4226 - with just a patch
above, I couldn't reproduce overwriting 4 initial data bytes with MSS
option. 

Still, that commit was made in
https://gerrit.zephyrproject.org/r/gitweb?p=zephyr.git;a=commitdiff;h
=61edc68c9ecf221d18acc8037d541f5d79eee3c7 ,
with description 
"net: tcp: Supporting TCP client functionality when using IPv6". I
don't know if something in IPv6 warrants changes to tcpip_poll_tcp()
done in that commit, I guess only Jukka can tell. But the issue in
Unfortunately I do not remember why I changed tcpip_poll_tcp(), the uIP
code (and especially TCP support) is very cryptic and difficult to
follow. One basically needs to run the stack and see how it behaves.


https://gerrit.zephyrproject.org/r/#/c/4226 could be explained by it
pretty well too: comment says "We are sending here the initial SYN",
and allocates a packet to host that SYN (and then other options, like
MSS), but tcpip_poll_tcp() from 61edc68c9 instead of this SYN packet
records actual data packet in some structures where a SYN packet is
expected, so the data packet gets patched with MSS, etc.


Note that I don't get 100% perfect networking with the patch above,
so there may be more issues to tackle (or alternatively, I don't
handle all conditions well in my code). But figuring out if the
tcpip_poll_tcp() should be reverted or not is the most productive
next
step IMHO.




Regards,
Rohit.
Thanks for your hard efforts with the TCP, it is very much appreciated.


Cheers,
Jukka


Rohit Grover
 

Jukka, Paul,

I believe we should go ahead and fix tcpip_poll_tcp() according to the patch I submitted previously.
This fix can supersede https://gerrit.zephyrproject.org/r/#/c/4226, and we can pause on that particular pull request for now. We should also avoid regressing on any IPv6 functionality, so if there are tests we could use to evaluate IPv6 then we should employ them to validate changes to tcpip_poll_tcp().

Fixing tcpip_poll_tcp() isn't the complete answer. With changes to tcpip_poll_tcp(), I can get simple request-response transactions, but sustained/multiple transactions don't work; so there is still reason to suspect that TCP support is unreliable.

The bigger problem appears to be that the uIP port for Z has grown unmaintainable. I can sympathize with Jukka when he says that TCP support on uIP "is very cryptic and difficult to follow."
It is not clear to me if uIP is tricky to port in general, or that Z poses a special challenge.
Does Zephyr intend to switch from uIP to YAIP? If so, then perhaps one strategy would be to wait for YAIP to catch up.
If Zephyr wishes to maintain support for TCP/uIP, then someone needs to take a fresh look at the current uIP port, possibly from the point of view of a porter of uIP.

Regards,
Rohit.

-----Original Message-----
> From: Jukka Rissanen [mailto:jukka.rissanen(a)linux.intel.com]
> Sent: 01 September 2016 09:11
> To: Paul Sokolovsky; Rohit Grover
> Cc: devel(a)lists.zephyrproject.org
> Subject: Re: having trouble getting the echo client to work using TCP/uIP on
> K64F
>
> Hi Rohit and Paul,
>
> On Thu, 2016-09-01 at 03:52 +0300, Paul Sokolovsky wrote:
> > Hello Rohit,
> >
> > On Fri, 26 Aug 2016 14:36:07 +0000
> > Rohit Grover <Rohit.Grover(a)arm.com> wrote:
> >
> > >
> > > Paul,
> > >
> > > With tcpip_poll_tcp() reverted to its original version:
> > >
> > > void
> > > tcpip_poll_tcp(struct uip_conn *conn, struct net_buf *data_buf) {
> > > /* We are sending here the initial SYN */
> > > struct net_buf *buf = ip_buf_get_tx(conn->appstate.state);
> > > uip_set_conn(buf) = conn;
> > > conn->buf = ip_buf_ref(buf);
> > >
> > > process_post_synch(&tcpip_process, TCP_POLL, conn, buf); }
> > >
> > > I can now connect and send a packet using the following code run
> > > either as a fiber or a task:
> >
> > []
> >
> > >
> > > This is already an improvement over the state where I could not send
> > > using a microkernel. Sending out a second packet before processing
> > > the receipt of the first hasn't worked yet, but I'll leave that
> > > investigation for next week.
> >
> > Thanks for the above patch, Rohit. So far, that's the most definitive
> > patch to resolve many different issues. With it and only it I get
> > pretty working client-side BSD socket API like communication. In
> > particular, it appears to supersede need for
> > https://gerrit.zephyrproject.org/r/#/c/4226 - with just a patch above,
> > I couldn't reproduce overwriting 4 initial data bytes with MSS option.
> >
> > Still, that commit was made in
> > https://gerrit.zephyrproject.org/r/gitweb?p=zephyr.git;a=commitdiff;h
> > =61edc68c9ecf221d18acc8037d541f5d79eee3c7 , with description
> > "net: tcp: Supporting TCP client functionality when using IPv6". I
> > don't know if something in IPv6 warrants changes to tcpip_poll_tcp()
> > done in that commit, I guess only Jukka can tell. But the issue in
>
> Unfortunately I do not remember why I changed tcpip_poll_tcp(), the uIP
> code (and especially TCP support) is very cryptic and difficult to follow. One
> basically needs to run the stack and see how it behaves.
>
>
> > https://gerrit.zephyrproject.org/r/#/c/4226 could be explained by it
> > pretty well too: comment says "We are sending here the initial SYN",
> > and allocates a packet to host that SYN (and then other options, like
> > MSS), but tcpip_poll_tcp() from 61edc68c9 instead of this SYN packet
> > records actual data packet in some structures where a SYN packet is
> > expected, so the data packet gets patched with MSS, etc.
> >
> >
> > Note that I don't get 100% perfect networking with the patch above, so
> > there may be more issues to tackle (or alternatively, I don't handle
> > all conditions well in my code). But figuring out if the
> > tcpip_poll_tcp() should be reverted or not is the most productive next
> > step IMHO.
> >
> >
> > >
> > >
> > > Regards,
> > > Rohit.
> >
> >
>
> Thanks for your hard efforts with the TCP, it is very much appreciated.
>
>
> Cheers,
> Jukka
>
>

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


Paul Sokolovsky
 

Hello,

On Thu, 01 Sep 2016 11:11:15 +0300
Jukka Rissanen <jukka.rissanen(a)linux.intel.com> wrote:

[]

Still, that commit was made in
https://gerrit.zephyrproject.org/r/gitweb?p=zephyr.git;a=commitdiff;h
=61edc68c9ecf221d18acc8037d541f5d79eee3c7 ,
with description 
"net: tcp: Supporting TCP client functionality when using IPv6". I
don't know if something in IPv6 warrants changes to tcpip_poll_tcp()
done in that commit, I guess only Jukka can tell. But the issue in
Unfortunately I do not remember why I changed tcpip_poll_tcp(), the
uIP code (and especially TCP support) is very cryptic and difficult to
follow. One basically needs to run the stack and see how it behaves.
I see. It seems that one of the biggest changes done was to convert
single static packet buffer design on uIP to support multiple buffers.
Ironically, that's what another Contiki-originated IP stack (or at
least IP stack from the same author, Adam Dunkels) - lwIP - does. It
supports multiple in-flight packet buffers, clearer API, better
performance. I wonder if using it might have been better choice for
Zephyr. All the above comes at the expense of doubling (or more) of
code size, so maybe not still.

Anyway, with Z's own new IP stack in progress, that's more of
speculation, I guess focus should be on stabilizing current uIP based
implementation, until the new stack reaches needed functionality level.


--
Best Regards,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog


Paul Sokolovsky
 

On Thu, 1 Sep 2016 14:47:41 +0000
Rohit Grover <Rohit.Grover(a)arm.com> wrote:

Jukka, Paul,

I believe we should go ahead and fix tcpip_poll_tcp() according to
the patch I submitted previously. This fix can supersede
https://gerrit.zephyrproject.org/r/#/c/4226, and we can pause on that
particular pull request for now.
+1. Would you submit it to Gerrit?

We should also avoid regressing on
any IPv6 functionality, so if there are tests we could use to
evaluate IPv6 then we should employ them to validate changes to
tcpip_poll_tcp().
I believe running echo_server/echo_client should be a good first step,
I'll look into that.


Fixing tcpip_poll_tcp() isn't the complete answer. With changes to
tcpip_poll_tcp(), I can get simple request-response transactions, but
sustained/multiple transactions don't work; so there is still reason
to suspect that TCP support is unreliable.
As I mentioned, multiple (well, up to 5-6) packets work for me with
QEMU. I'll try it with FRDM-K64F Ethernet driver soon.

The bigger problem appears to be that the uIP port for Z has grown
unmaintainable. I can sympathize with Jukka when he says that TCP
support on uIP "is very cryptic and difficult to follow." It is not
clear to me if uIP is tricky to port in general, or that Z poses a
special challenge. Does Zephyr intend to switch from uIP to YAIP? If
so, then perhaps one strategy would be to wait for YAIP to catch up.
If Zephyr wishes to maintain support for TCP/uIP, then someone needs
to take a fresh look at the current uIP port, possibly from the point
of view of a porter of uIP.
My understanding is that yes, YAIP is intended to replace IP stack
implementation in Z, though maybe exact timeline isn't known yet. My
current task is to prepare a proof-of-concept demo of a scripting
language running on top of Zephyr, so I'd be interested to follow route
of trying to fix up the current uIP implementation, though probably not
go too deep with that.


Regards,
Rohit.

--
Best Regards,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog


Rohit Grover
 

Paul, Jukka,

> On Thu, 1 Sep 2016 14:47:41 +0000
> Rohit Grover <Rohit.Grover(a)arm.com> wrote:
>
> > Jukka, Paul,
> >
> > I believe we should go ahead and fix tcpip_poll_tcp() according to the
> > patch I submitted previously. This fix can supersede
> > https://gerrit.zephyrproject.org/r/#/c/4226, and we can pause on that
> > particular pull request for now.
>
> +1. Would you submit it to Gerrit?
>

Submitted https://gerrit.zephyrproject.org/r/#/c/4491/

Regards,
Rohit.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.