Re: Situation with net APIs testing


Jukka Rissanen
 

Hi Paul,

On Wed, 2017-06-14 at 18:10 +0300, Paul Sokolovsky wrote:
Hello,

As I'm approaching final steps in preparing BSD Sockets patchset for
submission, I'm looking into a way to add some tests for it. Testing
of
networking functionality is by default hard, because in general,
networking hardware would be required for that, even if "virtual",
like
tunslip6, etc. tools from net-tools repo running, to support QEMU
networking. During prototyping work, I learnt there're loopback
capabilities when binding and connecting to the same netif, but it
still requires net-tools running just to get QEMU start up with
networking support.

Well, I took a look at tests/net and saw the whole bunch of tests,
whoa! I gave a try to tests/net/tcp , some cases passed, some failed,
hmm. But then I killed net-tools/loop-slip-tab.sh script and the test
ran in the same manner. Whoa, so we have means to run networking
without any requirements on the host side, which means we can run
them
as part of sanitycheck testsuite! But, 8 tests of tests/net/ have
build_only=true, any wonder they're broken?
When we had gerrit and jenkins, some of the net tests run slightly
longer that what was desired, so they were marked as build only. Now
that situation is different with github and shippable, we can change
this. So I will prepare a patch that activates those tests that can be
activated.

I looked through tests/net what is current status of the tests:

ieee802154/crypto
* This cannot be run on qemu as it requires suitable hw

tcp
* Test does not pass, needs fixing

mld
* Test does not pass, needs fixing

ipv6
* Test does not pass, needs fixing

lib/mqtt_publisher
* Test requires real qemu to run. This needs to be converted 

lib/mqtt_subscriber
* Test requires real qemu to run. This needs to be converted 

buf
* This test runs ok so build_only=true can be removed.

all
* This is intentional compile test that activates all network config
options and tries to compile the binary. The result binary cannot be
run mostly because of memory requirements and no suitable test
environment. The only issue with this test is that we should remember
to add and enable new net config options into this test case.

All other tests programs (24 pieces), that consists of quite many
individual tests, are run automatically by CI, so the situation is not
so bleak as you indicated here.

I will fix the relevant failing tests as they have bit rotted after the
tests were written. Converting two mqtt tests to not use real qemu
requires a bit more work.



Anyway, I looked at what's involved in net-tools free running, and
figured it's CONFIG_NET_L2_DUMMY. Added it to my sockets test, and
got
only segfault in return. After debugging it, turned out it's the same
issue as already faced by me and other folks: if there're no netifs
defined, networking code is going to crash (instead of printing clear
error to the user): https://jira.zephyrproject.org/browse/ZEP-2105

But how the tests/net/ run then and don't crash? Here's the answer:

zephyr/tests/net$ grep NET_DEVICE -r * | wc
     22      42    1532

So, almost each and every test defines its own test interface.

One would think that if we have 22 repetitive implementations of test
interfaces, whose main purpose is to be just loopback interfaces,
No, the interface is not a loopback interface although it might look
like that. The purpose of the interface that is created in each of the
test is to simulate a real network so that we do not have to connect to
outside world but the test is self contained. So it kind of looks like
a loopback interface but in this case the source and destination IP
addresses are not the same (as would be the case with loopback
interface), as typically we want to test some real behavior of the
system so src/dest addresses should differ.

The loopback support has limited use cases actually and we probably
need to make that optional (behind Kconfig option) in the code as
normally there should be no need to send anything back to itself in the
real world.

then
we'd have a loopback interface in the main codebase. But nope, as
confirmed by Jukka on IRC, we don't.

Summary:

1. Writing networking tests is hard, but it Zephyr, it takes
extraordinary, agonizing effort. The most annoying is that all needed
pieces are there, but instead of presenting a nice picture, they form
a mess which greets you with crashes if you try to change anything.
I am not sure what kind of mess you mean here but patches are welcome
as always to rectify this.


2. There're existing big (~20K each) test which fail. Apparently,
because they aren't run, so bitrot. Why do we need these huge,
detailed
tests if we don't run them? (An alternative explanation is that
Some explanation given above.

there's
something wrong with my system, and yep, I'd be glad to know what I'm
still don't do right with Zephyr after working on it for a year.)
Hmm, I missed the point of your last sentence.



I'd be glad if more experienced developers could confirm if it's
really
like the above, or I miss something. And I'll be happy to work on the
above issues, but in the meantime, I'll need to submit BSD Sockets
with
rather bare and hard to run (not automated) tests due to the
situation
above.
Cheers,
Jukka

Join devel@lists.zephyrproject.org to automatically receive all group messages.