Main thread (and stack) isn't the (only) main


Paul Sokolovsky
 

Hello,

This post is probably should be treated as user feedback in
justification of unified kernel *and* getting rid of .mdef files
https://lists.zephyrproject.org/archives/list/devel(a)lists.zephyrproject.org/thread/P3JQWLBJPZ37NDAXRV54DRXMXSEM35CK/#P3JQWLBJPZ37NDAXRV54DRXMXSEM35CK

So, I started a port of a scripting language (MicroPython), initially
low-profile, starting with nanonkernel. Knowing that a scripting
language would require extended stack space, I bumped
CONFIG_MAIN_STACK_SIZE to 4K. Proceeding to add networking
capabilities, I switched to microkernel (as nanokernel demonstrated
some weird behavior). Now I understand that's where problem spot was -
CONFIG_MAIN_STACK_SIZE appear to apply (mostly) to nanokernel. But I
would need to be very proficient in Zephyr to know that beforehand.
Because description of that option is:

config MAIN_STACK_SIZE
int
prompt "Background task stack size (in bytes)"
default 1024
help
This option specifies the size of the stack used by the kernel's
background task, whose entry point is main().

So, no clear warning that it applies only to nanokernel's main() and
not microkernel main().

Going next, I enabled mbedTLS, and that's when I started to get
mysterious crashes - connecting to some sites, but not the others.
Clearly looked like stack related, so I bumped CONFIG_MAIN_STACK_SIZE
to 16K, and as that didn't help, network RX and TX threads. That
didn't help still, I tried to reach for GDB
(https://lists.zephyrproject.org/archives/list/devel(a)lists.zephyrproject.org/thread/TOEAZ6AM7JNIZMR4IUZG5SKQGHTBRDVC/),
that didn't help, so I resorted to printf(), and confirmed that a
stack was smashed when I got a crash. The only mystery was why it
keeps being smashed while I kept increasing it. And reaching to a .map
file, I saw that besides "main_task_stack", which was 16K as I set it,
there's some "__MAIN_stack" with a lowy 4K, and smashing occurs inside
it.

It took just a grep over entire tree to figure out that "MAIN" stack
comes from an .mdef file (which I of course just copy-pasted) and bump
it to have the issue resolved.


So, this is clearly a user error, but the whole matter is pretty
confusing, as I experienced first-hand. There're some things which could
be improved, like wording of nanokernel/MAIN_STACK_SIZE and perhaps var
naming (if it's *default task*, maybe its stack should be
default_task_stack).

But of course, even better would be to switch to unified kernel and get
rid of multiple ways to specify app params (I'm talking about .mdef).
So, my firm +1 on that. And grepping thru the tree, I see that
unified/ has its own Kconfig with MAIN_STACK_SIZE description
updated. Is it ready for testing by "end users"? Does it have .mdef
eliminated?


Thanks,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog

Join devel@lists.zephyrproject.org to automatically receive all group messages.