Re: Main thread (and stack) isn't the (only) main


Benjamin Walsh <benjamin.walsh@...>
 

Hi Paul,

This post is probably should be treated as user feedback in
justification of unified kernel *and* getting rid of .mdef files
https://lists.zephyrproject.org/archives/list/devel(a)lists.zephyrproject.org/thread/P3JQWLBJPZ37NDAXRV54DRXMXSEM35CK/#P3JQWLBJPZ37NDAXRV54DRXMXSEM35CK

So, I started a port of a scripting language (MicroPython), initially
low-profile, starting with nanonkernel. Knowing that a scripting
language would require extended stack space, I bumped
CONFIG_MAIN_STACK_SIZE to 4K. Proceeding to add networking
capabilities, I switched to microkernel (as nanokernel demonstrated
some weird behavior). Now I understand that's where problem spot was -
CONFIG_MAIN_STACK_SIZE appear to apply (mostly) to nanokernel. But I
would need to be very proficient in Zephyr to know that beforehand.
Because description of that option is:

config MAIN_STACK_SIZE
int
prompt "Background task stack size (in bytes)"
default 1024
help
This option specifies the size of the stack used by the kernel's
background task, whose entry point is main().

So, no clear warning that it applies only to nanokernel's main() and
not microkernel main().
There is a historical reason for this, that is no longer valid: main(),
in the microkernel, used to be the kernel's entry point, and not a
possible task's entry point. The main() function used to do some kernel
initialization, and when done, turn into the idle task. All of this is
still true, except the fact that the function is not main() anymore, but
_main(). :-)

Going next, I enabled mbedTLS, and that's when I started to get
mysterious crashes - connecting to some sites, but not the others.
Clearly looked like stack related, so I bumped CONFIG_MAIN_STACK_SIZE
to 16K, and as that didn't help, network RX and TX threads. That
didn't help still, I tried to reach for GDB
(https://lists.zephyrproject.org/archives/list/devel(a)lists.zephyrproject.org/thread/TOEAZ6AM7JNIZMR4IUZG5SKQGHTBRDVC/),
that didn't help, so I resorted to printf(), and confirmed that a
stack was smashed when I got a crash. The only mystery was why it
keeps being smashed while I kept increasing it. And reaching to a .map
What you ended doing was bumping the stack of the init/idle task.

file, I saw that besides "main_task_stack", which was 16K as I set it,
there's some "__MAIN_stack" with a lowy 4K, and smashing occurs inside
it.

It took just a grep over entire tree to figure out that "MAIN" stack
comes from an .mdef file (which I of course just copy-pasted) and bump
it to have the issue resolved.


So, this is clearly a user error, but the whole matter is pretty
confusing, as I experienced first-hand. There're some things which could
be improved, like wording of nanokernel/MAIN_STACK_SIZE and perhaps var
naming (if it's *default task*, maybe its stack should be
default_task_stack).
The __MAIN_stack symbol is generated based on the name of the task in
the mdef file. You don't have to have a task with a main() entry point.
There is nothing special about main() in the microkernel. In fact, as
noted before, that symbol used to be reserved for the kernel. There is
no default task in the microkernel.

But of course, even better would be to switch to unified kernel and get
rid of multiple ways to specify app params (I'm talking about .mdef).
Yup. MDEF files are only kept in the unified kernel for legacy reasosn,
for one or two releases. We will not publicize them and should in fact
have a note against using them.

Basically, all the issues you faced here are taken care of in the
unified kernel.

So, my firm +1 on that. And grepping thru the tree, I see that
unified/ has its own Kconfig with MAIN_STACK_SIZE description
updated. Is it ready for testing by "end users"? Does it have .mdef
eliminated?
See above.

API-wise, I think we are pretty much on par with the microkernel, with a
couple of obscure ones missing and/or in review.

Regards,
Ben

Join devel@lists.zephyrproject.org to automatically receive all group messages.