Re: Fibers Become Unrunnable in Nanokernel


Michael Rosen
 

As far as I can tell, the "timer" is expired and the struct tcs's
for the fibers are not in the runnable list. All other fibers in the
system on ARC seem to be in the runnable list as expected. Also,
from some basic stack analysis, it appears that the unrunnable
fibers are still in the nano_timer_test function. One thing worth
noting is that while most fibers are just doing some math and
storing it in memory; but two of them are accessing a SPI and I2C
device. When these fibers are prevented from accessing the device,
the system seems to run smoothly; otherwise it doesn't. Has anything
like this ever been encountered before?

Note also that moving to Zephyr 1.6 would be significant effort as
we have implemented a number of custom drivers and other features
that would take a significant time to port.
This does not really solve your problem, but Zephyr 1.6 contains a legacy layer that provides all the APIs of the old kernels on top of the new kernel. It's not a NOP to move to 1.6, since you might have some issues with e.g. stack sizes, or some other > > idiosyncrasies, but it might be less painful than you think.

About your issue: the first thing I always suspect with weird behaviour like this is stack smashing. There is a kconfig option for ARC that enables stack overflow/underflow detection. Do you have that option enabled ?
Ben,

Just to update you and the mailing list; I think the issue is one you solved for 1.6. However, its not tracked on JIRA or in the release notes so I didn't realize such a critical bug was not fixed in 1.5. The commit in question is 5986ec040b. As this is a very specific timing bug, we are still validating our code to be 100% sure its fixed, but its looking good so far.

Thanks,
Mike

Join devel@lists.zephyrproject.org to automatically receive all group messages.