CPU hang in NMI handler at ISR exits.


Antoine Zen-Ruffinen
 

Greeting to all Zephyr users & developers!


I'm facing an issue I have trouble to debug. We are using Zephyr V2.1 (currently migrating to V2.2, but issue seems to stick) on NXP's i.MXRT 1062 wich is an ARM Cortext-M7 (ARMv7-M architecture). The first symptom I have is having the system that freeze.


Looking at where the CPU is using "west attach", we always end in the NMI ISR "z_SysNmiOnReset" which contains a short-loop. My investigations so far:


 - Reading the NMIPENDSET bit of the Interrupt Control and State Register "ICSR" show that the NMI interrupt is not pending.

 - Reading the "ISR_NUMBER" of the IPSR register show me that that the active vector is not NMI but some peripheral interrupts (ENET to be precise).

 - Following the program flow with GDB show that after the IRQ, the CPU jump to the "_isr_wrapper()" function, then execute the driver's handler code trough de-referencing the "_sw_isr_table" . Then if I try to step into "z_arm_int_ext", GDB get stuck. Hitting "CTRL+C" shows that we are in "z_SysNmiOnReset".

 - Using a break-point inside" z_SysNmiOnReset" show normal flows unit the "bx lr", where same as above happens.


I was suspecting a Stack overflow on the interrupt stack. Raising CONFIG_ISR_STACK_SIZE seems to increase the time before the issue arise. Currently the issue happens on the 4 to 8th ISR call, depending on the KConfig configuration,  but with the same setting always on the same call.  Enabling CONFIG_STACK_SENTINEL does not change the behavior nor gives better debug informations.


Does someone have seen that before ? Any suggestion where to look at ? I'm currently out of ideas.


Antoine Zen-Ruffinen

Riedo Networks Ltd
Route de la Fonderie 6, 1700 Fribourg, Switzerland
Tel: +41 26 505 50 03, Fax: +41 26 505 50 01 www.riedonetworks.com