Random fault exception.
Antoine Zen-Ruffinen
Hi,
I have stared using Zephyr few week back on MIXMRT1050-EVKB and I am enjoying it. However, I am currently facing an issue that I have trouble to sort out. This might be a bug, but I am not sure a this point. From time to time, my code run into hard-fault
exception or an other exception. For instance I have the following report in the console:
***** USAGE FAULT *****
Illegal use of the EPSR
***** Hardware exception *****
Current thread ID = 0x80000f44
Faulting instruction address = 0x4346
Fatal fault in thread 0x80000f44! Aborting.
In this report, the "faulting instruction address" is likely wrong at the application is linked to Flash at 0x60000000 and RAM at 0x80000000. Also, I was not able to find any instruction using the xPSR register in the code area where the fault append.
The strange is that if I modify the code is any way, by adding ad "printk()", adding variable, changing a configuration option or other way (event in code section that are not run before the crash), the error is gone for a while then re-appears suddenly
upon a future code change. Also if I debug my code and do instruction stepping, the error does not occurs.
I have tried to investigate many thing, like comparing disassembly of working and crashing program, debugging to find the faulty instruction, increase of stack size, etc with no luck.
I'm not 100% sure this is related to Zephyr itself, but I an developing on ARM Cortex-M since a while now and I never seen something like this. Worth to mention that I use the CONFIG_CODE_DATA_RELOCATION option.
Anybody has a seen this before ? Any leads that I can follow ? Would be greatly appreciated!
Thanks! Regards,
Antoine
|
|
Andrei Gansari
Hello Antoine,
It looks similar to: https://github.com/zephyrproject-rtos/zephyr/issues/12849
I suspect it’s related to CONFIG_CODE_DATA_RELOCATION; also “by adding ad "printk()", adding variable, changing a configuration” sounds like a caching issue. How does you application look like? Please provide more details. Are you running the following ‘zephyr/samples/application_development/code_relocation/’ ? Are you using a different medium to store code (other than onboard flash/itcm, i.e. sd card)?
When running try -DCONFIG_NO_OPTIMIZATIONS=y to generate O0 code, you may have a call stack to find the source of your crash. (If the issue still reproduces in O0).
Regards, Andrei
From: users@... <users@...>
On Behalf Of Antoine Zen-Ruffinen via Lists.Zephyrproject.Org
Sent: Wednesday, March 27, 2019 1:14 PM To: users@... Cc: users@... Subject: [Zephyr-users] Random fault exception.
Hi,
I have stared using Zephyr few week back on MIXMRT1050-EVKB and I am enjoying it. However, I am currently facing an issue that I have trouble to sort out. This might be a bug, but I am not sure a this point. From time to time, my code run into hard-fault exception or an other exception. For instance I have the following report in the console:
***** USAGE FAULT ***** Illegal use of the EPSR ***** Hardware exception ***** Current thread ID = 0x80000f44 Faulting instruction address = 0x4346 Fatal fault in thread 0x80000f44! Aborting.
In this report, the "faulting instruction address" is likely wrong at the application is linked to Flash at 0x60000000 and RAM at 0x80000000. Also, I was not able to find any instruction using the xPSR register in the code area where the fault append.
The strange is that if I modify the code is any way, by adding ad "printk()", adding variable, changing a configuration option or other way (event in code section that are not run before the crash), the error is gone for a while then re-appears suddenly upon a future code change. Also if I debug my code and do instruction stepping, the error does not occurs.
I have tried to investigate many thing, like comparing disassembly of working and crashing program, debugging to find the faulty instruction, increase of stack size, etc with no luck.
I'm not 100% sure this is related to Zephyr itself, but I an developing on ARM Cortex-M since a while now and I never seen something like this. Worth to mention that I use the CONFIG_CODE_DATA_RELOCATION option.
Anybody has a seen this before ? Any leads that I can follow ? Would be greatly appreciated!
Thanks! Regards,
Antoine
|
|
Antoine Zen-Ruffinen
Hello Andrei,
Thanks for helping!
The code that I am using is "samples/drivers/flash_shell". The reason for this is that I am writing a flash driver for the Hyperflash on the MIMXRT150-EVKB. That's the reason why I use code relocation, as the hyper-flash
device does not support read-while-write, so it must run on RAM while the write is done (This was suggested by another NXP
engineer, Igor, see https://community.nxp.com/thread/486654)
.The driver it-self works when I an not getting the fault exception. I will make a pull-request once the code will be polished. The code is run XIP from the hyperflash, with the
exception of the init, erase and write flash function that are run from RAM, but the issue appends when run from flash. The final goal of having the flash driver is to use mcuboot.
With the mcuboot code, I ran exactly in the same issue (no thread are used in mcuboot).
I have tried to run with CONFIG_NO_OPTIMIZATIONS=y as you suggested. Now the application crashes event before finishing the boot. I also have a new error message:
***** USAGE FAULT *****
Unaligned memory access
***** Hardware exception *****
Current thread ID = 0x8000360c
Faulting instruction address = 0x60014854
Fatal fault in thread 0x8000360c! Aborting.
***** USAGE FAULT *****
Illegal load of EXC_RETURN into PC
***** Hardware exception *****
Current thread ID = 0x80001e5c
Faulting instruction address = 0x2e7f7133
Fatal fault in thread 0x80001e5c! Aborting.
***** BUS FAULT *****
Unstacking error
***** Hardware exception *****
***** HARD FAULT *****
Fault escalation (see below)
***** BUS FAULT *****
Precise data bus error
BFAR Address: 0xa79c2e80
***** Hardware exception *****
Current thread ID = 0x00000000
Faulting instruction address = 0x6000a182
Fatal fault in ISR! Spinning...
At least the faulting instruction address are in the flash range now. The first "usage fault: unaligned memory access" is in "zephyr/lib/libc/minimal/source/string/string.c:310". That's memset() !? And again, if I debug and make instruction stepping
(that take quite some time in memset), no error occures.... What really tick me is that every time i change "something" the behavior is different. If it would be an unaligned memory access, it should append always at the same place, no?
(gdb) bt
#0 _UsageFault (esf=0x80003500 <_main_stack+600>)
at /home/antoine/tests/zephyr-flash/zephyr/arch/arm/core/fault.c:497
#1 0x60009f36 in _FaultHandle (esf=0x80003500 <_main_stack+600>, fault=6)
at /home/antoine/tests/zephyr-flash/zephyr/arch/arm/core/fault.c:680
#2 0x60009f88 in _Fault (esf=0x80003500 <_main_stack+600>, exc_return=170)
at /home/antoine/tests/zephyr-flash/zephyr/arch/arm/core/fault.c:862
#3 0x6000a116 in __usage_fault () at /home/antoine/tests/zephyr-flash/zephyr/arch/arm/core/fault_s.S:147
#4 <signal handler called>
#5 0xaaaaaaaa in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
From: Andrei Gansari <andrei.gansari@...>
Sent: Wednesday, March 27, 2019 2:16 PM To: Antoine Zen-Ruffinen; users@... Subject: RE: Random fault exception. Hello Antoine,
It looks similar to: https://github.com/zephyrproject-rtos/zephyr/issues/12849
I suspect it’s related to CONFIG_CODE_DATA_RELOCATION; also “by adding ad "printk()", adding variable, changing a configuration” sounds like a caching issue. How does you application look like? Please provide more details. Are you running the following ‘zephyr/samples/application_development/code_relocation/’ ? Are you using a different medium to store code (other than onboard flash/itcm, i.e. sd card)?
When running try -DCONFIG_NO_OPTIMIZATIONS=y to generate O0 code, you may have a call stack to find the source of your crash. (If the issue still reproduces in O0).
Regards, Andrei
From: users@... <users@...> On Behalf Of
Antoine Zen-Ruffinen via Lists.Zephyrproject.Org
Hi,
I have stared using Zephyr few week back on MIXMRT1050-EVKB and I am enjoying it. However, I am currently facing an issue that I have trouble to sort out. This might be a bug, but I am not sure a this point. From time to time, my code run into hard-fault exception or an other exception. For instance I have the following report in the console:
***** USAGE FAULT ***** Illegal use of the EPSR ***** Hardware exception ***** Current thread ID = 0x80000f44 Faulting instruction address = 0x4346 Fatal fault in thread 0x80000f44! Aborting.
In this report, the "faulting instruction address" is likely wrong at the application is linked to Flash at 0x60000000 and RAM at 0x80000000. Also, I was not able to find any instruction using the xPSR register in the code area where the fault append.
The strange is that if I modify the code is any way, by adding ad "printk()", adding variable, changing a configuration option or other way (event in code section that are not run before the crash), the error is gone for a while then re-appears suddenly upon a future code change. Also if I debug my code and do instruction stepping, the error does not occurs.
I have tried to investigate many thing, like comparing disassembly of working and crashing program, debugging to find the faulty instruction, increase of stack size, etc with no luck.
I'm not 100% sure this is related to Zephyr itself, but I an developing on ARM Cortex-M since a while now and I never seen something like this. Worth to mention that I use the CONFIG_CODE_DATA_RELOCATION option.
Anybody has a seen this before ? Any leads that I can follow ? Would be greatly appreciated!
Thanks! Regards,
Antoine
|
|
Maureen Helm
Hi Antoine, It could be that you’re getting an interrupt (e.g., systick) while writing to flash, and the interrupt handler is located in flash. It’s not ideal to disable interrupts for such a long time, but you may need to do it to avoid a read-while-write scenario.
I’m looking forward to your pull request! This will be a nice addition.
Maureen
From: users@... [mailto:users@...]
On Behalf Of Antoine Zen-Ruffinen via Lists.Zephyrproject.Org
Sent: Wednesday, March 27, 2019 8:58 AM To: Andrei Gansari <andrei.gansari@...>; users@... Cc: users@... Subject: Re: [Zephyr-users] Random fault exception.
Hello Andrei,
Thanks for helping!
The code that I am using is "samples/drivers/flash_shell". The reason for this is that I am writing a flash driver for the Hyperflash on the MIMXRT150-EVKB. That's the reason why I use code relocation, as the hyper-flash device does not support read-while-write, so it must run on RAM while the write is done (This was suggested by another NXP engineer, Igor, see https://community.nxp.com/thread/486654) .The driver it-self works when I an not getting the fault exception. I will make a pull-request once the code will be polished. The code is run XIP from the hyperflash, with the exception of the init, erase and write flash function that are run from RAM, but the issue appends when run from flash. The final goal of having the flash driver is to use mcuboot. With the mcuboot code, I ran exactly in the same issue (no thread are used in mcuboot).
I have tried to run with CONFIG_NO_OPTIMIZATIONS=y as you suggested. Now the application crashes event before finishing the boot. I also have a new error message:
***** USAGE FAULT ***** Unaligned memory access ***** Hardware exception ***** Current thread ID = 0x8000360c Faulting instruction address = 0x60014854 Fatal fault in thread 0x8000360c! Aborting. ***** USAGE FAULT ***** Illegal load of EXC_RETURN into PC ***** Hardware exception ***** Current thread ID = 0x80001e5c Faulting instruction address = 0x2e7f7133 Fatal fault in thread 0x80001e5c! Aborting. ***** BUS FAULT ***** Unstacking error ***** Hardware exception ***** ***** HARD FAULT ***** Fault escalation (see below) ***** BUS FAULT ***** Precise data bus error BFAR Address: 0xa79c2e80 ***** Hardware exception ***** Current thread ID = 0x00000000 Faulting instruction address = 0x6000a182 Fatal fault in ISR! Spinning...
At least the faulting instruction address are in the flash range now. The first "usage fault: unaligned memory access" is in "zephyr/lib/libc/minimal/source/string/string.c:310". That's memset() !? And again, if I debug and make instruction stepping (that take quite some time in memset), no error occures.... What really tick me is that every time i change "something" the behavior is different. If it would be an unaligned memory access, it should append always at the same place, no?
I have put a breakpoint in _UsageFault(), but still I have not backtrace 😞:
(gdb) bt #0 _UsageFault (esf=0x80003500 <_main_stack+600>) at /home/antoine/tests/zephyr-flash/zephyr/arch/arm/core/fault.c:497 #1 0x60009f36 in _FaultHandle (esf=0x80003500 <_main_stack+600>, fault=6) at /home/antoine/tests/zephyr-flash/zephyr/arch/arm/core/fault.c:680 #2 0x60009f88 in _Fault (esf=0x80003500 <_main_stack+600>, exc_return=170) at /home/antoine/tests/zephyr-flash/zephyr/arch/arm/core/fault.c:862 #3 0x6000a116 in __usage_fault () at /home/antoine/tests/zephyr-flash/zephyr/arch/arm/core/fault_s.S:147 #4 <signal handler called> #5 0xaaaaaaaa in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?)
From: Andrei Gansari <andrei.gansari@...>
Hello Antoine,
It looks similar to: https://github.com/zephyrproject-rtos/zephyr/issues/12849
I suspect it’s related to CONFIG_CODE_DATA_RELOCATION; also “by adding ad "printk()", adding variable, changing a configuration” sounds like a caching issue. How does you application look like? Please provide more details. Are you running the following ‘zephyr/samples/application_development/code_relocation/’ ? Are you using a different medium to store code (other than onboard flash/itcm, i.e. sd card)?
When running try -DCONFIG_NO_OPTIMIZATIONS=y to generate O0 code, you may have a call stack to find the source of your crash. (If the issue still reproduces in O0).
Regards, Andrei
Hi,
I have stared using Zephyr few week back on MIXMRT1050-EVKB and I am enjoying it. However, I am currently facing an issue that I have trouble to sort out. This might be a bug, but I am not sure a this point. From time to time, my code run into hard-fault exception or an other exception. For instance I have the following report in the console:
***** USAGE FAULT ***** Illegal use of the EPSR ***** Hardware exception ***** Current thread ID = 0x80000f44 Faulting instruction address = 0x4346 Fatal fault in thread 0x80000f44! Aborting.
In this report, the "faulting instruction address" is likely wrong at the application is linked to Flash at 0x60000000 and RAM at 0x80000000. Also, I was not able to find any instruction using the xPSR register in the code area where the fault append.
The strange is that if I modify the code is any way, by adding ad "printk()", adding variable, changing a configuration option or other way (event in code section that are not run before the crash), the error is gone for a while then re-appears suddenly upon a future code change. Also if I debug my code and do instruction stepping, the error does not occurs.
I have tried to investigate many thing, like comparing disassembly of working and crashing program, debugging to find the faulty instruction, increase of stack size, etc with no luck.
I'm not 100% sure this is related to Zephyr itself, but I an developing on ARM Cortex-M since a while now and I never seen something like this. Worth to mention that I use the CONFIG_CODE_DATA_RELOCATION option.
Anybody has a seen this before ? Any leads that I can follow ? Would be greatly appreciated!
Thanks! Regards,
Antoine
|
|
Antoine Zen-Ruffinen
Hi Maureen,
Thanks for helping me! I use irq_lock() and irq_unlock() over the write/erase code section to avoid this.
Antoine
From: Maureen Helm <maureen.helm@...>
Sent: Wednesday, March 27, 2019 5:44:52 PM To: Antoine Zen-Ruffinen; Andrei Gansari; users@... Subject: RE: [Zephyr-users] Random fault exception. Hi Antoine, It could be that you’re getting an interrupt (e.g., systick) while writing to flash, and the interrupt handler is located in flash. It’s not ideal to disable interrupts for such a long time, but you may need to do it to avoid a read-while-write scenario.
I’m looking forward to your pull request! This will be a nice addition.
Maureen
From: users@... [mailto:users@...]
On Behalf Of Antoine Zen-Ruffinen via Lists.Zephyrproject.Org
Hello Andrei,
Thanks for helping!
The code that I am using is "samples/drivers/flash_shell". The reason for this is that I am writing a flash driver for the Hyperflash on the MIMXRT150-EVKB. That's the reason why I use code relocation, as the hyper-flash device does not support read-while-write, so it must run on RAM while the write is done (This was suggested by another NXP engineer, Igor, see https://community.nxp.com/thread/486654) .The driver it-self works when I an not getting the fault exception. I will make a pull-request once the code will be polished. The code is run XIP from the hyperflash, with the exception of the init, erase and write flash function that are run from RAM, but the issue appends when run from flash. The final goal of having the flash driver is to use mcuboot. With the mcuboot code, I ran exactly in the same issue (no thread are used in mcuboot).
I have tried to run with CONFIG_NO_OPTIMIZATIONS=y as you suggested. Now the application crashes event before finishing the boot. I also have a new error message:
***** USAGE FAULT ***** Unaligned memory access ***** Hardware exception ***** Current thread ID = 0x8000360c Faulting instruction address = 0x60014854 Fatal fault in thread 0x8000360c! Aborting. ***** USAGE FAULT ***** Illegal load of EXC_RETURN into PC ***** Hardware exception ***** Current thread ID = 0x80001e5c Faulting instruction address = 0x2e7f7133 Fatal fault in thread 0x80001e5c! Aborting. ***** BUS FAULT ***** Unstacking error ***** Hardware exception ***** ***** HARD FAULT ***** Fault escalation (see below) ***** BUS FAULT ***** Precise data bus error BFAR Address: 0xa79c2e80 ***** Hardware exception ***** Current thread ID = 0x00000000 Faulting instruction address = 0x6000a182 Fatal fault in ISR! Spinning...
At least the faulting instruction address are in the flash range now. The first "usage fault: unaligned memory access" is in "zephyr/lib/libc/minimal/source/string/string.c:310". That's memset() !? And again, if I debug and make instruction stepping (that take quite some time in memset), no error occures.... What really tick me is that every time i change "something" the behavior is different. If it would be an unaligned memory access, it should append always at the same place, no?
I have put a breakpoint in _UsageFault(), but still I have not backtrace 😞:
(gdb) bt #0 _UsageFault (esf=0x80003500 <_main_stack+600>) at /home/antoine/tests/zephyr-flash/zephyr/arch/arm/core/fault.c:497 #1 0x60009f36 in _FaultHandle (esf=0x80003500 <_main_stack+600>, fault=6) at /home/antoine/tests/zephyr-flash/zephyr/arch/arm/core/fault.c:680 #2 0x60009f88 in _Fault (esf=0x80003500 <_main_stack+600>, exc_return=170) at /home/antoine/tests/zephyr-flash/zephyr/arch/arm/core/fault.c:862 #3 0x6000a116 in __usage_fault () at /home/antoine/tests/zephyr-flash/zephyr/arch/arm/core/fault_s.S:147 #4 <signal handler called> #5 0xaaaaaaaa in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?)
From: Andrei Gansari <andrei.gansari@...>
Hello Antoine,
It looks similar to: https://github.com/zephyrproject-rtos/zephyr/issues/12849
I suspect it’s related to CONFIG_CODE_DATA_RELOCATION; also “by adding ad "printk()", adding variable, changing a configuration” sounds like a caching issue. How does you application look like? Please provide more details. Are you running the following ‘zephyr/samples/application_development/code_relocation/’ ? Are you using a different medium to store code (other than onboard flash/itcm, i.e. sd card)?
When running try -DCONFIG_NO_OPTIMIZATIONS=y to generate O0 code, you may have a call stack to find the source of your crash. (If the issue still reproduces in O0).
Regards, Andrei
Hi,
I have stared using Zephyr few week back on MIXMRT1050-EVKB and I am enjoying it. However, I am currently facing an issue that I have trouble to sort out. This might be a bug, but I am not sure a this point. From time to time, my code run into hard-fault exception or an other exception. For instance I have the following report in the console:
***** USAGE FAULT ***** Illegal use of the EPSR ***** Hardware exception ***** Current thread ID = 0x80000f44 Faulting instruction address = 0x4346 Fatal fault in thread 0x80000f44! Aborting.
In this report, the "faulting instruction address" is likely wrong at the application is linked to Flash at 0x60000000 and RAM at 0x80000000. Also, I was not able to find any instruction using the xPSR register in the code area where the fault append.
The strange is that if I modify the code is any way, by adding ad "printk()", adding variable, changing a configuration option or other way (event in code section that are not run before the crash), the error is gone for a while then re-appears suddenly upon a future code change. Also if I debug my code and do instruction stepping, the error does not occurs.
I have tried to investigate many thing, like comparing disassembly of working and crashing program, debugging to find the faulty instruction, increase of stack size, etc with no luck.
I'm not 100% sure this is related to Zephyr itself, but I an developing on ARM Cortex-M since a while now and I never seen something like this. Worth to mention that I use the CONFIG_CODE_DATA_RELOCATION option.
Anybody has a seen this before ? Any leads that I can follow ? Would be greatly appreciated!
Thanks! Regards,
Antoine
|
|