Exception debugging with qemu_x86/gdb


Paul Sokolovsky
 

Hello,

I have a crash ("CPU exception 13") somewhere in networking code. My
next step would be to run the app (BOARD=qemu_x86) under GDB, wait for
crash, type "backtrace". I follow
https://www.zephyrproject.org/doc/reference/kbuild/kbuild_project.html#application-debugging
, but when exception occurs, I don't end up in GDB, Zephyr's own
exception handler keeps running, e.g.:

***** CPU exception 13
***** Exception code: 0x00004074
Current thread ID = 0x00177f60
Faulting segment:address = 0x00000008:0x001782da
eax: 0x0000ff0e, ebx: 0x00178350, ecx: 0x00177f60, edx: 0x00177f60
esi: 0x00000000, edi: 0x00178400, ebp: 000169398, esp: 0x0017830c
eflags: 0x00004046
Fatal essential fiber error! Spinning...

I tried to look for Kconfig options, but the only relevant I found was
CONFIG_EXCEPTION_DEBUG, setting it to "n" from default "y" doesn't
help. Well, another option is CONFIG_GDB_SERVER, but that embeds
actual GDB debug stub into the *application*. But we use QEMU's debug
stub on the meta-level, so CONFIG_GDB_SERVER shouldn't be needed (and
enabling it just garbles console, as it tries to communicate via
serial).

So, I would naively think that QEMU's GDB stub would override any
relevant guest exception handling, but that apparently not happen. What
am I missing? I tried to look for other related options to QEMU
(besides -s -S), but don't see nothing relevant. The only doc I found
is http://wiki.qemu.org/Documentation/Debugging which is pretty short
at best.


Thanks,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog


Boie, Andrew P
 

On Sat, 2016-09-17 at 23:53 +0300, Paul Sokolovsky wrote:
I would naively think that QEMU's GDB stub would override any
relevant guest exception handling, but that apparently not happen.

I'm not sure how we could get QEMU's stubs in Zephyr's IDT to be honest.

It may be the case that we could do some work on the Zephyr x86
exception handling stubs to be more GDB friendly. I don't know off the
top of my head what it would take to get 'backtrace' to work the way
you describe. We might just need to massage the stack a bit and issue a
debugger 'break' in the error handler.

Currently, for x86 exceptions with CONFIG_EXCEPTION_DEBUG turned on we install a bunch of handler stubs, you can see the code in arch/x86/core/fatal.c. That is what prints out the message. It then calls _SysFatalErrorHandler which either aborts the thread or spins forever.

Faulting segment:address = 0x00000008:0x001782da
This is where your code is generating an exception. I'd set a
breakpoint there.


Andrew


Paul Sokolovsky
 

Hello Andrew,

On Mon, 19 Sep 2016 16:14:38 +0000
"Boie, Andrew P" <andrew.p.boie(a)intel.com> wrote:

On Sat, 2016-09-17 at 23:53 +0300, Paul Sokolovsky wrote:
I would naively think that QEMU's GDB stub would override any
relevant guest exception handling, but that apparently not happen.
I'm not sure how we could get QEMU's stubs in Zephyr's IDT to be
honest.
Thanks for the reply. To clarify, my thinking was that QEMU *might*
work in the following manner: if it's run in GDB stub mode, when it
during emulation detected a typical CPU exception (access violation,
invalid opcode), then instead of emulation calling a particular
exception vector as set in a guest, it could just halt guest, and make
its own GDB stub to communicate required state to GDB. As I mentioned,
I had no idea if it works like that or not, so wondered what other
people experienced.

So, I assume there's no special support in QEMU for that, and any
follow up should be brought on QEMU mailing list.

It may be the case that we could do some work on the Zephyr x86
exception handling stubs to be more GDB friendly. I don't know off the
top of my head what it would take to get 'backtrace' to work the way
you describe. We might just need to massage the stack a bit and issue
a debugger 'break' in the error handler.
Sound like good idea, I hope to try it next time I face similar
debugging needs.

Currently, for x86 exceptions with CONFIG_EXCEPTION_DEBUG turned on
we install a bunch of handler stubs, you can see the code in
arch/x86/core/fatal.c. That is what prints out the message. It then
calls _SysFatalErrorHandler which either aborts the thread or spins
forever.

Faulting segment:address = 0x00000008:0x001782da
This is where your code is generating an exception. I'd set a
breakpoint there.
Thanks, yes, I looked at those places. Fairly speaking, the issue
looked like a stack smashing from the beginning, and trying to debug it
with GDB by trying to catch it with break-points and stepping didn't
give any result. So, I had to resort to printf debugging, which rarely
fails someone (but takes enormous of time). It indeed turned out to be
stack smashing, but not somewhere I could expect (because I already
increased stack sizes of threads I knew about). I'll send detail in a
separate email.



Andrew

Thanks,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog