Bluetooth mesh sample (mesh) - Hard Fault


laczenJMS
 

Hi,

Running the bluetooth mesh example (samples/bluetooth/mesh/) on a
nrf51822 results in a Hard Fault. The provisioner is meshctl (bluez).
After successful provisioning I disconnect from the mesh, when
connecting again the hard fault appears:

Kernel stacks:
main (real size 512): unused 228 usage 284 / 512 (55 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1656 usage 392 / 2048 (19 %)
workqueue (real size 1024): unused 676 usage 348 / 1024 (33 %)
prio recv thread stack (real size 448): unused 144 usage 304 / 448 (67 %)
recv thread stack (real size 1396): unused 32 usage 1364 / 1396 (97 %)
***** HARD FAULT *****
Executing thread ID (thread): 0x20001f04
Faulting instruction address: 0xf63c
Fatal fault in ISR! Spinning...

If needed I can provide more info/logging...

Kind regards,

Jehudi


Johan Hedberg
 

Hi Jehudi,

On Wed, Sep 06, 2017, Laczen JMS wrote:
Running the bluetooth mesh example (samples/bluetooth/mesh/) on a
nrf51822 results in a Hard Fault. The provisioner is meshctl (bluez).
After successful provisioning I disconnect from the mesh, when
connecting again the hard fault appears:

Kernel stacks:
main (real size 512): unused 228 usage 284 / 512 (55 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1656 usage 392 / 2048 (19 %)
workqueue (real size 1024): unused 676 usage 348 / 1024 (33 %)
prio recv thread stack (real size 448): unused 144 usage 304 / 448 (67 %)
recv thread stack (real size 1396): unused 32 usage 1364 / 1396 (97 %)
***** HARD FAULT *****
Executing thread ID (thread): 0x20001f04
Faulting instruction address: 0xf63c
Fatal fault in ISR! Spinning...

If needed I can provide more info/logging...
I'd bet the recv thread stack overflowed. It's already at 97% with only
32 unused bytes based on the above log. Try increasing its size to
something bigger. The Kconfig variable is called CONFIG_BT_RX_STACK_SIZE.

Johan


laczenJMS
 

Hi Johan,

I increased the stack size to CONFIG_BT_RX_STACK_SIZE=2048 (I don't
understand why the real stack size is reported to be 2348), anyhow it
gives a new hard fault (but now with assert: '0' failed):

Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19 %)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18 %)
prio recv thread stack (real size 748): unused 440 usage 308 / 748 (41 %)
recv thread stack (real size 2348): unused 308 usage 2040 / 2348 (86 %)
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc
Faulting instruction address: 0x12d50
Fatal fault in ISR! Spinning...

Kind regards,

Jehudi

2017-09-06 11:35 GMT+02:00 Johan Hedberg <johan.hedberg@...>:

Hi Jehudi,

On Wed, Sep 06, 2017, Laczen JMS wrote:
Running the bluetooth mesh example (samples/bluetooth/mesh/) on a
nrf51822 results in a Hard Fault. The provisioner is meshctl (bluez).
After successful provisioning I disconnect from the mesh, when
connecting again the hard fault appears:

Kernel stacks:
main (real size 512): unused 228 usage 284 / 512 (55 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1656 usage 392 / 2048 (19 %)
workqueue (real size 1024): unused 676 usage 348 / 1024 (33 %)
prio recv thread stack (real size 448): unused 144 usage 304 / 448 (67 %)
recv thread stack (real size 1396): unused 32 usage 1364 / 1396 (97 %)
***** HARD FAULT *****
Executing thread ID (thread): 0x20001f04
Faulting instruction address: 0xf63c
Fatal fault in ISR! Spinning...

If needed I can provide more info/logging...
I'd bet the recv thread stack overflowed. It's already at 97% with only
32 unused bytes based on the above log. Try increasing its size to
something bigger. The Kconfig variable is called CONFIG_BT_RX_STACK_SIZE.

Johan


Carles Cufi
 

Hi Jehudi,

-----Original Message-----
From: zephyr-devel-bounces@... [mailto:zephyr-devel-
bounces@...] On Behalf Of Laczen JMS
Sent: 06 September 2017 15:20
To: Johan Hedberg <johan.hedberg@...>
Cc: zephyr-devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Johan,

I increased the stack size to CONFIG_BT_RX_STACK_SIZE=2048 (I don't
understand why the real stack size is reported to be 2348), anyhow it
gives a new hard fault (but now with assert: '0' failed):

Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19 %)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18 %)
prio recv thread stack (real size 748): unused 440 usage 308 / 748
(41 %)
recv thread stack (real size 2348): unused 308 usage 2040 /
2348 (86 %)
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc
Faulting instruction address: 0x12d50 Fatal fault in ISR! Spinning...
This is a BLE Controller assert that hit. Can you give us a couple of additional tidbits of info to help us diagnose?

* What exact Zephyr version are you running? (if master please give us the commit SHA)
* What board are you using? Is this a combined (Host + Controller single-chip) build or are you using 2 chips?
* What configuration are you using? (your .conf file)

Thanks,

Carles


laczenJMS
 

Hi Carles,


2017-09-06 15:26 GMT+02:00 Cufi, Carles <Carles.Cufi@...>:
Hi Jehudi,

-----Original Message-----
From: zephyr-devel-bounces@... [mailto:zephyr-devel-
bounces@...] On Behalf Of Laczen JMS
Sent: 06 September 2017 15:20
To: Johan Hedberg <johan.hedberg@...>
Cc: zephyr-devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Johan,

I increased the stack size to CONFIG_BT_RX_STACK_SIZE=2048 (I don't
understand why the real stack size is reported to be 2348), anyhow it
gives a new hard fault (but now with assert: '0' failed):

Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19 %)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18 %)
prio recv thread stack (real size 748): unused 440 usage 308 / 748
(41 %)
recv thread stack (real size 2348): unused 308 usage 2040 /
2348 (86 %)
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc
Faulting instruction address: 0x12d50 Fatal fault in ISR! Spinning...
This is a BLE Controller assert that hit. Can you give us a couple of additional tidbits of info to help us diagnose?

* What exact Zephyr version are you running? (if master please give us the commit SHA)
I am using the latest zephyr version (pulled yesterday), commit
d3862d7b39349b079b0d125de0c27e54a41b8aa4

* What board are you using? Is this a combined (Host + Controller single-chip) build or are you using 2 chips?
I am using a andtcl board with nrf51822 256kb flash and 32kb ram

* What configuration are you using? (your .conf file)
Included are the conf file from the mesh source directory, as well as
the .config files from the output directory


Thanks,

Carles
Kind regards,

Jehudi


Carles Cufi
 

Hi Jehudi,

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: 06 September 2017 15:46
To: Cufi, Carles <Carles.Cufi@...>
Cc: Johan Hedberg <johan.hedberg@...>; zephyr-
devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Carles,


2017-09-06 15:26 GMT+02:00 Cufi, Carles <Carles.Cufi@...>:
Hi Jehudi,

-----Original Message-----
From: zephyr-devel-bounces@...
[mailto:zephyr-devel- bounces@...] On Behalf Of
Laczen JMS
Sent: 06 September 2017 15:20
To: Johan Hedberg <johan.hedberg@...>
Cc: zephyr-devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Johan,

I increased the stack size to CONFIG_BT_RX_STACK_SIZE=2048 (I don't
understand why the real stack size is reported to be 2348), anyhow it
gives a new hard fault (but now with assert: '0' failed):

Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57
%)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19
%)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18
%)
prio recv thread stack (real size 748): unused 440 usage 308 /
748
(41 %)
recv thread stack (real size 2348): unused 308 usage 2040 /
2348 (86 %)
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc
Faulting instruction address: 0x12d50 Fatal fault in ISR!
Spinning...

This is a BLE Controller assert that hit. Can you give us a couple of
additional tidbits of info to help us diagnose?

* What exact Zephyr version are you running? (if master please give us
the commit SHA)
I am using the latest zephyr version (pulled yesterday), commit
d3862d7b39349b079b0d125de0c27e54a41b8aa4
So that's the latest, excludes some fixes that came in some days ago. Thanks.


* What board are you using? Is this a combined (Host + Controller
single-chip) build or are you using 2 chips?

I am using a andtcl board with nrf51822 256kb flash and 32kb ram
Not sure what board that is, can you send a link?


* What configuration are you using? (your .conf file)
Included are the conf file from the mesh source directory, as well as
the .config files from the output directory
Got your config files, so this is a combined build with Host + Controller on a single chip. What would be interesting to know in this case is more info about the peer devices you are interacting with. Can you give us a clue of what dongles/chips/devices are you connecting to, what brand they are and what version? Also the number of simultaneous connections you have at any one time, and the general description of the setup you are running in terms of chips and stacks.

Thanks,

Carles


laczenJMS
 

Hi Carles,

2017-09-06 16:30 GMT+02:00 Cufi, Carles <Carles.Cufi@...>:
Hi Jehudi,

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: 06 September 2017 15:46
To: Cufi, Carles <Carles.Cufi@...>
Cc: Johan Hedberg <johan.hedberg@...>; zephyr-
devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Carles,


2017-09-06 15:26 GMT+02:00 Cufi, Carles <Carles.Cufi@...>:
Hi Jehudi,

-----Original Message-----
From: zephyr-devel-bounces@...
[mailto:zephyr-devel- bounces@...] On Behalf Of
Laczen JMS
Sent: 06 September 2017 15:20
To: Johan Hedberg <johan.hedberg@...>
Cc: zephyr-devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Johan,

I increased the stack size to CONFIG_BT_RX_STACK_SIZE=2048 (I don't
understand why the real stack size is reported to be 2348), anyhow it
gives a new hard fault (but now with assert: '0' failed):

Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57
%)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19
%)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18
%)
prio recv thread stack (real size 748): unused 440 usage 308 /
748
(41 %)
recv thread stack (real size 2348): unused 308 usage 2040 /
2348 (86 %)
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc
Faulting instruction address: 0x12d50 Fatal fault in ISR!
Spinning...

This is a BLE Controller assert that hit. Can you give us a couple of
additional tidbits of info to help us diagnose?

* What exact Zephyr version are you running? (if master please give us
the commit SHA)
I am using the latest zephyr version (pulled yesterday), commit
d3862d7b39349b079b0d125de0c27e54a41b8aa4
So that's the latest, excludes some fixes that came in some days ago. Thanks.


* What board are you using? Is this a combined (Host + Controller
single-chip) build or are you using 2 chips?

I am using a andtcl board with nrf51822 256kb flash and 32kb ram
Not sure what board that is, can you send a link?
It is a module from aliexpress:
https://nl.aliexpress.com/store/product/1pc-Wireless-Module-32K-RAM-256K-FLASH-NRF51822-Nordic-51822-core-with-PCB-antenna-intergrated-NRF51822/605000_32801747596.html


* What configuration are you using? (your .conf file)
Included are the conf file from the mesh source directory, as well as
the .config files from the output directory
Got your config files, so this is a combined build with Host + Controller on a single chip. What would be interesting to know in this case is more info about the peer devices you are interacting with. Can you give us a clue of what dongles/chips/devices are you connecting to, what brand they are and what version? Also the number of simultaneous connections you have at any one time, and the general description of the setup you are running in terms of chips and stacks.
I am running meshctl from bluez (development version) as a
provisioner, there are no other devices connected. The dongle I am
using with meshctl is a Trust BT dongle (dmesg reports Product:
CSR8510 A10). I have also enabled debugging in zephyr
(CONFIG_BT_DEBUG_HCI_DRIVER=y) included is the logging

Thanks,

Carles
Kind regards,

Jehudi


Carles Cufi
 

Hi Jehudi,

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: 06 September 2017 16:58
To: Cufi, Carles <Carles.Cufi@...>
Cc: Johan Hedberg <johan.hedberg@...>; zephyr-
devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Carles,

2017-09-06 16:30 GMT+02:00 Cufi, Carles <Carles.Cufi@...>:
Hi Jehudi,

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: 06 September 2017 15:46
To: Cufi, Carles <Carles.Cufi@...>
Cc: Johan Hedberg <johan.hedberg@...>; zephyr-
devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Carles,


2017-09-06 15:26 GMT+02:00 Cufi, Carles <Carles.Cufi@...>:
Hi Jehudi,

-----Original Message-----
From: zephyr-devel-bounces@...
[mailto:zephyr-devel- bounces@...] On Behalf
Of Laczen JMS
Sent: 06 September 2017 15:20
To: Johan Hedberg <johan.hedberg@...>
Cc: zephyr-devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard
Fault

Hi Johan,

I increased the stack size to CONFIG_BT_RX_STACK_SIZE=2048 (I
don't understand why the real stack size is reported to be 2348),
anyhow it gives a new hard fault (but now with assert: '0'
failed):

Kernel stacks:
main (real size 512): unused 220 usage 292 / 512
(57
%)
idle (real size 256): unused 200 usage 56 / 256 (21
%)
interrupt (real size 2048): unused 1640 usage 408 / 2048
(19
%)
workqueue (real size 2048): unused 1668 usage 380 / 2048
(18
%)
prio recv thread stack (real size 748): unused 440 usage 308
/
748
(41 %)
recv thread stack (real size 2348): unused 308 usage 2040
/
2348 (86 %)
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc
Faulting instruction address: 0x12d50 Fatal fault in ISR!
Spinning...

This is a BLE Controller assert that hit. Can you give us a couple
of
additional tidbits of info to help us diagnose?

* What exact Zephyr version are you running? (if master please give
us the commit SHA)
I am using the latest zephyr version (pulled yesterday), commit
d3862d7b39349b079b0d125de0c27e54a41b8aa4
So that's the latest, excludes some fixes that came in some days ago.
Thanks.


* What board are you using? Is this a combined (Host + Controller
single-chip) build or are you using 2 chips?

I am using a andtcl board with nrf51822 256kb flash and 32kb ram
Not sure what board that is, can you send a link?
It is a module from aliexpress:
https://nl.aliexpress.com/store/product/1pc-Wireless-Module-32K-RAM-
256K-FLASH-NRF51822-Nordic-51822-core-with-PCB-antenna-intergrated-
NRF51822/605000_32801747596.html


* What configuration are you using? (your .conf file)
Included are the conf file from the mesh source directory, as well as
the .config files from the output directory
Got your config files, so this is a combined build with Host +
Controller on a single chip. What would be interesting to know in this
case is more info about the peer devices you are interacting with. Can
you give us a clue of what dongles/chips/devices are you connecting to,
what brand they are and what version? Also the number of simultaneous
connections you have at any one time, and the general description of the
setup you are running in terms of chips and stacks.
I am running meshctl from bluez (development version) as a provisioner,
there are no other devices connected. The dongle I am using with meshctl
is a Trust BT dongle (dmesg reports Product:
CSR8510 A10). I have also enabled debugging in zephyr
(CONFIG_BT_DEBUG_HCI_DRIVER=y) included is the logging
Thanks for all the info.

Before we investigate whether the CSR8510 is doing something it should not, can we double-check additional stack sizes?

Can you set:
CONFIG_BT_HCI_TX_STACK_SIZE to 1024
and
CONFIG_BT_RX_STACK_SIZE to as much as you can, 4096 ideally

I see that your receive thread stack (CONFIG_BT_RX_STACK_SIZE) is getting near the 2048 limit, which is huge. Are you doing a lot of complex operations in BLE callbacks?

Thanks,

Carles


Chettimada, Vinayak Kariappa
 

Hi Jehudi,

To further debug the conditions around the assert, please apply the attached diff.
And send me which assert is hit.

Regards,
Vinayak

On 6 Sep 2017, at 17:05, Cufi, Carles <Carles.Cufi@...> wrote:

Hi Jehudi,

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: 06 September 2017 16:58
To: Cufi, Carles <Carles.Cufi@...>
Cc: Johan Hedberg <johan.hedberg@...>; zephyr-
devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Carles,

2017-09-06 16:30 GMT+02:00 Cufi, Carles <Carles.Cufi@...>:
Hi Jehudi,

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: 06 September 2017 15:46
To: Cufi, Carles <Carles.Cufi@...>
Cc: Johan Hedberg <johan.hedberg@...>; zephyr-
devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Carles,


2017-09-06 15:26 GMT+02:00 Cufi, Carles <Carles.Cufi@...>:
Hi Jehudi,

-----Original Message-----
From: zephyr-devel-bounces@...
[mailto:zephyr-devel- bounces@...] On Behalf
Of Laczen JMS
Sent: 06 September 2017 15:20
To: Johan Hedberg <johan.hedberg@...>
Cc: zephyr-devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard
Fault

Hi Johan,

I increased the stack size to CONFIG_BT_RX_STACK_SIZE=2048 (I
don't understand why the real stack size is reported to be 2348),
anyhow it gives a new hard fault (but now with assert: '0'
failed):

Kernel stacks:
main      (real size 512):      unused 220      usage 292 / 512
(57
%)
idle      (real size 256):      unused 200      usage 56 / 256 (21
%)
interrupt (real size 2048):     unused 1640     usage 408 / 2048
(19
%)
workqueue (real size 2048):     unused 1668     usage 380 / 2048
(18
%)
prio recv thread stack (real size 748): unused 440      usage 308
/
748
(41 %)
recv thread stack (real size 2348):     unused 308      usage 2040
/
2348 (86 %)
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
 Executing thread ID (thread): 0x200023dc
 Faulting instruction address:  0x12d50 Fatal fault in ISR!
Spinning...

This is a BLE Controller assert that hit. Can you give us a couple
of
additional tidbits of info to help us diagnose?

* What exact Zephyr version are you running? (if master please give
us the commit SHA)

I am using the latest zephyr version (pulled yesterday), commit
d3862d7b39349b079b0d125de0c27e54a41b8aa4

So that's the latest, excludes some fixes that came in some days ago.
Thanks.


* What board are you using? Is this a combined (Host + Controller
single-chip) build or are you using 2 chips?

I am using a andtcl board with nrf51822 256kb flash and 32kb ram

Not sure what board that is, can you send a link?

It is a module from aliexpress:
https://nl.aliexpress.com/store/product/1pc-Wireless-Module-32K-RAM-
256K-FLASH-NRF51822-Nordic-51822-core-with-PCB-antenna-intergrated-
NRF51822/605000_32801747596.html


* What configuration are you using? (your .conf file)

Included are the conf file from the mesh source directory, as well as
the .config files from the output directory


Got your config files, so this is a combined build with Host +
Controller on a single chip. What would be interesting to know in this
case is more info about the peer devices you are interacting with. Can
you give us a clue of what dongles/chips/devices are you connecting to,
what brand they are and what version? Also the number of simultaneous
connections you have at any one time, and the general description of the
setup you are running in terms of chips and stacks.


I am running meshctl from bluez (development version) as a provisioner,
there are no other devices connected. The dongle I am using with meshctl
is a Trust BT dongle (dmesg reports Product:
CSR8510 A10). I have also enabled debugging in zephyr
(CONFIG_BT_DEBUG_HCI_DRIVER=y) included is the logging

Thanks for all the info.

Before we investigate whether the CSR8510 is doing something it should not, can we double-check additional stack sizes?

Can you set:
CONFIG_BT_HCI_TX_STACK_SIZE to 1024
and
CONFIG_BT_RX_STACK_SIZE to as much as you can, 4096 ideally

I see that your receive thread stack (CONFIG_BT_RX_STACK_SIZE) is getting near the 2048 limit, which is huge. Are you doing a lot of complex operations in BLE callbacks?

Thanks,

Carles
_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@...
https://lists.zephyrproject.org/mailman/listinfo/zephyr-devel


laczenJMS
 

Hi Vinayak and Carles,

I applied the patch and increased stack sizes as required, it took
some time before I got the error again. The result:

Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19 %)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18 %)
[bt] [WRN] proxy_ccc_write: Client wrote 0x0000 instead enabling notify
prio recv thread stack (real size 748): unused 440 usage 308 / 748 (41 %)
recv thread stack (real size 4396): unused 2968 usage 1428 / 4396 (32 %)
Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19 %)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18 %)
tx stack (real size 940): unused 512 usage 428 / 940 (45 %)
Unknown Rsp to 2.
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc
Faulting instruction address: 0x12d50
Fatal fault in ISR! Spinning...

Kind regards,

Jehudi

2017-09-06 17:43 GMT+02:00 Chettimada, Vinayak Kariappa
<vinayak.kariappa.chettimada@...>:

Hi Jehudi,

To further debug the conditions around the assert, please apply the attached
diff.
And send me which assert is hit.

Regards,
Vinayak

On 6 Sep 2017, at 17:05, Cufi, Carles <Carles.Cufi@...> wrote:

Hi Jehudi,

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: 06 September 2017 16:58
To: Cufi, Carles <Carles.Cufi@...>
Cc: Johan Hedberg <johan.hedberg@...>; zephyr-
devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Carles,

2017-09-06 16:30 GMT+02:00 Cufi, Carles <Carles.Cufi@...>:

Hi Jehudi,

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: 06 September 2017 15:46
To: Cufi, Carles <Carles.Cufi@...>
Cc: Johan Hedberg <johan.hedberg@...>; zephyr-
devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Carles,


2017-09-06 15:26 GMT+02:00 Cufi, Carles <Carles.Cufi@...>:

Hi Jehudi,

-----Original Message-----
From: zephyr-devel-bounces@...
[mailto:zephyr-devel- bounces@...] On Behalf
Of Laczen JMS
Sent: 06 September 2017 15:20
To: Johan Hedberg <johan.hedberg@...>
Cc: zephyr-devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard
Fault

Hi Johan,

I increased the stack size to CONFIG_BT_RX_STACK_SIZE=2048 (I
don't understand why the real stack size is reported to be 2348),
anyhow it gives a new hard fault (but now with assert: '0'

failed):


Kernel stacks:
main (real size 512): unused 220 usage 292 / 512

(57

%)

idle (real size 256): unused 200 usage 56 / 256 (21

%)

interrupt (real size 2048): unused 1640 usage 408 / 2048

(19

%)

workqueue (real size 2048): unused 1668 usage 380 / 2048

(18

%)

prio recv thread stack (real size 748): unused 440 usage 308

/

748

(41 %)
recv thread stack (real size 2348): unused 308 usage 2040

/

2348 (86 %)
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc
Faulting instruction address: 0x12d50 Fatal fault in ISR!

Spinning...


This is a BLE Controller assert that hit. Can you give us a couple
of

additional tidbits of info to help us diagnose?


* What exact Zephyr version are you running? (if master please give
us the commit SHA)


I am using the latest zephyr version (pulled yesterday), commit
d3862d7b39349b079b0d125de0c27e54a41b8aa4


So that's the latest, excludes some fixes that came in some days ago.

Thanks.



* What board are you using? Is this a combined (Host + Controller

single-chip) build or are you using 2 chips?

I am using a andtcl board with nrf51822 256kb flash and 32kb ram


Not sure what board that is, can you send a link?


It is a module from aliexpress:
https://nl.aliexpress.com/store/product/1pc-Wireless-Module-32K-RAM-
256K-FLASH-NRF51822-Nordic-51822-core-with-PCB-antenna-intergrated-
NRF51822/605000_32801747596.html



* What configuration are you using? (your .conf file)


Included are the conf file from the mesh source directory, as well as
the .config files from the output directory


Got your config files, so this is a combined build with Host +

Controller on a single chip. What would be interesting to know in this
case is more info about the peer devices you are interacting with. Can
you give us a clue of what dongles/chips/devices are you connecting to,
what brand they are and what version? Also the number of simultaneous
connections you have at any one time, and the general description of the
setup you are running in terms of chips and stacks.



I am running meshctl from bluez (development version) as a provisioner,
there are no other devices connected. The dongle I am using with meshctl
is a Trust BT dongle (dmesg reports Product:
CSR8510 A10). I have also enabled debugging in zephyr
(CONFIG_BT_DEBUG_HCI_DRIVER=y) included is the logging


Thanks for all the info.

Before we investigate whether the CSR8510 is doing something it should not,
can we double-check additional stack sizes?

Can you set:
CONFIG_BT_HCI_TX_STACK_SIZE to 1024
and
CONFIG_BT_RX_STACK_SIZE to as much as you can, 4096 ideally

I see that your receive thread stack (CONFIG_BT_RX_STACK_SIZE) is getting
near the 2048 limit, which is huge. Are you doing a lot of complex
operations in BLE callbacks?

Thanks,

Carles
_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@...
https://lists.zephyrproject.org/mailman/listinfo/zephyr-devel


Chettimada, Vinayak Kariappa
 

Hi Jehudi,

This is very helpful.

The local controller (if I assume correct, is in slave role) is under channel map update waiting for the instant when it received an unknown rsp PDU.

Please apply a modified diff that is attached, which will let me know which procedure from local controller could have been initiated (if so) that the peer sent an unknown rsp.

Thanks in advance.

-Vinayak

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: Wednesday, September 06, 2017 10:30 PM
To: Chettimada, Vinayak Kariappa
<vinayak.kariappa.chettimada@...>
Cc: zephyr-devel@...; Cufi, Carles
<Carles.Cufi@...>
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Vinayak and Carles,

I applied the patch and increased stack sizes as required, it took some time
before I got the error again. The result:

Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19 %)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18 %)
[bt] [WRN] proxy_ccc_write: Client wrote 0x0000 instead enabling notify
prio recv thread stack (real size 748): unused 440 usage 308 / 748 (41 %)
recv thread stack (real size 4396): unused 2968 usage 1428 / 4396 (32 %)
Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19 %)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18 %)
tx stack (real size 940): unused 512 usage 428 / 940 (45 %)
Unknown Rsp to 2.
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc
Faulting instruction address: 0x12d50 Fatal fault in ISR! Spinning...

Kind regards,

Jehudi

2017-09-06 17:43 GMT+02:00 Chettimada, Vinayak Kariappa
<vinayak.kariappa.chettimada@...>:
Hi Jehudi,

To further debug the conditions around the assert, please apply the
attached diff.
And send me which assert is hit.

Regards,
Vinayak

On 6 Sep 2017, at 17:05, Cufi, Carles <Carles.Cufi@...> wrote:

Hi Jehudi,

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: 06 September 2017 16:58
To: Cufi, Carles <Carles.Cufi@...>
Cc: Johan Hedberg <johan.hedberg@...>; zephyr-
devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Carles,

2017-09-06 16:30 GMT+02:00 Cufi, Carles <Carles.Cufi@...>:

Hi Jehudi,

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: 06 September 2017 15:46
To: Cufi, Carles <Carles.Cufi@...>
Cc: Johan Hedberg <johan.hedberg@...>; zephyr-
devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Carles,


2017-09-06 15:26 GMT+02:00 Cufi, Carles <Carles.Cufi@...>:

Hi Jehudi,

-----Original Message-----
From: zephyr-devel-bounces@...
[mailto:zephyr-devel- bounces@...] On Behalf Of
Laczen JMS
Sent: 06 September 2017 15:20
To: Johan Hedberg <johan.hedberg@...>
Cc: zephyr-devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Johan,

I increased the stack size to CONFIG_BT_RX_STACK_SIZE=2048 (I don't
understand why the real stack size is reported to be 2348), anyhow it
gives a new hard fault (but now with assert: '0'

failed):


Kernel stacks:
main (real size 512): unused 220 usage 292 / 512

(57

%)

idle (real size 256): unused 200 usage 56 / 256 (21

%)

interrupt (real size 2048): unused 1640 usage 408 / 2048

(19

%)

workqueue (real size 2048): unused 1668 usage 380 / 2048

(18

%)

prio recv thread stack (real size 748): unused 440 usage 308

/

748

(41 %)
recv thread stack (real size 2348): unused 308 usage 2040

/

2348 (86 %)
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc Faulting instruction
address: 0x12d50 Fatal fault in ISR!

Spinning...


This is a BLE Controller assert that hit. Can you give us a couple of

additional tidbits of info to help us diagnose?


* What exact Zephyr version are you running? (if master please give us
the commit SHA)


I am using the latest zephyr version (pulled yesterday), commit
d3862d7b39349b079b0d125de0c27e54a41b8aa4


So that's the latest, excludes some fixes that came in some days ago.

Thanks.



* What board are you using? Is this a combined (Host + Controller

single-chip) build or are you using 2 chips?

I am using a andtcl board with nrf51822 256kb flash and 32kb ram


Not sure what board that is, can you send a link?


It is a module from aliexpress:
https://nl.aliexpress.com/store/product/1pc-Wireless-Module-32K-RAM-
256K-FLASH-NRF51822-Nordic-51822-core-with-PCB-antenna-intergrated-
NRF51822/605000_32801747596.html



* What configuration are you using? (your .conf file)


Included are the conf file from the mesh source directory, as well as
the .config files from the output directory


Got your config files, so this is a combined build with Host +

Controller on a single chip. What would be interesting to know in this
case is more info about the peer devices you are interacting with. Can
you give us a clue of what dongles/chips/devices are you connecting
to, what brand they are and what version? Also the number of
simultaneous connections you have at any one time, and the general
description of the setup you are running in terms of chips and stacks.



I am running meshctl from bluez (development version) as a
provisioner, there are no other devices connected. The dongle I am
using with meshctl is a Trust BT dongle (dmesg reports Product:
CSR8510 A10). I have also enabled debugging in zephyr
(CONFIG_BT_DEBUG_HCI_DRIVER=y) included is the logging


Thanks for all the info.

Before we investigate whether the CSR8510 is doing something it should
not, can we double-check additional stack sizes?

Can you set:
CONFIG_BT_HCI_TX_STACK_SIZE to 1024
and
CONFIG_BT_RX_STACK_SIZE to as much as you can, 4096 ideally

I see that your receive thread stack (CONFIG_BT_RX_STACK_SIZE) is
getting near the 2048 limit, which is huge. Are you doing a lot of
complex operations in BLE callbacks?

Thanks,

Carles
_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@...
https://lists.zephyrproject.org/mailman/listinfo/zephyr-devel


laczenJMS
 

Hi Vinayak,

With the diff applied I got the following:

Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19 %)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18 %)
prio recv thread stack (real size 748): unused 440 usage 308 / 748 (41 %)
recv thread stack (real size 4396): unused 2356 usage 2040 / 4396 (46 %)
Unknown Rsp to 14, in 2 state
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc
Faulting instruction address: 0x12d50
Fatal fault in ISR! Spinning...

Kind regards,

Jehudi

2017-09-07 8:56 GMT+02:00 Chettimada, Vinayak Kariappa
<vinayak.kariappa.chettimada@...>:

Hi Jehudi,

This is very helpful.

The local controller (if I assume correct, is in slave role) is under channel map update waiting for the instant when it received an unknown rsp PDU.

Please apply a modified diff that is attached, which will let me know which procedure from local controller could have been initiated (if so) that the peer sent an unknown rsp.

Thanks in advance.

-Vinayak

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: Wednesday, September 06, 2017 10:30 PM
To: Chettimada, Vinayak Kariappa
<vinayak.kariappa.chettimada@...>
Cc: zephyr-devel@...; Cufi, Carles
<Carles.Cufi@...>
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Vinayak and Carles,

I applied the patch and increased stack sizes as required, it took some time
before I got the error again. The result:

Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19 %)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18 %)
[bt] [WRN] proxy_ccc_write: Client wrote 0x0000 instead enabling notify
prio recv thread stack (real size 748): unused 440 usage 308 / 748 (41 %)
recv thread stack (real size 4396): unused 2968 usage 1428 / 4396 (32 %)
Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19 %)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18 %)
tx stack (real size 940): unused 512 usage 428 / 940 (45 %)
Unknown Rsp to 2.
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc
Faulting instruction address: 0x12d50 Fatal fault in ISR! Spinning...

Kind regards,

Jehudi

2017-09-06 17:43 GMT+02:00 Chettimada, Vinayak Kariappa
<vinayak.kariappa.chettimada@...>:
Hi Jehudi,

To further debug the conditions around the assert, please apply the
attached diff.
And send me which assert is hit.

Regards,
Vinayak

On 6 Sep 2017, at 17:05, Cufi, Carles <Carles.Cufi@...> wrote:

Hi Jehudi,

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: 06 September 2017 16:58
To: Cufi, Carles <Carles.Cufi@...>
Cc: Johan Hedberg <johan.hedberg@...>; zephyr-
devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Carles,

2017-09-06 16:30 GMT+02:00 Cufi, Carles <Carles.Cufi@...>:

Hi Jehudi,

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: 06 September 2017 15:46
To: Cufi, Carles <Carles.Cufi@...>
Cc: Johan Hedberg <johan.hedberg@...>; zephyr-
devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Carles,


2017-09-06 15:26 GMT+02:00 Cufi, Carles <Carles.Cufi@...>:

Hi Jehudi,

-----Original Message-----
From: zephyr-devel-bounces@...
[mailto:zephyr-devel- bounces@...] On Behalf Of
Laczen JMS
Sent: 06 September 2017 15:20
To: Johan Hedberg <johan.hedberg@...>
Cc: zephyr-devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Johan,

I increased the stack size to CONFIG_BT_RX_STACK_SIZE=2048 (I don't
understand why the real stack size is reported to be 2348), anyhow it
gives a new hard fault (but now with assert: '0'

failed):


Kernel stacks:
main (real size 512): unused 220 usage 292 / 512

(57

%)

idle (real size 256): unused 200 usage 56 / 256 (21

%)

interrupt (real size 2048): unused 1640 usage 408 / 2048

(19

%)

workqueue (real size 2048): unused 1668 usage 380 / 2048

(18

%)

prio recv thread stack (real size 748): unused 440 usage 308

/

748

(41 %)
recv thread stack (real size 2348): unused 308 usage 2040

/

2348 (86 %)
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc Faulting instruction
address: 0x12d50 Fatal fault in ISR!

Spinning...


This is a BLE Controller assert that hit. Can you give us a couple of

additional tidbits of info to help us diagnose?


* What exact Zephyr version are you running? (if master please give us
the commit SHA)


I am using the latest zephyr version (pulled yesterday), commit
d3862d7b39349b079b0d125de0c27e54a41b8aa4


So that's the latest, excludes some fixes that came in some days ago.

Thanks.



* What board are you using? Is this a combined (Host + Controller

single-chip) build or are you using 2 chips?

I am using a andtcl board with nrf51822 256kb flash and 32kb ram


Not sure what board that is, can you send a link?


It is a module from aliexpress:
https://nl.aliexpress.com/store/product/1pc-Wireless-Module-32K-RAM-
256K-FLASH-NRF51822-Nordic-51822-core-with-PCB-antenna-intergrated-
NRF51822/605000_32801747596.html



* What configuration are you using? (your .conf file)


Included are the conf file from the mesh source directory, as well as
the .config files from the output directory


Got your config files, so this is a combined build with Host +

Controller on a single chip. What would be interesting to know in this
case is more info about the peer devices you are interacting with. Can
you give us a clue of what dongles/chips/devices are you connecting
to, what brand they are and what version? Also the number of
simultaneous connections you have at any one time, and the general
description of the setup you are running in terms of chips and stacks.



I am running meshctl from bluez (development version) as a
provisioner, there are no other devices connected. The dongle I am
using with meshctl is a Trust BT dongle (dmesg reports Product:
CSR8510 A10). I have also enabled debugging in zephyr
(CONFIG_BT_DEBUG_HCI_DRIVER=y) included is the logging


Thanks for all the info.

Before we investigate whether the CSR8510 is doing something it should
not, can we double-check additional stack sizes?

Can you set:
CONFIG_BT_HCI_TX_STACK_SIZE to 1024
and
CONFIG_BT_RX_STACK_SIZE to as much as you can, 4096 ideally

I see that your receive thread stack (CONFIG_BT_RX_STACK_SIZE) is
getting near the 2048 limit, which is huge. Are you doing a lot of
complex operations in BLE callbacks?

Thanks,

Carles
_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@...
https://lists.zephyrproject.org/mailman/listinfo/zephyr-devel


Chettimada, Vinayak Kariappa
 

Hi Jehudi,

Thank you for the quick response. (btw, I am on #zephyrproject with nickname vich, if that’s faster to communicate)

It was a local intiated slave feature req that is not support by the peer BT 4.0 implementation while the peer had initiated a channel map update.

I will send a fix soon to github and email you the patch.

-Vinayak

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: Thursday, September 07, 2017 9:28 AM
To: Chettimada, Vinayak Kariappa
<vinayak.kariappa.chettimada@...>
Cc: zephyr-devel@...; Cufi, Carles
<Carles.Cufi@...>
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Vinayak,

With the diff applied I got the following:

Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19 %)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18 %)
prio recv thread stack (real size 748): unused 440 usage 308 / 748 (41 %)
recv thread stack (real size 4396): unused 2356 usage 2040 / 4396 (46 %)
Unknown Rsp to 14, in 2 state
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc
Faulting instruction address: 0x12d50 Fatal fault in ISR! Spinning...

Kind regards,

Jehudi

2017-09-07 8:56 GMT+02:00 Chettimada, Vinayak Kariappa
<vinayak.kariappa.chettimada@...>:
Hi Jehudi,

This is very helpful.

The local controller (if I assume correct, is in slave role) is under channel
map update waiting for the instant when it received an unknown rsp PDU.

Please apply a modified diff that is attached, which will let me know which
procedure from local controller could have been initiated (if so) that the peer
sent an unknown rsp.

Thanks in advance.

-Vinayak

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: Wednesday, September 06, 2017 10:30 PM
To: Chettimada, Vinayak Kariappa
<vinayak.kariappa.chettimada@...>
Cc: zephyr-devel@...; Cufi, Carles
<Carles.Cufi@...>
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Vinayak and Carles,

I applied the patch and increased stack sizes as required, it took
some time before I got the error again. The result:

Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19 %)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18 %)
[bt] [WRN] proxy_ccc_write: Client wrote 0x0000 instead enabling notify
prio recv thread stack (real size 748): unused 440 usage 308 / 748 (41 %)
recv thread stack (real size 4396): unused 2968 usage 1428 / 4396 (32
%)
Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19 %)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18 %)
tx stack (real size 940): unused 512 usage 428 / 940 (45 %)
Unknown Rsp to 2.
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc
Faulting instruction address: 0x12d50 Fatal fault in ISR! Spinning...

Kind regards,

Jehudi

2017-09-06 17:43 GMT+02:00 Chettimada, Vinayak Kariappa
<vinayak.kariappa.chettimada@...>:
Hi Jehudi,

To further debug the conditions around the assert, please apply the
attached diff.
And send me which assert is hit.

Regards,
Vinayak

On 6 Sep 2017, at 17:05, Cufi, Carles <Carles.Cufi@...>
wrote:

Hi Jehudi,

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: 06 September 2017 16:58
To: Cufi, Carles <Carles.Cufi@...>
Cc: Johan Hedberg <johan.hedberg@...>; zephyr-
devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard
Fault

Hi Carles,

2017-09-06 16:30 GMT+02:00 Cufi, Carles <Carles.Cufi@...>:

Hi Jehudi,

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: 06 September 2017 15:46
To: Cufi, Carles <Carles.Cufi@...>
Cc: Johan Hedberg <johan.hedberg@...>; zephyr-
devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard
Fault

Hi Carles,


2017-09-06 15:26 GMT+02:00 Cufi, Carles <Carles.Cufi@...>:

Hi Jehudi,

-----Original Message-----
From: zephyr-devel-bounces@...
[mailto:zephyr-devel- bounces@...] On Behalf Of
Laczen JMS
Sent: 06 September 2017 15:20
To: Johan Hedberg <johan.hedberg@...>
Cc: zephyr-devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard
Fault

Hi Johan,

I increased the stack size to CONFIG_BT_RX_STACK_SIZE=2048 (I don't
understand why the real stack size is reported to be 2348), anyhow
it gives a new hard fault (but now with assert: '0'

failed):


Kernel stacks:
main (real size 512): unused 220 usage 292 / 512

(57

%)

idle (real size 256): unused 200 usage 56 / 256 (21

%)

interrupt (real size 2048): unused 1640 usage 408 / 2048

(19

%)

workqueue (real size 2048): unused 1668 usage 380 / 2048

(18

%)

prio recv thread stack (real size 748): unused 440 usage 308

/

748

(41 %)
recv thread stack (real size 2348): unused 308 usage 2040

/

2348 (86 %)
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc Faulting instruction
address: 0x12d50 Fatal fault in ISR!

Spinning...


This is a BLE Controller assert that hit. Can you give us a couple
of

additional tidbits of info to help us diagnose?


* What exact Zephyr version are you running? (if master please give
us the commit SHA)


I am using the latest zephyr version (pulled yesterday), commit
d3862d7b39349b079b0d125de0c27e54a41b8aa4


So that's the latest, excludes some fixes that came in some days ago.

Thanks.



* What board are you using? Is this a combined (Host + Controller

single-chip) build or are you using 2 chips?

I am using a andtcl board with nrf51822 256kb flash and 32kb ram


Not sure what board that is, can you send a link?


It is a module from aliexpress:
https://nl.aliexpress.com/store/product/1pc-Wireless-Module-32K-
RAM
-
256K-FLASH-NRF51822-Nordic-51822-core-with-PCB-antenna-
intergrated-
NRF51822/605000_32801747596.html



* What configuration are you using? (your .conf file)


Included are the conf file from the mesh source directory, as well
as the .config files from the output directory


Got your config files, so this is a combined build with Host +

Controller on a single chip. What would be interesting to know in
this case is more info about the peer devices you are interacting
with. Can you give us a clue of what dongles/chips/devices are you
connecting to, what brand they are and what version? Also the
number of simultaneous connections you have at any one time, and
the general description of the setup you are running in terms of chips
and stacks.



I am running meshctl from bluez (development version) as a
provisioner, there are no other devices connected. The dongle I am
using with meshctl is a Trust BT dongle (dmesg reports Product:
CSR8510 A10). I have also enabled debugging in zephyr
(CONFIG_BT_DEBUG_HCI_DRIVER=y) included is the logging


Thanks for all the info.

Before we investigate whether the CSR8510 is doing something it
should not, can we double-check additional stack sizes?

Can you set:
CONFIG_BT_HCI_TX_STACK_SIZE to 1024 and
CONFIG_BT_RX_STACK_SIZE to
as much as you can, 4096 ideally

I see that your receive thread stack (CONFIG_BT_RX_STACK_SIZE) is
getting near the 2048 limit, which is huge. Are you doing a lot of
complex operations in BLE callbacks?

Thanks,

Carles
_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@...
https://lists.zephyrproject.org/mailman/listinfo/zephyr-devel


Chettimada, Vinayak Kariappa
 

Find attached the diff to handle the unknown rsp.

-Vinayak

-----Original Message-----
From: zephyr-devel-bounces@... [mailto:zephyr-devel-
bounces@...] On Behalf Of Chettimada, Vinayak
Kariappa
Sent: Thursday, September 07, 2017 10:01 AM
To: Laczen JMS <laczenjms@...>
Cc: zephyr-devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Jehudi,

Thank you for the quick response. (btw, I am on #zephyrproject with
nickname vich, if that’s faster to communicate)

It was a local intiated slave feature req that is not support by the peer BT 4.0
implementation while the peer had initiated a channel map update.

I will send a fix soon to github and email you the patch.

-Vinayak

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: Thursday, September 07, 2017 9:28 AM
To: Chettimada, Vinayak Kariappa
<vinayak.kariappa.chettimada@...>
Cc: zephyr-devel@...; Cufi, Carles
<Carles.Cufi@...>
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Vinayak,

With the diff applied I got the following:

Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19 %)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18 %)
prio recv thread stack (real size 748): unused 440 usage 308 / 748 (41 %)
recv thread stack (real size 4396): unused 2356 usage 2040 / 4396 (46 %)
Unknown Rsp to 14, in 2 state
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc
Faulting instruction address: 0x12d50 Fatal fault in ISR! Spinning...

Kind regards,

Jehudi

2017-09-07 8:56 GMT+02:00 Chettimada, Vinayak Kariappa
<vinayak.kariappa.chettimada@...>:
Hi Jehudi,

This is very helpful.

The local controller (if I assume correct, is in slave role) is
under channel
map update waiting for the instant when it received an unknown rsp PDU.

Please apply a modified diff that is attached, which will let me
know which
procedure from local controller could have been initiated (if so) that
the peer sent an unknown rsp.

Thanks in advance.

-Vinayak

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: Wednesday, September 06, 2017 10:30 PM
To: Chettimada, Vinayak Kariappa
<vinayak.kariappa.chettimada@...>
Cc: zephyr-devel@...; Cufi, Carles
<Carles.Cufi@...>
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard
Fault

Hi Vinayak and Carles,

I applied the patch and increased stack sizes as required, it took
some time before I got the error again. The result:

Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19 %)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18 %)
[bt] [WRN] proxy_ccc_write: Client wrote 0x0000 instead enabling notify
prio recv thread stack (real size 748): unused 440 usage 308 / 748 (41
%)
recv thread stack (real size 4396): unused 2968 usage 1428 / 4396 (32
%)
Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19 %)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18 %)
tx stack (real size 940): unused 512 usage 428 / 940 (45 %)
Unknown Rsp to 2.
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc
Faulting instruction address: 0x12d50 Fatal fault in ISR! Spinning...

Kind regards,

Jehudi

2017-09-06 17:43 GMT+02:00 Chettimada, Vinayak Kariappa
<vinayak.kariappa.chettimada@...>:
Hi Jehudi,

To further debug the conditions around the assert, please apply
the attached diff.
And send me which assert is hit.

Regards,
Vinayak

On 6 Sep 2017, at 17:05, Cufi, Carles <Carles.Cufi@...>
wrote:

Hi Jehudi,

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: 06 September 2017 16:58
To: Cufi, Carles <Carles.Cufi@...>
Cc: Johan Hedberg <johan.hedberg@...>; zephyr-
devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard
Fault

Hi Carles,

2017-09-06 16:30 GMT+02:00 Cufi, Carles
<Carles.Cufi@...>:

Hi Jehudi,

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: 06 September 2017 15:46
To: Cufi, Carles <Carles.Cufi@...>
Cc: Johan Hedberg <johan.hedberg@...>; zephyr-
devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard
Fault

Hi Carles,


2017-09-06 15:26 GMT+02:00 Cufi, Carles
<Carles.Cufi@...>:

Hi Jehudi,

-----Original Message-----
From: zephyr-devel-bounces@...
[mailto:zephyr-devel- bounces@...] On Behalf
Of Laczen JMS
Sent: 06 September 2017 15:20
To: Johan Hedberg <johan.hedberg@...>
Cc: zephyr-devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard
Fault

Hi Johan,

I increased the stack size to CONFIG_BT_RX_STACK_SIZE=2048 (I
don't understand why the real stack size is reported to be 2348),
anyhow it gives a new hard fault (but now with assert: '0'

failed):


Kernel stacks:
main (real size 512): unused 220 usage 292 / 512

(57

%)

idle (real size 256): unused 200 usage 56 / 256 (21

%)

interrupt (real size 2048): unused 1640 usage 408 / 2048

(19

%)

workqueue (real size 2048): unused 1668 usage 380 / 2048

(18

%)

prio recv thread stack (real size 748): unused 440 usage 308

/

748

(41 %)
recv thread stack (real size 2348): unused 308 usage 2040

/

2348 (86 %)
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc Faulting instruction
address: 0x12d50 Fatal fault in ISR!

Spinning...


This is a BLE Controller assert that hit. Can you give us a
couple of

additional tidbits of info to help us diagnose?


* What exact Zephyr version are you running? (if master please
give us the commit SHA)


I am using the latest zephyr version (pulled yesterday), commit
d3862d7b39349b079b0d125de0c27e54a41b8aa4


So that's the latest, excludes some fixes that came in some days ago.

Thanks.



* What board are you using? Is this a combined (Host + Controller

single-chip) build or are you using 2 chips?

I am using a andtcl board with nrf51822 256kb flash and 32kb ram


Not sure what board that is, can you send a link?


It is a module from aliexpress:
https://nl.aliexpress.com/store/product/1pc-Wireless-Module-32K-
RAM
-
256K-FLASH-NRF51822-Nordic-51822-core-with-PCB-antenna-
intergrated-
NRF51822/605000_32801747596.html



* What configuration are you using? (your .conf file)


Included are the conf file from the mesh source directory, as
well as the .config files from the output directory


Got your config files, so this is a combined build with Host +

Controller on a single chip. What would be interesting to know in
this case is more info about the peer devices you are interacting
with. Can you give us a clue of what dongles/chips/devices are
you connecting to, what brand they are and what version? Also the
number of simultaneous connections you have at any one time, and
the general description of the setup you are running in terms of
chips
and stacks.



I am running meshctl from bluez (development version) as a
provisioner, there are no other devices connected. The dongle I
am using with meshctl is a Trust BT dongle (dmesg reports Product:
CSR8510 A10). I have also enabled debugging in zephyr
(CONFIG_BT_DEBUG_HCI_DRIVER=y) included is the logging


Thanks for all the info.

Before we investigate whether the CSR8510 is doing something it
should not, can we double-check additional stack sizes?

Can you set:
CONFIG_BT_HCI_TX_STACK_SIZE to 1024 and
CONFIG_BT_RX_STACK_SIZE to
as much as you can, 4096 ideally

I see that your receive thread stack (CONFIG_BT_RX_STACK_SIZE) is
getting near the 2048 limit, which is huge. Are you doing a lot
of complex operations in BLE callbacks?

Thanks,

Carles
_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@...
https://lists.zephyrproject.org/mailman/listinfo/zephyr-devel

_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@...
https://lists.zephyrproject.org/mailman/listinfo/zephyr-devel


Chettimada, Vinayak Kariappa
 

Hi Jehudi,

This is the PR: https://github.com/zephyrproject-rtos/zephyr/pull/1390

Please add yourself as reviewer and approve the same after you have confirm the fix by testing.

Regards,
Vinayak

-----Original Message-----
From: Chettimada, Vinayak Kariappa
Sent: Thursday, September 07, 2017 10:12 AM
To: Chettimada, Vinayak Kariappa
<vinayak.kariappa.chettimada@...>; Laczen JMS
<laczenjms@...>
Cc: zephyr-devel@...
Subject: RE: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Find attached the diff to handle the unknown rsp.

-Vinayak

-----Original Message-----
From: zephyr-devel-bounces@...
[mailto:zephyr-devel- bounces@...] On Behalf Of
Chettimada, Vinayak Kariappa
Sent: Thursday, September 07, 2017 10:01 AM
To: Laczen JMS <laczenjms@...>
Cc: zephyr-devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard Fault

Hi Jehudi,

Thank you for the quick response. (btw, I am on #zephyrproject with
nickname vich, if that’s faster to communicate)

It was a local intiated slave feature req that is not support by the
peer BT 4.0 implementation while the peer had initiated a channel map
update.

I will send a fix soon to github and email you the patch.

-Vinayak

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: Thursday, September 07, 2017 9:28 AM
To: Chettimada, Vinayak Kariappa
<vinayak.kariappa.chettimada@...>
Cc: zephyr-devel@...; Cufi, Carles
<Carles.Cufi@...>
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard
Fault

Hi Vinayak,

With the diff applied I got the following:

Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19 %)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18 %)
prio recv thread stack (real size 748): unused 440 usage 308 / 748 (41 %)
recv thread stack (real size 4396): unused 2356 usage 2040 / 4396 (46
%)
Unknown Rsp to 14, in 2 state
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc
Faulting instruction address: 0x12d50 Fatal fault in ISR! Spinning...

Kind regards,

Jehudi

2017-09-07 8:56 GMT+02:00 Chettimada, Vinayak Kariappa
<vinayak.kariappa.chettimada@...>:
Hi Jehudi,

This is very helpful.

The local controller (if I assume correct, is in slave role) is
under channel
map update waiting for the instant when it received an unknown rsp
PDU.

Please apply a modified diff that is attached, which will let me
know which
procedure from local controller could have been initiated (if so)
that the peer sent an unknown rsp.

Thanks in advance.

-Vinayak

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: Wednesday, September 06, 2017 10:30 PM
To: Chettimada, Vinayak Kariappa
<vinayak.kariappa.chettimada@...>
Cc: zephyr-devel@...; Cufi, Carles
<Carles.Cufi@...>
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard
Fault

Hi Vinayak and Carles,

I applied the patch and increased stack sizes as required, it
took some time before I got the error again. The result:

Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19 %)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18 %)
[bt] [WRN] proxy_ccc_write: Client wrote 0x0000 instead enabling
notify
prio recv thread stack (real size 748): unused 440 usage 308 / 748 (41
%)
recv thread stack (real size 4396): unused 2968 usage 1428 / 4396
(32
%)
Kernel stacks:
main (real size 512): unused 220 usage 292 / 512 (57 %)
idle (real size 256): unused 200 usage 56 / 256 (21 %)
interrupt (real size 2048): unused 1640 usage 408 / 2048 (19 %)
workqueue (real size 2048): unused 1668 usage 380 / 2048 (18 %)
tx stack (real size 940): unused 512 usage 428 / 940 (45 %)
Unknown Rsp to 2.
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc
Faulting instruction address: 0x12d50 Fatal fault in ISR! Spinning...

Kind regards,

Jehudi

2017-09-06 17:43 GMT+02:00 Chettimada, Vinayak Kariappa
<vinayak.kariappa.chettimada@...>:
Hi Jehudi,

To further debug the conditions around the assert, please apply
the attached diff.
And send me which assert is hit.

Regards,
Vinayak

On 6 Sep 2017, at 17:05, Cufi, Carles
<Carles.Cufi@...>
wrote:

Hi Jehudi,

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: 06 September 2017 16:58
To: Cufi, Carles <Carles.Cufi@...>
Cc: Johan Hedberg <johan.hedberg@...>; zephyr-
devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard
Fault

Hi Carles,

2017-09-06 16:30 GMT+02:00 Cufi, Carles
<Carles.Cufi@...>:

Hi Jehudi,

-----Original Message-----
From: Laczen JMS [mailto:laczenjms@...]
Sent: 06 September 2017 15:46
To: Cufi, Carles <Carles.Cufi@...>
Cc: Johan Hedberg <johan.hedberg@...>; zephyr-
devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard
Fault

Hi Carles,


2017-09-06 15:26 GMT+02:00 Cufi, Carles
<Carles.Cufi@...>:

Hi Jehudi,

-----Original Message-----
From: zephyr-devel-bounces@...
[mailto:zephyr-devel- bounces@...] On
Behalf Of Laczen JMS
Sent: 06 September 2017 15:20
To: Johan Hedberg <johan.hedberg@...>
Cc: zephyr-devel@...
Subject: Re: [Zephyr-devel] Bluetooth mesh sample (mesh) - Hard
Fault

Hi Johan,

I increased the stack size to CONFIG_BT_RX_STACK_SIZE=2048 (I
don't understand why the real stack size is reported to be
2348), anyhow it gives a new hard fault (but now with assert: '0'

failed):


Kernel stacks:
main (real size 512): unused 220 usage 292 / 512

(57

%)

idle (real size 256): unused 200 usage 56 / 256 (21

%)

interrupt (real size 2048): unused 1640 usage 408 / 2048

(19

%)

workqueue (real size 2048): unused 1668 usage 380 / 2048

(18

%)

prio recv thread stack (real size 748): unused 440 usage 308

/

748

(41 %)
recv thread stack (real size 2348): unused 308 usage 2040

/

2348 (86 %)
[bt] [ERR] isr_rx_conn_pkt_ctrl: assert: '0' failed
***** HARD FAULT *****
Executing thread ID (thread): 0x200023dc Faulting instruction
address: 0x12d50 Fatal fault in ISR!

Spinning...


This is a BLE Controller assert that hit. Can you give us a
couple of

additional tidbits of info to help us diagnose?


* What exact Zephyr version are you running? (if master please
give us the commit SHA)


I am using the latest zephyr version (pulled yesterday), commit
d3862d7b39349b079b0d125de0c27e54a41b8aa4


So that's the latest, excludes some fixes that came in some days ago.

Thanks.



* What board are you using? Is this a combined (Host +
Controller

single-chip) build or are you using 2 chips?

I am using a andtcl board with nrf51822 256kb flash and 32kb
ram


Not sure what board that is, can you send a link?


It is a module from aliexpress:
https://nl.aliexpress.com/store/product/1pc-Wireless-Module-32K
-
RAM
-
256K-FLASH-NRF51822-Nordic-51822-core-with-PCB-antenna-
intergrated-
NRF51822/605000_32801747596.html



* What configuration are you using? (your .conf file)


Included are the conf file from the mesh source directory, as
well as the .config files from the output directory


Got your config files, so this is a combined build with Host +

Controller on a single chip. What would be interesting to know
in this case is more info about the peer devices you are
interacting with. Can you give us a clue of what
dongles/chips/devices are you connecting to, what brand they
are and what version? Also the number of simultaneous
connections you have at any one time, and the general
description of the setup you are running in terms of chips
and stacks.



I am running meshctl from bluez (development version) as a
provisioner, there are no other devices connected. The dongle I
am using with meshctl is a Trust BT dongle (dmesg reports Product:
CSR8510 A10). I have also enabled debugging in zephyr
(CONFIG_BT_DEBUG_HCI_DRIVER=y) included is the logging


Thanks for all the info.

Before we investigate whether the CSR8510 is doing something it
should not, can we double-check additional stack sizes?

Can you set:
CONFIG_BT_HCI_TX_STACK_SIZE to 1024 and
CONFIG_BT_RX_STACK_SIZE to
as much as you can, 4096 ideally

I see that your receive thread stack (CONFIG_BT_RX_STACK_SIZE)
is getting near the 2048 limit, which is huge. Are you doing a
lot of complex operations in BLE callbacks?

Thanks,

Carles
_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@...
https://lists.zephyrproject.org/mailman/listinfo/zephyr-devel

_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@...
https://lists.zephyrproject.org/mailman/listinfo/zephyr-devel