Topics

Race Condition in ATT #ble #nrf52480

ben.byers@...
 
Edited

In version 1.14.1, there appears to be a race condition when updating an indication value. It results in a BUS FAULT, which occurs because a pointer becomes null.
Here are the Zephyr ATT print messages for what it looks like when it breaks:
 
Updating indication value to: 0x02 -- This is an assertion from our code for clarity
[00:00:15.622,406] <dbg> bt_att.bt_att_req_send: conn 0x2000076c req 0x200025d4
[00:00:15.622,406] <dbg> bt_att.att_send_req: req 0x200025d4
[00:00:15.666,259] <dbg> bt_att.att_rsp_sent: conn 0x2000076c att 0x200008a0
[00:00:15.666,351] <dbg> bt_att.bt_att_recv: Received ATT code 0x12 len 13
[00:00:15.666,351] <dbg> bt_att.att_write_req: handle 0x0017
[00:00:15.666,381] <dbg> bt_att.write_cb: handle 0x0017 offset 0

These are the Zephyr messages when it is successful:
Updating indication value to: 0x02
[00:00:20.676,361] <dbg> bt_att.bt_att_req_send: conn 0x2000076c req 0x200025d4
[00:00:20.676,361] <dbg> bt_att.att_send_req: req 0x200025d4
[00:00:20.720,703] <dbg> bt_att.att_rsp_sent: conn 0x2000076c att 0x200008a0
[00:00:20.765,014] <dbg> bt_att.att_req_sent: conn 0x2000076c att 0x200008a0 att->req 0x200025d4
[00:00:20.765,655] <dbg> bt_att.bt_att_recv: Received ATT code 0x1e len 0
[00:00:20.765,686] <dbg> bt_att.att_confirm: 
[00:00:20.765,686] <dbg> bt_att.att_handle_rsp: err 0 len 0: 

It would appear that att_req_sent doesn't get called when the BUS FAULT occurs. This appears consistent when I have repeated this. The ATT code shows that a Write_Request is occurring which looks like it is interrupting the indication update before it finishes. I took some BLE sniffer logs and in both the case of failure and success, the logs look identical, so externally the indication has been sent, but something internally hasn't resolved.

I've tried enabling kernel messages to see if there is a task switch here (including ISR), but I end up with lots of missed messages, so they aren't useful.

Any ideas as to how to fix this or other messages I can enable that will help continue to debug this?

UPDATE: Some more info, I enabled the __ASSERT() and this is what I see:

ASSERTION FAIL [req->func] @ ../ZEPHYR/subsys/bluetooth/host/att.c:267