Topics

HCI Host Not enough space in buffer #hci

@abaska
 

Hi,

I have a HCI host and controller setup. When the host starts, and the controller had already been running, it sometimes begins reading in the middle of a HCI message from the controller. It reads the wrong HCI message type and length. This causes the buffer to completely fill where it errors out in h4.c, read_payload(), BT_ERR("Not enough space in buffer"). After this the host application does not recover and a power cycle is needed.

I found a quick fix for this is to remove a line in h4.c read_payload(). I think it was meant to discard the rest of a message data if the buffer is full, but it looks like its assuming rx.remaining is valid. In this case, rx.remaining is invalid because it read the wrong byte as the length when it jumped into the middle of a HCI message.
    if (rx.remaining > net_buf_tailroom(rx.buf)) {
      BT_ERR("Not enough space in buffer");
      //rx.discard = rx.remaining; // fixes issue. it was discarding thousands of bytes
      reset_rx();
      return;
    }
I don't know what other consequences this quick fix will have. Is there a better way to fix this issue? Maybe check the message validity before it fills up the buffer?

Thanks

Johan Hedberg
 

Hi,


On 23 Nov 2018, at 20.37, @abaska wrote:
I have a HCI host and controller setup. When the host starts, and the controller had already been running, it sometimes begins reading in the middle of a HCI message from the controller. It reads the wrong HCI message type and length. This causes the buffer to completely fill where it errors out in h4.c, read_payload(), BT_ERR("Not enough space in buffer"). After this the host application does not recover and a power cycle is needed.

I found a quick fix for this is to remove a line in h4.c read_payload(). I think it was meant to discard the rest of a message data if the buffer is full, but it looks like its assuming rx.remaining is valid. In this case, rx.remaining is invalid because it read the wrong byte as the length when it jumped into the middle of a HCI message.
if (rx.remaining > net_buf_tailroom(rx.buf)) {
BT_ERR("Not enough space in buffer");
//rx.discard = rx.remaining; // fixes issue. it was discarding thousands of bytes
reset_rx();
return;
}
I don't know what other consequences this quick fix will have. Is there a better way to fix this issue? Maybe check the message validity before it fills up the buffer?
Basically the code is assuming that the packet is valid, but the buffer sizes have simply been defined too small to fit the packet. In such a case discarding the rx.remaining bytes would be the correct thing since then the driver skips a valid but too long packet. In the case that you’re getting corrupt data (starting in the middle of a packet) it’s all a guessing game. You could discard everything indicated, like it does now, or try again with the next byte (in which case you’ll be repeating this until you get something that makes sense). Since the code needs to read (i.e. jump forward) at least the H4 + ACL/event headers, you don’t have any guarantee that you’ll ever hit a clean packet boundary.

I’m not sure there’s any correct answer in how the host should behave, since either way this is an unreliable setup: you should ideally design your device so that you can force a reset of the controller (e.g. by power-cycling it) so that you know you start off in a known state.

Johan