Zephyr DFU protocol


Carles Cufi
 

Hi all,

As you might already know, we've been working on the introduction of DFU (Device Firmware Upgrade) to Zephyr. Several Pull Requests have been posted dealing with the low-level flash and image access modules required to store a received image and then boot into it, but that leaves out one of the key items in the system: the update protocol that allows an existing running image to obtain an updated one over a transport mechanism.

There are several fundamental requirements for such a protocol if we want it to be future-proof, extensible and practical for embedded devices:

- Must be packet-based and transport-agnostic
- Must be extensible and flexible
- The server-side implementation (assuming a request/response model) must be relatively simple and require little resources
- Must be compatible with the mcuboot project and model
- At the very least the following transports must be supported: BLE, UART, IP, USB
- A client-side tool (assuming a request/response model) must either exist already or be easily implementable

With that in mind we proceeded to analyze a few of the existing protocols out there (the ones we knew about), in order to consider whether reusing an existing effort was a better approach than designing and implementing a new protocol from scratch:

1) USB DFU specification[1]
2) Nordic Secure DFU protocol (included in the Nordic SDK)[2]
3) Newt Manager Protocol (part of Mynewt)[3]
4) Distributed DFU over CoAP used in Nordic's Thread SDK[4]

Note: I will use the word "source" to identify the device that contains the new image, and "target" to identify the one that receives it, flashes it and the boots into it.

The USB DFU specification does not seem to be a good fit since it maps specifically to particular USB endpoints and classes, making it not suitable for other transports without extensive modification. Using a standard USB class such as CDC ACM as transport, we could instead map the chosen protocol over a USB physical link.
The Nordic Secure DFU protocol is also very tightly mapped to the Nordic software architecture, including assumptions that the Bluetooth Protocol Stack is decoupled from the bootloader and application images and is permanently available through a set of system calls.

We also see 2 very different image distribution models. In protocols 1, 2 and 3 the source (client) "pushes" an image to the target (server) after checking that it's applicable based on version checking and other verifications. In protocol 4 however, the source acts instead as a server and the targets act as clients that "pulls" images from the source (server) whenever they are available. I believe that the Linaro DFU implementation also follows the "pull" paradigm of protocol 4.

We believe that the right approach for the sort of ecosystem that Zephyr targets is the "push" approach, to minimize traffic, reduce power consumption and also make it possible to use with all transports. That said, it is important to note that although we are trying to decide on a default DFU mechanism for Zephyr, all layers (including the image management) will be independent of it, and it should therefore be entirely possible to implement an additional protocol for our users. Furthermore we don't exclude the possibility of extending the chosen protocol to support a "pull" model as well, something that should be entirely feasible as long as the protocol of choice is flexible.

After analyzing the different options available, we believe the Newt Manager Protocol (NMP) to be the better suited option for our current needs, for reasons outlined below:

- It is proven to work with mcuboot, the default bootloader for Zephyr
- The current mcuboot repository already contains an implementation of NMP for serial recovery
- It uses a "push" model
- It is very simple but also easily extensible
- Uses a simple packet format combining an 8-byte header followed by CBOR[5]-encoded data
- Supports additional functionality on top of basic DFU: stats, filesystem access, date and time setting, etc.
- Already supports the BLE and serial transports
- A command-line tool exists to send images over both BLE and Serial (both Go and JS/Node versions are available)
- It is open source and licensed under the APLv2
- There are commercial products using it already [6]

The protocol itself consists of two different entities, the client sending requests and the server replying with responses.
The client side is typically a higher specced device running a full operating system (computer or portable device), whereas the server is the target of the DFU procedure and receives the image, stores it and then boots into it.
Additionally, the protocol also supports an OIC (now OCF) variant where the target/server exposes a discoverable server resource through the OCF framework over IPv6 and CoAP, making it possible to use it in a "distributed push" model where a single client can discover multiple servers and push an image to them.[7] This is an interesting feature since it enables DFU over IPv6 and CoAP out of the box, even without having to switch to a "pull" model.

Unfortunately the protocol itself is not documented in a specification, and instead the source code of the different implementations must currently be used to examine and understand the protocol. In terms of currently available implementations, there are the following:

- client/source side:
- newtmgr: Written in Go, this is the official Newt Manager Protocol client. Supports both the standard (over BLE and serial) and OIC (over IP) variants and all additional features [8]
- node-newtmgr: Unofficial NodeJS reimplementation of newtmgr, supports the standard variant over BLE and serial [9]
- Adafruit Mynewt Manager iOS application [10]
- server/target side:
- Mynewt Newt Manager Protocol implementation. Supports both variants and all transports [11]

There's also the choice, not discussed so far, to implement a brand new protocol completely tailored for Zephyr and designed from scratch. Although this has some advantages, such as being able to define it completely and adapt it to the particularities of Zephyr and let everybody contribute to the protocol choices, format and standards to use. That said, and given the fact that a protocol already exists that has been proven to work with an operating system similar to Zephyr, clients are already available for both desktop and iOS, and that it could potentially save a lot of development time to reuse an existing component like we did with mcuboot, we have not pursued this option further for now.

We are eager to hear from everybody regarding the preliminary choice, including whether you know other, alternative protocols that are not known to us, whether there are requirements that are not met by our proposal or in general opinions and questions.

Regards,

Nordic Team

[1] http://www.usb.org/developers/docs/devclass_docs/DFU_1.1.pdf
[2] http://infocenter.nordicsemi.com/index.jsp?topic=%2Fcom.nordic.infocenter.sdk5.v14.0.0%2Flib_bootloader_dfu.html&cp=4_0_0_3_5_1
[3] http://mynewt.apache.org/latest/os/modules/devmgmt/newtmgr/
[4] http://infocenter.nordicsemi.com/index.jsp?topic=%2Fcom.nordic.infocenter.threadsdk.v0.10.0%2Fthread_example_dfu.html&cp=4_2_0_2_3
[5] https://tools.ietf.org/html/rfc7049
[6] https://www.adafruit.com/product/3574
[7] http://mynewt.apache.org/latest/os/modules/devmgmt/oicmgr/
[8] https://github.com/apache/mynewt-newtmgr
[9] https://github.com/jacobrosenthal/node-newtmgr
[10] https://learn.adafruit.com/adafruit-nrf52-pro-feather/adafruit-mynewt-manager
[11] https://github.com/apache/mynewt-core/tree/master/mgmt


Johann Fischer
 

Hi,

On 28.08.2017 14:45, Cufi, Carles wrote:
The USB DFU specification does not seem to be a good fit since it maps specifically to particular USB endpoints and classes, making it not suitable for other transports without extensive modification. Using a standard USB class such as CDC ACM as transport, we could instead map the chosen protocol over a USB physical link.
That surprised me a little, can you describe it in more detail what you mean with "it maps specifically to particular USB endpoints and classes". I think if you have USB, then USB DFU is the most elegant solution for update. Or is it about using the same update tool for UART and USB?

--
Best Regards,
Johann Fischer


Carles Cufi
 

Hi Johann,

Thanks for the feedback.

-----Original Message-----
From: zephyr-devel-bounces@... [mailto:zephyr-devel-
bounces@...] On Behalf Of Johann Fischer
Sent: 28 August 2017 15:35
To: zephyr-devel@...
Subject: Re: [Zephyr-devel] Zephyr DFU protocol

Hi,

On 28.08.2017 14:45, Cufi, Carles wrote:

The USB DFU specification does not seem to be a good fit since it maps
specifically to particular USB endpoints and classes, making it not
suitable for other transports without extensive modification. Using a
standard USB class such as CDC ACM as transport, we could instead map
the chosen protocol over a USB physical link.

That surprised me a little, can you describe it in more detail what you
mean with "it maps specifically to particular USB endpoints and
classes". I think if you have USB, then USB DFU is the most elegant
solution for update. Or is it about using the same update tool for UART
and USB?
Yes, the whole point here is to find a protocol and therefore set of update command-line tools for all transports, so that the only difference among them is an adaption layer for them. That however does *not* prevent Zephyr from also supporting USB DFU or any other DFU mechanism which is widely used and already has a well-established toolset. It is just that I would not recommend using the USB DFU protocol over any other transport as a "universal default protocol".

Regards,

Carles


David Brown
 

On Mon, Aug 28, 2017 at 12:45:37PM +0000, Cufi, Carles wrote:

As you might already know, we've been working on the introduction of
DFU (Device Firmware Upgrade) to Zephyr. Several Pull Requests have
been posted dealing with the low-level flash and image access modules
required to store a received image and then boot into it, but that
leaves out one of the key items in the system: the update protocol
that allows an existing running image to obtain an updated one over a
transport mechanism.
My first suggestion. Unless we are stricly implementing the USB DFU
protocol, we really should call this something else. DFU is defined
by USB standards, and is a very specific protocol with a very specific
purpose. If what we're looking is for something general across other
transports, we should call it a different name to avoid confusion.

There are several fundamental requirements for such a protocol if we
want it to be future-proof, extensible and practical for embedded
devices:

- Must be packet-based and transport-agnostic
Although this makes sense for non-usb, it also precludes using
existing tools for the update when we do have USB as our transport.

My suggestion would be to support DFU for USB, and device another
protocol for the other transports.

- Must be extensible and flexible
- The server-side implementation (assuming a request/response model) must be relatively simple and require little resources
- Must be compatible with the mcuboot project and model
- At the very least the following transports must be supported: BLE, UART, IP, USB
- A client-side tool (assuming a request/response model) must either exist already or be easily implementable
So this is solved for USB DFU. We would probably have to create tools
for other transports.

With that in mind we proceeded to analyze a few of the existing
protocols out there (the ones we knew about), in order to consider
whether reusing an existing effort was a better approach than
designing and implementing a new protocol from scratch:

1) USB DFU specification[1]
2) Nordic Secure DFU protocol (included in the Nordic SDK)[2]
3) Newt Manager Protocol (part of Mynewt)[3]
4) Distributed DFU over CoAP used in Nordic's Thread SDK[4]

Note: I will use the word "source" to identify the device that
contains the new image, and "target" to identify the one that
receives it, flashes it and the boots into it.

The USB DFU specification does not seem to be a good fit since it
maps specifically to particular USB endpoints and classes, making it
not suitable for other transports without extensive modification.
Using a standard USB class such as CDC ACM as transport, we could
instead map the chosen protocol over a USB physical link.
This is fairly intentional. As I mention above, I would suggest
implementing DFU regardless of other protocols used.

We also see 2 very different image distribution models. In protocols
1, 2 and 3 the source (client) "pushes" an image to the target
(server) after checking that it's applicable based on version
checking and other verifications. In protocol 4 however, the source
acts instead as a server and the targets act as clients that "pulls"
images from the source (server) whenever they are available. I
believe that the Linaro DFU implementation also follows the "pull"
paradigm of protocol 4.
They also serve different purposes. DFU (the real one on USB) works
similar to a recovery mode. You put the target into DFU mode, and the
USB endpoint is a different kind of device than it usually is. The
other upgrade protocols are intended to upgrade live devices. This is
also a good reason to support USB's DFU in addition to whatever other
protocol we come up.

We believe that the right approach for the sort of ecosystem that
Zephyr targets is the "push" approach, to minimize traffic, reduce
power consumption and also make it possible to use with all
transports. That said, it is important to note that although we are
trying to decide on a default DFU mechanism for Zephyr, all layers
(including the image management) will be independent of it, and it
should therefore be entirely possible to implement an additional
protocol for our users. Furthermore we don't exclude the possibility
of extending the chosen protocol to support a "pull" model as well,
something that should be entirely feasible as long as the protocol of
choice is flexible.
The approach that was demoed at the last Linaro Connect was pull
based, and essentially had the firmware living on an http server. It
had the advantage of being fairly easy to implement with existing code
in Zephyr.

After analyzing the different options available, we believe the Newt
Manager Protocol (NMP) to be the better suited option for our current
needs, for reasons outlined below:

- It is proven to work with mcuboot, the default bootloader for Zephyr
- The current mcuboot repository already contains an implementation of NMP for serial recovery
- It uses a "push" model
- It is very simple but also easily extensible
- Uses a simple packet format combining an 8-byte header followed by CBOR[5]-encoded data
- Supports additional functionality on top of basic DFU: stats, filesystem access, date and time setting, etc.
- Already supports the BLE and serial transports
- A command-line tool exists to send images over both BLE and Serial (both Go and JS/Node versions are available)
- It is open source and licensed under the APLv2
- There are commercial products using it already [6]
I agree that newtmgr protocol seems to be the best fit for us. It's
serial model would even fit in fairly well with the Zephyr shell,
since it wraps the packets with a control-character + base-64 packet +
control character, which the shell seems to have partial support for
already.

David


Carles Cufi
 

Hi David,

Thanks for the feedback.

-----Original Message-----
From: David Brown [mailto:david.brown@...]
Sent: 28 August 2017 16:22
To: Cufi, Carles <Carles.Cufi@...>
Cc: zephyr-devel@...
Subject: Re: [Zephyr-devel] Zephyr DFU protocol

On Mon, Aug 28, 2017 at 12:45:37PM +0000, Cufi, Carles wrote:

As you might already know, we've been working on the introduction of
DFU (Device Firmware Upgrade) to Zephyr. Several Pull Requests have
been posted dealing with the low-level flash and image access modules
required to store a received image and then boot into it, but that
leaves out one of the key items in the system: the update protocol that
allows an existing running image to obtain an updated one over a
transport mechanism.
My first suggestion. Unless we are stricly implementing the USB DFU
protocol, we really should call this something else. DFU is defined by
USB standards, and is a very specific protocol with a very specific
purpose. If what we're looking is for something general across other
transports, we should call it a different name to avoid confusion.
I have no problem changing the name from "DFU protocol" to something else. In fact the one we recommend is called a "Management Protocol" because it does much more than just DFU. If people agree with reusing that moniker then we could go with that. Which reminds me that I've spoken to Mynewt developers and they don't have anything against renaming "Newt Manager Protocol" to something less tied to Mynewt. Perhaps something akin to mcuboot would be in order, like "mcumgmt" or similar?


There are several fundamental requirements for such a protocol if we
want it to be future-proof, extensible and practical for embedded
devices:

- Must be packet-based and transport-agnostic
Although this makes sense for non-usb, it also precludes using existing
tools for the update when we do have USB as our transport.

My suggestion would be to support DFU for USB, and device another
protocol for the other transports.
As I already stated, this is not about choosing the only protocol available for updating images in the target device, so implementing standard USB DFU is definitely something that we want as well. That said, I would also be in favour of having the future management protocol run over CDC ACM, for 2 reasons:

1) Being able to benefit from the additional "management" functionality on top of updating images with a single tool
2) Being able to update devices that only offer a USB connection (with no debugger IC bridging) before we actually implement USB DFU, since our efforts would be initially concentrated on the "management" protocol


- Must be extensible and flexible
- The server-side implementation (assuming a request/response model)
must be relatively simple and require little resources
- Must be compatible with the mcuboot project and model
- At the very least the following transports must be supported: BLE,
UART, IP, USB
- A client-side tool (assuming a request/response model) must either
exist already or be easily implementable
So this is solved for USB DFU. We would probably have to create tools
for other transports.
Well, let me clarify a point here: The "protocol" as I use this word is the sequence of packets that allow you to update images (and perform other operations) on the target device. The "transport" is just a thin layer that is capable of transmitting and receiving those protocol packets over a physical medium. The tool I'm talking about for the management protocol would support multiple transports with a single protocol.


With that in mind we proceeded to analyze a few of the existing
protocols out there (the ones we knew about), in order to consider
whether reusing an existing effort was a better approach than designing
and implementing a new protocol from scratch:

1) USB DFU specification[1]
2) Nordic Secure DFU protocol (included in the Nordic SDK)[2]
3) Newt Manager Protocol (part of Mynewt)[3]
4) Distributed DFU over CoAP used in Nordic's Thread SDK[4]

Note: I will use the word "source" to identify the device that contains
the new image, and "target" to identify the one that receives it,
flashes it and the boots into it.

The USB DFU specification does not seem to be a good fit since it maps
specifically to particular USB endpoints and classes, making it not
suitable for other transports without extensive modification.
Using a standard USB class such as CDC ACM as transport, we could
instead map the chosen protocol over a USB physical link.
This is fairly intentional. As I mention above, I would suggest
implementing DFU regardless of other protocols used.
Agree, see my comment above.


We also see 2 very different image distribution models. In protocols 1,
2 and 3 the source (client) "pushes" an image to the target
(server) after checking that it's applicable based on version checking
and other verifications. In protocol 4 however, the source acts instead
as a server and the targets act as clients that "pulls"
images from the source (server) whenever they are available. I believe
that the Linaro DFU implementation also follows the "pull"
paradigm of protocol 4.
They also serve different purposes. DFU (the real one on USB) works
similar to a recovery mode. You put the target into DFU mode, and the
USB endpoint is a different kind of device than it usually is. The
other upgrade protocols are intended to upgrade live devices. This is
also a good reason to support USB's DFU in addition to whatever other
protocol we come up.

We believe that the right approach for the sort of ecosystem that
Zephyr targets is the "push" approach, to minimize traffic, reduce
power consumption and also make it possible to use with all transports.
That said, it is important to note that although we are trying to
decide on a default DFU mechanism for Zephyr, all layers (including the
image management) will be independent of it, and it should therefore be
entirely possible to implement an additional protocol for our users.
Furthermore we don't exclude the possibility of extending the chosen
protocol to support a "pull" model as well, something that should be
entirely feasible as long as the protocol of choice is flexible.
The approach that was demoed at the last Linaro Connect was pull based,
and essentially had the firmware living on an http server. It had the
advantage of being fairly easy to implement with existing code in
Zephyr.
The problem with using a pull-based protocol is that it is less portable to non TCP/IP transports as I see it, since it requires the target device to initiate the transaction. Are you implying that you'd prefer the Zephyr "management" protocol to be pull-based? We're definitely open to discuss that at length. Also see the section about the Newt Management Protocol over OIC/OCF, which implements "push" over TCP/IP, which could be an alternative for the Linaro usecase.


After analyzing the different options available, we believe the Newt
Manager Protocol (NMP) to be the better suited option for our current
needs, for reasons outlined below:

- It is proven to work with mcuboot, the default bootloader for Zephyr
- The current mcuboot repository already contains an implementation of
NMP for serial recovery
- It uses a "push" model
- It is very simple but also easily extensible
- Uses a simple packet format combining an 8-byte header followed by
CBOR[5]-encoded data
- Supports additional functionality on top of basic DFU: stats,
filesystem access, date and time setting, etc.
- Already supports the BLE and serial transports
- A command-line tool exists to send images over both BLE and Serial
(both Go and JS/Node versions are available)
- It is open source and licensed under the APLv2
- There are commercial products using it already [6]
I agree that newtmgr protocol seems to be the best fit for us. It's
serial model would even fit in fairly well with the Zephyr shell, since
it wraps the packets with a control-character + base-64 packet + control
character, which the shell seems to have partial support for already.
It does, I didn't mention that to avoid extending myself too much, but it's also a very nice feature I find.

Thanks again for the feedback, it seems that we are pretty much in line.

Regards,

Carles


David Brown
 

On Mon, Aug 28, 2017 at 02:43:27PM +0000, Cufi, Carles wrote:

As I already stated, this is not about choosing the only protocol
available for updating images in the target device, so implementing
standard USB DFU is definitely something that we want as well. That
said, I would also be in favour of having the future management
protocol run over CDC ACM, for 2 reasons:
One other protocol I just realized is already out there is lwm2m.
There is starting to be some support for it in Zephyr, it works over
other transports, supports device management, and has support for
firmware update.

The eclipse foundation has a couple of implementations (wakaama in C,
and Leshan in Java).

The approach that was demoed at the last Linaro Connect was pull based,
and essentially had the firmware living on an http server. It had the
advantage of being fairly easy to implement with existing code in
Zephyr.
The problem with using a pull-based protocol is that it is less
portable to non TCP/IP transports as I see it, since it requires the
target device to initiate the transaction. Are you implying that
you'd prefer the Zephyr "management" protocol to be pull-based? We're
definitely open to discuss that at length. Also see the section about
the Newt Management Protocol over OIC/OCF, which implements "push"
over TCP/IP, which could be an alternative for the Linaro usecase.
I agree that the pull approach isn't really all that great, but it was
easy to implement. It does make management of the upgrade server a
little easier, though. I'm not sure how well a push-based protocol
scales to a large number of devices.

Any idea how Android or iOS handle this? I would guess that both are
pull based, since that would otherwise require the vendor to have a
server that keeps track of every device out there.

I agree that newtmgr protocol seems to be the best fit for us. It's
serial model would even fit in fairly well with the Zephyr shell, since
it wraps the packets with a control-character + base-64 packet + control
character, which the shell seems to have partial support for already.
It does, I didn't mention that to avoid extending myself too much,
but it's also a very nice feature I find.
One other thing we should consider is the security of the upgrade
protocol. Mcuboot has signatures to validate images, so that would
prevent rogue upgrades, but if we have a management protocol, that
should probably also be secured via some means.

It is likely that something like lwm2m is going to be implemented for
Zephyr (code is there, and work seems to be happening on it), so we
should decide if we want to push the newt management protocol as well.

David


Carles Cufi
 

Hi David,

-----Original Message-----
From: David Brown [mailto:david.brown@...]
Sent: 28 August 2017 17:59
To: Cufi, Carles <Carles.Cufi@...>
Cc: zephyr-devel@...
Subject: Re: [Zephyr-devel] Zephyr DFU protocol

On Mon, Aug 28, 2017 at 02:43:27PM +0000, Cufi, Carles wrote:

As I already stated, this is not about choosing the only protocol
available for updating images in the target device, so implementing
standard USB DFU is definitely something that we want as well. That
said, I would also be in favour of having the future management
protocol run over CDC ACM, for 2 reasons:
One other protocol I just realized is already out there is lwm2m.
There is starting to be some support for it in Zephyr, it works over
other transports, supports device management, and has support for
firmware update.

The eclipse foundation has a couple of implementations (wakaama in C,
and Leshan in Java).
I just read through the highlights of the spec and indeed this matches relatively closely the concept we are trying to push here with a "management" protocol. After looking through it a bit here are the problems I see:

a) Complexity: By reading through the specification[i] this looks like a pretty complex protocol to me, which in many cases might be a drawback to users wanting to reduce ROM and RAM size. This is particularly important for very constrained devices that only need to send some sensor data over BLE for example
b) Suitability for other transports: The specification clearly states 2 main transports: UDP and SMS. While adapting this to other transports would likely be feasible, the protocol doesn't look designed for it
c) Model: the protocol seems to rest on the basis of a "pull" model, where clients are the target devices. For the reasons stated before, this might not be suitable to simple UART, BLE or USB CDC ACM usecases.

That said, the protocol does match the Newt Manager Protocol quite closely when it comes to supported functionality and purpose. My vote here would be to have support for both, because I do not think running LWM2M over UART or BLE is a good match for tiny constrained applications that only require simple firmware updates.


The approach that was demoed at the last Linaro Connect was pull
based, and essentially had the firmware living on an http server. It
had the advantage of being fairly easy to implement with existing
code in Zephyr.
The problem with using a pull-based protocol is that it is less
portable to non TCP/IP transports as I see it, since it requires the
target device to initiate the transaction. Are you implying that you'd
prefer the Zephyr "management" protocol to be pull-based? We're
definitely open to discuss that at length. Also see the section about
the Newt Management Protocol over OIC/OCF, which implements "push"
over TCP/IP, which could be an alternative for the Linaro usecase.
I agree that the pull approach isn't really all that great, but it was
easy to implement. It does make management of the upgrade server a
little easier, though. I'm not sure how well a push-based protocol
scales to a large number of devices.
I honestly have no idea whether the push model would scale for large deployments, but this brings an interesting question, which is what sort of device are we targeting here:

1) The simple device which is never connected to the internet directly and does not even have a TCP/IP stack, but rather only a GATT-based BLE connection to a mobile phone, table or computer. Among those there are mice, wearables, sensors and monitors, etc.
2) The slightly more complex device with (almost) always-on TCP/IP connection to the outside world, perhaps over 15.4, Thread, BLE over IPSP or any other technology

I think the Newt Manager Protocol was designed for devices closer to the 1) model. LWM2M and similar protocols target rather b). Those are quite different in nature because a) requires a device to specifically connect to it and send an image with that purpose (say for example a sports band such as Fitbit or similar), whereas devices of the b) kind can keep polling regularly to determine whether a firmware update is available. I do not think we can realistically cover both with a single protocol unless we "force" one of the 2 models to work in the other circumstances.


Any idea how Android or iOS handle this? I would guess that both are
pull based, since that would otherwise require the vendor to have a
server that keeps track of every device out there.
Not sure about Android, but I am pretty sure that iOS devices keep a TCP connection permanently connected to an Apple server, through which all "push" notifications are sent, be it software updates or messaging notifications.


I agree that newtmgr protocol seems to be the best fit for us. It's
serial model would even fit in fairly well with the Zephyr shell,
since it wraps the packets with a control-character + base-64 packet
+ control character, which the shell seems to have partial support
for already.

It does, I didn't mention that to avoid extending myself too much, but
it's also a very nice feature I find.
One other thing we should consider is the security of the upgrade
protocol. Mcuboot has signatures to validate images, so that would
prevent rogue upgrades, but if we have a management protocol, that
should probably also be secured via some means.

It is likely that something like lwm2m is going to be implemented for
Zephyr (code is there, and work seems to be happening on it), so we
should decide if we want to push the newt management protocol as well.
Agreed, and it's not an easy call. The two options I see after your remarks and looking a little bit more into LWM2M are:

- Use LWM2M for everything, including DTLS for security and adapt it somehow to the simple "push" model
- Use LWM2M for the "pull" model, and then Newt Manager Protocol for the simple "push" one, with security in the latter being provided by the transport itself (SMP in BLE, and the simple fact that you need to manipulate the device physically for UART).

While having one single protocol would definitely be a boon, I am not sure LWM2M will fit the bill in terms of RAM and ROM requirements, and we still need something for the UART recovery mode in the bootloader, which will probably end up being the Newt Manager Protocol since I don't think we can fit LWM2M into a bootloader.

Additional thoughts welcome.

[i] http://www.openmobilealliance.org/release/LightweightM2M/V1_0-20170208-A/OMA-TS-LightweightM2M-V1_0-20170208-A.pdf


Richard Peters <mail@...>
 

Hi,

i am looking for a solution like that and just want to contribute my
requirements.

i would like to use zephyr with an external bootloader like the Nordic
DFU, where i can update firmware via bluetooth.
Unfortunately I doubt this is easy to achieve, due to the way the Nordic DFU bootloader expects the SoftDevice to be present in flash, something that is not the case when using Zephyr instead. The Nordic DFU procedure is also closely tied to the image format of the Nordic SDK (and SoftDevice).
However, Zephyr is indeed compatible with a bootloader, mcuboot, and we are currently discussing adding support for a DFU (over BLE and other transports) to Zephyr in another thread in this mailing list. You are welcome to contribute to that thread with your requirements and comments, and as soon as we've chosen a protocol we'll start working towards implementing the DFU procedure.
My devices are in a BLE mesh network with no direct internet
connectivity to the outer world.
The user can connect with a smartphone ot tablet to one of the devices
in the mesh over BLE.
There is an App, which downloads the latest firmware for the devices to
the smartphone.

A firmware update will be transfered via BLE to the connected device and
then spread to all devices in the mesh that need this update.

I think there are two possible ways to achieve this:

1.) The update gets transferred to the target devices via bluetooth.
This happend in the zephyr application and gets stored in a filesystem
(on internal or external flash memory). The bootloader performs the
update from the filesystem after a reboot.

2.) The bootloader starts and receives the firmware (on the fly) from
the next device in the mesh network (which is running zephyr, too).

The whole process should be optimized for the memory usage.

Regards,
Richard


David Brown
 

On Tue, Aug 29, 2017 at 09:14:31AM +0000, Cufi, Carles wrote:

One other protocol I just realized is already out there is lwm2m.
There is starting to be some support for it in Zephyr, it works over
other transports, supports device management, and has support for
firmware update.
I just read through the highlights of the spec and indeed this
matches relatively closely the concept we are trying to push here
with a "management" protocol. After looking through it a bit here are
the problems I see:

a) Complexity: By reading through the specification[i] this looks
like a pretty complex protocol to me, which in many cases might be a
drawback to users wanting to reduce ROM and RAM size. This is
particularly important for very constrained devices that only need to
send some sensor data over BLE for example
I had a conversation with Sterling Hughes yesterday, and he explained
that this was pretty much the primary reason for developing the news
manager protocol instead of just using lwm2m.

b) Suitability for other transports: The specification clearly states
2 main transports: UDP and SMS. While adapting this to other
transports would likely be feasible, the protocol doesn't look
designed for it
c) Model: the protocol seems to rest on the basis of a "pull" model,
where clients are the target devices. For the reasons stated before,
this might not be suitable to simple UART, BLE or USB CDC ACM
usecases.
This is the other main reason for its infeasibility.

That said, the protocol does match the Newt Manager Protocol quite
closely when it comes to supported functionality and purpose. My vote
here would be to have support for both, because I do not think
running LWM2M over UART or BLE is a good match for tiny constrained
applications that only require simple firmware updates.
Agreed. I think that lwm2m is going to end up needing to be
implemented because there will be environments that will require that
specific protocol. But, we will want something like newtmgr for other
cases, and situations where less code is desired.

It is also possible for newtmgr to be layered differently, depending
on the situation. For serial, it can either be used directly, or in a
console friendly manner (with escape characters and base-64 encoding).
It is possible to leave minicom or picocom running, and have newtmgr
connect to the serial port to exchange packets.

On BLE, it can be transported directly over GATT.

And for network interfaces, layering it over COAP or COAPS makes
sense.

David


David Brown
 

On Tue, Aug 29, 2017 at 02:58:55PM +0200, Richard Peters wrote:

My devices are in a BLE mesh network with no direct internet
connectivity to the outer world.
The user can connect with a smartphone ot tablet to one of the devices
in the mesh over BLE.
There is an App, which downloads the latest firmware for the devices to
the smartphone.

A firmware update will be transfered via BLE to the connected device and
then spread to all devices in the mesh that need this update.

I think there are two possible ways to achieve this:

1.) The update gets transferred to the target devices via bluetooth.
This happend in the zephyr application and gets stored in a filesystem
(on internal or external flash memory). The bootloader performs the
update from the filesystem after a reboot.
How this works in Mynewt (and will with Zephyr, if we use newtmgr) is
that mcuboot has two partitions: slot0 is where the primary code
lives, and slot1 is where the upgrade is written. It can be written
a bit at a time until complete. Then, a reboot into mcuboot will
cause the bootloader to detect the upgrade, and initiate swapping the
two images.

2.) The bootloader starts and receives the firmware (on the fly) from
the next device in the mesh network (which is running zephyr, too).
I'm not sure if this has been implemented, but it certainly could be.
Again, it would take the firmware from slot0 on the source device, and
place it into slot1 on the upgrading device.

The whole process should be optimized for the memory usage.
There will need to be sufficient flash set aside for the second image.
In most configuration, this is a dedicated partition, which avoids
needing to have filesystem management code in the bootloader.

The images themselves are signed, and upgrades with invalid signatures
will just be ignored.

David


David Brown
 

On Tue, Aug 29, 2017 at 09:14:31AM +0000, Cufi, Carles wrote:

While having one single protocol would definitely be a boon, I am not
sure LWM2M will fit the bill in terms of RAM and ROM requirements,
and we still need something for the UART recovery mode in the
bootloader, which will probably end up being the Newt Manager
Protocol since I don't think we can fit LWM2M into a bootloader.
Given that there are other parties that have an interest in lwm2m, I
think we should put our focus into supporting newtmgr for upgrades.
The other protocols (lwm2m, and USB DFU) will probably be implemented
as there is need for them.

What would be nice would be to take the target-side newtmgr code, and
make it into its own project. We would need to refactor and abstract
the operating system interfaces so that we can use the same codebase
for multiple platforms. This would be similar to how mcuboot is now
its own project that works on Zephyr and Mynewt (and soon RIOT).

David


Richard Peters <mail@...>
 

Hi David,

There will need to be sufficient flash set aside for the second image.
In most configuration, this is a dedicated partition, which avoids
needing to have filesystem management code in the bootloader.
May be a zip compression helpful to reduce the image size in flash?

Richard


David Brown
 

On Tue, Aug 29, 2017 at 04:03:13PM +0200, Richard Peters wrote:

There will need to be sufficient flash set aside for the second image.
In most configuration, this is a dedicated partition, which avoids
needing to have filesystem management code in the bootloader.
May be a zip compression helpful to reduce the image size in flash?
Something like this is certainly doable (choosing a compression
algorithm that doesn't have large memory requirements, though, is
important).

However, one of the considerations of the bootloader is that it has to
be immutable (it can never be upgraded), since it is the beginning of
the root of trust. We'd like to keep as much complexity out of it as
possible. I've even pushed to get rid of the "swap" code it currently
has, and instead move that complexity up a layer or to, and deploy one
of two images at both addresses, and just run the images in place in
the slot containing the desired image.

David


Pushpal Sidhu
 

On Tue, Aug 29, 2017 at 8:24 AM, David Brown <david.brown@...> wrote:

On Tue, Aug 29, 2017 at 04:03:13PM +0200, Richard Peters wrote:

There will need to be sufficient flash set aside for the second image.
In most configuration, this is a dedicated partition, which avoids
needing to have filesystem management code in the bootloader.

May be a zip compression helpful to reduce the image size in flash?

Something like this is certainly doable (choosing a compression
algorithm that doesn't have large memory requirements, though, is
important).

However, one of the considerations of the bootloader is that it has to
be immutable (it can never be upgraded), since it is the beginning of
the root of trust. We'd like to keep as much complexity out of it as
possible. I've even pushed to get rid of the "swap" code it currently
has, and instead move that complexity up a layer or to, and deploy one
of two images at both addresses, and just run the images in place in
the slot containing the desired image.
Sounds like you want an SPL (which I'm for).


David

_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@...
https://lists.zephyrproject.org/mailman/listinfo/zephyr-devel


David Brown
 

On Tue, Aug 29, 2017 at 12:03:04PM -0700, Pushpal Sidhu wrote:

However, one of the considerations of the bootloader is that it has to
be immutable (it can never be upgraded), since it is the beginning of
the root of trust. We'd like to keep as much complexity out of it as
possible. I've even pushed to get rid of the "swap" code it currently
has, and instead move that complexity up a layer or to, and deploy one
of two images at both addresses, and just run the images in place in
the slot containing the desired image.
Sounds like you want an SPL (which I'm for).
We've discussed this, because it does seem like it could be useful.
But, the conclusion we mostly come to is that nearly everything that
would be in the secondary loader has to be in the primary, and the
secondary doesn't end up doing much.

It is also hard to work with such memory constrained devices. It is
difficult to get mcuboot down to 16KB (depends on the signature
algorithms), and with needing two code partitions to safely upgrade,
it limits a lot of what we can do with this.

Maybe doing a two stage boot would be useful for environments that
have larger codespace.

David


Pushpal Sidhu
 

On Tue, Aug 29, 2017 at 12:24 PM, David Brown <david.brown@...> wrote:
On Tue, Aug 29, 2017 at 12:03:04PM -0700, Pushpal Sidhu wrote:

Sounds like you want an SPL (which I'm for).
We've discussed this, because it does seem like it could be useful.
But, the conclusion we mostly come to is that nearly everything that
would be in the secondary loader has to be in the primary, and the
secondary doesn't end up doing much.
I'm having trouble finding that discussion. Could you point me to it?
I'm curious as to why the primary was thought to re-perform the same
functions as the SPL in this case. We don't have to follow the
traditional u-boot model I would think.

To be honest, I'm unfamiliar with mcuboot for now so I'm not totally
sure what it's doing. I'm also unaware of how zephyr is loaded by the
bootloader (I'm new to this project) so I may be speaking nonsense.

It is also hard to work with such memory constrained devices. It is
difficult to get mcuboot down to 16KB (depends on the signature
algorithms), and with needing two code partitions to safely upgrade,
it limits a lot of what we can do with this.

Maybe doing a two stage boot would be useful for environments that
have larger codespace.

David


Carles Cufi
 

Hi David,

-----Original Message-----
From: David Brown [mailto:david.brown@...]
Sent: 29 August 2017 15:49
To: Cufi, Carles <Carles.Cufi@...>
Cc: zephyr-devel@...
Subject: Re: [Zephyr-devel] Zephyr DFU protocol

On Tue, Aug 29, 2017 at 09:14:31AM +0000, Cufi, Carles wrote:

One other protocol I just realized is already out there is lwm2m.
There is starting to be some support for it in Zephyr, it works over
other transports, supports device management, and has support for
firmware update.
I just read through the highlights of the spec and indeed this matches
relatively closely the concept we are trying to push here with a
"management" protocol. After looking through it a bit here are the
problems I see:

a) Complexity: By reading through the specification[i] this looks like
a pretty complex protocol to me, which in many cases might be a
drawback to users wanting to reduce ROM and RAM size. This is
particularly important for very constrained devices that only need to
send some sensor data over BLE for example
I had a conversation with Sterling Hughes yesterday, and he explained
that this was pretty much the primary reason for developing the news
manager protocol instead of just using lwm2m.
Good to hear we found the same issues then. The thing is that Device Firmware Upgrade can target a wide variety of different uses in wildly unrelated hardware. In the context of small sensor-type devices with a BLE stack and little more, having LWM2M seems to be overkill, so there does not seem to be a universal firmware update protocol that would fit the whole spectrum of MCUs and applications that can run Zephyr.


b) Suitability for other transports: The specification clearly states
2 main transports: UDP and SMS. While adapting this to other transports
would likely be feasible, the protocol doesn't look designed for it
c) Model: the protocol seems to rest on the basis of a "pull" model,
where clients are the target devices. For the reasons stated before,
this might not be suitable to simple UART, BLE or USB CDC ACM usecases.
This is the other main reason for its infeasibility.
Yep, I can't quite see a UART "client" polling for a server on the other side. I'm sure it can be done, I'm less sure that it makes sense.

But all in all we are completely aligned it seems with Runtime with regard to the necessity for a protocol like NMP.


That said, the protocol does match the Newt Manager Protocol quite
closely when it comes to supported functionality and purpose. My vote
here would be to have support for both, because I do not think running
LWM2M over UART or BLE is a good match for tiny constrained
applications that only require simple firmware updates.
Agreed. I think that lwm2m is going to end up needing to be implemented
because there will be environments that will require that specific
protocol. But, we will want something like newtmgr for other cases, and
situations where less code is desired.
Yes, and probably not only LWM2M. For example our Nordic Thread SDK uses multicast to distribute images to all devices in the Thread network. That might also be a valid mechanism (i.e. "multi-push") for certain transports. It is actually similar to what NMP is doing with OIC/OCF.

It is also possible for newtmgr to be layered differently, depending on
the situation. For serial, it can either be used directly, or in a
console friendly manner (with escape characters and base-64 encoding).
It is possible to leave minicom or picocom running, and have newtmgr
connect to the serial port to exchange packets.

On BLE, it can be transported directly over GATT.

And for network interfaces, layering it over COAP or COAPS makes sense.
Yep, and there are transports already for those except the CoAP one in the current Mynewt codebase (unless you count the OIC variant). That is one of the strengths of NMP: it makes almost no assumptions about the underlying transport.

I have also spoken to Sterling btw, and it looks like we are all of the same opinion.

Regards,

Carles


Carles Cufi
 

Hi David,

-----Original Message-----
From: David Brown [mailto:david.brown@...]
Sent: 29 August 2017 15:57
To: Cufi, Carles <Carles.Cufi@...>
Cc: zephyr-devel@...
Subject: Re: [Zephyr-devel] Zephyr DFU protocol

On Tue, Aug 29, 2017 at 09:14:31AM +0000, Cufi, Carles wrote:

While having one single protocol would definitely be a boon, I am not
sure LWM2M will fit the bill in terms of RAM and ROM requirements, and
we still need something for the UART recovery mode in the bootloader,
which will probably end up being the Newt Manager Protocol since I
don't think we can fit LWM2M into a bootloader.
Given that there are other parties that have an interest in lwm2m, I
think we should put our focus into supporting newtmgr for upgrades.
The other protocols (lwm2m, and USB DFU) will probably be implemented as
there is need for them.
I completely agree with you. I would like to focus our (Nordic's) efforts into the simple protocol first, that works over UART and BLE, while the work towards LWM2M and other advanced protocols proceeds in parallel. The image management code will of course be protocol-independent, so once those PRs that Andrzej has sent are merged all other teams will be able to benefit from them.


What would be nice would be to take the target-side newtmgr code, and
make it into its own project. We would need to refactor and abstract
the operating system interfaces so that we can use the same codebase for
multiple platforms. This would be similar to how mcuboot is now its own
project that works on Zephyr and Mynewt (and soon RIOT).
That would be my preference as well, but it might not be as trivial as it sounds. I need to discuss this further with the Mynewt developers, because some of the abstractions (namely mbuf) might not be easy to port. Once we choose a protocol, and if this ends up being NMP, I would like to start those discussions ASAP with the contributions of the Mynewt community.

Regards,

Carles


David Brown
 

On Tue, Aug 29, 2017 at 04:24:14PM -0700, Pushpal Sidhu wrote:
On Tue, Aug 29, 2017 at 12:24 PM, David Brown <david.brown@...> wrote:
On Tue, Aug 29, 2017 at 12:03:04PM -0700, Pushpal Sidhu wrote:

Sounds like you want an SPL (which I'm for).
We've discussed this, because it does seem like it could be useful.
But, the conclusion we mostly come to is that nearly everything that
would be in the secondary loader has to be in the primary, and the
secondary doesn't end up doing much.
I'm having trouble finding that discussion. Could you point me to it?
I'm curious as to why the primary was thought to re-perform the same
functions as the SPL in this case. We don't have to follow the
traditional u-boot model I would think.
I'm not sure it was written down, or just verbal or IRC chat. I
didn't find any logs about it.

To be honest, I'm unfamiliar with mcuboot for now so I'm not totally
sure what it's doing. I'm also unaware of how zephyr is loaded by the
bootloader (I'm new to this project) so I may be speaking nonsense.
Currently, the bootloader doesn't really "load" anything, since it is
only being used on execute-in-place targets. It's main job is to:

- Verify the signature of the image before booting it.

- Detect a properly signed image in an upgrade partition, and safely
exchanging the two images.

David


David Brown
 

On Wed, Aug 30, 2017 at 09:38:44AM +0000, Cufi, Carles wrote:

That would be my preference as well, but it might not be as trivial
as it sounds. I need to discuss this further with the Mynewt
developers, because some of the abstractions (namely mbuf) might not
be easy to port. Once we choose a protocol, and if this ends up being
NMP, I would like to start those discussions ASAP with the
contributions of the Mynewt community.
I think we should go ahead and start the conversation with them, on
mailing lists.

Unfortunately, the Mynewt mailing lists add a Reply-to header, which
causes that list to "steal" replies that are cross posted. So, you
if you send to both the Zephyr and the Mynewt list, the replies will
randomly discard the Zephyr list, which tends to fork threads
(randomly because it depends on which mailing list server replies
quicker, and which message a given recipient's mail system decides to
use, gmail tends to use the first one, for example).

Feel free to help me apply pressure to get their list configuration
fixed.

David