USB: Question about usb_write() API


Sundar Subramaniyan
 

Hi Loic,

On Wed, Jan 31, 2018 at 4:13 PM, Loic Poulain <loic.poulain@...> wrote:

Today there is no transfer concept at usb device driver level, you
only read/write the TX/RX hardware FIFO (usb_dw), this is up to each
class driver to manage packet splitting and transfer start/end,
including short/zero packet management.
This also means that class drivers need to know the endpoint FIFO size
to delineate packets and transfers. This is significant extra work for
each class driver and not really agnostic.

I agree with you on this one. In the long term, packet fragmentation and blocking (if required)
should be taken care at a higher level.

This is why I would like to introduce transfer concept, letting the
device drivers deal with transfer management.
The interface could be something like: usb_transfer(endpoint, buffer,
buffer_len, callback, flags)
The implementation is hardware specific but usb controller usually
have registers to program transfer size, packet count and IRQ to
indicate transfer completion, FIFO empty...
This is a simplified version of Linux USB Request Block (URB).

This is a good thing and much needed one.
I'll need to look at your implementation to understand how it's done on the driver side.
But why do we need a new driver API to do this?

Thanks,
Sundar
 


On 31 January 2018 at 09:28, andrei.emeltchenko@...
<andrei.emeltchenko.news@gmail.com> wrote:
> Hi,
>
> On Wed, Jan 31, 2018 at 08:09:12AM +0530, Sundar Subramaniyan wrote:
>>    Hi Paul,
>>    On Tue, Jan 30, 2018 at 9:55 PM, Paul Sokolovsky
>>    <[1]paul.sokolovsky@...> wrote:
>>
>>
>>        A good clarification thus would be:
>>
>>        1. bytes_ret is always not-NULL.
>>        2. There can always be short writes, number of bytes written is
>>        returned in *bytes_ret, caller must retry with remaining data as
>>        needed.
>>
>>        Note that this doesn't call for updates to existing drivers - they can
>>        keep accepting bytes_ret as NULL, and loop/block inside. But clients
>>        of this API definitely need to be updated, because otherwise they
>>        simply won't work with some drivers.
>>
>>    I think the same too. For now, new drivers should handle both the methods
>>    to be coherent with the API definition. If ret_bytes is NULL, then it
>>    shall to
>>    a blocked write taking care of fragmentation of the packet internally
>>    within
>>    the driver. Otherwise if it is != NULL, it can let the upper layer do the
>>    fragmentation
>>    and shall *not* block-wait for the completion of the current write on the
>>    USB bus,
>>    since we have write callback for notification purpose.
>
> I think we do not have now this requirement not to block-wait. When
> working with sending fragmented packets you would prefer to block until all
> fragments are sent.
>
> Best regards
> Andrei Emeltchenko
> _______________________________________________
> Zephyr-devel mailing list
> Zephyr-devel@lists.zephyrproject.org
> https://lists.zephyrproject.org/mailman/listinfo/zephyr-devel


Sundar Subramaniyan
 

Hi Andrei,

On Wed, Jan 31, 2018 at 1:58 PM, andrei.emeltchenko@... <andrei.emeltchenko.news@...> wrote:
I think we do not have now this requirement not to block-wait. When
working with sending fragmented packets you would prefer to block until all
fragments are sent.
 
If block wait is mandated, we can call usb_write() only on one endpoint at a time which will hamper performance.
Although USB bus transactions are sequential we can however queue the write buffers to a fifo and process them
later from a work queue independently after previous completion which might help design the driver in a better way.

w.r.t callbacks, well there are two cases:

Application has:
1. set a valid callback function
2. set callback as NULL

If the callback is NULL then does that implicitly mean we have to block till write is complete?
I would prefer the driver implementation be simple rather complicating with all the synchronization and fragmentation stuff.

Thanks,
Sundar


Loic Poulain
 

Hi,

As defined by the documentation, write is asynchronous, you can only
be sure write is complete when usb_ep_callback is called.
In practice, there are some hacks in current drivers to make write
block if previous one is not completed (active polling on FIFO for the
dw driver, semaphore for stm32 one).

Today there is no transfer concept at usb device driver level, you
only read/write the TX/RX hardware FIFO (usb_dw), this is up to each
class driver to manage packet splitting and transfer start/end,
including short/zero packet management.
This also means that class drivers need to know the endpoint FIFO size
to delineate packets and transfers. This is significant extra work for
each class driver and not really agnostic.

This is why I would like to introduce transfer concept, letting the
device drivers deal with transfer management.
The interface could be something like: usb_transfer(endpoint, buffer,
buffer_len, callback, flags)
The implementation is hardware specific but usb controller usually
have registers to program transfer size, packet count and IRQ to
indicate transfer completion, FIFO empty...
This is a simplified version of Linux USB Request Block (URB).

Regards,
Loic


On 31 January 2018 at 09:28, andrei.emeltchenko@...
<andrei.emeltchenko.news@...> wrote:
Hi,

On Wed, Jan 31, 2018 at 08:09:12AM +0530, Sundar Subramaniyan wrote:
Hi Paul,
On Tue, Jan 30, 2018 at 9:55 PM, Paul Sokolovsky
<[1]paul.sokolovsky@...> wrote:


A good clarification thus would be:

1. bytes_ret is always not-NULL.
2. There can always be short writes, number of bytes written is
returned in *bytes_ret, caller must retry with remaining data as
needed.

Note that this doesn't call for updates to existing drivers - they can
keep accepting bytes_ret as NULL, and loop/block inside. But clients
of this API definitely need to be updated, because otherwise they
simply won't work with some drivers.

I think the same too. For now, new drivers should handle both the methods
to be coherent with the API definition. If ret_bytes is NULL, then it
shall to
a blocked write taking care of fragmentation of the packet internally
within
the driver. Otherwise if it is != NULL, it can let the upper layer do the
fragmentation
and shall *not* block-wait for the completion of the current write on the
USB bus,
since we have write callback for notification purpose.
I think we do not have now this requirement not to block-wait. When
working with sending fragmented packets you would prefer to block until all
fragments are sent.

Best regards
Andrei Emeltchenko
_______________________________________________
Zephyr-devel mailing list
Zephyr-devel@...
https://lists.zephyrproject.org/mailman/listinfo/zephyr-devel


Andrei
 

Hi,

On Wed, Jan 31, 2018 at 08:09:12AM +0530, Sundar Subramaniyan wrote:
Hi Paul,
On Tue, Jan 30, 2018 at 9:55 PM, Paul Sokolovsky
<[1]paul.sokolovsky@...> wrote:
 

A good clarification thus would be:

1. bytes_ret is always not-NULL.
2. There can always be short writes, number of bytes written is
returned in *bytes_ret, caller must retry with remaining data as
needed.

Note that this doesn't call for updates to existing drivers - they can
keep accepting bytes_ret as NULL, and loop/block inside. But clients
of this API definitely need to be updated, because otherwise they
simply won't work with some drivers.

I think the same too. For now, new drivers should handle both the methods
to be coherent with the API definition. If ret_bytes is NULL, then it
shall to
a blocked write taking care of fragmentation of the packet internally
within
the driver. Otherwise if it is != NULL, it can let the upper layer do the
fragmentation
and shall *not* block-wait for the completion of the current write on the
USB bus,
since we have write callback for notification purpose.
I think we do not have now this requirement not to block-wait. When
working with sending fragmented packets you would prefer to block until all
fragments are sent.

Best regards
Andrei Emeltchenko


Sundar Subramaniyan
 

Hi Johann,

On Tue, Jan 30, 2018 at 9:12 PM, Johann Fischer <johann_fischer@...> wrote:
Please use subsys/usb/class for testing and as reference.

The IN endpoint callback just informs that a endpoint buffer was transmitted to
the host.

I used the DW driver as a reference,
see https://github.com/zephyrproject-rtos/zephyr/pull/542
The Kinetis USB controller is a bit similar to NRF52, btw I would be glad
about review.

Thanks, will take a look at it.
 
There is an effort to implement the transfer method. This is very convenient
from the function point of view, but may complicate the drivers.
Please review https://github.com/zephyrproject-rtos/zephyr/pull/5231

Thanks, will look into this. I need to understand the transfer method. So after it's
been pulled, does the driver need to support it by default? Since there's no driver
registration using f_ops to the USB subsystem, there would be linking issues, no?
 
Regards,
Johann Fischer


Thanks,
Sundar


Sundar Subramaniyan
 

Hi Andrei,

On Tue, Jan 30, 2018 at 9:01 PM, andrei.emeltchenko@... <andrei.emeltchenko.news@...> wrote:
I think API states it supports both methods:
https://github.com/zephyrproject-rtos/zephyr/blob/master/include/usb/usb_device.h#L209

Yes, it does support both of them. So new drivers should support both of them to be inline with the API definition?
As Paul pointed out, some old drivers might need to be modified since some applications assume the drivers' fragmentation support.
 
Fragmentation to USB packets is done on a higher level, in function_ecm
and finction_rndis files.


Thanks for pointing this. I'll look how this is being used.
 
Please also look to usb_dc_ep_transfer() implemented for DW and STM32:

https://github.com/zephyrproject-rtos/zephyr/pull/5231

https://github.com/zephyrproject-rtos/zephyr/pull/5214

Will look into it. Thanks.
 
Best regards
Andrei Emeltchenko

Thanks,
Sundar


Sundar Subramaniyan
 

Hi Paul,

On Tue, Jan 30, 2018 at 9:55 PM, Paul Sokolovsky <paul.sokolovsky@...> wrote:
 
A good clarification thus would be:

1. bytes_ret is always not-NULL.
2. There can always be short writes, number of bytes written is
returned in *bytes_ret, caller must retry with remaining data as needed.

Note that this doesn't call for updates to existing drivers - they can
keep accepting bytes_ret as NULL, and loop/block inside. But clients
of this API definitely need to be updated, because otherwise they
simply won't work with some drivers.

I think the same too. For now, new drivers should handle both the methods
to be coherent with the API definition. If ret_bytes is NULL, then it shall to
a blocked write taking care of fragmentation of the packet internally within
the driver. Otherwise if it is != NULL, it can let the upper layer do the fragmentation
and shall *not* block-wait for the completion of the current write on the USB bus,
since we have write callback for notification purpose.

Is this a fair understanding?

Thanks,
Sundar


Paul Sokolovsky
 

On Tue, 30 Jan 2018 14:36:40 +0000
"Cufi, Carles" <Carles.Cufi@...> wrote:

[]

I'd suggest to look at this as a generic problem, not specific to
USB subsystem (e.g., there's the same issue in networking
subsystem, etc.), and consider the known best practices dealing
with it. For example, POSIX, when dealing with streams-of-bytes,
[]

Although you raise a good point, I'm not convinced that this applies
here. I think that what Sundar wants to know is whether this
particular API call, usb_write(), was originally designed to be able
to buffer data and fragment it into smaller USB-sized packets or
instead it should mirror the state of whatever hardware FIFO it is
using below, returning a number of bytes written different from the
ones the caller asked to send. Beyond that there's the issue of the
API call being blocking or not, which seems unclear given the
different ways it is used in Zephyr. To sum up, I thin he's trying to
clarify the original intentions of the API designer in order to adapt
the driver he's working on to it, not looking to modify the API
itself.
But clarification of the API is also what I'm talking about. Other
points are also collinear, for example, POSIX supports concept of short
reads/writes exactly to let drivers avoid blocking (or avoid extensive
resource usage in general).

usb_write() is described as:

* @brief write data to the specified endpoint
*
* Function to write data to the specified endpoint. The supplied
* usb_ep_callback will be called when transmission is done.
*
* @param[in] ep Endpoint address corresponding to the one listed in the
* device configuration table
* @param[in] data Pointer to data to write
* @param[in] data_len Length of data requested to write. This may be zero for
* a zero length status packet.
* @param[out] bytes_ret Bytes written to the EP FIFO. This value may be NULL if
* the application expects all bytes to be written

That's not explicit enough. The best reading I can get of it is that it
supports *both* of the scenarios above: a) if bytes_ret is non-NULL, it
may perform short write; b) if it's non-NULL, short write is not
allowed. But in all fairness, there's a bit of "reading between the
lines" in that. And such specification puts additional burden on a
driver: it must support both not-looping/not-blocking and
looping-blocking operation. Besides problems with testing these both
modes, we'd find that some drivers simply omit implementation of one or
another case (I see that in Zephyr all the time).

A good clarification thus would be:

1. bytes_ret is always not-NULL.
2. There can always be short writes, number of bytes written is
returned in *bytes_ret, caller must retry with remaining data as needed.

Note that this doesn't call for updates to existing drivers - they can
keep accepting bytes_ret as NULL, and loop/block inside. But clients
of this API definitely need to be updated, because otherwise they
simply won't work with some drivers.


Thanks,

Carles


--
Best Regards,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog


Johann Fischer
 

Hi,

Here's what bt_usb does:
https://github.com/zephyrproject-rtos/zephyr/blob/master/samples/bluetooth/hci_usb/src/main.c#L665

When I looked at other drivers to see how they are implemented, the
designware USB driver seems to assume
the application/class drivers always support the "write in parts" method.
I'm not blaming the driver or it's
implementation but I just wanted to point out that there are drivers which
do not support handling chunked
write within them.
This sample was written for the DW driver, there was no other device driver
recently.

Another class driver, netusb does it a little differently by writing in
parts, but this too assumes the usb_write()
to be synchronous.

Here's what it does:
https://github.com/zephyrproject-rtos/zephyr/blob/master/subsys/usb/class/netusb/netusb.c#L189
Please use subsys/usb/class for testing and as reference.

One more thing to consider is the write callbacks. In most cases, the
application/class driver is the one that registers
the Bulk/Interrupt endpoint callbacks and the device stack has no way of
knowing when to perform the next write, even
if it wants to handle this in the USB device stack.
The IN endpoint callback just informs that a endpoint buffer was transmitted to
the host.


My question now is, what is the ideal place to take care of truncation and
chunk handling as per the USB device stack design?
Why the driver needs to handle the multi-part write and synchronization?
I used the DW driver as a reference,
see https://github.com/zephyrproject-rtos/zephyr/pull/542
The Kinetis USB controller is a bit similar to NRF52, btw I would be glad
about review.

There is an effort to implement the transfer method. This is very convenient
from the function point of view, but may complicate the drivers.
Please review https://github.com/zephyrproject-rtos/zephyr/pull/5231

Regards,
Johann Fischer


Andrei
 

Hi,

On Tue, Jan 30, 2018 at 02:31:46PM +0000, Cufi, Carles wrote:
+ Andrei

 

Carles

 

From: zephyr-devel-bounces@...
[mailto:zephyr-devel-bounces@...] On Behalf Of Sundar
Subramaniyan
Sent: 30 January 2018 13:18
To: zephyr-devel@...
Subject: [Zephyr-devel] USB: Question about usb_write() API

 

Hi,

While developing a USB device controller driver for the Nordic nRF52840
SoC,
I ran into the problem of handling chunked/partial write() while coding
the usb_dc_ep_write() function.

From what I could gather, the usb_write() API documentation states that
the application calling
this function may choose to write all the data buffer at once even if it
is larger than the maximum
packet size supported by the USB endpoint to which we're writing to.
Let's call this method "write all at once".

If the application or the class driver can manage to write in parts, then
it must use the "bytes_ret" argument
to know how much data is written on the bus so as to know if there's
anymore data remaining to be written in
the subsequent call. Let's call this method "write in parts".
I think API states it supports both methods:
https://github.com/zephyrproject-rtos/zephyr/blob/master/include/usb/usb_device.h#L209


In my case, I'm using bt_usb and it tries to write all at once - i.e.
there's no handling of remaining data.
This mandates the USB device controller driver to take care of the writing
in parts within the driver itself.
While this can be done in a simple loop if the driver talks synchronously
to the USBD HW and do a poll and
a yield or use a semaphore till the HW completes the DMA and there's room
to write more.

Here's what bt_usb does:
[1]https://github.com/zephyrproject-rtos/zephyr/blob/master/samples/bluetooth/hci_usb/src/main.c#L665

When I looked at other drivers to see how they are implemented, the
designware USB driver seems to assume
the application/class drivers always support the "write in parts" method.
I'm not blaming the driver or it's
implementation but I just wanted to point out that there are drivers which
do not support handling chunked
write within them.

Here's the DW driver implementation:
[2]https://github.com/zephyrproject-rtos/zephyr/blob/master/drivers/usb/device/usb_dc_dw.c#L1004
[3]https://github.com/zephyrproject-rtos/zephyr/blob/master/drivers/usb/device/usb_dc_dw.c#L423

Another class driver, netusb does it a little differently by writing in
parts, but this too assumes the usb_write()
to be synchronous.

Here's what it does:
[4]https://github.com/zephyrproject-rtos/zephyr/blob/master/subsys/usb/class/netusb/netusb.c#L189
Fragmentation to USB packets is done on a higher level, in function_ecm
and finction_rndis files.

One more thing to consider is the write callbacks. In most cases, the
application/class driver is the one that registers
the Bulk/Interrupt endpoint callbacks and the device stack has no way of
knowing when to perform the next write, even
if it wants to handle this in the USB device stack.

My question now is, what is the ideal place to take care of truncation and
chunk handling as per the USB device stack design?
Why the driver needs to handle the multi-part write and synchronization?

I understand that there are not going to be multiple USBD controllers on a
given chip and taking care of this in the
driver won't cost much from memory/CPU cycles point of view but from USB
DC driver implementation perspective
this might save a lot of time if the stack can abstract them.
Please also look to usb_dc_ep_transfer() implemented for DW and STM32:

https://github.com/zephyrproject-rtos/zephyr/pull/5231

https://github.com/zephyrproject-rtos/zephyr/pull/5214

Best regards
Andrei Emeltchenko


Carles Cufi
 

Hi Paul,

-----Original Message-----
From: zephyr-devel-bounces@... [mailto:zephyr-devel-
bounces@...] On Behalf Of Paul Sokolovsky
Sent: 30 January 2018 15:21
To: Sundar Subramaniyan <sundar.subramaniyan@...>
Cc: zephyr-devel@...
Subject: Re: [Zephyr-devel] USB: Question about usb_write() API

Hello Sundar,

On Tue, 30 Jan 2018 17:48:03 +0530
Sundar Subramaniyan <sundar.subramaniyan@...> wrote:

[]


My question now is, what is the ideal place to take care of truncation
and chunk handling as per the USB device stack design?
Why the driver needs to handle the multi-part write and
synchronization?

I understand that there are not going to be multiple USBD controllers
on a given chip and taking care of this in the driver won't cost much
from memory/CPU cycles point of view but from USB DC driver
implementation perspective this might save a lot of time if the stack
can abstract them.

Please let me know your opinions.
I'd suggest to look at this as a generic problem, not specific to USB
subsystem (e.g., there's the same issue in networking subsystem, etc.),
and consider the known best practices dealing with it. For example,
POSIX, when dealing with streams-of-bytes, always mandates possibility
of short reads/short writes (what you called "write in parts"). That
means that not drivers, but higher levels should take care of repeating
"long" operations.

This approach allows to implement simpler drivers (== easier to write,
less bugs), and allows for more flexibility on the higher levels.
But yes, the drawback of this approach is that it requires higher levels
to deal with repeating operations (POSIX passes that responsibility all
the way up to the application level, so each and every application needs
to have repetition loops*).

* Of course, that can be abstracted to a library. Different libraries
actually, because as mentioned, that's a flexible approach allowing to
implement different data flows and handling policies.
Although you raise a good point, I'm not convinced that this applies here. I think that what Sundar wants to know is whether this particular API call, usb_write(), was originally designed to be able to buffer data and fragment it into smaller USB-sized packets or instead it should mirror the state of whatever hardware FIFO it is using below, returning a number of bytes written different from the ones the caller asked to send. Beyond that there's the issue of the API call being blocking or not, which seems unclear given the different ways it is used in Zephyr.
To sum up, I thin he's trying to clarify the original intentions of the API designer in order to adapt the driver he's working on to it, not looking to modify the API itself.

Thanks,

Carles


Carles Cufi
 

+ Andrei

 

Carles

 

From: zephyr-devel-bounces@... [mailto:zephyr-devel-bounces@...] On Behalf Of Sundar Subramaniyan
Sent: 30 January 2018 13:18
To: zephyr-devel@...
Subject: [Zephyr-devel] USB: Question about usb_write() API

 

Hi,

While developing a USB device controller driver for the Nordic nRF52840 SoC,
I ran into the problem of handling chunked/partial write() while coding the usb_dc_ep_write() function.

From what I could gather, the usb_write() API documentation states that the application calling
this function may choose to write all the data buffer at once even if it is larger than the maximum
packet size supported by the USB endpoint to which we're writing to.
Let's call this method "write all at once".

If the application or the class driver can manage to write in parts, then it must use the "bytes_ret" argument
to know how much data is written on the bus so as to know if there's anymore data remaining to be written in
the subsequent call. Let's call this method "write in parts".

In my case, I'm using bt_usb and it tries to write all at once - i.e. there's no handling of remaining data.
This mandates the USB device controller driver to take care of the writing in parts within the driver itself.
While this can be done in a simple loop if the driver talks synchronously to the USBD HW and do a poll and
a yield or use a semaphore till the HW completes the DMA and there's room to write more.

Here's what bt_usb does:
https://github.com/zephyrproject-rtos/zephyr/blob/master/samples/bluetooth/hci_usb/src/main.c#L665

When I looked at other drivers to see how they are implemented, the designware USB driver seems to assume
the application/class drivers always support the "write in parts" method. I'm not blaming the driver or it's
implementation but I just wanted to point out that there are drivers which do not support handling chunked
write within them.

Here's the DW driver implementation:
https://github.com/zephyrproject-rtos/zephyr/blob/master/drivers/usb/device/usb_dc_dw.c#L1004
https://github.com/zephyrproject-rtos/zephyr/blob/master/drivers/usb/device/usb_dc_dw.c#L423

Another class driver, netusb does it a little differently by writing in parts, but this too assumes the usb_write()
to be synchronous.

Here's what it does:
https://github.com/zephyrproject-rtos/zephyr/blob/master/subsys/usb/class/netusb/netusb.c#L189

One more thing to consider is the write callbacks. In most cases, the application/class driver is the one that registers
the Bulk/Interrupt endpoint callbacks and the device stack has no way of knowing when to perform the next write, even
if it wants to handle this in the USB device stack.

My question now is, what is the ideal place to take care of truncation and chunk handling as per the USB device stack design?
Why the driver needs to handle the multi-part write and synchronization?

I understand that there are not going to be multiple USBD controllers on a given chip and taking care of this in the
driver won't cost much from memory/CPU cycles point of view but from USB DC driver implementation perspective
this might save a lot of time if the stack can abstract them.

Please let me know your opinions.


Thanks,
Sundar


Paul Sokolovsky
 

Hello Sundar,

On Tue, 30 Jan 2018 17:48:03 +0530
Sundar Subramaniyan <sundar.subramaniyan@...> wrote:

[]


My question now is, what is the ideal place to take care of
truncation and chunk handling as per the USB device stack design?
Why the driver needs to handle the multi-part write and
synchronization?

I understand that there are not going to be multiple USBD controllers
on a given chip and taking care of this in the
driver won't cost much from memory/CPU cycles point of view but from
USB DC driver implementation perspective
this might save a lot of time if the stack can abstract them.

Please let me know your opinions.
I'd suggest to look at this as a generic problem, not specific to USB
subsystem (e.g., there's the same issue in networking subsystem, etc.),
and consider the known best practices dealing with it. For example,
POSIX, when dealing with streams-of-bytes, always mandates possibility
of short reads/short writes (what you called "write in parts"). That
means that not drivers, but higher levels should take care of repeating
"long" operations.

This approach allows to implement simpler drivers (== easier to
write, less bugs), and allows for more flexibility on the higher levels.
But yes, the drawback of this approach is that it requires higher levels
to deal with repeating operations (POSIX passes that responsibility
all the way up to the application level, so each and every application
needs to have repetition loops*).

* Of course, that can be abstracted to a library. Different libraries
actually, because as mentioned, that's a flexible approach allowing
to implement different data flows and handling policies.


Thanks,
Sundar


--
Best Regards,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog


Sundar Subramaniyan
 

Hi,

While developing a USB device controller driver for the Nordic nRF52840 SoC,
I ran into the problem of handling chunked/partial write() while coding the usb_dc_ep_write() function.

From what I could gather, the usb_write() API documentation states that the application calling
this function may choose to write all the data buffer at once even if it is larger than the maximum
packet size supported by the USB endpoint to which we're writing to.
Let's call this method "write all at once".

If the application or the class driver can manage to write in parts, then it must use the "bytes_ret" argument
to know how much data is written on the bus so as to know if there's anymore data remaining to be written in
the subsequent call. Let's call this method "write in parts".

In my case, I'm using bt_usb and it tries to write all at once - i.e. there's no handling of remaining data.
This mandates the USB device controller driver to take care of the writing in parts within the driver itself.
While this can be done in a simple loop if the driver talks synchronously to the USBD HW and do a poll and
a yield or use a semaphore till the HW completes the DMA and there's room to write more.

Here's what bt_usb does:
https://github.com/zephyrproject-rtos/zephyr/blob/master/samples/bluetooth/hci_usb/src/main.c#L665

When I looked at other drivers to see how they are implemented, the designware USB driver seems to assume
the application/class drivers always support the "write in parts" method. I'm not blaming the driver or it's
implementation but I just wanted to point out that there are drivers which do not support handling chunked
write within them.

Here's the DW driver implementation:
https://github.com/zephyrproject-rtos/zephyr/blob/master/drivers/usb/device/usb_dc_dw.c#L1004
https://github.com/zephyrproject-rtos/zephyr/blob/master/drivers/usb/device/usb_dc_dw.c#L423

Another class driver, netusb does it a little differently by writing in parts, but this too assumes the usb_write()
to be synchronous.

Here's what it does:
https://github.com/zephyrproject-rtos/zephyr/blob/master/subsys/usb/class/netusb/netusb.c#L189

One more thing to consider is the write callbacks. In most cases, the application/class driver is the one that registers
the Bulk/Interrupt endpoint callbacks and the device stack has no way of knowing when to perform the next write, even
if it wants to handle this in the USB device stack.

My question now is, what is the ideal place to take care of truncation and chunk handling as per the USB device stack design?
Why the driver needs to handle the multi-part write and synchronization?

I understand that there are not going to be multiple USBD controllers on a given chip and taking care of this in the
driver won't cost much from memory/CPU cycles point of view but from USB DC driver implementation perspective
this might save a lot of time if the stack can abstract them.

Please let me know your opinions.

Thanks,
Sundar