Topics

[RFC] Device: device_get_binding() returns NULL if device fails initialization


Daniel Leung <daniel.leung@...>
 

Problem Statement:
Currently, there is no way to know if a driver fails initialization.

Problem Description:
Zephyr currently does not provide a way to check programmatically
whether a device can be used. The device_get_binding() always
returns a valid device struct (if such device driver instance
exists). This causes a bit of headache when debugging apps as
the developer assumes the devices are ready to be used.

Solution:
The solution was actually proposed in [1] a while ago. The idea
is to piggy-back onto driver_api pointer. If the pointer is NULL,
the device driver has failed in its own initialization function.
The driver_api is set to NULL (or in other words, never set to
the actual API struct). When device_get_binding() sess that
the driver_api is NULL for a particular device struct, it returns
NULL to the caller, and thus preventing its use in the app.
Since this is a binary state, instead of creating another variable
inside device struct, this uses driver_api to avoid unnecessarily
enlarging the ROM size. Since this is a simple change, a patch
has been created in [2].

[1] https://lists.zephyrproject.org/archives/list/devel(a)lists.zephyrproject.org/message/MZB5PYBSRHV3NIEHJYXYQVLTPFIIHPB3/
[2] https://gerrit.zephyrproject.org/r/1261


----------
Daniel Leung


Tomasz Bursztyka
 

Hi Daniel,

That's actually a good point.

And that reminds me, that most of our drivers do not set driver API to
NULL if they fail.
But they do return an explicit code which no one care about:
see _sys_device_do_config_level() in kernel/nanokernel/device.c

I think there you can add also a tiny test:

if (device->init(info) < 0) {
info->driver_api = NULL;
}

(actually, naming in this function should be reversed: struct device
should be "device" and
struct device_config should be "info" or "config", even better. But I am
diverging from subject)

It's better to let device.c doing it, instead of making sure all driver
behaves well on that aspect.
As long as they return a relevant status, it will be fine.

Tomasz

Problem Statement:
Currently, there is no way to know if a driver fails initialization.

Problem Description:
Zephyr currently does not provide a way to check programmatically
whether a device can be used. The device_get_binding() always
returns a valid device struct (if such device driver instance
exists). This causes a bit of headache when debugging apps as
the developer assumes the devices are ready to be used.

Solution:
The solution was actually proposed in [1] a while ago. The idea
is to piggy-back onto driver_api pointer. If the pointer is NULL,
the device driver has failed in its own initialization function.
The driver_api is set to NULL (or in other words, never set to
the actual API struct). When device_get_binding() sess that
the driver_api is NULL for a particular device struct, it returns
NULL to the caller, and thus preventing its use in the app.
Since this is a binary state, instead of creating another variable
inside device struct, this uses driver_api to avoid unnecessarily
enlarging the ROM size. Since this is a simple change, a patch
has been created in [2].

[1] https://lists.zephyrproject.org/archives/list/devel(a)lists.zephyrproject.org/message/MZB5PYBSRHV3NIEHJYXYQVLTPFIIHPB3/
[2] https://gerrit.zephyrproject.org/r/1261


----------
Daniel Leung


Kalowsky, Daniel <daniel.kalowsky@...>
 

-----Original Message-----
From: Tomasz Bursztyka [mailto:tomasz.bursztyka(a)linux.intel.com]
Sent: Wednesday, April 6, 2016 12:00 AM
To: devel(a)lists.zephyrproject.org
Subject: [devel] Re: [RFC] Device: device_get_binding() returns NULL if
device fails initialization

Hi Daniel,

That's actually a good point.

And that reminds me, that most of our drivers do not set driver API to
NULL if they fail.
But they do return an explicit code which no one care about:
see _sys_device_do_config_level() in kernel/nanokernel/device.c

I think there you can add also a tiny test:

if (device->init(info) < 0) {
info->driver_api = NULL;
}
Good point Tomasz. Might need to be part of the RFC patch that Daniel posted.

(actually, naming in this function should be reversed: struct device
should be "device" and
struct device_config should be "info" or "config", even better. But I am
diverging from subject)

It's better to let device.c doing it, instead of making sure all driver
behaves well on that aspect.
As long as they return a relevant status, it will be fine.
Correct. If the OS is going to depend upon a feature/function being set to indicate a specific status, the OS should be doing this not the device driver. Although it doesn't hurt if the device driver does it as well :-)



Tomasz

Problem Statement:
Currently, there is no way to know if a driver fails initialization.

Problem Description:
Zephyr currently does not provide a way to check programmatically
whether a device can be used. The device_get_binding() always
returns a valid device struct (if such device driver instance
exists). This causes a bit of headache when debugging apps as
the developer assumes the devices are ready to be used.

Solution:
The solution was actually proposed in [1] a while ago. The idea
is to piggy-back onto driver_api pointer. If the pointer is NULL,
the device driver has failed in its own initialization function.
The driver_api is set to NULL (or in other words, never set to
the actual API struct). When device_get_binding() sess that
the driver_api is NULL for a particular device struct, it returns
NULL to the caller, and thus preventing its use in the app.
Since this is a binary state, instead of creating another variable
inside device struct, this uses driver_api to avoid unnecessarily
enlarging the ROM size. Since this is a simple change, a patch
has been created in [2].

[1]
https://lists.zephyrproject.org/archives/list/devel(a)lists.zephyrproject.org/
message/MZB5PYBSRHV3NIEHJYXYQVLTPFIIHPB3/
[2] https://gerrit.zephyrproject.org/r/1261


----------
Daniel Leung


Daniel Leung <daniel.leung@...>
 

On Wed, Apr 06, 2016 at 02:02:17PM +0000, Kalowsky, Daniel wrote:
-----Original Message-----
From: Tomasz Bursztyka [mailto:tomasz.bursztyka(a)linux.intel.com]
Sent: Wednesday, April 6, 2016 12:00 AM
To: devel(a)lists.zephyrproject.org
Subject: [devel] Re: [RFC] Device: device_get_binding() returns NULL if
device fails initialization

Hi Daniel,

That's actually a good point.

And that reminds me, that most of our drivers do not set driver API to
NULL if they fail.
But they do return an explicit code which no one care about:
see _sys_device_do_config_level() in kernel/nanokernel/device.c

I think there you can add also a tiny test:

if (device->init(info) < 0) {
info->driver_api = NULL;
}
Good point Tomasz. Might need to be part of the RFC patch that Daniel posted.
The original discussion was actually started about getting rid of returns of
initialization functions. The idea was that the kernel could not really do
anything when a driver failed initialization, so why not getting rid of returns
to save a few bytes. The burden of determining whether it is a critical error
during init is up to the driver. If there is a critical error, the driver
should assert (and setting driver_api to NULL).

There are situations where there are non-critical erros during initialization.
For example, the I2C driver, failure to set default configuration is not
critical. As long as the developer sets it later, the driver can still function.
However, with the above code, this non-critical error will cause driver_api
to be NULL, which prevents its usage at all. The driver writer should know
whether an error prevents the device from functioning at all. So it should be
up to driver writer to set driver_api to NULL. One can argue that a non-critical
error should not return anything other than 0. But then it is not passing on
the information that there is a non-critical error (though the kernel does not
really care).

(actually, naming in this function should be reversed: struct device
should be "device" and
struct device_config should be "info" or "config", even better. But I am
diverging from subject)

It's better to let device.c doing it, instead of making sure all driver
behaves well on that aspect.
As long as they return a relevant status, it will be fine.
Correct. If the OS is going to depend upon a feature/function being set to
indicate a specific status, the OS should be doing this not the device driver.
Although it doesn't hurt if the device driver does it as well :-)
(see above.)


Tomasz

Problem Statement:
Currently, there is no way to know if a driver fails initialization.

Problem Description:
Zephyr currently does not provide a way to check programmatically
whether a device can be used. The device_get_binding() always
returns a valid device struct (if such device driver instance
exists). This causes a bit of headache when debugging apps as
the developer assumes the devices are ready to be used.

Solution:
The solution was actually proposed in [1] a while ago. The idea
is to piggy-back onto driver_api pointer. If the pointer is NULL,
the device driver has failed in its own initialization function.
The driver_api is set to NULL (or in other words, never set to
the actual API struct). When device_get_binding() sess that
the driver_api is NULL for a particular device struct, it returns
NULL to the caller, and thus preventing its use in the app.
Since this is a binary state, instead of creating another variable
inside device struct, this uses driver_api to avoid unnecessarily
enlarging the ROM size. Since this is a simple change, a patch
has been created in [2].

[1]
https://lists.zephyrproject.org/archives/list/devel(a)lists.zephyrproject.org/
message/MZB5PYBSRHV3NIEHJYXYQVLTPFIIHPB3/
[2] https://gerrit.zephyrproject.org/r/1261


----------
Daniel Leung


Tomasz Bursztyka
 

Hi Daniel,

And that reminds me, that most of our drivers do not set driver API to
NULL if they fail.
But they do return an explicit code which no one care about:
see _sys_device_do_config_level() in kernel/nanokernel/device.c

I think there you can add also a tiny test:

if (device->init(info) < 0) {
info->driver_api = NULL;
}
Good point Tomasz. Might need to be part of the RFC patch that Daniel posted.
The original discussion was actually started about getting rid of returns of
initialization functions.
Ok I missed that discussion.

The idea was that the kernel could not really do
anything when a driver failed initialization, so why not getting rid of returns
to save a few bytes. The burden of determining whether it is a critical error
during init is up to the driver. If there is a critical error, the driver
should assert (and setting driver_api to NULL).

There are situations where there are non-critical erros during initialization.
For example, the I2C driver, failure to set default configuration is not
critical. As long as the developer sets it later, the driver can still function.
However, with the above code, this non-critical error will cause driver_api
to be NULL, which prevents its usage at all. The driver writer should know
whether an error prevents the device from functioning at all. So it should be
up to driver writer to set driver_api to NULL. One can argue that a non-critical
error should not return anything other than 0. But then it is not passing on
the information that there is a non-critical error (though the kernel does not
really care).
Your RFC patch was not clear about that. It would require to change the
init function signature
and all the existing drivers in one go as well.
I hope we don't consider Kernel device API (init and stuff) as part of
"API 1.0".

Tomasz


Daniel Leung <daniel.leung@...>
 

On Mon, Apr 11, 2016 at 09:37:25AM +0200, Tomasz Bursztyka wrote:
Hi Daniel,

And that reminds me, that most of our drivers do not set driver API to
NULL if they fail.
But they do return an explicit code which no one care about:
see _sys_device_do_config_level() in kernel/nanokernel/device.c

I think there you can add also a tiny test:

if (device->init(info) < 0) {
info->driver_api = NULL;
}
Good point Tomasz. Might need to be part of the RFC patch that Daniel posted.
The original discussion was actually started about getting rid of returns of
initialization functions.
Ok I missed that discussion.
You can follow the link and go to the top message. The idea was first proposed by
Benjamin Walsh to get rid of return values. This RFC was just a tiny part of that.


The idea was that the kernel could not really do
anything when a driver failed initialization, so why not getting rid of returns
to save a few bytes. The burden of determining whether it is a critical error
during init is up to the driver. If there is a critical error, the driver
should assert (and setting driver_api to NULL).

There are situations where there are non-critical erros during initialization.
For example, the I2C driver, failure to set default configuration is not
critical. As long as the developer sets it later, the driver can still function.
However, with the above code, this non-critical error will cause driver_api
to be NULL, which prevents its usage at all. The driver writer should know
whether an error prevents the device from functioning at all. So it should be
up to driver writer to set driver_api to NULL. One can argue that a non-critical
error should not return anything other than 0. But then it is not passing on
the information that there is a non-critical error (though the kernel does not
really care).
Your RFC patch was not clear about that. It would require to change the
init function signature
and all the existing drivers in one go as well.
I hope we don't consider Kernel device API (init and stuff) as part of
"API 1.0".
The init change will be in another RFC. I am testing locally to see how much
space we can actually save. If it is minimal or non-existent, there is no need
to do that.

AFAIK, the device init interface is not API from my understanding. Application
should not be using these in their code. They are internal to the kernel and
drivers.


----------
Daniel Leung


Nashif, Anas
 

Hi




On 12/04/2016, 00:23, "Daniel Leung" <daniel.leung(a)intel.com> wrote:

and all the existing drivers in one go as well.
I hope we don't consider Kernel device API (init and stuff) as part of
"API 1.0".
The init change will be in another RFC. I am testing locally to see how much
space we can actually save. If it is minimal or non-existent, there is no need
to do that.

AFAIK, the device init interface is not API from my understanding. Application
should not be using these in their code. They are internal to the kernel and
drivers.
IMO the driver interface is still considered public APIs.
Someone might be using those to write drivers external to Zephyr, changing the signatures will break such drivers.


Anas



----------
Daniel Leung


Thomas, Ramesh
 

On Mon, 2016-04-11 at 09:23 -0700, Daniel Leung wrote:
On Mon, Apr 11, 2016 at 09:37:25AM +0200, Tomasz Bursztyka wrote:
Hi Daniel,

And that reminds me, that most of our drivers do not set driver API to
NULL if they fail.
But they do return an explicit code which no one care about:
see _sys_device_do_config_level() in kernel/nanokernel/device.c

I think there you can add also a tiny test:

if (device->init(info) < 0) {
info->driver_api = NULL;
}
Good point Tomasz. Might need to be part of the RFC patch that Daniel posted.
The original discussion was actually started about getting rid of returns of
initialization functions.
Ok I missed that discussion.
You can follow the link and go to the top message. The idea was first proposed by
Benjamin Walsh to get rid of return values. This RFC was just a tiny part of that.


The idea was that the kernel could not really do
anything when a driver failed initialization, so why not getting rid of returns
to save a few bytes. The burden of determining whether it is a critical error
during init is up to the driver. If there is a critical error, the driver
should assert (and setting driver_api to NULL).

There are situations where there are non-critical erros during initialization.
For example, the I2C driver, failure to set default configuration is not
critical. As long as the developer sets it later, the driver can still function.
However, with the above code, this non-critical error will cause driver_api
to be NULL, which prevents its usage at all. The driver writer should know
whether an error prevents the device from functioning at all. So it should be
up to driver writer to set driver_api to NULL. One can argue that a non-critical
error should not return anything other than 0. But then it is not passing on
the information that there is a non-critical error (though the kernel does not
Not sure how the method in the RFC differentiates between critical and
non-critical errors. Isn't the driver also *not* passing on the
non-critical error status to the app by not setting device_api = NULL in
those cases? Then how will the app know that it needs to do something to
correct such non-critical errors?

If this is merely a way to indicate that the driver is in an unusable
error state, then how is it different from critical error? - which is
not expected to happen in production releases.

really care).
Your RFC patch was not clear about that. It would require to change the
init function signature
and all the existing drivers in one go as well.
I hope we don't consider Kernel device API (init and stuff) as part of
"API 1.0".
The init change will be in another RFC. I am testing locally to see how much
space we can actually save. If it is minimal or non-existent, there is no need
to do that.

AFAIK, the device init interface is not API from my understanding. Application
should not be using these in their code. They are internal to the kernel and
drivers.


----------
Daniel Leung


Tomasz Bursztyka
 

Hi Ramesh,

The idea was that the kernel could not really do
anything when a driver failed initialization, so why not getting rid of returns
to save a few bytes. The burden of determining whether it is a critical error
during init is up to the driver. If there is a critical error, the driver
should assert (and setting driver_api to NULL).

There are situations where there are non-critical erros during initialization.
For example, the I2C driver, failure to set default configuration is not
critical. As long as the developer sets it later, the driver can still function.
However, with the above code, this non-critical error will cause driver_api
to be NULL, which prevents its usage at all. The driver writer should know
whether an error prevents the device from functioning at all. So it should be
up to driver writer to set driver_api to NULL. One can argue that a non-critical
error should not return anything other than 0. But then it is not passing on
the information that there is a non-critical error (though the kernel does not
Not sure how the method in the RFC differentiates between critical and
non-critical errors. Isn't the driver also*not* passing on the
non-critical error status to the app by not setting device_api = NULL in
those cases? Then how will the app know that it needs to do something to
correct such non-critical errors?

If this is merely a way to indicate that the driver is in an unusable
error state, then how is it different from critical error? - which is
not expected to happen in production releases.
The idea is to let the driver decide whether it is still functioning or not.
Unusable state is a critical error to me.
A non critical error, means it is still ok to work, thus the driver API
will be set.
If the driver stops at non-critical error and do not set the API, it's
not really a non-critical error then.
(or it's a bug)

I don't see much trouble with that. Up to drivers to decide

And anyway, in 99.9% of the drivers: there will be critical errors only
I guess (unable to get
a device binding, to configure some register...).

Tomasz


Tomasz Bursztyka
 

Hi Daniel and Anas,

and all the existing drivers in one go as well.
I hope we don't consider Kernel device API (init and stuff) as part of
"API 1.0".
The init change will be in another RFC. I am testing locally to see how much
space we can actually save. If it is minimal or non-existent, there is no need
to do that.
Saving even a minimal amount of bytes is good to consider, as long as it
does not reduce
features etc... and that RFC is actually fixing things.

AFAIK, the device init interface is not API from my understanding. Application
should not be using these in their code. They are internal to the kernel and
drivers.
IMO the driver interface is still considered public APIs.
Someone might be using those to write drivers external to Zephyr, changing the signatures will break such drivers.
How long are we supposed to support this API 1.0? Can't we drop some of
its specifics in, let's say, 1.5?
The more we will have to support all of it, the less we will be able to
proceed on some interesting changes. :(

Tomasz


Benjamin Walsh <benjamin.walsh@...>
 

On Tue, Apr 12, 2016 at 09:12:19AM +0200, Tomasz Bursztyka wrote:
Hi Daniel and Anas,

and all the existing drivers in one go as well.
I hope we don't consider Kernel device API (init and stuff) as part of
"API 1.0".
The init change will be in another RFC. I am testing locally to see how much
space we can actually save. If it is minimal or non-existent, there is no need
to do that.
Saving even a minimal amount of bytes is good to consider, as long
as it does not reduce
features etc... and that RFC is actually fixing things.

AFAIK, the device init interface is not API from my understanding. Application
should not be using these in their code. They are internal to the kernel and
drivers.
IMO the driver interface is still considered public APIs.
Someone might be using those to write drivers external to Zephyr, changing the signatures will break such drivers.
How long are we supposed to support this API 1.0? Can't we drop some
of its specifics in, let's say, 1.5?
The more we will have to support all of it, the less we will be able
to proceed on some interesting changes. :(
I already got feedback from people developing either drivers or sample
application for Rocket, basically playing to role of customers of
Zephyr, and I can already say that, for people used to writing software
against APIs that experience a very minimal amount of changes (i.e.
VxWorks), they did _not_ like the fact that even a small amount of our
APIs is in flux. It basically causes lost time and frustration.

I'm not advocating anything w.r.t. a policiy for changing APIs right
now, but this is a data point.


Tomasz Bursztyka
 

Hi Benjamin,

AFAIK, the device init interface is not API from my understanding. Application
should not be using these in their code. They are internal to the kernel and
drivers.
IMO the driver interface is still considered public APIs.
Someone might be using those to write drivers external to Zephyr, changing the signatures will break such drivers.
How long are we supposed to support this API 1.0? Can't we drop some
of its specifics in, let's say, 1.5?
The more we will have to support all of it, the less we will be able
to proceed on some interesting changes.:(
I already got feedback from people developing either drivers or sample
application for Rocket, basically playing to role of customers of
Zephyr, and I can already say that, for people used to writing software
against APIs that experience a very minimal amount of changes (i.e.
VxWorks), they did_not_ like the fact that even a small amount of our
APIs is in flux. It basically causes lost time and frustration.

I'm not advocating anything w.r.t. a policiy for changing APIs right
now, but this is a data point.
Thus why we may enforce a version of the API vs a Zephyr version. And
move on with a new one.
If they want to get stuck with API 1.0 they could use Zephyr 1.2 (1.3
maybe) and that's it.

Our APIs are, on many aspects, like prototype APIs.
It's not surprising we are willing to modify them, and some quite a lot
actually.

Also: we may keep the public API compatible, but the device driver API
below might change.
Unless we want to bloat our drivers.

Tomasz


Daniel Leung <daniel.leung@...>
 

On Tue, Apr 12, 2016 at 10:22:13AM -0400, Benjamin Walsh wrote:
On Tue, Apr 12, 2016 at 09:12:19AM +0200, Tomasz Bursztyka wrote:
Hi Daniel and Anas,

and all the existing drivers in one go as well.
I hope we don't consider Kernel device API (init and stuff) as part of
"API 1.0".
The init change will be in another RFC. I am testing locally to see how much
space we can actually save. If it is minimal or non-existent, there is no need
to do that.
Saving even a minimal amount of bytes is good to consider, as long
as it does not reduce
features etc... and that RFC is actually fixing things.

AFAIK, the device init interface is not API from my understanding. Application
should not be using these in their code. They are internal to the kernel and
drivers.
IMO the driver interface is still considered public APIs.
Someone might be using those to write drivers external to Zephyr, changing the signatures will break such drivers.
How long are we supposed to support this API 1.0? Can't we drop some
of its specifics in, let's say, 1.5?
The more we will have to support all of it, the less we will be able
to proceed on some interesting changes. :(
I already got feedback from people developing either drivers or sample
application for Rocket, basically playing to role of customers of
Zephyr, and I can already say that, for people used to writing software
against APIs that experience a very minimal amount of changes (i.e.
VxWorks), they did _not_ like the fact that even a small amount of our
APIs is in flux. It basically causes lost time and frustration.

I'm not advocating anything w.r.t. a policiy for changing APIs right
now, but this is a data point.
Is this only about changing the APIs (like adding parameters, etc)?
How about adding APIs (but still keep the existing API unmodified)?


----------
Daniel Leung


Daniel Leung <daniel.leung@...>
 

On Tue, Apr 12, 2016 at 09:07:00AM +0200, Tomasz Bursztyka wrote:
Hi Ramesh,

> > > The idea was that the kernel could not really do
> > > anything when a driver failed initialization, so why not getting rid of returns
> > > to save a few bytes. The burden of determining whether it is a critical error
> > > during init is up to the driver. If there is a critical error, the driver
> > > should assert (and setting driver_api to NULL).
> > >
> > > There are situations where there are non-critical erros during initialization.
> > > For example, the I2C driver, failure to set default configuration is not
> > > critical. As long as the developer sets it later, the driver can still function.
> > > However, with the above code, this non-critical error will cause driver_api
> > > to be NULL, which prevents its usage at all. The driver writer should know
> > > whether an error prevents the device from functioning at all. So it should be
> > > up to driver writer to set driver_api to NULL. One can argue that a non-critical
> > > error should not return anything other than 0. But then it is not passing on
> > > the information that there is a non-critical error (though the kernel does not

Not sure how the method in the RFC differentiates between critical and
non-critical errors. Isn't the driver also *not* passing on the
non-critical error status to the app by not setting device_api = NULL in
those cases? Then how will the app know that it needs to do something to
correct such non-critical errors?

If this is merely a way to indicate that the driver is in an unusable
error state, then how is it different from critical error? - which is
not expected to happen in production releases.

The idea is to let the driver decide whether it is still functioning or
not.
Unusable state is a critical error to me.
A non critical error, means it is still ok to work, thus the driver API
will be set.
If the driver stops at non-critical error and do not set the API, it's not
really a non-critical error then.
(or it's a bug)

I don't see much trouble with that. Up to drivers to decide

And anyway, in 99.9% of the drivers: there will be critical errors only I
guess (unable to get
a device binding, to configure some register...).

Tomasz
Ramesh, in a production release, there should not be any non-critical errors
at device initialization (hence "production"). In a production environment,
the null-returning can be used as a mechanism to signify a major problem,
for example, a peripheral breaks down in the field. Notification then can be
made to replace the faulty hardware, if so desired.


----------
Daniel Leung


Benjamin Walsh <benjamin.walsh@...>
 

On Tue, Apr 12, 2016 at 10:22:21AM -0700, Daniel Leung wrote:
On Tue, Apr 12, 2016 at 10:22:13AM -0400, Benjamin Walsh wrote:
On Tue, Apr 12, 2016 at 09:12:19AM +0200, Tomasz Bursztyka wrote:
Hi Daniel and Anas,

and all the existing drivers in one go as well.
I hope we don't consider Kernel device API (init and stuff) as part of
"API 1.0".
The init change will be in another RFC. I am testing locally to see how much
space we can actually save. If it is minimal or non-existent, there is no need
to do that.
Saving even a minimal amount of bytes is good to consider, as long
as it does not reduce
features etc... and that RFC is actually fixing things.

AFAIK, the device init interface is not API from my understanding. Application
should not be using these in their code. They are internal to the kernel and
drivers.
IMO the driver interface is still considered public APIs.
Someone might be using those to write drivers external to Zephyr, changing the signatures will break such drivers.
How long are we supposed to support this API 1.0? Can't we drop some
of its specifics in, let's say, 1.5?
The more we will have to support all of it, the less we will be able
to proceed on some interesting changes. :(
I already got feedback from people developing either drivers or sample
application for Rocket, basically playing to role of customers of
Zephyr, and I can already say that, for people used to writing software
against APIs that experience a very minimal amount of changes (i.e.
VxWorks), they did _not_ like the fact that even a small amount of our
APIs is in flux. It basically causes lost time and frustration.

I'm not advocating anything w.r.t. a policiy for changing APIs right
now, but this is a data point.
Is this only about changing the APIs (like adding parameters, etc)?
How about adding APIs (but still keep the existing API unmodified)?
Changing APIs.


Thomas, Ramesh
 

On Tue, Apr 12, 2016 at 10:27:20, Leung, Daniel wrote:
On Tue, Apr 12, 2016 at 09:07:00AM +0200, Tomasz Bursztyka wrote:
Hi Ramesh,

> > > The idea was that the kernel could not really do
> > > anything when a driver failed initialization, so why not getting rid
of returns
> > > to save a few bytes. The burden of determining whether it is a
critical error
> > > during init is up to the driver. If there is a critical error, the driver
> > > should assert (and setting driver_api to NULL).
> > >
> > > There are situations where there are non-critical erros during
initialization.
> > > For example, the I2C driver, failure to set default configuration is
not
> > > critical. As long as the developer sets it later, the driver can still
function.
> > > However, with the above code, this non-critical error will cause
driver_api
> > > to be NULL, which prevents its usage at all. The driver writer
should know
> > > whether an error prevents the device from functioning at all. So
it should be
> > > up to driver writer to set driver_api to NULL. One can argue that a
non-critical
> > > error should not return anything other than 0. But then it is not
passing on
> > > the information that there is a non-critical error (though the
kernel does not

Not sure how the method in the RFC differentiates between critical
and
non-critical errors. Isn't the driver also *not* passing on the
non-critical error status to the app by not setting device_api = NULL in
those cases? Then how will the app know that it needs to do
something to
correct such non-critical errors?

If this is merely a way to indicate that the driver is in an unusable
error state, then how is it different from critical error? - which is
not expected to happen in production releases.

The idea is to let the driver decide whether it is still functioning or
not.
Unusable state is a critical error to me.
A non critical error, means it is still ok to work, thus the driver API
will be set.
If the driver stops at non-critical error and do not set the API, it's not
really a non-critical error then.
(or it's a bug)

I don't see much trouble with that. Up to drivers to decide

And anyway, in 99.9% of the drivers: there will be critical errors only I
guess (unable to get
a device binding, to configure some register...).

Tomasz
Ramesh, in a production release, there should not be any non-critical
errors
at device initialization (hence "production"). In a production
environment,
the null-returning can be used as a mechanism to signify a major
problem,
for example, a peripheral breaks down in the field. Notification then can
be
made to replace the faulty hardware, if so desired.
I see. So the method in the RFC is to assist in diagnosis or notification of critical
errors.



----------
Daniel Leung


Boie, Andrew P
 

I already got feedback from people developing either drivers or sample
application for Rocket, basically playing to role of customers of
Zephyr, and I can already say that, for people used to writing software
against APIs that experience a very minimal amount of changes (i.e.
I'm not advocating anything w.r.t. a policiy for changing APIs right
now, but this is a data point.
I think we shouldn't change our existing APIs at all. They should be treated as a contract. Someone who builds code against 1.0 should be able to do so against later versions until we make a conscious decision otherwise.

I think if we want to change something, we mark the old APIs with __attribute__((deprecated)) so that it's clear that the old API is going away at some point by generating a warning. But they should still be *usable*.

Then, in the fullness of time, remove the APIs that were previously deprecated. Whether that is Zephyr 1.5, 2.0, doesn't matter to me.

VxWorks), they did _not_ like the fact that even a small amount of our
APIs is in flux. It basically causes lost time and frustration.
What specifically changed that caused consternation?

Andrew


Benjamin Walsh <benjamin.walsh@...>
 

On Tue, Apr 12, 2016 at 08:58:34PM +0000, Boie, Andrew P wrote:

I already got feedback from people developing either drivers or sample
application for Rocket, basically playing to role of customers of
Zephyr, and I can already say that, for people used to writing software
against APIs that experience a very minimal amount of changes (i.e.
I'm not advocating anything w.r.t. a policiy for changing APIs right
now, but this is a data point.
I think we shouldn't change our existing APIs at all. They should be
treated as a contract. Someone who builds code against 1.0 should be
able to do so against later versions until we make a conscious
decision otherwise.

I think if we want to change something, we mark the old APIs with
__attribute__((deprecated)) so that it's clear that the old API is
going away at some point by generating a warning. But they should
still be *usable*.

Then, in the fullness of time, remove the APIs that were previously
deprecated. Whether that is Zephyr 1.5, 2.0, doesn't matter to me.

VxWorks), they did _not_ like the fact that even a small amount of
our APIs is in flux. It basically causes lost time and frustration.
What specifically changed that caused consternation?
I don't remember exactly. One of the them was some renaming of either
pinmux or gpio functions/structures.


Davidoaia, Bogdan M <bogdan.m.davidoaia@...>
 

Hi Daniel,

I didn't catch this at first, but the patch causes a problem for the
Grove LCD driver (not sure if this problem is also present for other
drivers).

The way it was written, the Grove LCD driver doesn't use the driver_api
pointer at all (there is no LCD API). As such, it is NULL by default and
the call to device_get_binding will return NULL even though the device
was correctly initialized.

Bogdan

On Ma, 2016-04-05 at 16:36 -0700, Daniel Leung wrote:
Problem Statement:
Currently, there is no way to know if a driver fails initialization.

Problem Description:
Zephyr currently does not provide a way to check programmatically
whether a device can be used. The device_get_binding() always
returns a valid device struct (if such device driver instance
exists). This causes a bit of headache when debugging apps as
the developer assumes the devices are ready to be used.

Solution:
The solution was actually proposed in [1] a while ago. The idea
is to piggy-back onto driver_api pointer. If the pointer is NULL,
the device driver has failed in its own initialization function.
The driver_api is set to NULL (or in other words, never set to
the actual API struct). When device_get_binding() sess that
the driver_api is NULL for a particular device struct, it returns
NULL to the caller, and thus preventing its use in the app.
Since this is a binary state, instead of creating another variable
inside device struct, this uses driver_api to avoid unnecessarily
enlarging the ROM size. Since this is a simple change, a patch
has been created in [2].

[1] https://lists.zephyrproject.org/archives/list/devel(a)lists.zephyrproject.org/message/MZB5PYBSRHV3NIEHJYXYQVLTPFIIHPB3/
[2] https://gerrit.zephyrproject.org/r/1261


----------
Daniel Leung


Daniel Leung <daniel.leung@...>
 

On Thu, Apr 14, 2016 at 09:15:50AM +0000, Davidoaia, Bogdan M wrote:


I didn't catch this at first, but the patch causes a problem for the
Grove LCD driver (not sure if this problem is also present for other
drivers).

The way it was written, the Grove LCD driver doesn't use the driver_api
pointer at all (there is no LCD API). As such, it is NULL by default and
the call to device_get_binding will return NULL even though the device
was correctly initialized.
Thank you for letting me know. I will send a patch to fix it.


--------
Daniel Leung