Re: [RFC] Power Management Infrastructure


Nashif, Anas
 

Amir,
All commits need to pass Jenkins, so you should make the changes and organise the commits in a way that keeps Jenkins happy all the way.

Thanks,
Anas

On 18/08/2016, 12:03, "Kaplan, Amir" <amir.kaplan(a)intel.com> wrote:

Hi,
Following Ramesh and Ku-Lang's feedbacks we have split the changes to multiple small patches(instead of the original delivery).

Note: Due to the fact that the drivers changes depends on the change in device.h only the last delivery(https://gerrit.zephyrproject.org/r/#/c/4161/) passes all required verifications in Jenkins.

Below are the updated deliveries:
Device API:
https://gerrit.zephyrproject.org/r/#/c/4142/

gpio:
https://gerrit.zephyrproject.org/r/#/c/4143/

interrupt_controler:
https://gerrit.zephyrproject.org/r/#/c/4145/

pwm:
https://gerrit.zephyrproject.org/r/#/c/4147/

rtc:
https://gerrit.zephyrproject.org/r/#/c/4148/

uart:
https://gerrit.zephyrproject.org/r/#/c/4149/

spi:
https://gerrit.zephyrproject.org/r/#/c/4150/

timer:
https://gerrit.zephyrproject.org/r/#/c/4159/

samples:
https://gerrit.zephyrproject.org/r/#/c/4161/

Regards,
Amir & Keren

-----Original Message-----
From: Kaplan, Amir
Sent: Monday, August 15, 2016 13:36
To: devel(a)lists.zephyrproject.org
Cc: Rahamim, Hezi <hezi.rahamim(a)intel.com>; Siman-tov, Keren <keren.siman-tov(a)intel.com>; Kanner, Noa <noa.kanner(a)intel.com>
Subject: RE: [devel] Re: Re: Re: Re: Re: [RFC] Power Management Infrastructure

Hi all,

The corresponding Gerrit link:
https://gerrit.zephyrproject.org/r/4081


-----Original Message-----
From: Kaplan, Amir
Sent: Monday, August 15, 2016 11:18
To: 'devel(a)lists.zephyrproject.org' <devel(a)lists.zephyrproject.org>
Cc: Rahamim, Hezi <hezi.rahamim(a)intel.com>; Siman-tov, Keren <keren.siman-tov(a)intel.com>; Kaplan, Amir <amir.kaplan(a)intel.com>; Kanner, Noa <noa.kanner(a)intel.com>
Subject: RE: [devel] Re: Re: Re: Re: Re: [RFC] Power Management Infrastructure

Hi all,
After reviewing all the comments and consulting Ramesh, below is the updated RFC:

Current state
===========

In the current Zephyr implementation the driver power hooks distinguish only between two driver states (suspend and resume). Drivers may have more than two states (i.e. D-states) and can traverse between those states. The state change today is limited only from active to suspend while there can be cases of other transitions requested by the application.
Please look at the below suggested device power states E.g. transition between DEVICE_PM_LOW_POWER_STATE to DEVICE_PM_OFF_STATE.

Moreover, the current device state cannot be queried by an application or a Power Manager service.

Below is the current Zephyr PM hooks:

struct device_pm_ops {
int (*suspend)(struct device *device, int pm_policy);
int (*resume)(struct device *device, int pm_policy); };

Proposed changes
===============

1. Have one function that can be used for all possible device PM purposes using a control code instead of the suspend resume functions:

int (*device_pm_control)(struct device *device, int pm_command, int device_power_state);

In first version will support DEVICE_SET_POWER_STATE and DEVICE_GET_POWER_STATE commands.

2. Add the below device power states:
Note: Many devices do not have all four power states defined.

DEVICE_PM_ACTIVE_STATE
--------------------------------------------
Normal operation of the device. All device context is retained.

DEVICE_PM_LOW_POWER_STATE
-------------------------------------------------------
Device context is preserved by the HW and need not be restored by the driver.
The device do not allow the Power Manager service to power it down.

DEVICE_PM_SUSPEND_STATE
------------------------------------------------
Most device context is lost by the hardware.
Device drivers must save and restore or reinitialize any context lost by the hardware.
The device can be powered down.
The device is allowing a wake signal to send them to active state.

DEVICE_PM_OFF_STATE
---------------------------------------
Power has been fully removed from the device. The device context is lost when this state is entered, so the OS software will reinitialize the device when powering it back on.
Device may not wake itself as the SoC need to reinitialize the device.

3. The set state functionality (via device_pm_control ) will behave according to the state transition of a specific driver. E.g. transition from active state to suspend state in a UART device will save device states and gate the clock.
The set state functionality (via device_pm_control ) will enable the Power Manager service to know the state of a driver if needed thus enable it to configure the SoC power behavior.

The advantages in the new method:
1. Active device PM that does not need system to go idle to do device PM. Any component can call it. Multiple PM states and transitions need not always between active and low power states.
2. Reduces memory use and complexity because now there is only one function.
3. Compatible with legacy suspend/resume done from central PMA during idle 4. Scalable- In future more control codes can be added to support other device pm operations without having to change infrastructure.

Regards,
Amir, Keren, Hezi

-----Original Message-----
From: Rahamim, Hezi
Sent: Wednesday, August 10, 2016 10:18
To: Kaplan, Amir <amir.kaplan(a)intel.com>; Siman-tov, Keren <keren.siman-tov(a)intel.com>
Subject: FW: [devel] Re: Re: Re: Re: Re: [RFC] Power Management Infrastructure



-----Original Message-----
From: Thomas, Ramesh
Sent: Friday, July 15, 2016 06:22
To: Rahamim, Hezi <hezi.rahamim(a)intel.com>; devel(a)lists.zephyrproject.org
Subject: Re: [devel] Re: Re: Re: Re: Re: [RFC] Power Management Infrastructure



On 07/14/2016 06:17 AM, Rahamim, Hezi wrote:
> Hi Ramesh'
>
> Please see my comments below.
>
> Regards,
> Hezi
>
> -----Original Message-----
> From: Thomas, Ramesh [mailto:ramesh.thomas(a)intel.com]
> Sent: Thursday, July 14, 2016 10:32
> To: devel(a)lists.zephyrproject.org
> Subject: [devel] Re: Re: Re: Re: Re: [RFC] Power Management
> Infrastructure
>
> On 7/13/2016 11:40 PM, Rahamim, Hezi wrote:
>> Hi Dimitriy,
>>
>> The get_state is there only for symmetry and good practice.
>> As mentioned below the power manager service will probably not use it as it is not efficient to do get_state to all devices to know all the devices states...
>> The more important part of the RFC is adding the set_state function and the device policies.
>
> That made me think why we originally came up with 2 functions when one was enough. Probably we thought the same way - to keep symmetry. Problem is that we will keep getting more needs and we will either add more functions to device_pm_ops or will have to change existing ones.
>
> How about having one function that can be used for all possible device
> PM purposes using a control code? Something like following :-
>
> int device_pm_control(device, flags);
>
> flags = (CONTROL_CODE | SYSTEM_POWER_STATE | DEVICE_POWER_STATE)
>
> CONTROL_CODE = {SET_POWER_STATE, GET_POWER_STATE, ...}
> DEVICE_POWER_STATE = {device PM states} SYSTEM_POWER_STATE = {system
> power policies}
>
> (We can add additional parameters if flags param is overloaded)
>
> returns value based on CONTROL_CODE
> e.g. returns device power state if CONTROL_CODE == GET_POWER_STATE
>
> (We probably don't need device_pm_ops if we have only one function.)
>
> [HR] Looks good. If the PM service will be designed as a driver than it can use the SYSTEM_POWER_STATE and a device driver will use the DEVICE_POWER_STATE.
>
>
> ***I also have some questions inline below***
>
>
>>
>> Thanks for the comment,
>> Hezi
>>
>> -----Original Message-----
>> From: Dmitriy Korovkin [mailto:dmitriy.korovkin(a)windriver.com]
>> Sent: Thursday, July 14, 2016 00:41
>> To: devel(a)lists.zephyrproject.org
>> Subject: [devel] Re: Re: Re: [RFC] Power Management Infrastructure
>>
>> Hi Hezi,
>> I think RFC needs to be extended with the description of the idea of controlling power state of each device (if I got you correctly).
>> Without it the need of
>> int (*get_state)(struct device *device, int device_pm_policy); looks very unclear.
>> If all you need is to provide more that two states, then set_state() looks quite enough.
>>
>> Regards,
>>
>> Dmitriy Korovkin
>>
>> On 16-07-13 12:11 PM, Rahamim, Hezi wrote:
>>> Hi Ramesh,
>>>
>>> Please see my comments below/
>>>
>>> Thanks for the comments,
>>> Hezi
>>>
>>> -----Original Message-----
>>> From: Thomas, Ramesh [mailto:ramesh.thomas(a)intel.com]
>>> Sent: Wednesday, July 13, 2016 09:41
>>> To: devel(a)lists.zephyrproject.org
>>> Subject: [devel] Re: [RFC] Power Management Infrastructure
>>>
>>> On 7/12/2016 2:03 AM, Rahamim, Hezi wrote:
>>>> Hi all,
>>>>
>>>> Current state
>>>>
>>>> ===========
>>>>
>>>> In the current Zephyr implementation the driver power hooks
>>>> distinguish only
>>>>
>>>> between two driver states (suspend and resume). Drivers may have
>>>> more than two
>>>
>>> Currently suspend and resume are not actually states but a notification of the state transition. There is a second argument to those functions that specify the current policy for which the transition is happening.
>>>
>>>>
>>>> states (i.e. D-states) and can traverse between those states. The
>>>> state change
>>>>
>>>> today is limited only from active to suspend while there can be
>>>> cases of other
>>>>
>>>> transitions requested by the application.
>>>>
>>>> Please look at the below suggested device power states E.g.
>>>> transition between
>>>>
>>>> DEVICE_PM_LOW_POWER_STATE to DEVICE_PM_OFF_STATE.
>>>>
>>>> Moreover, the current device state cannot be queried by an
>>>> application or
>>>>
>>>> a Power Manager service.
>>>>
>>>> Below is the current Zephyr PM hooks:
>>>>
>>>> struct device_pm_ops {
>>>>
>>>> int (*suspend)(struct device *device, int pm_policy);
>>>>
>>>> int (*resume)(struct device *device, int pm_policy);
>>>>
>>>> };
>>>>
>>>> Proposed changes
>>>>
>>>> ===============
>>>>
>>>> First proposed change is to have a set state and get state driver
>>>> functions
>>>>
>>>> instead of the suspend resume functions:
>>>>
>>>> struct device_pm_ops {
>>>>
>>>> int (*set_state)(struct device *device, int
>>>> device_pm_policy);
>>>>
>>>> int (*get_state)(struct device *device, int
>>>> device_pm_policy);
>>>>
>>>> };
>>>>
>>>> The set_state function will behave according to the state
>>>> transition of a specific
>>>>
>>>> driver. E.g. transition from active state to suspend state in a
>>>> UART device will
>>>>
>>>> save device states and gate the clock.
>>>
>>> The proposal, as I understand, is to use the pm hooks to actively
>>> control the power states instead of using them as just notifications
>>> of the SOC's power transitions. Considering this, we had one power
>>> policy called "device_suspend_only". That is open to be broken down
>>> into more device specific power states.
>>>
>>> [HR] You are correct, we intend to use the pm driver hooks to
>>> actively control the device Power states. We will use the Zephyer's
>>> current power policies to indicate the system power state. As you
>>> mentioned, when devices will not be in active state the system can still be at "device_suspend_only" state.
>
> Do you see any issues with the apps/drivers keeping the PM service updated of the device's current state in real time? What about race conditions? Complexity of communication framework?
> [HR] The need of communication framework and device state database
> lock may be needed. For example, inter processor communication may be
> needed if in a specific SoC there are shared power resources between
> two cores (in AtP3 we may avoid that...)
>
>>>
>>>>
>>>> The get_state function will enable the Power Manager service to
>>>> know the state
>>>>
>>>> of each driver thus enable it to configure the SoC power behavior.
>>>>
>>>
>>> The set_state function looks ok. It is the same as the current
>>> suspend but with the name change. The purpose of the name change is
>>> to add a corresponding get_state. The RFC is not giving much
>>> details of the use of the get_state function.
>>>
>>> I assume there is a need for the PM service to build a device tree,
>>> with power hierarchy. It would be helpful if you could explain how
>>> this function fits in your larger design of the PM service's power
>>> policy management of the devices.
>>>
>>> [HR] I will give an example:
>>> A user application decides to suspend the I2C and the SPI devices.
>>> The application will then call the corresponding set_state functions of these devices.
>>> The set_state functions will perform the store of the relevant
>>> device state and gate the device clock. In the next idle time the _sys_soc_suspend will be called.
>>> This will trigger the power manager service which will decide what
>>> should be done to optimize the power (clock gate a branch or change the system power state.
>>> The decision of the power manager service will be based on the
>>> devices states which can be obtained also by using the get_state functions.
>>>
>>> Since the PM service is expected to have communication established
>>> with all components in the system, wouldn't it know what state each
>>> device is set to? Querying each device and building a tree every
>>> time there is an opportunity to suspend, may take some time causing delay in suspend.
>>>
>>> [HR] You are correct, using the get_state function will lead to a
>>> less optimal Power manager service and it will need to use a more optimized method.
>>> However, it is a good practice to have this function as the
>>> application may want to query the device state.
>>>
>>>> Second proposed change is to add the below device power states:
>>>>
>>>> Note: Many devices do not have all four power states defined.
>>>>
>>>> DEVICE_PM_ACTIVE_STATE
>>>>
>>>> --------------------------------------------
>>>>
>>>> Normal operation of the device. All device context is retained.
>>>>
>>>> DEVICE_PM_LOW_POWER_STATE
>>>>
>>>> -------------------------------------------------------
>>>>
>>>> Device context is preserved by the HW and need not be restored by
>>>> the driver.
>>>>
>>>> The device do not allow the Power Manager service to power it down.
>>>>
>>>> DEVICE_PM_SUSPEND_STATE
>>>>
>>>> ------------------------------------------------
>>>>
>>>> Most device context is lost by the hardware.
>>>>
>>>> Device drivers must save and restore or reinitialize any context
>>>> lost
>>>>
>>>> by the hardware.
>>>>
>>>> The device can be powered down.
>>>>
>>>> The device is allowing a wake signal to send them to active state.
>>>>
>>>> DEVICE_PM_OFF_STATE
>>>>
>>>> ---------------------------------------
>>>>
>>>> Power has been fully removed from the device. The device context is
>>>> lost
>>>>
>>>> when this state is entered, so the OS software will reinitialize
>>>> the device
>>>>
>>>> when powering it back on.
>>>>
>>>> Device may not wake itself as the SoC need to reinitialize the device.
>>>>
>>>
>>> The description of the power states here sounds like they are
>>> notifications. It sounds like some other component is setting the
>>> power state and notifies using these values and the drivers do
>>> save/restore or other operations based on the notification. Are the
>>> drivers expected to gate clocks, turn off peripherals etc. in these notifications?
>>>
>>> [HR] These device power states serve two purposes:
>>> 1. The drivers are expected to perform all the power/clock changes
>>> It can perform according to the selected power state and do not
>>> influence other drivers.
>>> 2. The power manager service will use the drivers states to decide
>>> on system power policy to go to (it can also stay in
>>> SYS_PM_DEVICE_SUSPEND_ONLY but to optimize the clock gating scheme)
>
> Would these become part of a specification that all device drivers would need to implement? In this scheme, the PM responsibilities are shared between PM Service, various apps and drivers. So some plan needs to be in place to ensure all of them cooperate as expected.
> [HR] You are right, there is a need to define the PM responsibilities of the PM service, drivers and apps. However, this RFC was written to add the ability to support more than two device states, define the available states and to enable transition between them.
> We will be happy to contribute also to define the above.

The device PM states look ok to me. They are architecture independent and the drivers can map them to device specific operations.

I think this RFC should be part of other RFCs that define the bigger picture of how it is used. As I see it, the kind of device PM you propose can function independent of system idle. In my opinion, it would be good to define it independent of system power management. The 2 will coordinate, but should not be a requirement. That way, either infrastructure can be used independently by users. Also there would be implementations that would want to do all device PM in the PM service for various reasons.

>
>>>
>>> The initial part of the RFC does mention the application can set the
>>> power state of the device and that is what the proposed set_state
>>> function also suggests.
>>>
>>> Do they serve both purposes? May be an example of how the device's
>>> power state is actively changed and who and when does it, with
>>> respect to these notifications, would help.
>>>
>>> [HR] Here is an example:
>>> There are three peripherals in a certain SOC: UART, I2C and SPI.
>>> Both I2C and SPI are fed from the same PLL and the UART from a second one.
>>> At the beginning the three peripherals are at DEVICE_PM_ACTIVE_STATE.
>>> The user application decides that the I2C and the SPI should go to suspend.
>>> It then calls the set_state function of these devices with DEVICE_PM_SUSPEND_STATE.
>>> When idle comes the PM service is called and see that it can close the SPI and I2C PLL.
>>> However, it cannot move to SYS_PM_DEEP_SLEEP as the UART is still active.
>
> Will the PM service also put devices to suspend state, or only the apps will do it? Looks like the PM Service will never enter Deep Sleep if any device is on - any exceptions?
> [HR] Only apps will do that. The PM service can decide in some cases to go to deep sleep even if specific device is active (e.g. the device is located in the always on power domain). The decision to change power state is SoC specific.
>
> In the above example, the system had to go to idle for the PLL to get turned off. If you had a central scheme to turn off clocks then the PLL could have been turned off when both i2c and spi got turned off. Just an observation.
> [HR] There are indeed several ways to solve this and there will be a need to choose the best one for the specific SoC.
>>>
>>>> Comments/concerns welcome.
>>>>
>>>> Thanks,
>>>>
>>>> Hezi
>>>>
>>>> -------------------------------------------------------------------
>>>> -
>>>> - A member of the Intel Corporation group of companies
>>>>
>>>> This e-mail and any attachments may contain confidential material
>>>> for the sole use of the intended recipient(s). Any review or
>>>> distribution by others is strictly prohibited. If you are not the
>>>> intended recipient, please contact the sender and delete all copies.
>>>>
>>> --------------------------------------------------------------------
>>> - A member of the Intel Corporation group of companies
>>>
>>> This e-mail and any attachments may contain confidential material
>>> for the sole use of the intended recipient(s). Any review or
>>> distribution by others is strictly prohibited. If you are not the
>>> intended recipient, please contact the sender and delete all copies.
>>>
>> ---------------------------------------------------------------------
>> A member of the Intel Corporation group of companies
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
---------------------------------------------------------------------
A member of the Intel Corporation group of companies

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Join devel@lists.zephyrproject.org to automatically receive all group messages.