linux/drivers/cpuidle/cpuidle-psci.c
Linus Torvalds 2504ba8b01 Power management updates for 6.3-rc1
- Add EPP support to the AMD P-state cpufreq driver (Perry Yuan, Wyes
    Karny, Arnd Bergmann, Bagas Sanjaya).
 
  - Drop the custom cpufreq driver for loongson1 that is not necessary
    any more and the corresponding cpufreq platform device (Keguang
    Zhang).
 
  - Remove "select SRCU" from system sleep, cpufreq and OPP Kconfig
    entries (Paul E. McKenney).
 
  - Enable thermal cooling for Tegra194 (Yi-Wei Wang).
 
  - Register module device table and add missing compatibles for
    cpufreq-qcom-hw (Nícolas F. R. A. Prado, Abel Vesa and Luca Weiss).
 
  - Various dt binding updates for qcom-cpufreq-nvmem and opp-v2-kryo-cpu
    (Christian Marangi).
 
  - Make kobj_type structure in the cpufreq core constant (Thomas
    Weißschuh).
 
  - Make cpufreq_unregister_driver() return void (Uwe Kleine-König).
 
  - Make the TEO cpuidle governor check CPU utilization in order to refine
   idle state selection (Kajetan Puchalski).
 
  - Make Kconfig select the haltpoll cpuidle governor when the haltpoll
    cpuidle driver is selected and replace a default_idle() call in that
    driver with arch_cpu_idle() to allow MWAIT to be used (Li RongQing).
 
  - Add Emerald Rapids Xeon support to the intel_idle driver (Artem
    Bityutskiy).
 
  - Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to
    avoid randconfig build failures (Arnd Bergmann).
 
  - Make kobj_type structures used in the cpuidle sysfs interface
    constant (Thomas Weißschuh).
 
  - Make the cpuidle driver registration code update microsecond values
    of idle state parameters in accordance with their nanosecond values
    if they are provided (Rafael Wysocki).
 
  - Make the PSCI cpuidle driver prevent topology CPUs from being
    suspended on PREEMPT_RT (Krzysztof Kozlowski).
 
  - Document that pm_runtime_force_suspend() cannot be used with
    DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald).
 
  - Add EXPORT macros for exporting PM functions from drivers (Richard
    Fitzgerald).
 
  - Remove /** from non-kernel-doc comments in hibernation code (Randy
    Dunlap).
 
  - Fix possible name leak in powercap_register_zone() (Yang Yingliang).
 
  - Add Meteor Lake and Emerald Rapids support to the intel_rapl power
    capping driver (Zhang Rui).
 
  - Modify the idle_inject power capping facility to support 100% idle
    injection (Srinivas Pandruvada).
 
  - Fix large time windows handling in the intel_rapl power capping
    driver (Zhang Rui).
 
  - Fix memory leaks with using debugfs_lookup() in the generic PM
    domains and Energy Model code (Greg Kroah-Hartman).
 
  - Add missing 'cache-unified' property in the example for kryo OPP
    bindings (Rob Herring).
 
  - Fix error checking in opp_migrate_dentry() (Qi Zheng).
 
  - Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad
    Dybcio).
 
  - Modify some power management utilities to use the canonical ftrace
    path (Ross Zwisler).
 
  - Correct spelling problems for Documentation/power/ as reported by
    codespell (Randy Dunlap).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmPuJfMSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx/5kQAJNOVImLEPLerLP8xufw30//LuDU5Gi0
 STsyDOMql/I2MpkeqeCcgrSbpy6NlEglOvg16gfpQ3qqTCLF9ypENxs9E5BGGvW0
 aEdCzvaoqmvi9PCr/jmj0EPP70/U+rIX5m/k0QdjLh9x0aLoAEe3uRJTfR9QVqXf
 I7JX0N9kjKi7YxpA5DlkHrS7J7GPPiWlesJ3p4wXuHMo3jf+6fgkoPFt8yRrGWeh
 AHzGT2BLrsy7aAUjGZB65Qx9q3fnSXMmXOjmn0Xh2njQah+zRZDwrNzwoY2HTLL/
 KQ6/Ww16USYRZtCS1fmGwAj9I+ddq6AOvhPCMn0vLXXmKVAMUrVVWnQS/0+vpm9y
 suUMK9Tndkgxd1vjby2246ThJn27uDd/ERFan4ouQo2j22uICY+SDo3osj2hMXka
 wq4zthXkY8KgjZ+MuXnZxPhcOvo8KRvfxAU0fy5efQnSkbtwY9UlMvjPBMBHm/RA
 21/6kjQNtq5vMmI37oC8DH+oPrRQ7sUKuY7HNqwO9P3QNKWVmNe7cF5UtXXxME7Q
 ULvP1d+u+TNNdHFLryPwCSzBO34wQEccdRZBjalZ8tBe6JiDWUFHC3giSURZSuzZ
 GDvzVaNX6PkgToyv4inBTB8lTp6pAuUjaWNvNJzVvUXiEKHB0ihzg5vpJW5NdwlH
 15Tn8cjH7pp0
 =lZLx
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These add EPP support to the AMD P-state cpufreq driver, add support
  for new platforms to the Intel RAPL power capping driver, intel_idle
  and the Qualcomm cpufreq driver, enable thermal cooling for Tegra194,
  drop the custom cpufreq driver for loongson1 that is not necessary any
  more (and the corresponding cpufreq platform device), fix assorted
  issues and clean up code.

  Specifics:

   - Add EPP support to the AMD P-state cpufreq driver (Perry Yuan, Wyes
     Karny, Arnd Bergmann, Bagas Sanjaya)

   - Drop the custom cpufreq driver for loongson1 that is not necessary
     any more and the corresponding cpufreq platform device (Keguang
     Zhang)

   - Remove "select SRCU" from system sleep, cpufreq and OPP Kconfig
     entries (Paul E. McKenney)

   - Enable thermal cooling for Tegra194 (Yi-Wei Wang)

   - Register module device table and add missing compatibles for
     cpufreq-qcom-hw (Nícolas F. R. A. Prado, Abel Vesa and Luca Weiss)

   - Various dt binding updates for qcom-cpufreq-nvmem and
     opp-v2-kryo-cpu (Christian Marangi)

   - Make kobj_type structure in the cpufreq core constant (Thomas
     Weißschuh)

   - Make cpufreq_unregister_driver() return void (Uwe Kleine-König)

   - Make the TEO cpuidle governor check CPU utilization in order to
     refine idle state selection (Kajetan Puchalski)

   - Make Kconfig select the haltpoll cpuidle governor when the haltpoll
     cpuidle driver is selected and replace a default_idle() call in
     that driver with arch_cpu_idle() to allow MWAIT to be used (Li
     RongQing)

   - Add Emerald Rapids Xeon support to the intel_idle driver (Artem
     Bityutskiy)

   - Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to
     avoid randconfig build failures (Arnd Bergmann)

   - Make kobj_type structures used in the cpuidle sysfs interface
     constant (Thomas Weißschuh)

   - Make the cpuidle driver registration code update microsecond values
     of idle state parameters in accordance with their nanosecond values
     if they are provided (Rafael Wysocki)

   - Make the PSCI cpuidle driver prevent topology CPUs from being
     suspended on PREEMPT_RT (Krzysztof Kozlowski)

   - Document that pm_runtime_force_suspend() cannot be used with
     DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald)

   - Add EXPORT macros for exporting PM functions from drivers (Richard
     Fitzgerald)

   - Remove /** from non-kernel-doc comments in hibernation code (Randy
     Dunlap)

   - Fix possible name leak in powercap_register_zone() (Yang Yingliang)

   - Add Meteor Lake and Emerald Rapids support to the intel_rapl power
     capping driver (Zhang Rui)

   - Modify the idle_inject power capping facility to support 100% idle
     injection (Srinivas Pandruvada)

   - Fix large time windows handling in the intel_rapl power capping
     driver (Zhang Rui)

   - Fix memory leaks with using debugfs_lookup() in the generic PM
     domains and Energy Model code (Greg Kroah-Hartman)

   - Add missing 'cache-unified' property in the example for kryo OPP
     bindings (Rob Herring)

   - Fix error checking in opp_migrate_dentry() (Qi Zheng)

   - Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad
     Dybcio)

   - Modify some power management utilities to use the canonical ftrace
     path (Ross Zwisler)

   - Correct spelling problems for Documentation/power/ as reported by
     codespell (Randy Dunlap)"

* tag 'pm-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (53 commits)
  Documentation: amd-pstate: disambiguate user space sections
  cpufreq: amd-pstate: Fix invalid write to MSR_AMD_CPPC_REQ
  dt-bindings: opp: opp-v2-kryo-cpu: enlarge opp-supported-hw maximum
  dt-bindings: cpufreq: qcom-cpufreq-nvmem: make cpr bindings optional
  dt-bindings: cpufreq: qcom-cpufreq-nvmem: specify supported opp tables
  PM: Add EXPORT macros for exporting PM functions
  cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT
  MIPS: loongson32: Drop obsolete cpufreq platform device
  powercap: intel_rapl: Fix handling for large time window
  cpuidle: driver: Update microsecond values of state parameters as needed
  cpuidle: sysfs: make kobj_type structures constant
  cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies
  PM: EM: fix memory leak with using debugfs_lookup()
  PM: domains: fix memory leak with using debugfs_lookup()
  cpufreq: Make kobj_type structure constant
  cpufreq: davinci: Fix clk use after free
  cpufreq: amd-pstate: avoid uninitialized variable use
  cpufreq: Make cpufreq_unregister_driver() return void
  OPP: fix error checking in opp_migrate_dentry()
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add SM8550 compatible
  ...
2023-02-21 12:13:58 -08:00

451 lines
10 KiB
C

// SPDX-License-Identifier: GPL-2.0-only
/*
* PSCI CPU idle driver.
*
* Copyright (C) 2019 ARM Ltd.
* Author: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
*/
#define pr_fmt(fmt) "CPUidle PSCI: " fmt
#include <linux/cpuhotplug.h>
#include <linux/cpu_cooling.h>
#include <linux/cpuidle.h>
#include <linux/cpumask.h>
#include <linux/cpu_pm.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/of.h>
#include <linux/of_device.h>
#include <linux/platform_device.h>
#include <linux/psci.h>
#include <linux/pm_domain.h>
#include <linux/pm_runtime.h>
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/syscore_ops.h>
#include <asm/cpuidle.h>
#include "cpuidle-psci.h"
#include "dt_idle_states.h"
struct psci_cpuidle_data {
u32 *psci_states;
struct device *dev;
};
static DEFINE_PER_CPU_READ_MOSTLY(struct psci_cpuidle_data, psci_cpuidle_data);
static DEFINE_PER_CPU(u32, domain_state);
static bool psci_cpuidle_use_cpuhp;
void psci_set_domain_state(u32 state)
{
__this_cpu_write(domain_state, state);
}
static inline u32 psci_get_domain_state(void)
{
return __this_cpu_read(domain_state);
}
static __cpuidle int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int idx,
bool s2idle)
{
struct psci_cpuidle_data *data = this_cpu_ptr(&psci_cpuidle_data);
u32 *states = data->psci_states;
struct device *pd_dev = data->dev;
u32 state;
int ret;
ret = cpu_pm_enter();
if (ret)
return -1;
/* Do runtime PM to manage a hierarchical CPU toplogy. */
if (s2idle)
dev_pm_genpd_suspend(pd_dev);
else
pm_runtime_put_sync_suspend(pd_dev);
state = psci_get_domain_state();
if (!state)
state = states[idx];
ret = psci_cpu_suspend_enter(state) ? -1 : idx;
if (s2idle)
dev_pm_genpd_resume(pd_dev);
else
pm_runtime_get_sync(pd_dev);
cpu_pm_exit();
/* Clear the domain state to start fresh when back from idle. */
psci_set_domain_state(0);
return ret;
}
static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int idx)
{
return __psci_enter_domain_idle_state(dev, drv, idx, false);
}
static int psci_enter_s2idle_domain_idle_state(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int idx)
{
return __psci_enter_domain_idle_state(dev, drv, idx, true);
}
static int psci_idle_cpuhp_up(unsigned int cpu)
{
struct device *pd_dev = __this_cpu_read(psci_cpuidle_data.dev);
if (pd_dev)
pm_runtime_get_sync(pd_dev);
return 0;
}
static int psci_idle_cpuhp_down(unsigned int cpu)
{
struct device *pd_dev = __this_cpu_read(psci_cpuidle_data.dev);
if (pd_dev) {
pm_runtime_put_sync(pd_dev);
/* Clear domain state to start fresh at next online. */
psci_set_domain_state(0);
}
return 0;
}
static void psci_idle_syscore_switch(bool suspend)
{
bool cleared = false;
struct device *dev;
int cpu;
for_each_possible_cpu(cpu) {
dev = per_cpu_ptr(&psci_cpuidle_data, cpu)->dev;
if (dev && suspend) {
dev_pm_genpd_suspend(dev);
} else if (dev) {
dev_pm_genpd_resume(dev);
/* Account for userspace having offlined a CPU. */
if (pm_runtime_status_suspended(dev))
pm_runtime_set_active(dev);
/* Clear domain state to re-start fresh. */
if (!cleared) {
psci_set_domain_state(0);
cleared = true;
}
}
}
}
static int psci_idle_syscore_suspend(void)
{
psci_idle_syscore_switch(true);
return 0;
}
static void psci_idle_syscore_resume(void)
{
psci_idle_syscore_switch(false);
}
static struct syscore_ops psci_idle_syscore_ops = {
.suspend = psci_idle_syscore_suspend,
.resume = psci_idle_syscore_resume,
};
static void psci_idle_init_cpuhp(void)
{
int err;
if (!psci_cpuidle_use_cpuhp)
return;
register_syscore_ops(&psci_idle_syscore_ops);
err = cpuhp_setup_state_nocalls(CPUHP_AP_CPU_PM_STARTING,
"cpuidle/psci:online",
psci_idle_cpuhp_up,
psci_idle_cpuhp_down);
if (err)
pr_warn("Failed %d while setup cpuhp state\n", err);
}
static __cpuidle int psci_enter_idle_state(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int idx)
{
u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);
return CPU_PM_CPU_IDLE_ENTER_PARAM_RCU(psci_cpu_suspend_enter, idx, state[idx]);
}
static const struct of_device_id psci_idle_state_match[] = {
{ .compatible = "arm,idle-state",
.data = psci_enter_idle_state },
{ },
};
int psci_dt_parse_state_node(struct device_node *np, u32 *state)
{
int err = of_property_read_u32(np, "arm,psci-suspend-param", state);
if (err) {
pr_warn("%pOF missing arm,psci-suspend-param property\n", np);
return err;
}
if (!psci_power_state_is_valid(*state)) {
pr_warn("Invalid PSCI power state %#x\n", *state);
return -EINVAL;
}
return 0;
}
static int psci_dt_cpu_init_topology(struct cpuidle_driver *drv,
struct psci_cpuidle_data *data,
unsigned int state_count, int cpu)
{
/* Currently limit the hierarchical topology to be used in OSI mode. */
if (!psci_has_osi_support())
return 0;
if (IS_ENABLED(CONFIG_PREEMPT_RT))
return 0;
data->dev = psci_dt_attach_cpu(cpu);
if (IS_ERR_OR_NULL(data->dev))
return PTR_ERR_OR_ZERO(data->dev);
/*
* Using the deepest state for the CPU to trigger a potential selection
* of a shared state for the domain, assumes the domain states are all
* deeper states.
*/
drv->states[state_count - 1].flags |= CPUIDLE_FLAG_RCU_IDLE;
drv->states[state_count - 1].enter = psci_enter_domain_idle_state;
drv->states[state_count - 1].enter_s2idle = psci_enter_s2idle_domain_idle_state;
psci_cpuidle_use_cpuhp = true;
return 0;
}
static int psci_dt_cpu_init_idle(struct device *dev, struct cpuidle_driver *drv,
struct device_node *cpu_node,
unsigned int state_count, int cpu)
{
int i, ret = 0;
u32 *psci_states;
struct device_node *state_node;
struct psci_cpuidle_data *data = per_cpu_ptr(&psci_cpuidle_data, cpu);
state_count++; /* Add WFI state too */
psci_states = devm_kcalloc(dev, state_count, sizeof(*psci_states),
GFP_KERNEL);
if (!psci_states)
return -ENOMEM;
for (i = 1; i < state_count; i++) {
state_node = of_get_cpu_state_node(cpu_node, i - 1);
if (!state_node)
break;
ret = psci_dt_parse_state_node(state_node, &psci_states[i]);
of_node_put(state_node);
if (ret)
return ret;
pr_debug("psci-power-state %#x index %d\n", psci_states[i], i);
}
if (i != state_count)
return -ENODEV;
/* Initialize optional data, used for the hierarchical topology. */
ret = psci_dt_cpu_init_topology(drv, data, state_count, cpu);
if (ret < 0)
return ret;
/* Idle states parsed correctly, store them in the per-cpu struct. */
data->psci_states = psci_states;
return 0;
}
static int psci_cpu_init_idle(struct device *dev, struct cpuidle_driver *drv,
unsigned int cpu, unsigned int state_count)
{
struct device_node *cpu_node;
int ret;
/*
* If the PSCI cpu_suspend function hook has not been initialized
* idle states must not be enabled, so bail out
*/
if (!psci_ops.cpu_suspend)
return -EOPNOTSUPP;
cpu_node = of_cpu_device_node_get(cpu);
if (!cpu_node)
return -ENODEV;
ret = psci_dt_cpu_init_idle(dev, drv, cpu_node, state_count, cpu);
of_node_put(cpu_node);
return ret;
}
static void psci_cpu_deinit_idle(int cpu)
{
struct psci_cpuidle_data *data = per_cpu_ptr(&psci_cpuidle_data, cpu);
psci_dt_detach_cpu(data->dev);
psci_cpuidle_use_cpuhp = false;
}
static int psci_idle_init_cpu(struct device *dev, int cpu)
{
struct cpuidle_driver *drv;
struct device_node *cpu_node;
const char *enable_method;
int ret = 0;
cpu_node = of_cpu_device_node_get(cpu);
if (!cpu_node)
return -ENODEV;
/*
* Check whether the enable-method for the cpu is PSCI, fail
* if it is not.
*/
enable_method = of_get_property(cpu_node, "enable-method", NULL);
if (!enable_method || (strcmp(enable_method, "psci")))
ret = -ENODEV;
of_node_put(cpu_node);
if (ret)
return ret;
drv = devm_kzalloc(dev, sizeof(*drv), GFP_KERNEL);
if (!drv)
return -ENOMEM;
drv->name = "psci_idle";
drv->owner = THIS_MODULE;
drv->cpumask = (struct cpumask *)cpumask_of(cpu);
/*
* PSCI idle states relies on architectural WFI to be represented as
* state index 0.
*/
drv->states[0].enter = psci_enter_idle_state;
drv->states[0].exit_latency = 1;
drv->states[0].target_residency = 1;
drv->states[0].power_usage = UINT_MAX;
strcpy(drv->states[0].name, "WFI");
strcpy(drv->states[0].desc, "ARM WFI");
/*
* If no DT idle states are detected (ret == 0) let the driver
* initialization fail accordingly since there is no reason to
* initialize the idle driver if only wfi is supported, the
* default archictectural back-end already executes wfi
* on idle entry.
*/
ret = dt_init_idle_driver(drv, psci_idle_state_match, 1);
if (ret <= 0)
return ret ? : -ENODEV;
/*
* Initialize PSCI idle states.
*/
ret = psci_cpu_init_idle(dev, drv, cpu, ret);
if (ret) {
pr_err("CPU %d failed to PSCI idle\n", cpu);
return ret;
}
ret = cpuidle_register(drv, NULL);
if (ret)
goto deinit;
cpuidle_cooling_register(drv);
return 0;
deinit:
psci_cpu_deinit_idle(cpu);
return ret;
}
/*
* psci_idle_probe - Initializes PSCI cpuidle driver
*
* Initializes PSCI cpuidle driver for all CPUs, if any CPU fails
* to register cpuidle driver then rollback to cancel all CPUs
* registration.
*/
static int psci_cpuidle_probe(struct platform_device *pdev)
{
int cpu, ret;
struct cpuidle_driver *drv;
struct cpuidle_device *dev;
for_each_possible_cpu(cpu) {
ret = psci_idle_init_cpu(&pdev->dev, cpu);
if (ret)
goto out_fail;
}
psci_idle_init_cpuhp();
return 0;
out_fail:
while (--cpu >= 0) {
dev = per_cpu(cpuidle_devices, cpu);
drv = cpuidle_get_cpu_driver(dev);
cpuidle_unregister(drv);
psci_cpu_deinit_idle(cpu);
}
return ret;
}
static struct platform_driver psci_cpuidle_driver = {
.probe = psci_cpuidle_probe,
.driver = {
.name = "psci-cpuidle",
},
};
static int __init psci_idle_init(void)
{
struct platform_device *pdev;
int ret;
ret = platform_driver_register(&psci_cpuidle_driver);
if (ret)
return ret;
pdev = platform_device_register_simple("psci-cpuidle", -1, NULL, 0);
if (IS_ERR(pdev)) {
platform_driver_unregister(&psci_cpuidle_driver);
return PTR_ERR(pdev);
}
return 0;
}
device_initcall(psci_idle_init);