IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Users may disable HWP in firmware, in which case intel_pstate wouldn't load
unless the CPU model is explicitly supported.
Add TIGERLAKE to the list of CPUs that can register intel_pstate while not
advertising the HWP capability. Without this change, an TIGERLAKE in no-HWP
mode could only use the acpi_cpufreq frequency scaling driver.
See also commits:
d8de7a44e1: cpufreq: intel_pstate: Add Skylake servers support
fbdc21e9b0: cpufreq: intel_pstate: Add Icelake servers support in no-HWP mode
706c532885: cpufreq: intel_pstate: Add Cometlake support in no-HWP mode
Reported by: M. Cargi Ari <cagriari@pm.me>
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
To avoid some new AMD processors use wrong highest perf when amd pstate
driver loaded, this fix will query the highest perf from MSR register
MSR_AMD_CPPC_CAP1 and cppc_acpi interface firstly, then compare with the
highest perf value got by calling amd_get_highest_perf() function.
The lower value will be the correct highest perf we need to use.
Otherwise the CPU max MHz will be incorrect if the
amd_get_highest_perf() did not cover the new process family and model ID.
Like this lscpu info, the max frequency is incorrect.
Vendor ID: AuthenticAMD
Socket(s): 1
Stepping: 2
CPU max MHz: 5410.0000
CPU min MHz: 400.0000
BogoMIPS: 5600.54
Fixes: 3743d55b28 (x86, sched: Fix the AMD CPPC maximum performance value on certain AMD Ryzen generations)
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Perry Yuan <Perry.Yuan@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Return the value returned by smp_call_function_single() directly instead
of storing it in another redundant variable.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
[ Viresh: Minor update to commit log ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Change the default transition latency to be 20ms that is more
reasonable transition delay for AMD processors in non-EPP driver mode.
Update transition delay time to 1ms, in the AMD CPU autonomous mode and
non-autonomous mode, CPPC firmware will decide frequency at 1ms timescale
based on the workload utilization.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Perry Yuan <Perry.Yuan@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The patch will fix the invalid desired perf value for powersave
governor. This issue is found when testing on one AMD EPYC system, the
actual des_perf is smaller than the min_perf value, that is invalid
value. because the min_perf is the lowest_perf system can support in
idle state.
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Perry Yuan <Perry.Yuan@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix the wrong lowest perf value reading which is used for new
des_perf calculation by governor requested, the incorrect min_perf will
get incorrect des_perf to be set , that will cause the system frequency
changing unexpectedly.
Reviewed-by: Huang Rui <ray.huang@amd.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Perry Yuan <Perry.Yuan@amd.com>
Signed-off-by: Su Jinzhou <jinzhou.su@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Remove the white space and correct mixed-up indentation
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Perry Yuan <Perry.Yuan@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
move the cpudata assignment to cpudata declaration which
will simplify the functions.
No functional change intended.
Reviewed-by: Huang Rui <ray.huang@amd.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Perry Yuan <Perry.Yuan@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Use module_init instead of device_initcall.
- Add a function for module_exit to unregister driver.
Signed-off-by: Zhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Make acpi_cpc_valid() check if ACPI is disabled, so that its callers
don't need to check that separately. This will also cause the AMD
pstate driver to refuse to load right away when ACPI is disabled.
Also update the warning message in amd_pstate_init() to mention the
ACPI disabled case for completeness.
Signed-off-by: Perry Yuan <Perry.Yuan@amd.com>
[ rjw: Subject edits, new changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is no need to check if the cpufreq driver implements callback
cpufreq_driver::target_index. The logic in the __resolve_freq uses
the frequency table available in the policy. It doesn't matter if the
driver provides 'target_index' or 'target' callback. It just has to
populate the 'policy->freq_table'.
Thus, check only frequency table during the frequency resolving call.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The Qualcomm SM6115 platform uses the
qcom-cpufreq-hw driver, so add it to the cpufreq-dt-platdev driver's
blocklist.
Signed-off-by: Adam Skladowski <a39.skl@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
- Fix return error code in mtk_cpu_dvfs_info_init (Yang Yingliang).
- Minor cleanups and support for new boards for Qcom cpufreq drivers
(Bryan O'Donoghue, Konrad Dybcio, Pierre Gondois, and Yicong Yang).
- Fix sparse warnings for Tegra cpufreq driver (Viresh Kumar).
- Make dev_pm_opp_set_regulators() accept NULL terminated list (Viresh
Kumar).
- Add dev_pm_opp_set_config() and friends and migrate other users and
helpers to using them (Viresh Kumar).
- Add support for multiple clocks for a device (Viresh Kumar and
Krzysztof Kozlowski).
- Configure resources before adding OPP table for Venus (Stanimir
Varbanov).
- Keep reference count up for opp->np and opp_table->np while they are
still in use (Liang He).
- Minor OPP cleanups (Viresh Kumar and Yang Li).
- Add a trace event for cpuidle to track missed (too deep or too
shallow) wakeups (Kajetan Puchalski).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmLxUA0SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxypYQAK/sYS76XzKRjVsPmC082FVlA9Helhsa
Op50DSnhfzYAWrtRZM5VPsV2CgQkmc5KCmZJSd1ZKIFcOpjlJT/rvaVaSH7Ltcn5
52GOus6KXKCL3FegQLy3bLcmKkEJIXb3uhWE2VlSuj2cxx6KE2g4bUwPE0pRr++Y
RkfaT6hcUzxxOAKw1cQhdXgBoXKL/ZeypmpZ95joYuas/mozKskM5SQFX455JCQ9
t4vaRzrsHzxi5ELiML75TYMY97sF367wSs+4jZSgPBllbJcRXEMg+JkTccKRYrsZ
k/kDvP5xVFzKT/dYpNpW3u/pl94+xZuh5WLF9/AqwC/qs7kLPJJ0/8mfTTd63DjZ
3KrkimiQ3d2XMAL4L6FoK+T8v6MwzmlN0elmHHdtmu9mY+v01CwAzjpxdvaFoELK
V6BCRRX8KNwYsrAJ4EpDK9TvPYJf8yT3jvGDcjPZY9RYlebje0Q825XOcxea4Dfe
oFxiEWgfK9gzOBvaa24oifKDy2RVy6FvR43qQeiPG4AWAFjr4qP9cDO4q5OL/BuE
sXpsGY5NE/e8JH9hkgmUK1ms50zk4UMbRC5ZoZuHWyiaFlJdMRF3cUGHe3ylPrxb
XOFZz8Zl4WeAqBjGGHuiMedwEbmQH2RhdAMCQO1nxoq3UXy6E2/ojI1G1uQ9IEm0
5FFouJ+bEnqO
=LBb0
-----END PGP SIGNATURE-----
Merge tag 'pm-5.20-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These are ARM cpufreq updates and operating performance points (OPP)
updates plus one cpuidle update adding a new trace point.
Specifics:
- Fix return error code in mtk_cpu_dvfs_info_init (Yang Yingliang).
- Minor cleanups and support for new boards for Qcom cpufreq drivers
(Bryan O'Donoghue, Konrad Dybcio, Pierre Gondois, and Yicong Yang).
- Fix sparse warnings for Tegra cpufreq driver (Viresh Kumar).
- Make dev_pm_opp_set_regulators() accept NULL terminated list
(Viresh Kumar).
- Add dev_pm_opp_set_config() and friends and migrate other users and
helpers to using them (Viresh Kumar).
- Add support for multiple clocks for a device (Viresh Kumar and
Krzysztof Kozlowski).
- Configure resources before adding OPP table for Venus (Stanimir
Varbanov).
- Keep reference count up for opp->np and opp_table->np while they
are still in use (Liang He).
- Minor OPP cleanups (Viresh Kumar and Yang Li).
- Add a trace event for cpuidle to track missed (too deep or too
shallow) wakeups (Kajetan Puchalski)"
* tag 'pm-5.20-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (55 commits)
cpuidle: Add cpu_idle_miss trace event
venus: pm_helpers: Fix warning in OPP during probe
OPP: Don't drop opp->np reference while it is still in use
OPP: Don't drop opp_table->np reference while it is still in use
cpufreq: tegra194: Staticize struct tegra_cpufreq_soc instances
dt-bindings: cpufreq: cpufreq-qcom-hw: Add SM6375 compatible
dt-bindings: opp: Add msm8939 to the compatible list
dt-bindings: opp: Add missing compat devices
dt-bindings: opp: opp-v2-kryo-cpu: Fix example binding checks
cpufreq: Change order of online() CB and policy->cpus modification
cpufreq: qcom-hw: Remove deprecated irq_set_affinity_hint() call
cpufreq: qcom-hw: Disable LMH irq when disabling policy
cpufreq: qcom-hw: Reset cancel_throttle when policy is re-enabled
cpufreq: qcom-cpufreq-hw: use HZ_PER_KHZ macro in units.h
cpufreq: mediatek: fix error return code in mtk_cpu_dvfs_info_init()
OPP: Remove dev{m}_pm_opp_of_add_table_noclk()
PM / devfreq: tegra30: Register config_clks helper
OPP: Allow config_clks helper for single clk case
OPP: Provide a simple implementation to configure multiple clocks
OPP: Assert clk_count == 1 for single clk helpers
...
Here is the set of SPDX comment updates for 6.0-rc1.
Nothing huge here, just a number of updated SPDX license tags and
cleanups based on the review of a number of common patterns in GPLv2
boilerplate text. Also included in here are a few other minor updates,
2 USB files, and one Documentation file update to get the SPDX lines
correct.
All of these have been in the linux-next tree for a very long time.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYupz3g8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynPUgCgslaf2ssCgW5IeuXbhla+ZBRAzisAnjVgOvLN
4AKdqbiBNlFbCroQwmeQ
=v1sg
-----END PGP SIGNATURE-----
Merge tag 'spdx-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
Pull SPDX updates from Greg KH:
"Here is the set of SPDX comment updates for 6.0-rc1.
Nothing huge here, just a number of updated SPDX license tags and
cleanups based on the review of a number of common patterns in GPLv2
boilerplate text.
Also included in here are a few other minor updates, two USB files,
and one Documentation file update to get the SPDX lines correct.
All of these have been in the linux-next tree for a very long time"
* tag 'spdx-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: (28 commits)
Documentation: samsung-s3c24xx: Add blank line after SPDX directive
x86/crypto: Remove stray comment terminator
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_406.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_398.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_391.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_390.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_385.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_320.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_319.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_318.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_298.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_292.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_179.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_168.RULE (part 2)
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_168.RULE (part 1)
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_160.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_152.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_149.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_147.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_133.RULE
...
- Make dev_pm_opp_set_regulators() accept NULL terminated list (Viresh
Kumar).
- Add dev_pm_opp_set_config() and friends and migrate other
users/helpers to using them (Viresh Kumar).
- Add support for multiple clocks for a device (Viresh Kumar and
Krzysztof Kozlowski).
- Configure resources before adding OPP table for Venus (Stanimir
Varbanov).
- Keep reference count up for opp->np and opp_table->np while they are
still in use (Liang He).
- Minor cleanups (Viresh Kumar and Yang Li).
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEx73Crsp7f6M6scA70rkcPK6BEhwFAmLoprIACgkQ0rkcPK6B
Ehz2nBAApgYUDGkEjWcJufIxW1mH77uonzmWUV2jQBEcCvYnjdwhJ0RpQUT75Xnk
hTYJ5v9UKwOVl+puPguUe7UzSmWcsI9AzJCj0Vr/LBiln+sawoI51lqOaNjCJkmZ
VytQJB23DNsYJAG/0xM42+syu+IONJ4vCP/9m35sWlevfFihbfQsEK+iEKsseVgd
sEwPvHyixLWyeaoAf+6apOBP2Lf+/3R8h6Iv0U8n8jOzUpQQ5r/RSDyZeARP7gze
64aXvsvr7D0Mc9GpevDJKGtPFbRNfq5I4Lg5MOZ8NQVjXOqlWJil3oYEnKQxIH0Y
EEzcrSuWi3SEeHrQfj+GFs/D7z2ZHqmbg7yb4P7zSeqLvG+7Ey9aYOXOg5LykrYk
1rZQzenLMF91RnhdRLI22SJngokOYZjWBFp62mPqmEYtx2VsYQlxqGtJoCHYDRx3
QRp0ZYJBnHQMt7saiIRFdAAIz7/G5lkiUplVzqAWe7AEpUG3Y7kvIqfwi69s3I5S
ERSf3qqx3dUGFXYoxwglEwaf8ZvKQnPOzOLmbyc9Hrj2MclfKf9vW+0/4J6iiDlu
ITpsqEWUhtEjwCt3lbM6PWNRrCJHi6YkKw0sORxEWR639cqckmk6ZAuaRPeOob6a
nZ/UvwU2LMRG1zZyrsB3bbUkijJ019RPySmjCXApLsoNT1qpUr0=
=NHBQ
-----END PGP SIGNATURE-----
Merge tag 'opp-updates-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull operating performance points (OPP) updates for 5.20-rc1 from Viresh
Kumar:
"- Make dev_pm_opp_set_regulators() accept NULL terminated list (Viresh
Kumar).
- Add dev_pm_opp_set_config() and friends and migrate other
users/helpers to using them (Viresh Kumar).
- Add support for multiple clocks for a device (Viresh Kumar and
Krzysztof Kozlowski).
- Configure resources before adding OPP table for Venus (Stanimir
Varbanov).
- Keep reference count up for opp->np and opp_table->np while they are
still in use (Liang He).
- Minor cleanups (Viresh Kumar and Yang Li)."
* tag 'opp-updates-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: (43 commits)
venus: pm_helpers: Fix warning in OPP during probe
OPP: Don't drop opp->np reference while it is still in use
OPP: Don't drop opp_table->np reference while it is still in use
OPP: Remove dev{m}_pm_opp_of_add_table_noclk()
PM / devfreq: tegra30: Register config_clks helper
OPP: Allow config_clks helper for single clk case
OPP: Provide a simple implementation to configure multiple clocks
OPP: Assert clk_count == 1 for single clk helpers
OPP: Add key specific assert() method to key finding helpers
OPP: Compare bandwidths for all paths in _opp_compare_key()
OPP: Allow multiple clocks for a device
dt-bindings: opp: accept array of frequencies
OPP: Make dev_pm_opp_set_opp() independent of frequency
OPP: Reuse _opp_compare_key() in _opp_add_static_v2()
OPP: Remove rate_not_available parameter to _opp_add()
OPP: Use consistent names for OPP table instances
OPP: Use generic key finding helpers for bandwidth key
OPP: Use generic key finding helpers for level key
OPP: Add generic key finding helpers and use them for freq APIs
OPP: Remove dev_pm_opp_find_freq_ceil_by_volt()
...
- Fix return error code in mtk_cpu_dvfs_info_init (Yang Yingliang).
- Minor cleanups and support for new boards for Qcom cpufreq drivers
(Bryan O'Donoghue, Konrad Dybcio, Pierre Gondois, and Yicong Yang).
- Fix sparse warnings for Tegra driver (Viresh Kumar).
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEx73Crsp7f6M6scA70rkcPK6BEhwFAmLoqRYACgkQ0rkcPK6B
EhzcNA/6A52UpkbXwfd63B+hCeLPTjjvnn+Z1FeRUG+ymbEDcMYQ7Z+ae3clKQ86
MmkK9UkYae50riCz9lv10UihI5JtxVkyCiCvgASxmRAvKrbODqSOu0UmMKR+CwJ/
GryoB8xCL/aAencyb3qin4fZHlwdjqBP7mgRQtoRbxjT0liFonauoG84DPhWhgMi
i22V83j4pZpDIPyfNN9AvBOiaGXx3n3zG0ODx1E7Y7P9Eg/3n1AdRPST3C4Y4y3X
79WbwBur13NKi9Ww5jw5guRKehNC8l79Ly80UQ4pRg2y4RtmyABX9qteOfvbj9Ix
JFWV+EdDLhU+mk1hHMx3GzFDWsaUYS+ZrS9oDREZ06LkN1ZDf5OFOP8nxr7SRvSD
3ZYlVA2IClt1Q//GsRiGy7kJK1qFtWVVWEE/NHS27W6VorugoHiyfNx8jYYBaOez
bNyrKklYlIFsgkRoVG0e2/2ZrzCeTVZLjsFl9992pnoDG8Fv9fBUpsHQajLkolJ4
KSZ35sxSvMzREKtuy4g8XeV2l9cES5FZ6B2iGqVo2BuoJdyqMfcQqtOyVbXAlkOz
pXSPxVuNr5w9uVQhuE0cSAiqe6joMgE5jT+5A2dIThC34nRzDLTrmQohR31nDZRj
nCsb3Pkee6j8DVwGVSNFuVJ0xvOnw1Sk8RCidiKZ9+Otu/blqaM=
=fX64
-----END PGP SIGNATURE-----
Merge tag 'cpufreq-arm-updates-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull cpufreq/ARM updates for 5.20-rc1 from Viresh Kumar:
"- Fix return error code in mtk_cpu_dvfs_info_init (Yang Yingliang).
- Minor cleanups and support for new boards for Qcom cpufreq drivers
(Bryan O'Donoghue, Konrad Dybcio, Pierre Gondois, and Yicong Yang).
- Fix sparse warnings for Tegra driver (Viresh Kumar)."
* tag 'cpufreq-arm-updates-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: tegra194: Staticize struct tegra_cpufreq_soc instances
dt-bindings: cpufreq: cpufreq-qcom-hw: Add SM6375 compatible
dt-bindings: opp: Add msm8939 to the compatible list
dt-bindings: opp: Add missing compat devices
dt-bindings: opp: opp-v2-kryo-cpu: Fix example binding checks
cpufreq: Change order of online() CB and policy->cpus modification
cpufreq: qcom-hw: Remove deprecated irq_set_affinity_hint() call
cpufreq: qcom-hw: Disable LMH irq when disabling policy
cpufreq: qcom-hw: Reset cancel_throttle when policy is re-enabled
cpufreq: qcom-cpufreq-hw: use HZ_PER_KHZ macro in units.h
cpufreq: mediatek: fix error return code in mtk_cpu_dvfs_info_init()
Merge core device power management changes for v5.20-rc1:
- Extend support for wakeirq to callback wrappers used during system
suspend and resume (Ulf Hansson).
- Defer waiting for device probe before loading a hibernation image
till the first actual device access to avoid possible deadlocks
reported by syzbot (Tetsuo Handa).
- Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP (Bjorn
Helgaas).
- Add Raptor Lake-P to the list of processors supported by the Intel
RAPL driver (George D Sworo).
- Add Alder Lake-N and Raptor Lake-P to the list of processors for
which Power Limit4 is supported in the Intel RAPL driver (Sumeet
Pawnikar).
- Make pm_genpd_remove() check genpd_debugfs_dir against NULL before
attempting to remove it (Hsin-Yi Wang).
- Change the Energy Model code to represent power in micro-Watts and
adjust its users accordingly (Lukasz Luba).
* pm-core:
PM: runtime: Extend support for wakeirq for force_suspend|resume
* pm-sleep:
PM: hibernate: defer device probing when resuming from hibernation
PM: wakeup: Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP
* powercap:
powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P
powercap: intel_rapl: Add support for RAPTORLAKE_P
* pm-domains:
PM: domains: Ensure genpd_debugfs_dir exists before remove
* pm-em:
cpufreq: scmi: Support the power scale in micro-Watts in SCMI v3.1
firmware: arm_scmi: Get detailed power scale from perf
Documentation: EM: Switch to micro-Watts scale
PM: EM: convert power field to micro-Watts precision and align drivers
A cpumask structure on the stack can cause a warning with
CONFIG_NR_CPUS=8192 (e.g. Ubuntu 22.04 uses this):
drivers/cpufreq/cpufreq_ondemand.c: In function 'od_set_powersave_bias':
drivers/cpufreq/cpufreq_ondemand.c:449:1: warning: the frame size of
1032 bytes is larger than 1024 bytes [-Wframe-larger-than=]
449 | }
| ^
CONFIG_CPUMASK_OFFSTACK=y is enabled by default for most distros, and
hence we can work around the warning by using cpumask_var_t.
Signed-off-by: Zhao Liu <zhao1.liu@linux.intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use the possessive "its" instead of the contraction "it's"
where appropriate.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
From a state where all policy->related_cpus are offline, putting one
of the policy's CPU back online re-activates the policy by:
1. Calling cpufreq_driver->online()
2. Setting the CPU in policy->cpus
qcom_cpufreq_hw_cpu_online() makes use of policy->cpus. Thus 1. and 2.
should be inverted to avoid having a policy->cpus empty. The
qcom-cpufreq-hw is the only driver affected by this.
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
With the new design in place, the show() and store() callbacks check if
the policy is active or not before proceeding any further to avoid
potential races. And in order to guarantee that cpufreq_policy_free()
must be called after clearing the policy->cpus mask, i.e. by marking the
policy inactive.
In order to avoid introducing a bug around this later, print a warning
message if we end up freeing an active policy.
Also update cpufreq_online() a bit to make sure we clear the cpus mask
for each error case before calling cpufreq_policy_free().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The SCMI v3.1 adds support for power values in micro-Watts. They are not
always in milli-Watts anymore (ignoring the bogo-Watts). Thus, the power
must be converted conditionally before sending to Energy Model. Add the
logic which handles the needed checks and conversions.
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The milli-Watts precision causes rounding errors while calculating
efficiency cost for each OPP. This is especially visible in the 'simple'
Energy Model (EM), where the power for each OPP is provided from OPP
framework. This can cause some OPPs to be marked inefficient, while
using micro-Watts precision that might not happen.
Update all EM users which access 'power' field and assume the value is
in milli-Watts.
Solve also an issue with potential overflow in calculation of energy
estimation on 32bit machine. It's needed now since the power value
(thus the 'cost' as well) are higher.
Example calculation which shows the rounding error and impact:
power = 'dyn-power-coeff' * volt_mV * volt_mV * freq_MHz
power_a_uW = (100 * 600mW * 600mW * 500MHz) / 10^6 = 18000
power_a_mW = (100 * 600mW * 600mW * 500MHz) / 10^9 = 18
power_b_uW = (100 * 605mW * 605mW * 600MHz) / 10^6 = 21961
power_b_mW = (100 * 605mW * 605mW * 600MHz) / 10^9 = 21
max_freq = 2000MHz
cost_a_mW = 18 * 2000MHz/500MHz = 72
cost_a_uW = 18000 * 2000MHz/500MHz = 72000
cost_b_mW = 21 * 2000MHz/600MHz = 70 // <- artificially better
cost_b_uW = 21961 * 2000MHz/600MHz = 73203
The 'cost_b_mW' (which is based on old milli-Watts) is misleadingly
better that the 'cost_b_uW' (this patch uses micro-Watts) and such
would have impact on the 'inefficient OPPs' information in the Cpufreq
framework. This patch set removes the rounding issue.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
commit 65c7cdedeb ("genirq: Provide new interfaces for affinity hints")
deprecates irq_set_affinity_hint(). Use the new
irq_set_affinity_and_hint() instead.
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
If LMH (Limits Management Hardware) is available, when a policy is
disabled by unplugging the last online CPU of policy->cpus, the LMH
irq is left enabled.
When the policy is re-enabled with any of the CPU in policy->cpus
being plugged in, qcom_cpufreq_ready() re-enables the irq. This
triggers the following warning:
[ 379.160106] Unbalanced enable for IRQ 154
[ 379.160120] WARNING: CPU: 7 PID: 48 at kernel/irq/manage.c:774 __enable_irq+0x84/0xc0
Thus disable the irq.
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
If LMH (Limits Management Hardware) is available, when a policy is
disabled by unplugging the last online CPU of policy->cpus,
qcom_cpufreq_hw_cpu_offline() sets cancel_throttle=true.
cancel_throttle is not reset when the policy is re-enabled with any
of the CPU in policy->cpus being plugged in. So reset it.
This patch also adds an early exit check.
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
HZ macros has been centralized in units.h since [1]. Use it to avoid
duplicated definition.
[1] commit e2c77032fc ("units: add the HZ macros")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
If regulator_get_voltage() fails, it should return the error code in
mtk_cpu_dvfs_info_init().
Fixes: 0daa47325b ("cpufreq: mediatek: Link CCI device to CPU")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Rex-BC Chen <rex-bc.chen@mediatek.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
If the regulator_get_optional() call for the SRAM regulator returns
a probe deferral, we must bail out and retry probing later: failing
to do this will produce unstabilities on platforms requiring the
handling for this regulator.
Fixes: ffa7bdf7f3 ("cpufreq: mediatek: Make sram regulator optional")
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Now that we have a central API to handle all OPP table configurations,
migrate the set-prop-name family of helpers to use the new
infrastructure.
The return type and parameter to the APIs change a bit due to this,
update the current users as well in the same commit in order to avoid
breaking builds.
Acked-by: Samuel Holland <samuel@sholland.org> # sun50i
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Now that we have a central API to handle all OPP table configurations,
migrate the set-supported-hw family of helpers to use the new
infrastructure.
The return type and parameter to the APIs change a bit due to this,
update the current users as well in the same commit in order to avoid
breaking builds.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Now that we have a central API to handle all OPP table configurations,
migrate the set-regulators family of helpers to use the new
infrastructure.
The return type and parameter to the APIs change a bit due to this,
update the current users as well in the same commit in order to avoid
breaking builds.
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The OPP core now provides a unified API for setting all configuration
types, i.e. dev_pm_opp_set_config().
Lets start using it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The OPP core now provides a unified API for setting all configuration
types, i.e. dev_pm_opp_set_config().
Lets start using it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The OPP core now provides a unified API for setting all configuration
types, i.e. dev_pm_opp_set_config().
Lets start using it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Make dev_pm_opp_set_regulators() accept a NULL terminated list of names
instead of making the callers keep the two parameters in sync, which
creates an opportunity for bugs to get in.
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Steven Price <steven.price@arm.com> # panfrost
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Recent Zhaoxin/Centaur CPUs support X86_FEATURE_IDA and the turbo boost
can be dynamically enabled or disabled through MSR 0x1a0[38] in the same
way as Intel. So add turbo boost control support for these CPUs too.
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This SoC shall use the mediatek-cpufreq driver, or the system will
crash upon any clock scaling request: add it to the cpufreq-dt-platdev
blocklist.
Fixes: 39b360102f ("cpufreq: mediatek: Add support for MT8186")
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
In pmac_cpufreq_init_MacRISC3(), we need to add corresponding
of_node_put() for the three node pointers whose refcount have
been incremented by of_find_node_by_name().
Signed-off-by: Liang He <windhl@126.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Offlining cpu6 and cpu7 and then onlining cpu6 hangs on
sc7180-trogdor-lazor because the throttle interrupt doesn't exist.
Similarly, things go sideways when suspend/resume runs. That's because
the qcom_cpufreq_hw_cpu_online() and qcom_cpufreq_hw_lmh_exit()
functions are calling genirq APIs with an interrupt value of '-6', i.e.
-ENXIO, and that isn't good.
Check the value of the throttle interrupt like we already do in other
functions in this file and bail out early from lmh code to fix the hang.
Reported-by: Rob Clark <robdclark@chromium.org>
Cc: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Fixes: a1eb080a04 ("cpufreq: qcom-hw: provide online/offline operations")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
In qoriq_cpufreq_probe(), of_find_matching_node() will return a
node pointer with refcount incremented. We should use of_node_put()
when it is not used anymore.
Fixes: 157f527639 ("cpufreq: qoriq: convert to a platform driver")
[ Viresh: Fixed Author's name in commit log ]
Signed-off-by: Liang He <windhl@126.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
When system resumes from S3, the CPPC enable register will be
cleared and reset to 0.
So enable the CPPC interface by writing 1 to this register on
system resume and disable it during system suspend.
Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
[ rjw: Subject and changelog edits ]
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This change was introduced long back by commit 4f750c9308 ("cpufreq:
Synchronize the cpufreq store_*() routines with CPU hotplug").
Since then, both cpufreq and hotplug core have been reworked and have
much better locking in place. The race mentioned in commit 4f750c9308
isn't possible anymore.
Drop the unnecessary locking.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Instead of specially adding a space for each CPU, except the first one,
lets add space for each of them and remove it at the end.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Based on the normalized pattern:
this program is free software you can redistribute it and/or modify it
under the terms of the gnu general public license version 2 as
published by the free software foundation this program is distributed
as is without any warranty of any kind whether express or implied
without even the implied warranty of merchantability or fitness for a
particular purpose see the gnu general public license for more details
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference.
Reviewed-by: Allison Randal <allison@lohutok.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on the normalized pattern:
this file is licensed under the terms of the gnu general public
license version 2 this program is licensed as is without any warranty
of any kind whether express or implied
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The second part of the multiplatform changes now converts the
Intel/Marvell PXA platform along with the rest. The patches went through
several rebases before the merge window as bugs were found, so they
remained separate.
This has to touch a lot of drivers, in particular the touchscreen,
pcmcia, sound and clk bits, to detach the driver files from the
platform and board specific header files.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmKZKqsACgkQmmx57+YA
GNnO/w//dgJBlkmoIIKlG2eJsvoUKwDt7MuLEMCqSqYYUSvMENFwKK66INMDIJ3l
PmKf94JadlpBm2OB2vzW+D1EtaLGX9eXZkKD+vyB1I1yFkKdzEPcAfitfrRwe58E
pR4nQd/jVL4UCY+pp442O1q9VvMpMV9P4ILJGPS/PpsD5CT9Gn8m9svIIuNuDRFd
nwpyZC3l32jVLo9iuLmwZUvxtOWI3hTqZrnxhByBhlvtnGexRsq/VhfubK2uzBi1
CyWHjqzOSmseGmsUDwv9LFqVV9YRCeisS3IElA5L0VgM0XvHKA+f9qyF7V6zI20g
y9LtqhdAtiTpE/aUrOW2LDYaM/bc7RilYZrWchoZbCEsHhV4C+ld3QoTyxvGscvG
tbznhvZKdUNX8LHS0J9NqIj1q1YGN5ei5r/C5R8DBj1q8VcTVnq3dms8xzVTd35o
xS5BbLFliiI96jc7S6LaQizXheYjAfdPhmXUAxNXvWIVQ6SXnf8/U/RB9Zzjb8hm
FH2Gu8m/Dh2MHKBBRWSVw8VahV0V7WiEaWeYuwwTbW1wUrsWiizVaPnqrt6Cq9DW
oJZgBvktWEXUQz73qrnvwo9GjcKqAxaWKWq05hHKHKuLGezsPAyIhIKr51V2xqqw
cp2OIMCsN5GYENOhHvt6BMRAI5iA4VyFDtWAqw9B6EIwno6N7Z4=
=cnSb
-----END PGP SIGNATURE-----
Merge tag 'arm-multiplatform-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull more ARM multiplatform updates from Arnd Bergmann:
"The second part of the multiplatform changes now converts the
Intel/Marvell PXA platform along with the rest. The patches went
through several rebases before the merge window as bugs were found, so
they remained separate.
This has to touch a lot of drivers, in particular the touchscreen,
pcmcia, sound and clk bits, to detach the driver files from the
platform and board specific header files"
* tag 'arm-multiplatform-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (48 commits)
ARM: pxa/mmp: remove traces of plat-pxa
ARM: pxa: convert to multiplatform
ARM: pxa/sa1100: move I/O space to PCI_IOBASE
ARM: pxa: remove support for MTD_XIP
ARM: pxa: move mach/*.h to mach-pxa/
ARM: PXA: fix multi-cpu build of xsc3
ARM: pxa: move plat-pxa to drivers/soc/
ARM: mmp: rename pxa_register_device
ARM: mmp: remove tavorevb board support
ARM: pxa: remove unused mach/bitfield.h
ARM: pxa: move clk register definitions to driver
ARM: pxa: move smemc register access from clk to platform
cpufreq: pxa3: move clk register access to clk driver
ARM: pxa: remove get_clk_frequency_khz()
ARM: pxa: pcmcia: move smemc configuration back to arch
ASoC: pxa: i2s: use normal MMIO accessors
ASoC: pxa: ac97: use normal MMIO accessors
ASoC: pxa: use pdev resource for FIFO regs
Input: wm97xx - get rid of irq_enable method in wm97xx_mach_ops
Input: wm97xx - switch to using threaded IRQ
...
Building the cppc_cpufreq driver with for arm64 with
CONFIG_ENERGY_MODEL=n triggers the following warnings:
drivers/cpufreq/cppc_cpufreq.c:550:12: error: ‘cppc_get_cpu_cost’ defined but not used
[-Werror=unused-function]
550 | static int cppc_get_cpu_cost(struct device *cpu_dev, unsigned long KHz,
| ^~~~~~~~~~~~~~~~~
drivers/cpufreq/cppc_cpufreq.c:481:12: error: ‘cppc_get_cpu_power’ defined but not used
[-Werror=unused-function]
481 | static int cppc_get_cpu_power(struct device *cpu_dev,
| ^~~~~~~~~~~~~~~~~~
Move the Energy Model related functions into specific guards.
This allows to fix the warning and prevent doing extra work
when the Energy Model is not present.
Fixes: 740fcdc2c2 ("cpufreq: CPPC: Register EM based on efficiency class information")
Reported-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Tested-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If CONFIG_ACPI_CPPC_CPUFREQ_FIE is not set, building fails:
drivers/cpufreq/cppc_cpufreq.c: In function ‘populate_efficiency_class’:
drivers/cpufreq/cppc_cpufreq.c:584:2: error: ‘cppc_cpufreq_driver’ undeclared (first use in this function); did you mean ‘cpufreq_driver’?
cppc_cpufreq_driver.register_em = cppc_cpufreq_register_em;
^~~~~~~~~~~~~~~~~~~
cpufreq_driver
Make declare of cppc_cpufreq_driver out of CONFIG_ACPI_CPPC_CPUFREQ_FIE
to fix this.
Fixes: 740fcdc2c2 ("cpufreq: CPPC: Register EM based on efficiency class information")
Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Tegra234 cpufreq support (Sumit Gupta).
- Mediatek cleanups and enhancements (Wan Jiabing, Rex-BC Chen, and
Jia-Wei Chang).
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEx73Crsp7f6M6scA70rkcPK6BEhwFAmKMabYACgkQ0rkcPK6B
Ehz/aA//S/UOQPcDA+rTvIhnrNOKPd0Ppios+xWbw5TSuNyJYRK+DWHdjBeKE6oW
/kjq9UMub3ju1TLQbkf3QYRl0shdIAhP5/779YOTILvCOw6j2CDgjPls+rNpxUpl
Aob/XVkPh/pjjaYUdrvYz/KhduxX/tOowXiVPAxB15+jfm7xA3QFXHh9+m4Amlr8
iFqQSxyJEupw5DG1I/NUgRYX/tXFl38L0jyMCK1po+n5EWFWkRrJAMGc6FWpiy6x
UH0zIZ991tYnEk1sIFWOVWf5Nj4XqmsJ9Sz9PqFihlYLMHVHm7W6NnsYKqOx47ba
073r6Jgt5K5/mhtXVpolnRxFBHOtxS48IsQqVmteGrgdbvMCJ5dYDacEbO4QizyH
u9AB1Oljn1pwzOear+xEPeqfr4+6iO/LutDeTypNvYVfnjPWYPnPEp6wOeaPD3qJ
dOX0T9mywavcJ9gFyhpc2tzRBkSia7spEYw5gDvtCsU0iVPMesdjazQQbmiomQWU
v4DaMIk23htG8tqLDXbFenZ29SKZvvTpGXcNF4EAW8DUk3eruIw23Rd9ZmL90q+i
yQj49owl7t0O0RapbUlhhkv0nLaanNK20jTopSpQW+okD8bmPpMsiQgfnDDD3qgO
lBGSs5wuJfznoUjjNB7UMzkOkLr0UnwwgEL/+xEezFJKeb8sgTA=
=DJhA
-----END PGP SIGNATURE-----
Merge tag 'cpufreq-arm-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull ARM cpufreq updates for 5.19-rc1 from Viresh Kumar:
- Tegra234 cpufreq support (Sumit Gupta).
- Mediatek cleanups and enhancements (Wan Jiabing, Rex-BC Chen, and
Jia-Wei Chang).
* tag 'cpufreq-arm-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: (21 commits)
cpufreq: mediatek: Add support for MT8186
cpufreq: mediatek: Link CCI device to CPU
dt-bindings: cpufreq: mediatek: Add MediaTek CCI property
cpufreq: mediatek: Fix potential deadlock problem in mtk_cpufreq_set_target
cpufreq: mediatek: Add opp notification support
cpufreq: mediatek: Refine mtk_cpufreq_voltage_tracking()
cpufreq: mediatek: Move voltage limits to platform data
cpufreq: mediatek: Unregister platform device on exit
cpufreq: mediatek: Fix NULL pointer dereference in mediatek-cpufreq
cpufreq: mediatek: Make sram regulator optional
cpufreq: mediatek: Record previous target vproc value
cpufreq: mediatek: Replace old_* with pre_*
cpufreq: mediatek: Use device print to show logs
cpufreq: mediatek: Enable clocks and regulators
cpufreq: mediatek: Remove unused headers
cpufreq: mediatek: Cleanup variables and error handling in mtk_cpu_dvfs_info_init()
cpufreq: mediatek: Use module_init and add module_exit
arm64: tegra: add node for tegra234 cpufreq
cpufreq: tegra194: Add support for Tegra234
cpufreq: tegra194: add soc data to support multiple soc
...
The communication mean of the _CPC desired performance can be
PCC, System Memory, System IO, or Functional Fixed Hardware (FFH).
PCC, SystemMemory and SystemIo address spaces are available from any
CPU. Thus, dvfs_possible_from_any_cpu should be enabled in such case.
For FFH, let the FFH implementation do smp_call_function_*() calls.
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The communication mean of the _CPC desired performance can be
PCC, System Memory, System IO, or Functional Fixed Hardware.
commit b7898fda5b ("cpufreq: Support for fast frequency switching")
fast_switching is 'for switching CPU frequencies from interrupt
context'.
Writes to SystemMemory and SystemIo are fast and suitable this.
This is not the case for PCC and might not be the case for FFH.
Enable fast_switching for the cppc_cpufreq driver in above cases.
Add cppc_allow_fast_switch() to check the desired performance
register address space and set fast_switching accordingly.
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpufreq_offline() calls offline() and exit() under the policy rwsem
But they are called outside the rwsem in cpufreq_online().
Make cpufreq_online() call offline() and exit() as well as online() and
init() under the policy rwsem to achieve a clear lock relationship.
All of the init() and online() implementations in the tree only
initialize the policy object without attempting to acquire the policy
rwsem and they won't call cpufreq APIs attempting to acquire it.
Signed-off-by: Schspa Shi <schspa@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If policy initialization fails after the sysfs files are created,
there is a possibility to end up running show()/store() callbacks
for half-initialized policies, which may have unpredictable
outcomes.
Abort show()/store() in such a case by making sure the policy is active.
Also dectivate the policy on such failures.
Signed-off-by: Schspa Shi <schspa@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently, cpufreq_remove_dev() invokes the ->exit() driver callback
without holding the policy rwsem which is inconsistent with what
happens if ->exit() is invoked directly from cpufreq_offline().
It also manipulates the real_cpus mask and removes the CPU device
symlink without holding the policy rwsem, but cpufreq_offline() holds
the rwsem around the modifications thereof.
For consistency, modify cpufreq_remove_dev() to hold the policy rwsem
until the ->exit() callback has been called (or it has been determined
that it is not necessary to call it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Split the "core" part running under the policy rwsem out of
cpufreq_offline() to allow the locking in cpufreq_remove_dev() to be
rearranged more easily.
As a side-effect this eliminates the unlock label that's not needed
any more.
No expected functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Notice that cpufreq_offline() only needs to check policy_is_inactive()
once and rearrange the code in there to make that happen.
No expected functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
The platform data of MT8186 is different from previous MediaTek SoCs,
so we add a new compatible and platform data for it.
Signed-off-by: Jia-Wei Chang <jia-wei.chang@mediatek.com>
Signed-off-by: Rex-BC Chen <rex-bc.chen@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
In some MediaTek SoCs, like MT8183, CPU and CCI share the same power
supplies. Cpufreq needs to check if CCI devfreq exists and wait until
CCI devfreq ready before scaling frequency.
Before CCI devfreq is ready, we record the voltage when booting to
kernel and use the max(cpu target voltage, booting voltage) to
prevent cpufreq adjust to the lower voltage which will cause the CCI
crash because of high frequency and low voltage.
- Add is_ccifreq_ready() to link CCI device to CPI, and CPU will start
DVFS when CCI is ready.
- Add platform data for MT8183.
Signed-off-by: Jia-Wei Chang <jia-wei.chang@mediatek.com>
Signed-off-by: Rex-BC Chen <rex-bc.chen@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
add_cpu_dev_symlink() is responsible for setting the CPUs in the
real_cpus mask, the reverse of which should be done from
remove_cpu_dev_symlink() to make it look clean and avoid any breakage
later on.
Move the call to clear the mask to remove_cpu_dev_symlink().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Prevent intel_pstate to load when OOB (Out Of Band) P-states mode is
enabled in Sapphire Rapids. The OOB identifying bits are same as the
prior generation CPUs like Ice Lake servers. So, also add Sapphire
Rapids to intel_pstate_cpu_oob_ids list.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix following coccichek error:
./drivers/cpufreq/mediatek-cpufreq.c:199:2-8: preceding lock on line
./drivers/cpufreq/mediatek-cpufreq.c:208:2-8: preceding lock on line
mutex_lock is acquired but not released before return.
Use 'goto out' to help releasing the mutex_lock.
Fixes: c210063b40 ("cpufreq: mediatek: Add opp notification support")
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Rex-BC Chen <rex-bc.chen@mediatek.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
This reverts commit f346e96267.
The commit tried to fix a possible real bug but it made it even worse.
The fix was simply buggy as now an error out to out_offline_policy or
out_exit_policy will try to release a semaphore which was never taken in
the first place. This works fine only if we failed late, i.e. via
out_destroy_policy.
Fixes: f346e96267 ("cpufreq: Fix possible race in cpufreq online error path")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The driver needs some low-level register access for setting
the core and bus frequencies. These registers are owned
by the clk driver, so move the low-level access into that
driver with a slightly higher-level interface and avoid
any machine header file dependencies.
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-clk@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
get_clk_frequency_khz() is not a proper name for a global function,
and there is only one caller.
Convert viper to use the properly namespaced
pxa25x_get_clk_frequency_khz() and remove the other references.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Cc: linux-pm@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Performance states and energy consumption values are not advertised
in ACPI. In the GicC structure of the MADT table, the "Processor
Power Efficiency Class field" (called efficiency class from now)
allows to describe the relative energy efficiency of CPUs.
To leverage the EM and EAS, the CPPC driver creates a set of
artificial performance states and registers them in the Energy Model
(EM), such as:
- Every 20 capacity unit, a performance state is created.
- The energy cost of each performance state gradually increases.
No power value is generated as only the cost is used in the EM.
During task placement, a task can raise the frequency of its whole
pd. This can make EAS place a task on a pd with CPUs that are
individually less energy efficient.
As cost values are artificial, and to place tasks on CPUs with the
lower efficiency class, a gap in cost values is generated for adjacent
efficiency classes.
E.g.:
- efficiency class = 0, capacity is in [0-1024], so cost values
are in [0: 51] (one performance state every 20 capacity unit)
- efficiency class = 1, capacity is in [0-1024], cost values
are in [1*gap+0: 1*gap+51].
The value of the cost gap is chosen to absorb a the energy of 4 CPUs
at their maximum capacity. This means that between:
1- a pd of 4 CPUs, each of them being used at almost their full
capacity. Their efficiency class is N.
2- a CPU using almost none of its capacity. Its efficiency class is
N+1
EAS will choose the first option.
This patch also populates the (struct cpufreq_driver).register_em
callback if the valid efficiency_class ACPI values are provided.
Signed-off-by: Pierre Gondois <Pierre.Gondois@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In ACPI, describing power efficiency of CPUs can be done through the
following arm specific field:
ACPI 6.4, s5.2.12.14 'GIC CPU Interface (GICC) Structure',
'Processor Power Efficiency Class field':
Describes the relative power efficiency of the associated pro-
cessor. Lower efficiency class numbers are more efficient than
higher ones (e.g. efficiency class 0 should be treated as more
efficient than efficiency class 1). However, absolute values
of this number have no meaning: 2 isn’t necessarily half as
efficient as 1.
The efficiency_class field is stored in the GicC structure of the
ACPI MADT table and it's currently supported in Linux for arm64 only.
Thus, this new functionality is introduced for arm64 only.
To allow the cppc_cpufreq driver to know and preprocess the
efficiency_class values of all the CPUs, add a per_cpu efficiency_class
variable to store them.
At least 2 different efficiency classes must be present,
otherwise there is no use in creating an Energy Model.
The efficiency_class values are squeezed in [0:#efficiency_class-1]
while conserving the order. For instance, efficiency classes of:
[111, 212, 250]
will be mapped to:
[0 (was 111), 1 (was 212), 2 (was 250)].
Each policy being independently registered in the driver, populating
the per_cpu efficiency_class is done only once at the driver
initialization. This prevents from having each policy re-searching the
efficiency_class values of other CPUs. The EM will be registered in a
following patch.
The patch also exports acpi_cpu_get_madt_gicc() to fetch the GicC
structure of the ACPI MADT table for each CPU.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Pierre Gondois <Pierre.Gondois@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
For some platforms, the frequency returned by hardware may be slightly
different from what is provided in the frequency table. For example,
hardware may return 499 MHz instead of 500 MHz. In such cases it is
better to avoid getting into unnecessary frequency updates, as we may
end up switching policy->cur between the two and sending unnecessary
pre/post update notifications, etc.
This patch has chosen allows the hardware frequency and table frequency
to deviate by 1 MHz for now, we may want to increase it a bit later on
if someone still complains.
Reported-by: Rex-BC Chen <rex-bc.chen@mediatek.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Jia-wei Chang <jia-wei.chang@mediatek.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
From this opp notifier, cpufreq should listen to opp notification and do
proper actions when receiving events of disable and voltage adjustment.
One of the user for this opp notifier is MediaTek SVS.
The MediaTek Smart Voltage Scaling (SVS) is a hardware which calculates
suitable SVS bank voltages to OPP voltage table.
Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
Signed-off-by: Jia-Wei Chang <jia-wei.chang@mediatek.com>
Signed-off-by: Rex-BC Chen <rex-bc.chen@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
[ Viresh: Renamed opp_freq as current_freq and moved its initialization ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Because the difference of sram and proc should in a range of min_volt_shift
and max_volt_shift. We need to adjust the sram and proc step by step.
We replace VOLT_TOL (voltage tolerance) with the platform data and update the
logic to determine the voltage boundary and invoking regulator_set_voltage.
- Use 'sram_min_volt' and 'sram_max_volt' to determine the voltage boundary
of sram regulator.
- Use (sram_min_volt - min_volt_shift) and 'proc_max_volt' to determine the
voltage boundary of vproc regulator.
Moreover, to prevent infinite loop when tracking voltage, we calculate the
maximum value for each platform data.
We assume min voltage is 0 and tracking target voltage using
min_volt_shift for each iteration.
The retry_max is 3 times of expeted iteration count.
Signed-off-by: Jia-Wei Chang <jia-wei.chang@mediatek.com>
Signed-off-by: Rex-BC Chen <rex-bc.chen@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Voltages and shifts are defined as macros originally.
There are different requirements of these values for each MediaTek SoCs.
Therefore, we add the platform data and move these values into it.
Signed-off-by: Jia-Wei Chang <jia-wei.chang@mediatek.com>
Signed-off-by: Rex-BC Chen <rex-bc.chen@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
We register the platform device when driver inits. However, we do not
unregister it when driver exits.
To resolve this, we declare the platform data to be a global static
variable and rename it to be "cpufreq_pdev". With this global variable,
we can do platform_device_unregister() when driver exits.
Fixes: 501c574f4e ("cpufreq: mediatek: Add support of cpufreq to MT2701/MT7623 SoC")
Signed-off-by: Rex-BC Chen <rex-bc.chen@mediatek.com>
[ Viresh: Commit log and Subject ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Fix following coccicheck error:
drivers/cpufreq/mediatek-cpufreq.c:464:16-23: ERROR: info is NULL but dereferenced.
Use pr_err instead of dev_err to avoid dereferring a NULL pointer.
Fixes: f52b16ba9fe4 ("cpufreq: mediatek: Use device print to show logs")
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
For some MediaTek SoCs, like MT8186, it's possible that the sram regulator
is shared between CPU and CCI.
We hope regulator framework can return error for error handling rather
than a dummy handler from regulator_get api.
Therefore, we choose to use regulator_get_optional.
Signed-off-by: Jia-Wei Chang <jia-wei.chang@mediatek.com>
Signed-off-by: Rex-BC Chen <rex-bc.chen@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
We found the buck voltage may not be exactly the same with what we set
because CPU may share the same buck with other module.
Therefore, we need to record the previous desired value instead of reading
it from regulators.
Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
Signed-off-by: Jia-Wei Chang <jia-wei.chang@mediatek.com>
Signed-off-by: Rex-BC Chen <rex-bc.chen@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
To make driver more readable, replace old_* with pre_*.
Signed-off-by: Rex-BC Chen <rex-bc.chen@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
- Replace pr_* with dev_* to show logs.
- Remove usage of __func__.
Signed-off-by: Rex-BC Chen <rex-bc.chen@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
We need to enable regulators so that the max and min requested values will
be recorded.
The intermediate clock is not always enabled by CCF in different projects,
so we should enable it in the cpufreq driver.
Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
Signed-off-by: Jia-Wei Chang <jia-wei.chang@mediatek.com>
Signed-off-by: Rex-BC Chen <rex-bc.chen@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
- Remove several unnecessary varaibles in mtk_cpu_dvfs_info_init().
- Unify error message format and use dev_err_probe() if possible.
Signed-off-by: Jia-Wei Chang <jia-wei.chang@mediatek.com>
Signed-off-by: Rex-BC Chen <rex-bc.chen@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
- Use module_init instead of device_initcall.
- Add a function for module_exit to unregister driver.
Signed-off-by: Jia-Wei Chang <jia-wei.chang@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
This patch adds driver support for Tegra234 cpufreq.
Tegra234 has per core MMIO registers instead of system registers for
cpu frequency requests and to read the counters for re-constructing
the cpu frequency. Also, MPIDR affinity info in Tegra234 is different
from Tegra194.
Added ops hooks and soc data for Tegra234. This will help to easily
add variants of Tegra234 and future SoC's which use similar logic to
{get|set} the cpu frequency.
Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Adding SoC data and ops to support multiple SoC's in same driver.
Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
It's noted that dcvs interrupts are not self-clearing, thus an interrupt
handler runs constantly, which leads to a severe regression in runtime.
To fix the problem an explicit write to clear interrupt register is
required, note that on OSM platforms the register may not be present.
Fixes: 275157b367 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
When cpufreq online fails, the policy->cpus mask is not cleared and
policy->rwsem is released too early, so the driver can be invoked
via the cpuinfo_cur_freq sysfs attribute while its ->offline() or
->exit() callbacks are being run.
Take policy->clk as an example:
static int cpufreq_online(unsigned int cpu)
{
...
// policy->cpus != 0 at this time
down_write(&policy->rwsem);
ret = cpufreq_add_dev_interface(policy);
up_write(&policy->rwsem);
return 0;
out_destroy_policy:
for_each_cpu(j, policy->real_cpus)
remove_cpu_dev_symlink(policy, get_cpu_device(j));
up_write(&policy->rwsem);
...
out_exit_policy:
if (cpufreq_driver->exit)
cpufreq_driver->exit(policy);
clk_put(policy->clk);
// policy->clk is a wild pointer
...
^
|
Another process access
__cpufreq_get
cpufreq_verify_current_freq
cpufreq_generic_get
// acces wild pointer of policy->clk;
|
|
out_offline_policy: |
cpufreq_policy_free(policy); |
// deleted here, and will wait for no body reference
cpufreq_policy_put_kobj(policy);
}
Address this by modifying cpufreq_online() to release policy->rwsem
in the error path after the driver callbacks have run and to clear
policy->cpus before releasing the semaphore.
Fixes: 7106e02bae ("cpufreq: release policy->rwsem on error")
Signed-off-by: Schspa Shi <schspa@gmail.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The mach/hardware.h is included in lots of places, and it provides
three different things on pxa:
- the cpu_is_pxa* macros
- an indirect inclusion of mach/addr-map.h
- the __REG() and io_pv2() helper macros
Split it up into separate <linux/soc/pxa/cpu.h> and mach/pxa-regs.h
headers, then change all the files that use mach/hardware.h to
include the exact set of those three headers that they actually
need, allowing for further more targeted cleanup.
linux/soc/pxa/cpu.h can remain permanently exported and is now in
a global location along with similar headers. pxa-regs.h and
addr-map.h are only used in a very small number of drivers now
and can be moved to arch/arm/mach-pxa/ directly when those drivers
are to pass the necessary data as resources.
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Cc: Pavel Machek <pavel@ucw.cz>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Acked-by: Mark Brown <broonie@kernel.org>
Cc: linux-clk@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-input@vger.kernel.org
Cc: linux-leds@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-mtd@lists.infradead.org
Cc: linux-rtc@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-fbdev@vger.kernel.org
Cc: linux-watchdog@vger.kernel.org
Cc: alsa-devel@alsa-project.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Problem statement:
Once the user has disabled turbo frequency by
# echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
the cfs_rq's util_avg becomes quite small when compared with
CPU capacity.
Step to reproduce:
# echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
# ./x86_cpuload --count 1 --start 3 --timeout 100 --busy 99
would launch 1 thread and bind it to CPU3, lasting for 100 seconds,
with a CPU utilization of 99%. [1]
top result:
%Cpu3 : 98.4 us, 0.0 sy, 0.0 ni, 1.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
check util_avg:
cat /sys/kernel/debug/sched/debug | grep "cfs_rq\[3\]" -A 20 | grep util_avg
.util_avg : 611
So the util_avg/cpu capacity is 611/1024, which is much smaller than
98.4% shown in the top result.
This might impact some logic in the scheduler. For example,
group_is_overloaded() would compare the group_capacity and group_util
in the sched group, to check if this sched group is overloaded or not.
With this gap, even when there is a nearly 100% workload, the sched
group will not be regarded as overloaded. Besides group_is_overloaded(),
there are also other victims. There is a ongoing work that aims to
optimize the task wakeup in a LLC domain. The main idea is to stop
searching idle CPUs if the sched domain is overloaded[2]. This proposal
also relies on the util_avg/CPU capacity to decide whether the LLC
domain is overloaded.
Analysis:
CPU frequency invariance has caused this difference. In summary,
the util_sum of cfs rq would decay quite fast when the CPU is in
idle, when the CPU frequency invariance is enabled.
The detail is as followed:
As depicted in update_rq_clock_pelt(), when the frequency invariance
is enabled, there would be two clock variables on each rq, clock_task
and clock_pelt:
The clock_pelt scales the time to reflect the effective amount of
computation done during the running delta time but then syncs back to
clock_task when rq is idle.
absolute time | 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|16
@ max frequency ------******---------------******---------------
@ half frequency ------************---------************---------
clock pelt | 1| 2| 3| 4| 7| 8| 9| 10| 11|14|15|16
The fast decay of util_sum during idle is due to:
1. rq->clock_pelt is always behind rq->clock_task
2. rq->last_update is updated to rq->clock_pelt' after invoking
___update_load_sum()
3. Then the CPU becomes idle, the rq->clock_pelt' would be suddenly
increased a lot to rq->clock_task
4. Enters ___update_load_sum() again, the idle period is calculated by
rq->clock_task - rq->last_update, AKA, rq->clock_task - rq->clock_pelt'.
The lower the CPU frequency is, the larger the delta =
rq->clock_task - rq->clock_pelt' will be. Since the idle period will be
used to decay the util_sum only, the util_sum drops significantly during
idle period.
Proposal:
This symptom is not only caused by disabling turbo frequency, but it
would also appear if the user limits the max frequency at runtime.
Because, if the frequency is always lower than the max frequency,
CPU frequency invariance would decay the util_sum quite fast during
idle.
As some end users would disable turbo after boot up, this patch aims to
present this symptom and deals with turbo scenarios for now.
It might be ideal if CPU frequency invariance is aware of the max CPU
frequency (user specified) at runtime in the future.
Link: https://github.com/yu-chen-surf/x86_cpuload.git#1
Link: https://lore.kernel.org/lkml/20220310005228.11737-1-yu.c.chen@intel.com/#2
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
powerpc's asm/prom.h brings some headers that it doesn't
need itself.
In order to clean it up, first add missing headers in
users of asm/prom.h
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The .active_power() callback passes the device pointer when it's called.
Aligned with a convetion present in other subsystems and pass the 'dev'
as a first argument. It looks more cleaner.
Adjust all affected drivers which implement that API callback.
Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The struct dbs_data embeds a struct gov_attr_set and
the struct gov_attr_set embeds a kobject. Since every kobject must have
a release() method and we can't use kfree() to free it directly,
so introduce cpufreq_dbs_data_release() to release the dbs_data via
the kobject::release() method. This fixes the calltrace like below:
ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x34
WARNING: CPU: 12 PID: 810 at lib/debugobjects.c:505 debug_print_object+0xb8/0x100
Modules linked in:
CPU: 12 PID: 810 Comm: sh Not tainted 5.16.0-next-20220120-yocto-standard+ #536
Hardware name: Marvell OcteonTX CN96XX board (DT)
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : debug_print_object+0xb8/0x100
lr : debug_print_object+0xb8/0x100
sp : ffff80001dfcf9a0
x29: ffff80001dfcf9a0 x28: 0000000000000001 x27: ffff0001464f0000
x26: 0000000000000000 x25: ffff8000090e3f00 x24: ffff80000af60210
x23: ffff8000094dfb78 x22: ffff8000090e3f00 x21: ffff0001080b7118
x20: ffff80000aeb2430 x19: ffff800009e8f5e0 x18: 0000000000000000
x17: 0000000000000002 x16: 00004d62e58be040 x15: 013590470523aff8
x14: ffff8000090e1828 x13: 0000000001359047 x12: 00000000f5257d14
x11: 0000000000040591 x10: 0000000066c1ffea x9 : ffff8000080d15e0
x8 : ffff80000a1765a8 x7 : 0000000000000000 x6 : 0000000000000001
x5 : ffff800009e8c000 x4 : ffff800009e8c760 x3 : 0000000000000000
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0001474ed040
Call trace:
debug_print_object+0xb8/0x100
__debug_check_no_obj_freed+0x1d0/0x25c
debug_check_no_obj_freed+0x24/0xa0
kfree+0x11c/0x440
cpufreq_dbs_governor_exit+0xa8/0xac
cpufreq_exit_governor+0x44/0x90
cpufreq_set_policy+0x29c/0x570
store_scaling_governor+0x110/0x154
store+0xb0/0xe0
sysfs_kf_write+0x58/0x84
kernfs_fop_write_iter+0x12c/0x1c0
new_sync_write+0xf0/0x18c
vfs_write+0x1cc/0x220
ksys_write+0x74/0x100
__arm64_sys_write+0x28/0x3c
invoke_syscall.constprop.0+0x58/0xf0
do_el0_svc+0x70/0x170
el0_svc+0x54/0x190
el0t_64_sync_handler+0xa4/0x130
el0t_64_sync+0x1a0/0x1a4
irq event stamp: 189006
hardirqs last enabled at (189005): [<ffff8000080849d0>] finish_task_switch.isra.0+0xe0/0x2c0
hardirqs last disabled at (189006): [<ffff8000090667a4>] el1_dbg+0x24/0xa0
softirqs last enabled at (188966): [<ffff8000080106d0>] __do_softirq+0x4b0/0x6a0
softirqs last disabled at (188957): [<ffff80000804a618>] __irq_exit_rcu+0x108/0x1a4
[ rjw: Because can be freed by the gov_attr_set_put() in
cpufreq_dbs_governor_exit() now, it is also necessary to put the
invocation of the governor ->exit() callback into the new
cpufreq_dbs_data_release() function. ]
Fixes: c443563036 ("cpufreq: governor: New sysfs show/store callbacks for governor tunables")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
On QCOM platforms with EPSS flavour of cpufreq IP a throttled frequency is
obtained from another register REG_DOMAIN_STATE, thus the helper function
qcom_lmh_get_throttle_freq() should be modified accordingly, as for now
it returns gibberish since .reg_current_vote is unset for EPSS hardware.
To exclude a hardcoded magic number 19200 it is replaced by "xo" clock rate
in KHz.
Fixes: 275157b367 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Provide lightweight online and offline operations. This saves us from
parsing and tearing down the OPP tables each time the CPU is put online
or offline.
Tested-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Reviewed-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Pull ARM cpufreq updates for 5.18-rc1 fron Viresh Kumar:
"- Add per core DVFS support for QCom SoC (Bjorn Andersson), convert to yaml
binding (Manivannan Sadhasivam) and various other fixes to the QCom drivers
(Luca Weiss).
- Add OPP table for imx7s SoC (Denys Drozdov) and minor fixes (Stefan Agner).
- Fix CPPC driver's freq/performance conversions (Pierre Gondois).
- Minor generic cleanups (Yury Norov)."
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
dt-bindings: cpufreq: cpufreq-qcom-hw: Convert to YAML bindings
dt-bindings: dvfs: Use MediaTek CPUFREQ HW as an example
cpufreq: blocklist Qualcomm sc8280xp and sa8540p in cpufreq-dt-platdev
cpufreq: qcom-hw: Add support for per-core-dcvs
cpufreq: CPPC: Fix performance/frequency conversion
cpufreq: Add i.MX7S to cpufreq-dt-platdev blocklist
ARM: dts: imx7s: Define operating points table for cpufreq
cpufreq: qcom-cpufreq-nvmem: fix reading of PVS Valid fuse
cpufreq: replace cpumask_weight with cpumask_empty where appropriate
Merge power management utilities changes for 5.18-rc1:
- Add tracer tool for the amd-pstate driver (Jinzhou Su).
- Fix PC6 displaying in turbostat on some systems (Artem Bityutskiy).
- Add AMD P-State support to the cpupower utility (Huang Rui).
* pm-tools:
Documentation: amd-pstate: add tracer tool introduction
tools/power/x86/amd_pstate_tracer: Add tracer tool for AMD P-state
tools/power/x86/intel_pstate_tracer: make tracer as a module
cpufreq: amd-pstate: Add more tracepoint for AMD P-State module
turbostat: fix PC6 displaying on some systems
cpupower: Add "perf" option to print AMD P-State information
cpupower: Add function to print AMD P-State performance capabilities
cpupower: Move print_speed function into misc helper
cpupower: Enable boost state support for AMD P-State module
cpupower: Add AMD P-State sysfs definition and access helper
cpupower: Introduce ACPI CPPC library
cpupower: Add the function to get the sysfs value from specific table
cpupower: Initial AMD P-State capability
cpupower: Add the function to check AMD P-State enabled
cpupower: Add AMD P-State capability flag
tools/power/cpupower/{ToDo => TODO}: Rename the todo file
tools: cpupower: fix typo in cpupower-idle-set(1) manpage
The powernow-k8 driver will do checks at startup that the current
active driver is acpi-cpufreq and show a warning when they're not
expected.
Because of this the following warning comes up on systems that
support amd-pstate and compiled in both drivers:
`WTF driver: amd-pstate`
The systems that support powernow-k8 will not support amd-pstate,
so re-order the checks to validate the CPU model number first to
avoid this warning being displayed on modern SOCs.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
For some specific platforms (E.g. AlderLake) the balance performance
EPP is updated from the hard coded value in the driver. This acts as
the default and balance_performance EPP. The purpose of this EPP
update is to reach maximum 1 core turbo frequency (when possible) out
of the box.
Although we can achieve the objective by using hard coded value in the
driver, there can be other EPP which can be better in terms of power.
But that will be very subjective based on platform and use cases.
This is not practical to have a per platform specific default hard coded
in the driver.
If a platform wants to specify default EPP, it can be set in the firmware.
If this EPP is not the chipset default of 0x80 (balance_perf_epp unless
driver changed it) and more performance oriented but not 0, the driver
can use this as the default and balanced_perf EPP. In this case no driver
update is required every time there is some new platform and default EPP.
If the firmware didn't update the EPP from the chipset default then
the hard coded value is used as per existing implementation.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Usually, sysfs attributes have .show and .store and their naming
convention is filename_show() and filename_store().
But in cpufreq the naming convention of these functions is
show_filename() and store_filename() which prevents __ATTR_RW() and
__ATTR_RO() from being used in there to simplify code.
Accordingly, change the naming convention of the sysfs .show and
.store methods in cpufreq to follow the one expected by __ATTR_RW()
and __ATTR_RO() and use these macros in that code.
Signed-off-by: Lianjie Zhang <zhanglianjie@uniontech.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add frequency, mperf, aperf and tsc in the trace. This can be used
to debug and tune the performance of AMD P-state driver.
Use the time difference between amd_pstate_update to calculate CPU
frequency. There could be sleep in arch_freq_get_on_cpu, so do not
use it here.
Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
Co-developed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The Qualcomm sc8280xp and sa8540p platforms also uses the
qcom-cpufreq-hw driver, so add them to the cpufreq-dt-platdev driver's
blocklist.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The OSM and EPSS hardware controls the frequency of each cluster in the
system based on requests from the OS and various limiting factors, such
as input from LMH.
In most systems the vote from the OS is done using a single register per
cluster, but some systems are configured to instead take one request per
core. In this configuration a set of consecutive registers are used for
the OS to request the frequency of each of the cores within the cluster.
The information is then aggregated in the hardware and the frequency for
the cluster is determined.
As the current implementation ends up only requesting a frequency for
the first core in each cluster and only the vote of non-idle cores are
considered it's often the case that the cluster will be clocked (much)
lower than expected.
It's possible that there are benefits of performing the per-core
requests from the OS, but more investigation of the outcome is needed
before introducing such support. As such this patch extends the request
for the cluster to be written to all the cores.
The weight of the policy's related_cpus is used to determine how many
cores, and hence consecutive registers, each cluster has.
The OS is not permitted to disable the per-core dcvs feature.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
CPUfreq governors request CPU frequencies using information
on current CPU usage. The CPPC driver converts them to
performance requests. Frequency targets are computed as:
target_freq = (util / cpu_capacity) * max_freq
target_freq is then clamped between [policy->min, policy->max].
The CPPC driver converts performance values to frequencies
(and vice-versa) using cppc_cpufreq_perf_to_khz() and
cppc_cpufreq_khz_to_perf(). These functions both use two different
factors depending on the range of the input value. For
cppc_cpufreq_khz_to_perf():
- (NOMINAL_PERF / NOMINAL_FREQ) or
- (LOWEST_PERF / LOWEST_FREQ)
and for cppc_cpufreq_perf_to_khz():
- (NOMINAL_FREQ / NOMINAL_PERF) or
- ((NOMINAL_PERF - LOWEST_FREQ) / (NOMINAL_PERF - LOWEST_PERF))
This means:
1- the functions are not inverse for some values:
(perf_to_khz(khz_to_perf(x)) != x)
2- cppc_cpufreq_perf_to_khz(LOWEST_PERF) can sometimes give
a different value from LOWEST_FREQ due to integer approximation
3- it is implied that performance and frequency are proportional
(NOMINAL_FREQ / NOMINAL_PERF) == (LOWEST_PERF / LOWEST_FREQ)
This patch changes the conversion functions to an affine function.
This fixes the 3 points above.
Suggested-by: Lukasz Luba <lukasz.luba@arm.com>
Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Pierre Gondois <Pierre.Gondois@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The i.MX 7Solo currently does not have multiple operating points,
however, in order for the i.MX Thermal driver to successfully probe
a cpufreq device is required. Add it to the cpufreq-dt-platdev
driver's blocklist to allow using imx-cpufreq-dt.
Signed-off-by: Stefan Agner <stefan.agner@toradex.com>
Cc: Stefan Agner <stefan@agner.ch>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The fuse consists of 64 bits, with this statement we're supposed to get
the upper 32 bits but it actually read out of bounds and got 0 instead
of the desired value which lead to the "PVS bin not set." codepath being
run resetting our pvs value.
Fixes: a8811ec764 ("cpufreq: qcom: Add support for krait based socs")
Signed-off-by: Luca Weiss <luca@z3ntu.xyz>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
drivers/cpufreq calls cpumask_weight() to check if any bit of a given
cpumask is set. We can do it more efficiently with cpumask_empty() because
cpumask_empty() stops traversing the cpumask as soon as it finds first set
bit, while cpumask_weight() counts all bits unconditionally.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> (for SCMI cpufreq driver)
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
In the event that the SoC is under thermal pressure while booting it's
possible for the dcvs notification to happen inbetween the cpufreq
framework calling init and it actually updating the policy's
related_cpus cpumask.
Prior to the introduction of the thermal pressure update helper an empty
cpumask would simply result in the thermal pressure of no cpus being
updated, but the new code will attempt to dereference an invalid per_cpu
variable.
Avoid this problem by using the newly reintroduced "ready" callback, to
postpone enabling the IRQ until the related_cpus cpumask is filled in.
Fixes: 0258cb19c7 ("cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function")
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
This effectively revert '4bf8e582119e ("cpufreq: Remove ready()
callback")', in order to reintroduce the ready callback.
This is needed in order to be able to leave the thermal pressure
interrupts in the Qualcomm CPUfreq driver disabled during
initialization, so that it doesn't fire while related_cpus are still 0.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
[ Viresh: Added the Chinese translation as well and updated commit msg ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Replace acpi_bus_get_device() that is going to be dropped with
acpi_fetch_acpi_dev().
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The AMD P-State driver is based on ACPI CPPC function, so ACPI should be
dependence of this driver in the kernel config.
In file included from ../drivers/cpufreq/amd-pstate.c:40:0:
../include/acpi/processor.h:226:2: error: unknown type name ‘phys_cpuid_t’
phys_cpuid_t phys_id; /* CPU hardware ID such as APIC ID for x86 */
^~~~~~~~~~~~
../include/acpi/processor.h:355:1: error: unknown type name ‘phys_cpuid_t’; did you mean ‘phys_addr_t’?
phys_cpuid_t acpi_get_phys_id(acpi_handle, int type, u32 acpi_id);
^~~~~~~~~~~~
phys_addr_t
CC drivers/rtc/rtc-rv3029c2.o
../include/acpi/processor.h:356:1: error: unknown type name ‘phys_cpuid_t’; did you mean ‘phys_addr_t’?
phys_cpuid_t acpi_map_madt_entry(u32 acpi_id);
^~~~~~~~~~~~
phys_addr_t
../include/acpi/processor.h:357:20: error: unknown type name ‘phys_cpuid_t’; did you mean ‘phys_addr_t’?
int acpi_map_cpuid(phys_cpuid_t phys_id, u32 acpi_id);
^~~~~~~~~~~~
phys_addr_t
See https://lore.kernel.org/lkml/20e286d4-25d7-fb6e-31a1-4349c805aae3@infradead.org/.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Huang Rui <ray.huang@amd.com>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add the description of @req and @boost_supported in struct amd_cpudata
kernel-doc comment to remove warnings found by running scripts/kernel-doc,
which is caused by using 'make W=1'.
drivers/cpufreq/amd-pstate.c:104: warning: Function parameter or member
'req' not described in 'amd_cpudata'
drivers/cpufreq/amd-pstate.c:104: warning: Function parameter or member
'boost_supported' not described in 'amd_cpudata'
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Introduce sysfs attributes to get the different level AMD P-State
performances.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Introduce sysfs attributes to get the different level processor
frequencies.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If the sbios supports the boost mode of AMD P-State, let's switch to
boost enabled by default.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add trace event to monitor the performance value changes which is
controlled by cpu governors.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In some of Zen2 and Zen3 based processors, they are using the shared
memory that exposed from ACPI SBIOS. In this kind of the processors,
there is no MSR support, so we add acpi cppc function as the backend for
them.
It is using a module param (shared_mem) to enable related processors
manually. We will enable this by default once we address performance
issue on this solution.
Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Introduce the fast switch function for AMD P-State on the AMD processors
which support the full MSR register control. It's able to decrease the
latency on interrupt context.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
AMD P-State is the AMD CPU performance scaling driver that introduces a
new CPU frequency control mechanism on AMD Zen based CPU series in Linux
kernel. The new mechanism is based on Collaborative processor
performance control (CPPC) which is finer grain frequency management
than legacy ACPI hardware P-States. Current AMD CPU platforms are using
the ACPI P-states driver to manage CPU frequency and clocks with
switching only in 3 P-states. AMD P-State is to replace the ACPI
P-states controls, allows a flexible, low-latency interface for the
Linux kernel to directly communicate the performance hints to hardware.
AMD P-State leverages the Linux kernel governors such as *schedutil*,
*ondemand*, etc. to manage the performance hints which are provided by CPPC
hardware functionality. The first version for AMD P-State is to support one
of the Zen3 processors, and we will support more in future after we verify
the hardware and SBIOS functionalities.
There are two types of hardware implementations for AMD P-State: one is full
MSR support and another is shared memory support. It can use
X86_FEATURE_CPPC feature flag to distinguish the different types.
Using the new AMD P-State method + kernel governors (*schedutil*,
*ondemand*, ...) to manage the frequency update is the most appropriate
bridge between AMD Zen based hardware processor and Linux kernel, the
processor is able to adjust to the most efficiency frequency according to
the kernel scheduler loading.
Please check the detailed CPU feature and MSR register description in
Processor Programming Reference (PPR) for AMD Family 19h Model 51h,
Revision A1 Processors:
https://www.amd.com/system/files/TechDocs/56569-A1-PUB.zip
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull ARM cpufreq updates for 5.17-rc1 from Viresh Kumar:
"- Qcom cpufreq driver updates improve irq support (Ard Biesheuvel, Stephen Boyd,
and Vladimir Zapolskiy).
- Fixes double devm_remap for mediatek driver (Hector Yuan).
- Introduces thermal pressure helpers (Lukasz Luba)."
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: mediatek-hw: Fix double devm_remap in hotplug case
cpufreq: qcom-hw: Use optional irq API
cpufreq: qcom-hw: Set CPU affinity of dcvsh interrupts
cpufreq: qcom-hw: Fix probable nested interrupt handling
cpufreq: qcom-cpufreq-hw: Avoid stack buffer for IRQ name
arch_topology: Remove unused topology_set_thermal_pressure() and related
cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function
cpufreq: qcom-cpufreq-hw: Update offline CPUs per-cpu thermal pressure
thermal: cpufreq_cooling: Use new thermal pressure update function
arch_topology: Introduce thermal pressure update function
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field. Move the cpufreq code to use default_groups field which has been
the preferred way since aa30f47cf6 ("kobject: Add support for default
attribute groups to kobj_type") so that we can soon get rid of the
obsolete default_attrs field.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When hotpluging policy cpu, cpu policy init will be called multiple times.
Unplug CPU7 -> CPU6 -> CPU5 -> CPU4, then plug CPU4 again.
In this case, devm_remap will double remap and resource allocate fail.
So replace devm_remap to ioremap and release resources in cpu policy exit.
Signed-off-by: Hector.Yuan <hector.yuan@mediatek.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
With HWP enabled, when the turbo range of performance levels is
disabled by the platform firmware, the CPU capacity is given by
the "guaranteed performance" field in MSR_HWP_CAPABILITIES which
is generally dynamic. When it changes, the kernel receives an HWP
notification interrupt handled by notify_hwp_interrupt().
When the "guaranteed performance" value changes in the above
configuration, the CPU performance scaling needs to be adjusted so
as to use the new CPU capacity in computations, which means that
the cpuinfo.max_freq value needs to be updated for that CPU.
Accordingly, modify intel_pstate_notify_work() to read
MSR_HWP_CAPABILITIES and update cpuinfo.max_freq to reflect the
new configuration (this update can be carried out even if the
configuration doesn't actually change, because it simply doesn't
matter then and it takes less time to update it than to do extra
checks to decide whether or not a change has really occurred).
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The min and max frequency QoS requests in the cpufreq core are
initialized to whatever the current min and max frequency values are
at the init time, but if any of these values change later (for
example, cpuinfo.max_freq is updated by the driver), these initial
request values will be limiting the CPU frequency unnecessarily
unless they are changed by user space via sysfs.
To address this, initialize min_freq_req and max_freq_req to
FREQ_QOS_MIN_DEFAULT_VALUE and FREQ_QOS_MAX_DEFAULT_VALUE,
respectively, so they don't really limit anything until user
space updates them.
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is an expectation from users that they can get frequency specified
by cpufreq/cpuinfo_max_freq when conditions permit. But with AlderLake
mobile it may not be possible. This is possible that frequency is clipped
based on the system power-up EPP value. In this case users can update
cpufreq/energy_performance_preference to some performance oriented EPP to
limit clipping of frequencies.
To get out of box behavior as the prior generations of CPUs, update EPP
for AlderLake mobile CPUs on boot. On prior generations of CPUs EPP = 128
was enough to get maximum frequency, but with AlderLake mobile the
equivalent EPP is 102. Since EPP is model specific, this is possible that
they have different meaning on each generation of CPU.
The current EPP string "balance_performance" corresponds to EPP = 128.
Change the EPP corresponding to "balance_performance" to 102 for only
AlderLake mobile CPUs and update this on each CPU during boot.
To implement reuse epp_values[] array and update the modified EPP at the
index for BALANCE_PERFORMANCE. Add a dummy EPP_INDEX_DEFAULT to
epp_values[] to match indexes in the energy_perf_strings[].
After HWP PM is enabled also update EPP when "balance_performance" is
redefined for the very first time after the boot on each CPU. On
subsequent suspend/resume or offline/online the old EPP is restored,
so no specific action is needed.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It is not necessary to call intel_pstate_get_hwp_cap() from
intel_pstate_update_perf_limits(), because it gets called from
intel_pstate_verify_cpu_policy() which is either invoked directly
right before intel_pstate_update_perf_limits(), in
intel_cpufreq_verify_policy() in the passive mode, or called
from driver callbacks in a sequence that causes it to be followed
by an immediate intel_pstate_update_perf_limits().
Namely, in the active mode intel_cpufreq_verify_policy() is called
by intel_pstate_verify_policy() which is the ->verify() callback
routine of intel_pstate and gets called by the cpufreq core right
before intel_pstate_set_policy(), which is the driver's ->setoplicy()
callback routine, where intel_pstate_update_perf_limits() is called.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use platform_get_irq_optional() to avoid a noisy error message when the
irq isn't specified. The irq is definitely optional given that we only
care about errors that are -EPROBE_DEFER here.
Cc: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Make the comment in blocking_notifier_call_chain() easier to
understand.
Signed-off-by: Tang Yizhou <tangyizhou@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When I hot added a CPU, I found 'cpufreq' directory was not created
below /sys/devices/system/cpu/cpuX/.
It is because get_cpu_device() failed in add_cpu_dev_symlink().
cpufreq_add_dev() is the .add_dev callback of a CPU subsys interface.
It will be called when the CPU device registered into the system.
The call chain is as follows:
register_cpu()
->device_register()
->device_add()
->bus_probe_device()
->cpufreq_add_dev()
But only after the CPU device has been registered, we can get the
CPU device by get_cpu_device(), otherwise it will return NULL.
Since we already have the CPU device in cpufreq_add_dev(), pass
it to add_cpu_dev_symlink().
I noticed that the 'kobj' of the CPU device has been added into
the system before cpufreq_add_dev().
Fixes: 2f0ba790df ("cpufreq: Fix creation of symbolic links to policy directories")
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In runtime CPU cluster specific dcvsh interrupts may be handled on
unrelated CPU cores, it leads to an issue of too excessive number of
received and handled interrupts, but this is not observed, if CPU
affinity of the interrupt handler is set in accordance to CPU clusters.
The change reduces a number of received interrupts in about 10-100 times.
Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Re-enabling an interrupt from its own interrupt handler may cause
an interrupt storm, if there is a pending interrupt and because its
handling is disabled due to already done entrance into the handler
above in the stack.
Also, apparently it is improper to lock a mutex in an interrupt contex.
Fixes: 275157b367 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Registering an IRQ requires the string buffer containing the name to
remain allocated, as the name is not copied into another buffer.
So let's add a irq_name field to the data struct instead, which is
guaranteed to have the appropriate lifetime.
Cc: Thara Gopinath <thara.gopinath@linaro.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Andy Gross <agross@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>
Tested-by: Steev Klimaszewski <steev@kali.org>
Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
On systems with overclocking enabled, CPPC Highest Performance can be
hard coded to 0xff. In this case even if we have cores with different
highest performance, ITMT can't be enabled as the current implementation
depends on CPPC Highest Performance.
On such systems we can use MSR_HWP_CAPABILITIES maximum performance field
when CPPC.Highest Performance is 0xff.
Due to legacy reasons, we can't solely depend on MSR_HWP_CAPABILITIES as
in some older systems CPPC Highest Performance is the only way to identify
different performing cores.
Reported-by: Michael Larabel <Michael@MichaelLarabel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Michael Larabel <Michael@MichaelLarabel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After commit 4adcf2e582 ("cpufreq: intel_pstate: Add ->offline and
->online callbacks") the EPP value set by the "performance" scaling
algorithm in the active mode is not restored after an offline/online
cycle which replaces it with the saved EPP value coming from user
space.
Address this issue by forcing intel_pstate_hwp_set() to set a new
EPP value when it runs first time after online.
Fixes: 4adcf2e582 ("cpufreq: intel_pstate: Add ->offline and ->online callbacks")
Link: https://lore.kernel.org/linux-pm/adc7132c8655bd4d1c8b6129578e931a14fe1db2.camel@linux.intel.com/
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit fbdc21e9b0 ("cpufreq: intel_pstate: Add Icelake servers
support in no-HWP mode") enabled the use of Intel P-State driver
for Ice Lake servers.
But it doesn't cover the case when OS can't control P-States.
Therefore, for Ice Lake server, if MSR_MISC_PWR_MGMT bits 8 or 18
are enabled, then the Intel P-State driver should exit as OS can't
control P-States.
Fixes: fbdc21e9b0 ("cpufreq: intel_pstate: Add Icelake servers support in no-HWP mode")
Signed-off-by: Adamos Ttofari <attofari@amazon.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Thermal pressure provides a new API, which allows to use CPU frequency
as an argument. That removes the need of local conversion to capacity.
Use this new API and remove old local conversion code.
The new arch_update_thermal_pressure() also accepts boost frequencies,
which solves issue in the driver code with wrong reduced capacity
calculation. The reduced capacity was calculated wrongly due to
'policy->cpuinfo.max_freq' used as a divider. The value present there was
actually the boost frequency. Thus, even a normal maximum frequency value
which corresponds to max CPU capacity (arch_scale_cpu_capacity(cpu_id))
is not able to remove the capping.
The second side effect which is solved is that the reduced frequency wasn't
properly translated into the right reduced capacity,
e.g.
boost frequency = 3000MHz (stored in policy->cpuinfo.max_freq)
max normal frequency = 2500MHz (which is 1024 capacity)
2nd highest frequency = 2000MHz (which translates to 819 capacity)
Then in a scenario when the 'throttled_freq' max allowed frequency was
2000MHz the driver translated it into 682 capacity:
capacity = 1024 * 2000 / 3000 = 682
Then set the pressure value bigger than actually applied by the HW:
max_capacity - capacity => 1024 - 682 = 342 (<- thermal pressure)
Which was causing higher throttling and misleading task scheduler
about available CPU capacity.
A proper calculation in such case should be:
capacity = 1024 * 2000 / 2500 = 819
1024 - 819 = 205 (<- thermal pressure)
This patch relies on the new arch_update_thermal_pressure() handling
correctly such use case (with boost frequencies).
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The thermal pressure signal gives information to the scheduler about
reduced CPU capacity due to thermal. It is based on a value stored in
a per-cpu 'thermal_pressure' variable. The online CPUs will get the
new value there, while the offline won't. Unfortunately, when the CPU
is back online, the value read from per-cpu variable might be wrong
(stale data). This might affect the scheduler decisions, since it
sees the CPU capacity differently than what is actually available.
Fix it by making sure that all online+offline CPUs would get the
proper value in their per-cpu variable when there is throttling
or throttling is removed.
Fixes: 275157b367 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
It is possible that some performance excursions happened before OS boot
or enable HWP interrupts. So clear MSR_HWP_STATUS bits when we enable
HWP interrupt. In this way a next excursion will results in a HWP
interrupt.
The status bits of MSR_HWP_STATUS must be cleared (0) by software so
that a new status condition change will cause the hardware to set the
bit again and issue the notification.
Fixes: 57577c996d ("cpufreq: intel_pstate: Process HWP Guaranteed change notification")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It is possible that on some platforms HWP interrupts are disabled. In
that case accessing MSR 0x773 will result in warning.
So check X86_FEATURE_HWP_NOTIFY feature to access MSR 0x773. The other
places in code where this MSR is accessed, already checks this feature
except during disable path called during cpufreq offline and suspend
callbacks.
Fixes: 57577c996d ("cpufreq: intel_pstate: Process HWP Guaranteed change notification")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>