1980 Commits

Author SHA1 Message Date
Rafael J. Wysocki
ba41e1bc28 cpufreq: intel_pstate: Fix HWP on boot CPU after system resume
Commit 41cfd64cf49fc "Update frequencies of policy->cpus only from
->set_policy()" changed the way the intel_pstate driver's ->set_policy
callback updates the HWP (hardware-managed P-states) settings.
A side effect of it is that if those settings are modified on the
boot CPU during system suspend and wakeup, they will never be
restored during subsequent system resume.

To address this problem, allow cpufreq drivers that don't provide
->target or ->target_index callbacks to use ->suspend and ->resume
callbacks and add a ->resume callback to intel_pstate to restore
the HWP settings on the CPUs that belong to the given policy.

Fixes: 41cfd64cf49fc "Update frequencies of policy->cpus only from ->set_policy()"
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-05-02 13:48:15 +02:00
Aneesh Kumar K.V
d2adba3fd1 powerpc/mm: Abstraction for switch_mmu_context()
How we switch MMU context differs between hash and radix. For hash we
need to switch the SLB details and for radix we need to switch the PID
SPR.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:33:04 +10:00
Rafael J. Wysocki
81be193b7e Merge branch 'pm-cpufreq-fixes'
* pm-cpufreq-fixes:
  cpufreq: intel_pstate: Fix processing for turbo activation ratio
  Revert "cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC"
2016-04-29 14:22:25 +02:00
Sudeep Holla
2482bc31ca cpufreq: st: enable selective initialization based on the platform
The sti-cpufreq does unconditional registration of the cpufreq-dt driver
which causes issue on an multi-platform build. For example, on Vexpress
TC2 platform, we get the following error on boot:

cpu cpu0: OPP-v2 not supported
cpu cpu0: Not doing voltage scaling
cpu: dev_pm_opp_of_cpumask_add_table: couldn't find opp table
	for cpu:0, -19
cpu cpu0: dev_pm_opp_get_max_volt_latency: Invalid regulator (-6)
...
arm_big_little: bL_cpufreq_register: Failed registering platform driver:
		vexpress-spc, err: -17

The actual driver fails to initialise as cpufreq-dt is probed
successfully, which is incorrect. This issue can happen to any platform
not using cpufreq-dt in a multi-platform build.

This patch adds a check to do selective initialization of the driver.

Fixes: ab0ea257fc58 (cpufreq: st: Provide runtime initialised driver for ST's platforms)
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Lee Jones <lee.jones@linaro.org>
Cc: 4.5+ <stable@vger.kernel.org> # 4.5+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-28 15:25:56 +02:00
Viresh Kumar
9f123def55 cpufreq: mvebu: Move cpufreq code into drivers/cpufreq/
Move cpufreq bits for mvebu into drivers/cpufreq/ directory, that's
where they really belong to.

Compiled tested only.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-28 15:22:43 +02:00
Viresh Kumar
eb96924acd cpufreq: dt: Kill platform-data
There are no more users of platform-data for cpufreq-dt driver, get rid
of it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-28 15:22:43 +02:00
Viresh Kumar
1530b9963e cpufreq: dt: Identify cpu-sharing for platforms without operating-points-v2
Existing platforms, which do not support operating-points-v2, can
explicitly tell the opp core that some of the CPUs share opp tables,
with help of dev_pm_opp_set_sharing_cpus().

For such platforms, explicitly ask the opp core to provide list of CPUs
sharing the opp table with current cpu device, before falling back to
platform data.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-28 15:22:42 +02:00
Rafael J. Wysocki
b4f4b4b371 cpufreq: governor: Change confusing struct field and variable names
The name of the prev_cpu_wall field in struct cpu_dbs_info is
confusing, because it doesn't represent wall time, but the previous
update time as returned by get_cpu_idle_time() (that may be the
current value of jiffies_64 in some cases, for example).

Moreover, the names of some related variables in dbs_update() take
that confusion further.

Rename all of those things to make their names reflect the purpose
more accurately.  While at it, drop unnecessary parens from one of
the updated expressions.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Chen Yu <yu.c.chen@intel.com>
2016-04-28 15:10:08 +02:00
Srinivas Pandruvada
2b3ec76505 cpufreq: intel_pstate: Enable PPC enforcement for servers
For platforms which are controlled via remove node manager, enable _PPC by
default. These platforms are mostly categorized as enterprise server or
performance servers. These platforms needs to go through some
certifications tests, which tests control via _PPC.
The relative risk of enabling by default is  low as this is is less likely
that these systems have broken _PSS table.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-28 01:01:39 +02:00
Srinivas Pandruvada
3be9200d51 cpufreq: intel_pstate: Adjust policy->max
When policy->max is changed via _PPC or sysfs and is more than the max non
turbo frequency, it does not really change resulting performance in some
processors. When policy->max results in a P-State ratio more than the
turbo activation ratio, then processor can choose any P-State up to max
turbo. So the user or _PPC setting has no value, but this can cause
undesirable side effects like:
- Showing reduced max percentage in Intel P-State sysfs
- It can cause reduced max performance under certain boundary conditions:
The requested max scaling frequency either via _PPC or via cpufreq-sysfs,
will be converted into a fixed floating point max percent scale. In
majority of the cases this will result in correct max. But not 100% of the
time. If the _PPC is requested at a point where the calculation lead to a
lower max, this can result in a lower P-State then expected and it will
impact performance.
Example of this condition using a Broadwell laptop with config TDP.

ACPI _PSS table from a Broadwell laptop
2301000 2300000 2200000 2000000 1900000 1800000 1700000 1500000 1400000
1300000 1100000 1000000 900000 800000 600000 500000

The actual results by disabling config TDP so that we can get what is
requested on or below 2300000Khz.

scaling_max_freq        Max Requested P-State   Resultant scaling
max
---------------------------------------- ----------------------
2400000                 18                      2900000 (max
turbo)
2300000                 17                      2300000 (max
physical non turbo)
2200000                 15                      2100000
2100000                 15                      2100000
2000000                 13                      1900000
1900000                 13                      1900000
1800000                 12                      1800000
1700000                 11                      1700000
1600000                 10                      1600000
1500000                 f                       1500000
1400000                 e                       1400000
1300000                 d                       1300000
1200000                 c                       1200000
1100000                 a                       1000000
1000000                 a                       1000000
900000                  9                        900000
800000                  8                        800000
700000                  7                        700000
600000                  6                        600000
500000                  5                        500000
------------------------------------------------------------------

Now set the config TDP level 1 ratio as 0x0b (equivalent to 1100000KHz)
in BIOS (not every system will let you adjust this).
The turbo activation ratio will be set to one less than that, which will
be 0x0a (So any request above 1000000KHz should result in turbo region
assuming no thermal limits).
Here _PPC will request max to 1100000KHz (which basically should still
result in turbo as this is more than the turbo activation ratio up to
max allowable turbo frequency), but actual calculation resulted in a max
ceiling P-State which is 0x0a. So under any load condition, this driver
will not request turbo P-States. This will be a huge performance hit.

When config TDP feature is ON, if the _PPC points to a frequency above
turbo activation ratio, the performance can still reach max turbo. In this
case we don't need to treat this as the reduced frequency in set_policy
callback.

In this change when config TDP is active (by checking if the physical max
non turbo ratio is more than the current max non turbo ratio), any request
above current max non turbo is treated as full performance.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Minor cleanups ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-28 01:01:39 +02:00
Srinivas Pandruvada
9522a2ff9c cpufreq: intel_pstate: Enforce _PPC limits
Use ACPI _PPC notification to limit max P state driver will request.
ACPI _PPC change notification is sent by BIOS to limit max P state
in several cases:
- Reduce impact of platform thermal condition
- When Config TDP feature is used, a changed _PPC is sent to
follow TDP change
- Remote node managers in server want to control platform power
via baseboard management controller (BMC)

This change registers with ACPI processor performance lib so that
_PPC changes are notified to cpufreq core, which in turns will
result in call to .setpolicy() callback. Also the way _PSS
table identifies a turbo frequency is not compatible to max turbo
frequency in intel_pstate, so the very first entry in _PSS needs
to be adjusted.

This feature can be turned on by using kernel parameters:
intel_pstate=support_acpi_ppc

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Minor cleanups ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-28 01:01:39 +02:00
Akshay Adiga
eaa2c3aeef cpufreq: powernv: Ramp-down global pstate slower than local-pstate
The frequency transition latency from pmin to pmax is observed to be in
few millisecond granurality. And it usually happens to take a performance
penalty during sudden frequency rampup requests.

This patch set solves this problem by using an entity called "global
pstates". The global pstate is a Chip-level entity, so the global entitiy
(Voltage) is managed across the cores. The local pstate is a Core-level
entity, so the local entity (frequency) is managed across threads.

This patch brings down global pstate at a slower rate than the local
pstate. Hence by holding global pstates higher than local pstate makes
the subsequent rampups faster.

A per policy structure is maintained to keep track of the global and
local pstate changes. The global pstate is brought down using a parabolic
equation. The ramp down time to pmin is set to ~5 seconds. To make sure
that the global pstates are dropped at regular interval , a timer is
queued for every 2 seconds during ramp-down phase, which eventually brings
the pstate down to local pstate.

Iozone results show fairly consistent performance boost.
YCSB on redis shows improved Max latencies in most cases.

Iozone write/rewite test were made with filesizes 200704Kb and 401408Kb
with different record sizes . The following table shows IOoperations/sec
with and without patch.

Iozone Results ( in op/sec) ( mean over 3 iterations )
---------------------------------------------------------------------
file size-                      with            without		  %
recordsize-IOtype               patch           patch		change
----------------------------------------------------------------------
200704-1-SeqWrite               1616532         1615425         0.06
200704-1-Rewrite                2423195         2303130         5.21
200704-2-SeqWrite               1628577         1602620         1.61
200704-2-Rewrite                2428264         2312154         5.02
200704-4-SeqWrite               1617605         1617182         0.02
200704-4-Rewrite                2430524         2351238         3.37
200704-8-SeqWrite               1629478         1600436         1.81
200704-8-Rewrite                2415308         2298136         5.09
200704-16-SeqWrite              1619632         1618250         0.08
200704-16-Rewrite               2396650         2352591         1.87
200704-32-SeqWrite              1632544         1598083         2.15
200704-32-Rewrite               2425119         2329743         4.09
200704-64-SeqWrite              1617812         1617235         0.03
200704-64-Rewrite               2402021         2321080         3.48
200704-128-SeqWrite             1631998         1600256         1.98
200704-128-Rewrite              2422389         2304954         5.09
200704-256 SeqWrite             1617065         1616962         0.00
200704-256-Rewrite              2432539         2301980         5.67
200704-512-SeqWrite             1632599         1598656         2.12
200704-512-Rewrite              2429270         2323676         4.54
200704-1024-SeqWrite            1618758         1616156         0.16
200704-1024-Rewrite             2431631         2315889         4.99
401408-1-SeqWrite               1631479         1608132         1.45
401408-1-Rewrite                2501550         2459409         1.71
401408-2-SeqWrite               1617095         1626069         -0.55
401408-2-Rewrite                2507557         2443621         2.61
401408-4-SeqWrite               1629601         1611869         1.10
401408-4-Rewrite                2505909         2462098         1.77
401408-8-SeqWrite               1617110         1626968         -0.60
401408-8-Rewrite                2512244         2456827         2.25
401408-16-SeqWrite              1632609         1609603         1.42
401408-16-Rewrite               2500792         2451405         2.01
401408-32-SeqWrite              1619294         1628167         -0.54
401408-32-Rewrite               2510115         2451292         2.39
401408-64-SeqWrite              1632709         1603746         1.80
401408-64-Rewrite               2506692         2433186         3.02
401408-128-SeqWrite             1619284         1627461         -0.50
401408-128-Rewrite              2518698         2453361         2.66
401408-256-SeqWrite             1634022         1610681         1.44
401408-256-Rewrite              2509987         2446328         2.60
401408-512-SeqWrite             1617524         1628016         -0.64
401408-512-Rewrite              2504409         2442899         2.51
401408-1024-SeqWrite            1629812         1611566         1.13
401408-1024-Rewrite             2507620          2442968        2.64

Tested with YCSB workload (50% update + 50% read) over redis for 1 million
records and 1 million operation. Each test was carried out with target
operations per second and persistence disabled.

Max-latency (in us)( mean over 5 iterations )
---------------------------------------------------------------
op/s    Operation       with patch      without patch   %change
---------------------------------------------------------------
15000   Read            61480.6         50261.4         22.32
15000   cleanup         215.2           293.6           -26.70
15000   update          25666.2         25163.8         2.00

25000   Read            32626.2         89525.4         -63.56
25000   cleanup         292.2           263.0           11.10
25000   update          32293.4         90255.0         -64.22

35000   Read            34783.0         33119.0         5.02
35000   cleanup         321.2           395.8           -18.8
35000   update          36047.0         38747.8         -6.97

40000   Read            38562.2         42357.4         -8.96
40000   cleanup         371.8           384.6           -3.33
40000   update          27861.4         41547.8         -32.94

45000   Read            42271.0         88120.6         -52.03
45000   cleanup         263.6           383.0           -31.17
45000   update          29755.8         81359.0         -63.43

(test without target op/s)
47659   Read            83061.4         136440.6        -39.12
47659   cleanup         195.8           193.8           1.03
47659   update          73429.4         124971.8        -41.24

Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-27 23:56:58 +02:00
Shilpasri G Bhat
2920e9ce8f cpufreq: powernv: Remove flag use-case of policy->driver_data
commit 1b0289848d5d ("cpufreq: powernv: Add sysfs attributes to show
throttle stats") used policy->driver_data as a flag for one-time creation
of throttle sysfs files. Instead of this use 'kernfs_find_and_get()' to
check if the attribute already exists. This is required as
policy->driver_data is used for other purposes in the later patch.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-27 23:56:58 +02:00
Javier Martinez Canillas
6de0dc4b53 cpufreq: e_powersaver: Use IS_ENABLED() instead of checking for built-in or module
The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either
built-in or as a module, use that macro instead of open coding the same.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-27 22:42:34 +02:00
Srinivas Pandruvada
1becf03545 cpufreq: intel_pstate: Fix processing for turbo activation ratio
When the config TDP level is not nominal (level = 0), the MSR values for
reading level 1 and level 2 ratios contain power in low 14 bits and actual
ratio bits are at bits [23:16]. The current processing for level 1 and
level 2 is wrong as there is no shift done to get actual ratio.

Fixes: 6a35fc2d6c22 (cpufreq: intel_pstate: get P1 from TAR when available)
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 23:39:09 +02:00
Rafael J. Wysocki
ba1ca654f3 cpufreq: governor: Fix prev_load initialization in cpufreq_governor_start()
The way cpufreq_governor_start() initializes j_cdbs->prev_load is
questionable.

First off, j_cdbs->prev_cpu_wall used as a denominator in the
computation may be zero.  The case this happens is when
get_cpu_idle_time_us() returns -1 and get_cpu_idle_time_jiffy()
used to return that number is called exactly at the jiffies_64
wrap time.  It is rather hard to trigger that error, but it is not
impossible and it will just crash the kernel then.

Second, j_cdbs->prev_load is computed as the average load during
the entire time since the system started and it may not reflect the
load in the previous sampling period (as it is expected to).
That doesn't play well with the way dbs_update() uses that value.
Namely, if the update time delta (wall_time) happens do be greater
than twice the sampling rate on the first invocation of it, the
initial value of j_cdbs->prev_load (which may be completely off) will
be returned to the caller as the current load (unless it is equal to
zero and unless another CPU sharing the same policy object has a
greater load value).

For this reason, notice that the prev_load field of struct cpu_dbs_info
is only used by dbs_update() and only in that one place, so if
cpufreq_governor_start() is modified to always initialize it to 0,
it will make dbs_update() always compute the actual load first time
it checks the update time delta against the doubled sampling rate
(after initialization) and there won't be any side effects of it.

Consequently, modify cpufreq_governor_start() as described.

Fixes: 18b46abd0009 (cpufreq: governor: Be friendly towards latency-sensitive bursty workloads)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-04-25 16:21:34 +02:00
Viresh Kumar
3920be471c cpufreq: hisilicon: Use generic platdev driver
The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform
device now, reuse that and remove similar code from platform code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:18:24 +02:00
Viresh Kumar
5e4249c6d9 cpufreq: zynq: Use generic platdev driver
The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform
device now, reuse that and remove similar code from platform code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:18:24 +02:00
Viresh Kumar
117d4f59af cpufreq: sunxi: Use generic platdev driver
The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform
device now, reuse that and remove similar code from platform code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:18:24 +02:00
Viresh Kumar
a399dc9fc5 cpufreq: shmobile: Use generic platdev driver
The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform
device now, reuse that and remove similar code from platform code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:18:23 +02:00
Finley Xiao
014400c127 cpufreq: rockchip: Use generic platdev driver
This patch add rockchip's compatible string to the compat list and
remove similar code from platform code for supporting generic platdev
driver.

Signed-off-by: Finley Xiao <finley.xiao@rock-chips.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:18:23 +02:00
Viresh Kumar
7694ca6e1d cpufreq: omap: Use generic platdev driver
The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform
device now, reuse that and remove similar code from platform code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:18:23 +02:00
Viresh Kumar
7ead83f6df cpufreq: imx: Use generic platdev driver
The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform
device now, reuse that and remove similar code from platform code.

Note that the complete routine imx27_dt_init() is removed as

of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL);

has same effect as a NULL .init_machine machine callback pointer.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:18:23 +02:00
Viresh Kumar
a59511d1da cpufreq: berlin: Use generic platdev driver
The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform
device now, reuse that and remove similar code from platform code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:18:23 +02:00
Viresh Kumar
e92bb16674 cpufreq: dt: Mark platdev machines array as __initconst
The machines array in cpufreq-dt-platdev is used only once at boot time
and so should be marked with __initconst, so that kernel can free up
memory used for it, if required.

Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:18:22 +02:00
Jia Hongtao
394cb8316b cpufreq: qoriq: Fix cooling device registration issue during suspend
Cooling device is registered by ready callback. It's also invoked while
system resuming from sleep (Enabling non-boot cpus). Thus cooling device
may be multiple registered. Matchable unregistration is added to exit
callback to fix this issue.

Signed-off-by: Jia Hongtao <hongtao.jia@nxp.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:07:02 +02:00
Jia Hongtao
495c716f17 cpufreq: qoriq: Remove __exit macro from .exit callback
.exit callback (qoriq_cpufreq_cpu_exit()) is also used during suspend.
So __exit macro should be removed or the function will be discarded.

Signed-off-by: Jia Hongtao <hongtao.jia@nxp.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:07:02 +02:00
Jia Hongtao
27b8fe8daf cpufreq: qoriq: Don't show cooling device messages if THERMAL_OF undefined
When THERMAL_OF is undefined the cooling device messages should not be
shown. -ENOSYS is returned from of_cpufreq_cooling_register() when
THERMAL_OF is undefined.

Signed-off-by: Jia Hongtao <hongtao.jia@nxp.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 16:00:42 +02:00
Ashwin Chaugule
a29a1e7678 cpufreq: ACPI / CPPC: Add module support for cppc_cpufreq driver
Add a function to cleanup at module exit and export
appropriate GPL string to enable moduler support
for the cppc_cpufreq driver.

Reported-by: Srinivas Pandruvada <srinivas.pandruvada@intel.com>
Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 15:59:35 +02:00
Philippe Longepe
bdcaa23fb8 cpufreq: intel_pstate: Use average P-State instead of current P-State
The result returned by pid_calc() is subtracted from current_pstate
(which is the P-State requested during the last period) in order to
obtain the target P-State for the current iteration.

However, current_pstate may not reflect the real current P-State of
the CPU. In particular, that P-State may be higher because of the
frequency sharing per module.

The theory is:
 - The load is the percentage of time spent in C0 and is related to
   the average P-State during the same period.
 - The last requested P-State can be completely different than the
   average P-State (because of frequency sharing or throttling).
 - The P-State shift computed by the pid_calc is based on the load
   computed at average P-State, so the shift must be relative to
   this average P-State.

Using the average P-State instead of current P-State improves power
without significant performance penalty in cases when a task migrates
from one core to other core sharing frequency and voltage.

Performance and power comparison with this patch on Cherry Trail
platform using Android:

Benchmark               ?Perf    ?Power
FishTank                10.45%    3.1%
SmartBench-Gaming       -0.1%   -10.4%
SmartBench-Productivity -0.8%   -10.4%
CandyCrush                n/a   -17.4%
AngryBirds                n/a    -5.9%
videoPlayback             n/a   -13.9%
audioPlayback             n/a    -4.9%
IcyRocks-20-50           0.0%   -38.4%
iozone RR               -0.16%  -1.3%
iozone RW                0.74%  -1.3%

Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-25 15:45:11 +02:00
Rafael J. Wysocki
1cbc99dfe5 Merge back cpufreq changes for v4.7. 2016-04-25 15:44:01 +02:00
Rafael J. Wysocki
94862a62df Revert "cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC"
Revert commit 0df35026c6a5 (cpufreq: governor: Fix negative idle_time
when configured with CONFIG_HZ_PERIODIC) that introduced a regression
by causing the ondemand cpufreq governor to misbehave for
CONFIG_TICK_CPU_ACCOUNTING unset (the frequency goes up to the max at
one point and stays there indefinitely).

The revert takes subsequent modifications of the code in question into
account.

Fixes: 0df35026c6a5 (cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=115261
Reported-and-tested-by: Timo Valtoaho <timo.valtoaho@gmail.com>
Cc: 4.5+ <stable@vger.kernel.org> # 4.5+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-04-25 02:39:20 +02:00
Rafael J. Wysocki
395da1259a Merge branch 'pm-cpufreq-fixes'
* pm-cpufreq-fixes:
  cpufreq: Abort cpufreq_update_current_freq() for cpufreq_suspended set
  intel_pstate: Avoid getting stuck in high P-states when idle
2016-04-21 20:57:46 +02:00
Ingo Molnar
6666ea558b Linux 4.6-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXFELfAAoJEHm+PkMAQRiGRYIH+wWsUva7TR9arN1ZrURvI17b
 KqyQH8Ov9zJBsIaq/rFXOr5KfNgx7BU9BL9h7QkBy693HXTWf+GTZ1czHM4N12C3
 0ZdHGrLwTHo2zdisiQaFORZSfhSVTUNGXGHXw13bUMgEqatPgkozXEnsvXXNdt1Z
 HtlcuJn3pcj+QIY7qDXZgTLTwgn248hi1AgNag+ntFcWiz21IYaMIi7/mCY9QUIi
 AY+Y3hqFQM7/8cVyThGS5wZPTg1YzdhsLJpoCk0TbS8FvMEnA+ylcTgc15C78bwu
 AxOwM3OCmH4gMsd7Dd/O+i9lE3K6PFrgzdDisYL3P7eHap+EdiLDvVzPDPPx0xg=
 =Q7r3
 -----END PGP SIGNATURE-----

Merge tag 'v4.6-rc4' into x86/asm, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-19 10:38:52 +02:00
Rafael J. Wysocki
c9d9c929e6 cpufreq: Abort cpufreq_update_current_freq() for cpufreq_suspended set
Since governor operations are generally skipped if cpufreq_suspended
is set, cpufreq_start_governor() should do nothing in that case.

That function is called in the cpufreq_online() path, and may also
be called from cpufreq_offline() in some cases, which are invoked
by the nonboot CPUs disabing/enabling code during system suspend
to RAM and resume.  That happens when all devices have been
suspended, so if the cpufreq driver relies on things like I2C to
get the current frequency, it may not be ready to do that then.

To prevent problems from happening for this reason, make
cpufreq_update_current_freq(), which is the only function invoked
by cpufreq_start_governor() that doesn't check cpufreq_suspended
already, return 0 upfront if cpufreq_suspended is set.

Fixes: 3bbf8fe3ae08 (cpufreq: Always update current frequency before startig governor)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-04-18 23:47:42 +02:00
Borislav Petkov
93984fbd4e x86/cpufeature: Replace cpu_has_apic with boot_cpu_has() usage
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: iommu@lists.linux-foundation.org
Cc: linux-pm@vger.kernel.org
Cc: oprofile-list@lists.sf.net
Link: http://lkml.kernel.org/r/1459801503-15600-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:41 +02:00
Paul Gortmaker
6a0bcab9c6 drivers/cpufreq: make ppc_cbe_cpufreq_pmi driver explicitly non-modular
The Kconfig for this driver is currently:

config CPU_FREQ_CBE_PMI
    bool "CBE frequency scaling using PMI interface"

...meaning that it currently is not being built as a module by
anyone.  Lets remove the modular and unused code here, so that
when reading the driver there is no doubt it is builtin-only.

Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit.

We also delete the MODULE_LICENSE tag etc. since all that information
is already contained at the top of the file in the comments.

Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Christian Krafft <krafft@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-pm@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-11 20:30:43 +10:00
Rafael J. Wysocki
ffb810563c intel_pstate: Avoid getting stuck in high P-states when idle
Jörg Otte reports that commit a4675fbc4a7a (cpufreq: intel_pstate:
Replace timers with utilization update callbacks) caused the CPUs in
his Haswell-based system to stay in the very high frequency region
even if the system is completely idle.

That turns out to be an existing problem in the intel_pstate driver's
P-state selection algorithm for Core processors.  Namely, all
decisions made by that algorithm are based on the average frequency
of the CPU between sampling events and on the P-state requested on
the last invocation, so it may get stuck at a very hight frequency
even if the utilization of the CPU is very low (in fact, it may get
stuck in a inadequate P-state regardless of the CPU utilization).
The only way to kick it out of that limbo is a sufficiently long idle
period (3 times longer than the prescribed sampling interval), but if
that doesn't happen often enough (eg. due to a timing change like
after the above commit), the P-state of the CPU may be inadequate
pretty much all the time.

To address the most egregious manifestations of that issue, reset the
core_busy value used to determine the next P-state to request if the
utilization of the CPU, determined with the help of the MPERF
feedback register and the TSC, is below 1%.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=115771
Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-10 05:59:10 +02:00
Viresh Kumar
8cee1eed8e cpufreq: ACPI: Remove freq_table from acpi_cpufreq_data
The freq-table is stored in struct cpufreq_policy also and there is
absolutely no need of keeping a copy of its reference in struct
acpi_cpufreq_data. Drop it.

Also policy->freq_table can't be NULL in the target() callback, remove
the useless check as well.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Rebase ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:54:31 +02:00
Viresh Kumar
9b55f55af8 cpufreq: ACPI: policy->driver_data can't be NULL in ->exit()
Its always set by ->init() and so it will always be there in ->exit().
There is no need to have a special check for just that.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Rebase ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:41:50 +02:00
Rafael J. Wysocki
a794d6138c cpufreq: Rearrange cpufreq_add_dev()
Reorganize the code in cpufreq_add_dev() to avoid using the ret
variable and reduce the indentation level in it.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-04-09 01:37:07 +02:00
Rafael J. Wysocki
cd73e9b01f cpufreq: Simplify switch () in cpufreq_cpu_callback()
Merge two switch entries that do the same thing in
cpufreq_cpu_callback().

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-04-09 01:37:07 +02:00
Joe Perches
1c5864e26c cpufreq: Use consistent prefixing via pr_fmt
Use the more common kernel style adding a define for pr_fmt.

Miscellanea:

o Remove now unused PFX defines

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:35:18 +02:00
Joe Perches
b49c22a6ca cpufreq: Convert printk(KERN_<LEVEL> to pr_<level>
Use the more common logging style.

Miscellanea:

o Coalesce formats
o Realign arguments
o Add a missing space between a coalesced format

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:35:18 +02:00
Joe Perches
4836df173a intel_pstate: Use pr_fmt
Prefix the output using the more common kernel style.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Rebase ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:33:42 +02:00
Geliang Tang
d2499d05f0 cpufreq: mt8173: use list_for_each_entry*()
Use list_for_each_entry*() instead of list_for_each*() to simplify
the code.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:27:54 +02:00
Rafael J. Wysocki
22590efb98 intel_pstate: Avoid pointless FRAC_BITS shifts under div_fp()
There are multiple places in intel_pstate where int_tofp() is applied
to both arguments of div_fp(), but this is pointless, because int_tofp()
simply shifts its argument to the left by FRAC_BITS which mathematically
is equivalent to multuplication by 2^FRAC_BITS, so if this is done
to both arguments of a division, the extra factors will cancel each
other during that operation anyway.

Drop the pointless int_tofp() applied to div_fp() arguments throughout
the driver.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:25:58 +02:00
Viresh Kumar
2249c00a0b cpufreq: exynos: Use generic platdev driver
The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform
device now, reuse that and remove similar code from platform code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:18:42 +02:00
Viresh Kumar
f56aad1d98 cpufreq: dt: Add generic platform-device creation support
Multiple platforms are using the generic cpufreq-dt driver now, and all
of them are required to create a platform device with name "cpufreq-dt",
in order to get the cpufreq-dt probed.

Many of them do it from platform code, others have special drivers just
to do that.

It would be more sensible to do this at a generic place, where all such
platform can mark their entries.

This patch adds a separate file to get this device created. Currently
the compat list of platforms that we support is empty, and will be
filled in as and when we move platforms to use it.

It always compiles as part of the kernel and so doesn't need a
module-exit operation.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:18:42 +02:00
Viresh Kumar
f3f24dea2c cpufreq: tegra124: No need of setting platform-data
All CPUs on Tegra platform share clock/voltage lines and there is
absolutely no need of setting platform data for 'cpufreq-dt' platform
device, as that's the default case.

Stop setting platform data for cpufreq-dt device.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Thierry Reding <treding@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09 01:12:09 +02:00