b473d6a9d6
When the hardware reduces the power to the minimum possible, the power floor is notified via an interrupt. This can happen when user space requests a power limit via powercap RAPL interface, which forces the system to enter to the lowest power. This power floor indication can be used as a hint to resort to other methods of reducing power than via RAPL power limit. Before power floor status can be read or the firmware can trigger notifications regarding it, it needs to be configured via a mailbox command. The actual power floor status is read via bit 39 of MMIO offset 0x5B18 of the processor thermal PCI device. To show the current power floor status and get notification on a sysfs attribute, add 2 new attributes to /sys/bus/pci/devices/0000\:00\:04.0/power_limits/ power_floor_enable : This attribute is present when power floor notifications are supported. This attribute allows to enable/disable power floor notifications. power_floor_status : This attribute is present when power floor notifications are supported. When enabled via power_floor_enable, this attribute shows the current power floor status. The power floor implementation provides interfaces which are called from the sysfs callbacks to enable/disable and read power floor status. It also provides two additional interfaces to check if the current processor thermal device interrupt is for power floor status and to send notifications to user space. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog and documentation changes edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
382 lines
12 KiB
ReStructuredText
382 lines
12 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0
|
||
|
||
===============================================================
|
||
Intel(R) Dynamic Platform and Thermal Framework Sysfs Interface
|
||
===============================================================
|
||
|
||
:Copyright: © 2022 Intel Corporation
|
||
|
||
:Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
|
||
|
||
Introduction
|
||
------------
|
||
|
||
Intel(R) Dynamic Platform and Thermal Framework (DPTF) is a platform
|
||
level hardware/software solution for power and thermal management.
|
||
|
||
As a container for multiple power/thermal technologies, DPTF provides
|
||
a coordinated approach for different policies to effect the hardware
|
||
state of a system.
|
||
|
||
Since it is a platform level framework, this has several components.
|
||
Some parts of the technology is implemented in the firmware and uses
|
||
ACPI and PCI devices to expose various features for monitoring and
|
||
control. Linux has a set of kernel drivers exposing hardware interface
|
||
to user space. This allows user space thermal solutions like
|
||
"Linux Thermal Daemon" to read platform specific thermal and power
|
||
tables to deliver adequate performance while keeping the system under
|
||
thermal limits.
|
||
|
||
DPTF ACPI Drivers interface
|
||
----------------------------
|
||
|
||
:file:`/sys/bus/platform/devices/<N>/uuids`, where <N>
|
||
=INT3400|INTC1040|INTC1041|INTC10A0
|
||
|
||
``available_uuids`` (RO)
|
||
A set of UUIDs strings presenting available policies
|
||
which should be notified to the firmware when the
|
||
user space can support those policies.
|
||
|
||
UUID strings:
|
||
|
||
"42A441D6-AE6A-462b-A84B-4A8CE79027D3" : Passive 1
|
||
|
||
"3A95C389-E4B8-4629-A526-C52C88626BAE" : Active
|
||
|
||
"97C68AE7-15FA-499c-B8C9-5DA81D606E0A" : Critical
|
||
|
||
"63BE270F-1C11-48FD-A6F7-3AF253FF3E2D" : Adaptive performance
|
||
|
||
"5349962F-71E6-431D-9AE8-0A635B710AEE" : Emergency call
|
||
|
||
"9E04115A-AE87-4D1C-9500-0F3E340BFE75" : Passive 2
|
||
|
||
"F5A35014-C209-46A4-993A-EB56DE7530A1" : Power Boss
|
||
|
||
"6ED722A7-9240-48A5-B479-31EEF723D7CF" : Virtual Sensor
|
||
|
||
"16CAF1B7-DD38-40ED-B1C1-1B8A1913D531" : Cooling mode
|
||
|
||
"BE84BABF-C4D4-403D-B495-3128FD44dAC1" : HDC
|
||
|
||
``current_uuid`` (RW)
|
||
User space can write strings from available UUIDs, one at a
|
||
time.
|
||
|
||
:file:`/sys/bus/platform/devices/<N>/`, where <N>
|
||
=INT3400|INTC1040|INTC1041|INTC10A0
|
||
|
||
``imok`` (WO)
|
||
User space daemon write 1 to respond to firmware event
|
||
for sending keep alive notification. User space receives
|
||
THERMAL_EVENT_KEEP_ALIVE kobject uevent notification when
|
||
firmware calls for user space to respond with imok ACPI
|
||
method.
|
||
|
||
``odvp*`` (RO)
|
||
Firmware thermal status variable values. Thermal tables
|
||
calls for different processing based on these variable
|
||
values.
|
||
|
||
``data_vault`` (RO)
|
||
Binary thermal table. Refer to
|
||
https:/github.com/intel/thermal_daemon for decoding
|
||
thermal table.
|
||
|
||
``production_mode`` (RO)
|
||
When different from zero, manufacturer locked thermal configuration
|
||
from further changes.
|
||
|
||
ACPI Thermal Relationship table interface
|
||
------------------------------------------
|
||
|
||
:file:`/dev/acpi_thermal_rel`
|
||
|
||
This device provides IOCTL interface to read standard ACPI
|
||
thermal relationship tables via ACPI methods _TRT and _ART.
|
||
These IOCTLs are defined in
|
||
drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.h
|
||
|
||
IOCTLs:
|
||
|
||
ACPI_THERMAL_GET_TRT_LEN: Get length of TRT table
|
||
|
||
ACPI_THERMAL_GET_ART_LEN: Get length of ART table
|
||
|
||
ACPI_THERMAL_GET_TRT_COUNT: Number of records in TRT table
|
||
|
||
ACPI_THERMAL_GET_ART_COUNT: Number of records in ART table
|
||
|
||
ACPI_THERMAL_GET_TRT: Read binary TRT table, length to read is
|
||
provided via argument to ioctl().
|
||
|
||
ACPI_THERMAL_GET_ART: Read binary ART table, length to read is
|
||
provided via argument to ioctl().
|
||
|
||
DPTF ACPI Sensor drivers
|
||
-------------------------
|
||
|
||
DPTF Sensor drivers are presented as standard thermal sysfs thermal_zone.
|
||
|
||
|
||
DPTF ACPI Cooling drivers
|
||
--------------------------
|
||
|
||
DPTF cooling drivers are presented as standard thermal sysfs cooling_device.
|
||
|
||
|
||
DPTF Processor thermal PCI Driver interface
|
||
--------------------------------------------
|
||
|
||
:file:`/sys/bus/pci/devices/0000\:00\:04.0/power_limits/`
|
||
|
||
Refer to Documentation/power/powercap/powercap.rst for powercap
|
||
ABI.
|
||
|
||
``power_limit_0_max_uw`` (RO)
|
||
Maximum powercap sysfs constraint_0_power_limit_uw for Intel RAPL
|
||
|
||
``power_limit_0_step_uw`` (RO)
|
||
Power limit increment/decrements for Intel RAPL constraint 0 power limit
|
||
|
||
``power_limit_0_min_uw`` (RO)
|
||
Minimum powercap sysfs constraint_0_power_limit_uw for Intel RAPL
|
||
|
||
``power_limit_0_tmin_us`` (RO)
|
||
Minimum powercap sysfs constraint_0_time_window_us for Intel RAPL
|
||
|
||
``power_limit_0_tmax_us`` (RO)
|
||
Maximum powercap sysfs constraint_0_time_window_us for Intel RAPL
|
||
|
||
``power_limit_1_max_uw`` (RO)
|
||
Maximum powercap sysfs constraint_1_power_limit_uw for Intel RAPL
|
||
|
||
``power_limit_1_step_uw`` (RO)
|
||
Power limit increment/decrements for Intel RAPL constraint 1 power limit
|
||
|
||
``power_limit_1_min_uw`` (RO)
|
||
Minimum powercap sysfs constraint_1_power_limit_uw for Intel RAPL
|
||
|
||
``power_limit_1_tmin_us`` (RO)
|
||
Minimum powercap sysfs constraint_1_time_window_us for Intel RAPL
|
||
|
||
``power_limit_1_tmax_us`` (RO)
|
||
Maximum powercap sysfs constraint_1_time_window_us for Intel RAPL
|
||
|
||
``power_floor_status`` (RO)
|
||
When set to 1, the power floor of the system in the current
|
||
configuration has been reached. It needs to be reconfigured to allow
|
||
power to be reduced any further.
|
||
|
||
``power_floor_enable`` (RW)
|
||
When set to 1, enable reading and notification of the power floor
|
||
status. Notifications are triggered for the power_floor_status
|
||
attribute value changes.
|
||
|
||
:file:`/sys/bus/pci/devices/0000\:00\:04.0/`
|
||
|
||
``tcc_offset_degree_celsius`` (RW)
|
||
TCC offset from the critical temperature where hardware will throttle
|
||
CPU.
|
||
|
||
:file:`/sys/bus/pci/devices/0000\:00\:04.0/workload_request`
|
||
|
||
``workload_available_types`` (RO)
|
||
Available workload types. User space can specify one of the workload type
|
||
it is currently executing via workload_type. For example: idle, bursty,
|
||
sustained etc.
|
||
|
||
``workload_type`` (RW)
|
||
User space can specify any one of the available workload type using
|
||
this interface.
|
||
|
||
DPTF Processor thermal RFIM interface
|
||
--------------------------------------------
|
||
|
||
RFIM interface allows adjustment of FIVR (Fully Integrated Voltage Regulator),
|
||
DDR (Double Data Rate) and DLVR (Digital Linear Voltage Regulator)
|
||
frequencies to avoid RF interference with WiFi and 5G.
|
||
|
||
Switching voltage regulators (VR) generate radiated EMI or RFI at the
|
||
fundamental frequency and its harmonics. Some harmonics may interfere
|
||
with very sensitive wireless receivers such as Wi-Fi and cellular that
|
||
are integrated into host systems like notebook PCs. One of mitigation
|
||
methods is requesting SOC integrated VR (IVR) switching frequency to a
|
||
small % and shift away the switching noise harmonic interference from
|
||
radio channels. OEM or ODMs can use the driver to control SOC IVR
|
||
operation within the range where it does not impact IVR performance.
|
||
|
||
Some products use DLVR instead of FIVR as switching voltage regulator.
|
||
In this case attributes of DLVR must be adjusted instead of FIVR.
|
||
|
||
While shifting the frequencies additional clock noise can be introduced,
|
||
which is compensated by adjusting Spread spectrum percent. This helps
|
||
to reduce the clock noise to meet regulatory compliance. This spreading
|
||
% increases bandwidth of signal transmission and hence reduces the
|
||
effects of interference, noise and signal fading.
|
||
|
||
DRAM devices of DDR IO interface and their power plane can generate EMI
|
||
at the data rates. Similar to IVR control mechanism, Intel offers a
|
||
mechanism by which DDR data rates can be changed if several conditions
|
||
are met: there is strong RFI interference because of DDR; CPU power
|
||
management has no other restriction in changing DDR data rates;
|
||
PC ODMs enable this feature (real time DDR RFI Mitigation referred to as
|
||
DDR-RFIM) for Wi-Fi from BIOS.
|
||
|
||
|
||
FIVR attributes
|
||
|
||
:file:`/sys/bus/pci/devices/0000\:00\:04.0/fivr/`
|
||
|
||
``vco_ref_code_lo`` (RW)
|
||
The VCO reference code is an 11-bit field and controls the FIVR
|
||
switching frequency. This is the 3-bit LSB field.
|
||
|
||
``vco_ref_code_hi`` (RW)
|
||
The VCO reference code is an 11-bit field and controls the FIVR
|
||
switching frequency. This is the 8-bit MSB field.
|
||
|
||
``spread_spectrum_pct`` (RW)
|
||
Set the FIVR spread spectrum clocking percentage
|
||
|
||
``spread_spectrum_clk_enable`` (RW)
|
||
Enable/disable of the FIVR spread spectrum clocking feature
|
||
|
||
``rfi_vco_ref_code`` (RW)
|
||
This field is a read only status register which reflects the
|
||
current FIVR switching frequency
|
||
|
||
``fivr_fffc_rev`` (RW)
|
||
This field indicated the revision of the FIVR HW.
|
||
|
||
|
||
DVFS attributes
|
||
|
||
:file:`/sys/bus/pci/devices/0000\:00\:04.0/dvfs/`
|
||
|
||
``rfi_restriction_run_busy`` (RW)
|
||
Request the restriction of specific DDR data rate and set this
|
||
value 1. Self reset to 0 after operation.
|
||
|
||
``rfi_restriction_err_code`` (RW)
|
||
0 :Request is accepted, 1:Feature disabled,
|
||
2: the request restricts more points than it is allowed
|
||
|
||
``rfi_restriction_data_rate_Delta`` (RW)
|
||
Restricted DDR data rate for RFI protection: Lower Limit
|
||
|
||
``rfi_restriction_data_rate_Base`` (RW)
|
||
Restricted DDR data rate for RFI protection: Upper Limit
|
||
|
||
``ddr_data_rate_point_0`` (RO)
|
||
DDR data rate selection 1st point
|
||
|
||
``ddr_data_rate_point_1`` (RO)
|
||
DDR data rate selection 2nd point
|
||
|
||
``ddr_data_rate_point_2`` (RO)
|
||
DDR data rate selection 3rd point
|
||
|
||
``ddr_data_rate_point_3`` (RO)
|
||
DDR data rate selection 4th point
|
||
|
||
``rfi_disable (RW)``
|
||
Disable DDR rate change feature
|
||
|
||
DLVR attributes
|
||
|
||
:file:`/sys/bus/pci/devices/0000\:00\:04.0/dlvr/`
|
||
|
||
``dlvr_hardware_rev`` (RO)
|
||
DLVR hardware revision.
|
||
|
||
``dlvr_freq_mhz`` (RO)
|
||
Current DLVR PLL frequency in MHz.
|
||
|
||
``dlvr_freq_select`` (RW)
|
||
Sets DLVR PLL clock frequency. Once set, and enabled via
|
||
dlvr_rfim_enable, the dlvr_freq_mhz will show the current
|
||
DLVR PLL frequency.
|
||
|
||
``dlvr_pll_busy`` (RO)
|
||
PLL can't accept frequency change when set.
|
||
|
||
``dlvr_rfim_enable`` (RW)
|
||
0: Disable RF frequency hopping, 1: Enable RF frequency hopping.
|
||
|
||
``dlvr_spread_spectrum_pct`` (RW)
|
||
Sets DLVR spread spectrum percent value.
|
||
|
||
``dlvr_control_mode`` (RW)
|
||
Specifies how frequencies are spread using spread spectrum.
|
||
0: Down spread,
|
||
1: Spread in the Center.
|
||
|
||
``dlvr_control_lock`` (RW)
|
||
1: future writes are ignored.
|
||
|
||
DPTF Power supply and Battery Interface
|
||
----------------------------------------
|
||
|
||
Refer to Documentation/ABI/testing/sysfs-platform-dptf
|
||
|
||
DPTF Fan Control
|
||
----------------------------------------
|
||
|
||
Refer to Documentation/admin-guide/acpi/fan_performance_states.rst
|
||
|
||
Workload Type Hints
|
||
----------------------------------------
|
||
|
||
The firmware in Meteor Lake processor generation is capable of identifying
|
||
workload type and passing hints regarding it to the OS. A special sysfs
|
||
interface is provided to allow user space to obtain workload type hints from
|
||
the firmware and control the rate at which they are provided.
|
||
|
||
User space can poll attribute "workload_type_index" for the current hint or
|
||
can receive a notification whenever the value of this attribute is updated.
|
||
|
||
file:`/sys/bus/pci/devices/0000:00:04.0/workload_hint/`
|
||
Segment 0, bus 0, device 4, function 0 is reserved for the processor thermal
|
||
device on all Intel client processors. So, the above path doesn't change
|
||
based on the processor generation.
|
||
|
||
``workload_hint_enable`` (RW)
|
||
Enable firmware to send workload type hints to user space.
|
||
|
||
``notification_delay_ms`` (RW)
|
||
Minimum delay in milliseconds before firmware will notify OS. This is
|
||
for the rate control of notifications. This delay is between changing
|
||
the workload type prediction in the firmware and notifying the OS about
|
||
the change. The default delay is 1024 ms. The delay of 0 is invalid.
|
||
The delay is rounded up to the nearest power of 2 to simplify firmware
|
||
programming of the delay value. The read of notification_delay_ms
|
||
attribute shows the effective value used.
|
||
|
||
``workload_type_index`` (RO)
|
||
Predicted workload type index. User space can get notification of
|
||
change via existing sysfs attribute change notification mechanism.
|
||
|
||
The supported index values and their meaning for the Meteor Lake
|
||
processor generation are as follows:
|
||
|
||
0 - Idle: System performs no tasks, power and idle residency are
|
||
consistently low for long periods of time.
|
||
|
||
1 – Battery Life: Power is relatively low, but the processor may
|
||
still be actively performing a task, such as video playback for
|
||
a long period of time.
|
||
|
||
2 – Sustained: Power level that is relatively high for a long period
|
||
of time, with very few to no periods of idleness, which will
|
||
eventually exhaust RAPL Power Limit 1 and 2.
|
||
|
||
3 – Bursty: Consumes a relatively constant average amount of power, but
|
||
periods of relative idleness are interrupted by bursts of
|
||
activity. The bursts are relatively short and the periods of
|
||
relative idleness between them typically prevent RAPL Power
|
||
Limit 1 from being exhausted.
|
||
|
||
4 – Unknown: Can't classify.
|