Commit Graph

2885 Commits

Author SHA1 Message Date
Linus Torvalds
fbe7ef3f2f Thermal control fixes for 6.10-rc5
- Remove the filtered mode for mt8188 from lvts_thermal as it is not
    supported on this platform and fail the lvts_thermal initialization
    when the golden temperature is zero as that means the efuse data is
    not correctly set (Julien Panis).
 
  - Update the processor_thermal part of the Intel int340x driver to
    support shared interrupts as the processor thermal device interrupt
    may in fact be shared with PCI devices (Srinivas Pandruvada).
 
  - Synchronize the suspend-prepare and post-suspend actions of the
    thermal PM notifier to avoid a destructive race condition and
    change the priority of that notifier to the minimum to avoid
    interference between the work items spawned by it and the other
    PM notifiers during system resume (Rafael Wysocki).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmZ1clgSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxftEQAKE9MJHvo1zTgGq3jU198pYj6/oIf4F8
 GR8wTSIhmSO+YgUINbjcGaUoCbwROeaLzLDyAXsrG0vS1t1/c5TGGujvQnMCnrdz
 rSwGoXtJeRK2yYNUWz/Mt1e/ai04sn/i9/9gZJOagRQaas45SdUIxLqpQs17R5cP
 R/8tFAyjMVyQ1MvI4mP3zK1yvE4fBeC18ZGKXNM57tzGBr0dWcSLrVJH/vS1/3Zx
 947LCzngfw8rPx1v+1LSIaCja62StUHBfmkTHnXDaegCFaWCuyUGgHV7yV3RW5sc
 +jQfMo6WH+O10TWvP28tc9dwamB0kfGr7oZXDH0ucpTwg621tThQZL/urxcAX1BS
 i5HQlATfUD/qWq1c5KUEZMRVje/rU8+SnSbf/xZeQKF4albOoZUWAk6VWqyJiwrr
 VTnMsVVCEZ62Vawyk7tchdMu233GMoMndRPu/Xq+ourKf8cDzzXdzn678cofahpn
 5u+droa/n9O7gkku8Mz4zTmgVyXyJtLi1tMGHYT/M7XM/OrKlwvqmpAnUSuAfaQ8
 C8ywxyrk1GJQoy8LLStaSvUFcAtrdmBpv5hSFPDXkbmy/xHaKw7+vccjL9CybmXo
 P4u2psBQ4opeg0x5I+dS5Tp0WcFXcdo5UGSaw2dRT+7WDxA9ALk3hs/s+vHzv1zO
 Fe7iya1SwbZB
 =qXau
 -----END PGP SIGNATURE-----

Merge tag 'thermal-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fixes from Rafael Wysocki:
 "These fix the Mediatek lvts_thermal driver, the Intel int340x driver,
  and the thermal core (two issues related to system suspend).

  Specifics:

   - Remove the filtered mode for mt8188 from lvts_thermal as it is not
     supported on this platform and fail the lvts_thermal initialization
     when the golden temperature is zero as that means the efuse data is
     not correctly set (Julien Panis)

   - Update the processor_thermal part of the Intel int340x driver to
     support shared interrupts as the processor thermal device interrupt
     may in fact be shared with PCI devices (Srinivas Pandruvada)

   - Synchronize the suspend-prepare and post-suspend actions of the
     thermal PM notifier to avoid a destructive race condition and
     change the priority of that notifier to the minimum to avoid
     interference between the work items spawned by it and the other
     PM notifiers during system resume (Rafael Wysocki)"

* tag 'thermal-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: int340x: processor_thermal: Support shared interrupts
  thermal: core: Change PM notifier priority to the minimum
  thermal: core: Synchronize suspend-prepare and post-suspend actions
  thermal/drivers/mediatek/lvts_thermal: Return error in case of invalid efuse data
  thermal/drivers/mediatek/lvts_thermal: Remove filtered mode for mt8188
2024-06-21 11:16:56 -07:00
Srinivas Pandruvada
096597cfe4 thermal: int340x: processor_thermal: Support shared interrupts
On some systems the processor thermal device interrupt is shared with
other PCI devices. In this case return IRQ_NONE from the interrupt
handler when the interrupt is not for the processor thermal device.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Fixes: f0658708e8 ("thermal: int340x: processor_thermal: Use non MSI interrupts by default")
Cc: 6.7+ <stable@vger.kernel.org> # 6.7+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-19 16:52:15 +02:00
Rafael J. Wysocki
d284d6cdaa - Remove the filtered mode for mt8188 as it is not supported on this
platform (Julien Panis)
 
 - Fail in case the golden temperature is zero as that means the efuse
   data is not correctly set (Julien Panis)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmZvS8YACgkQqDIjiipP
 6E+lHQgAsdwIcybdNpwqoD+D6heY4TsL+++h0RS4id6U5e1jvIemfUw1KaSbdaXH
 w6lOwSjycl1z8+pywRZ7P1psZGIe9SGXkqoV7WsHr22JfqWToV5IX3d9HJxnvOlq
 pvXPH9qJgwb4m2mRtcQq4E+WMAMoXDhbtcFNKtrRqZJOeJQlBm42pdCyTyFLU0Gg
 CkQ4d1pWlVZ6hixzsBF/69kDC8aeGVQnYN32X0CkERhqg/RO0DTneGS2hwT5F6oK
 icgV4aBh5oSn/gymbdqybZWWmm1dvEmgR24OtTrgNYCgPhb55/QsfHO+gU8r4HIu
 q6/D5V7vDr5jZ9FTofm08I2SY2kPvQ==
 =O7Iv
 -----END PGP SIGNATURE-----

Merge tag 'thermal-v6.10-rc4' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux

Merge thermal driver fixes for 6.10-rc5 from Daniel Lezcano:

"- Remove the filtered mode for mt8188 as it is not supported on this
   platform (Julien Panis)

 - Fail in case the golden temperature is zero as that means the efuse
   data is not correctly set (Julien Panis)"

* tag 'thermal-v6.10-rc4' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
  thermal/drivers/mediatek/lvts_thermal: Return error in case of invalid efuse data
  thermal/drivers/mediatek/lvts_thermal: Remove filtered mode for mt8188
2024-06-17 13:58:39 +02:00
Rafael J. Wysocki
494c7d0550 thermal: core: Change PM notifier priority to the minimum
It is reported that commit 5a5efdaffd ("thermal: core: Resume thermal
zones asynchronously") causes battery data in sysfs on Thinkpad P1 Gen2
to become invalid after a resume from S3 (and it is necessary to reboot
the machine to restore correct battery data).  Some investigation into
the problem indicated that it happened because, after the commit in
question, the ACPI battery PM notifier ran in parallel with
thermal_zone_device_resume() for one of the thermal zones which
apparently confused the platform firmware on the affected system.

While the exact reason for the firmware confusion remains unclear, it
is arguably not particularly relevant, and the expected behavior of the
affected system can be restored by making the thermal PM notifier run
at the lowest priority which avoids interference between work items
spawned by it and the other PM notifiers (that will run before those
work items now).

Fixes: 5a5efdaffd ("thermal: core: Resume thermal zones asynchronously")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218881
Reported-by: fhortner@yahoo.de
Tested-by: fhortner@yahoo.de
Cc: 6.8+ <stable@vger.kernel.org> # 6.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-14 17:36:59 +02:00
Rafael J. Wysocki
d2278f3533 thermal: core: Synchronize suspend-prepare and post-suspend actions
After commit 5a5efdaffd ("thermal: core: Resume thermal zones
asynchronously") it is theoretically possible that, if a system suspend
starts immediately after a system resume, thermal_zone_device_resume()
spawned by the thermal PM notifier for one of the thermal zones at the
end of the system resume will run after the PM thermal notifier for the
suspend-prepare action.  If that happens, tz->suspended set by the latter
will be reset by the former which may lead to unexpected consequences.

To avoid that race, synchronize thermal_zone_device_resume() with the
suspend-prepare thermal PM notifier with the help of additional bool
field and completion in struct thermal_zone_device.

Note that this also ensures running __thermal_zone_device_update() at
least once for each thermal zone between system resume and the following
system suspend in case it is needed to start thermal mitigation.

Fixes: 5a5efdaffd ("thermal: core: Resume thermal zones asynchronously")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-14 17:36:59 +02:00
Julien Panis
72cacd06e4 thermal/drivers/mediatek/lvts_thermal: Return error in case of invalid efuse data
This patch prevents from registering thermal entries and letting the
driver misbehave if efuse data is invalid. A device is not properly
calibrated if the golden temperature is zero.

Fixes: f5f633b182 ("thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver")
Signed-off-by: Julien Panis <jpanis@baylibre.com>
Reviewed-by: Nicolas Pitre <npitre@baylibre.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20240604-mtk-thermal-calib-check-v2-1-8f258254051d@baylibre.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-06-12 19:07:34 +02:00
Rafael J. Wysocki
b684682698 thermal: gov_step_wise: Restore passive polling management
Consider a thermal zone with one passive trip point, a cooling device
with 3 states (0, 1, 2) bound to it, passive polling enabled (nonzero
passive_delay_jiffies) and no regular polling (polling_delay_jiffies
equal to 0) that is managed by the Step-Wise governor.  Suppose that
the initial state of the cooling device is 0 and the zone temperature
is below the trip point to start with.

When the trip point is crossed, tz->passive is incremented by the
thermal core and the governor's .manage() callback is invoked.  It
sets 'throttle' to 'true' for the trip in question and
get_target_state() returns 1 for the instance corresponding to the
cooling device (say that 'upper' and 'lower' are set to 2 and 0 for
it, respectively), so its state changes to 1.

Passive polling is still active for the zone, so next time the
temperature is updated, the governor's .manage() callback will be
invoked again.  If the temperature is still rising, it will change
the state of the cooling device to 2.

Now suppose that next time the zone temperature is updated, it falls
below the trip point, so tz->passive is decremented for the zone (say
it becomes 0 then) and the governor's .manage() callbacks runs.

It finds that the temperature trend for the zone is 'falling' and
'throttle' will be set to 'false' for the trip in question, so the
cooling device's state will be changed to 1.  However, because
tz->polling is 0 for the zone, the governor's .manage() callback
may not be invoked again for a long time and the cooling device's
state will not be reset back to 0.

This can happen because commit 042a3d80f1 ("thermal: core: Move
passive polling management to the core") removed passive polling
management from the Step-Wise governor.

Before that change, thermal_zone_trip_update() would bump up
tz->passive when changing the target state for a thermal instance
from "no target" to a specific value and it would drop tz->passive
when changing it back to "no target" which would cause passive
polling to be active for the zone until the governor has reset the
states of all cooling devices.  In particular, in the example above
tz->passive would be incremented when changing the state of the
cooling device from 0 to 1 and then it would be still nonzero when
the state of the cooling device was changed from 2 to 1.

To prevent this problem from occurring, restore the passive polling
management in the Step-Wise governor by partially reverting the
commit in question and update the comment in the restored code
to explain its role more clearly.

Fixes: 042a3d80f1 ("thermal: core: Move passive polling management to the core")
Closes: https://lore.kernel.org/linux-pm/ZmVfcEOxmjUHZTSX@hovoldconsulting.com
Reported-by: Johan Hovold <johan+linaro@kernel.org>
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-11 21:00:44 +02:00
Rafael J. Wysocki
1af89dedc8 thermal: core: Do not fail cdev registration because of invalid initial state
It is reported that commit 31a0fa0019 ("thermal/debugfs: Pass cooling
device state to thermal_debug_cdev_add()") causes the ACPI fan driver
to fail probing on some systems which turns out to be due to the _FST
control method returning an invalid value until _FSL is first evaluated
for the given fan.  If this happens, the .get_cur_state() cooling device
callback returns an error and __thermal_cooling_device_register() fails
as uses that callback after commit 31a0fa0019.

Arguably, _FST should not return an invalid value even if it is
evaluated before _FSL, so this may be regarded as a platform firmware
issue, but at the same time it is not a good enough reason for failing
the cooling device registration where the initial cooling device state
is only needed to initialize a thermal debug facility.

Accordingly, modify __thermal_cooling_device_register() to avoid
calling thermal_debug_cdev_add() instead of returning an error if the
initial .get_cur_state() callback invocation fails.

Fixes: 31a0fa0019 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()")
Closes: https://lore.kernel.org/linux-acpi/20240530153727.843378-1-laura.nao@collabora.com
Reported-by: Laura Nao <laura.nao@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Laura Nao <laura.nao@collabora.com>
2024-06-07 13:51:51 +02:00
Julien Panis
d9fef76e89 thermal/drivers/mediatek/lvts_thermal: Remove filtered mode for mt8188
Filtered mode is not supported on mt8188 SoC and is the source of bad
results. Move to immediate mode which provides good temperatures.

Fixes: f4745f546e ("thermal/drivers/mediatek/lvts_thermal: Add MT8188 support")
Reviewed-by: Nicolas Pitre <npitre@baylibre.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Julien Panis <jpanis@baylibre.com>
Link: https://lore.kernel.org/r/20240516-mtk-thermal-mt8188-mode-fix-v2-1-40a317442c62@baylibre.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-06-04 19:40:51 +02:00
Rafael J. Wysocki
ae2170d6ea thermal: trip: Trigger trip down notifications when trips involved in mitigation become invalid
When a trip point becomes invalid after being crossed on the way up,
it is involved in a mitigation episode that needs to be adjusted to
compensate for the trip going away.

For this reason, introduce thermal_zone_trip_down() as a wrapper
around thermal_trip_crossed() and make thermal_zone_set_trip_temp()
call it if the new temperature of the trip at hand is equal to
THERMAL_TEMP_INVALID and it has been crossed on the way up to trigger
all of the necessary adjustments in user space, the thermal debug
code and the zone governor.

Fixes: 8c69a777e4 ("thermal: core: Fix the handling of invalid trip points")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-27 13:00:00 +02:00
Rafael J. Wysocki
cb573eec60 thermal: core: Introduce thermal_trip_crossed()
Add a helper function called thermal_trip_crossed() to be invoked by
__thermal_zone_device_update() in order to notify user space, the
thermal debug code and the zone governor about trip crossing.

Subsequently, this will also be used in the case when a trip point
becomes invalid after being crossed on the way up.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-27 13:00:00 +02:00
Rafael J. Wysocki
5a599e10e5 thermal/debugfs: Allow tze_seq_show() to print statistics for invalid trips
Commit a6258fde8d ("thermal/debugfs: Make tze_seq_show() skip invalid
trips and trips with no stats") modified tze_seq_show() to skip invalid
trips, but it overlooked the fact that a trip may become invalid during
a mitigation eposide involving it, in which case its statistics should
still be reported.

For this reason, remove the invalid trip temperature check from the
main loop in tze_seq_show().

The trips that have never been valid will still be skipped after this
change because there are no statistics to report for them.

Fixes: a6258fde8d ("thermal/debugfs: Make tze_seq_show() skip invalid trips and trips with no stats")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-27 13:00:00 +02:00
Rafael J. Wysocki
9e69acc1de thermal/debugfs: Print initial trip temperature and hysteresis in tze_seq_show()
The temperature and hysteresis of a trip point may change during a
mitigation episode it is involved in (it may even become invalid
altogether), so in order to avoid possible confusion related to that,
store the temperature and hysteresis of trip points at the time they
are crossed on the way up and print those values instead of their
current temperature and hysteresis.

Fixes: 7ef01f228c ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-27 13:00:00 +02:00
Steven Rostedt (Google)
2c92ca849f tracing/treewide: Remove second parameter of __assign_str()
With the rework of how the __string() handles dynamic strings where it
saves off the source string in field in the helper structure[1], the
assignment of that value to the trace event field is stored in the helper
value and does not need to be passed in again.

This means that with:

  __string(field, mystring)

Which use to be assigned with __assign_str(field, mystring), no longer
needs the second parameter and it is unused. With this, __assign_str()
will now only get a single parameter.

There's over 700 users of __assign_str() and because coccinelle does not
handle the TRACE_EVENT() macro I ended up using the following sed script:

  git grep -l __assign_str | while read a ; do
      sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file;
      mv /tmp/test-file $a;
  done

I then searched for __assign_str() that did not end with ';' as those
were multi line assignments that the sed script above would fail to catch.

Note, the same updates will need to be done for:

  __assign_str_len()
  __assign_rel_str()
  __assign_rel_str_len()

I tested this with both an allmodconfig and an allyesconfig (build only for both).

[1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/

Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts.
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for
Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org>	# xfs
Tested-by: Guenter Roeck <linux@roeck-us.net>
2024-05-22 20:14:47 -04:00
Linus Torvalds
d90be6e4aa Driver core changes for 6.10-rc1
Here is the small set of driver core and kernfs changes for 6.10-rc1.
 
 Nothing major here at all, just a small set of changes for some driver
 core apis, and minor fixups.  Included in here are:
   - sysfs_bin_attr_simple_read() helper added and used
   - device_show_string() helper added and used
 All usages of these were acked by the various maintainers.  Also in here
 are:
   - kernfs minor cleanup
   - removed unused functions
   - typo fix in documentation
   - pay attention to sysfs_create_link() failures in module.c finally.
 
 All of these have been in linux-next for a very long time with no
 reported problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZk3+hQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylfTwCfUyHWkDZuZ7ehdtjzfmcd4EKZBK8An3AAV99G
 ox8PXMxuFTaUEdT/69FQ
 =2sEo
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the small set of driver core and kernfs changes for 6.10-rc1.

  Nothing major here at all, just a small set of changes for some driver
  core apis, and minor fixups. Included in here are:

   - sysfs_bin_attr_simple_read() helper added and used

   - device_show_string() helper added and used

  All usages of these were acked by the various maintainers. Also in
  here are:

   - kernfs minor cleanup

   - removed unused functions

   - typo fix in documentation

   - pay attention to sysfs_create_link() failures in module.c finally

  All of these have been in linux-next for a very long time with no
  reported problems"

* tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  device property: Fix a typo in the description of device_get_child_node_count()
  kernfs: mount: Remove unnecessary ‘NULL’ values from knparent
  scsi: Use device_show_string() helper for sysfs attributes
  platform/x86: Use device_show_string() helper for sysfs attributes
  perf: Use device_show_string() helper for sysfs attributes
  IB/qib: Use device_show_string() helper for sysfs attributes
  hwmon: Use device_show_string() helper for sysfs attributes
  driver core: Add device_show_string() helper for sysfs attributes
  treewide: Use sysfs_bin_attr_simple_read() helper
  sysfs: Add sysfs_bin_attr_simple_read() helper
  module: don't ignore sysfs_create_link() failures
  driver core: Remove unused platform_notify, platform_notify_remove
2024-05-22 12:13:40 -07:00
Linus Torvalds
5b5a5ad5a5 Thermal control fixes for 6.10-rc1
- Fix and clean up the MediaTek lvts_thermal driver (Julien Panis).
 
  - Prevent invalid trip point handling from triggering spurious trip
    point crossing events and allow passive polling to stop when a
    passive trip point involved in it becomes invalid (Rafael Wysocki).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmZMYXoSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxWKsP/R4PjdTgXYKg1NOGjsBJB5kYUVJZZkgZ
 Kr123fp2pmJZ7MI7AsoD3I6d0Z9Kfj2MEHgUYyU7anN+Ub+2JpQufK48CV/KoQSW
 LSC2hJsgHqRaOmP9i+tNGvKY3lnRTCKlLGhK+uSNC5ghHaFJuF/3ihEXHm0RWm4N
 w4VBN3y721wcHGNXCOeQRxM4dq6Z2Z2L5qWsgvEszPz9PwxflJ5ek2KfCyZDjtWu
 zh7xhk6WQ8kHkft2XhrCD9/AVIESeBLO9arvHh3TXUiJHHijyY/oTOHFP6VZ2n/A
 HZiJeRGKMNaBktkkuhCzQQbu0PpZ97GqCyrFc18wDFzdeiFSwd4vNYeeOBK+RyNo
 tUrlfm9X1O5e4Rary9mQ7DW9LJJ6DJSao4FFH0+wh2cSfJtl2/03lg6/obmb8hXh
 uuvKCLJUqoE6IF5bXL1943QWENGyCEHjvJWS46NuGSy+Ly8+khS49rPEfJg1Qj7g
 F12+1GL0uryTDLbL5V+IOtjFtUE786j5bHrxHTsasIvzbNvbm/fHbP/7JJW3l/Wf
 MDwMx7MaijNeHeO4cLGWnaWA4U9kaxsphYii7icuUjw55khqBbt+EmE3Npc/fVlM
 mWVkEZBGWKKpZb2zww7WX6FgBSZb/SZjMoxTUu/2YQgrwwiMe1LIHUFC3U5OATVG
 vKXn8ALYHSly
 =EjP7
 -----END PGP SIGNATURE-----

Merge tag 'thermal-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fixes from Rafael Wysocki:
 "These fix the MediaTek lvts_thermal driver and the handling of trip
  points that start as invalid and are adjusted later by user space via
  sysfs.

  Specifics:

   - Fix and clean up the MediaTek lvts_thermal driver (Julien Panis)

   - Prevent invalid trip point handling from triggering spurious trip
     point crossing events and allow passive polling to stop when a
     passive trip point involved in it becomes invalid (Rafael Wysocki)"

* tag 'thermal-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: core: Fix the handling of invalid trip points
  thermal/drivers/mediatek/lvts_thermal: Fix wrong lvts_ctrl index
  thermal/drivers/mediatek/lvts_thermal: Remove unused members from struct lvts_ctrl_data
  thermal/drivers/mediatek/lvts_thermal: Check NULL ptr on lvts_data
2024-05-21 11:35:45 -07:00
Rafael J. Wysocki
8c69a777e4 thermal: core: Fix the handling of invalid trip points
Commit 9ad18043fb ("thermal: core: Send trip crossing notifications
at init time if needed") overlooked the case when a trip point that
has started as invalid is set to a valid temperature later.  Namely,
the initial threshold value for all trips is zero, so if a previously
invalid trip becomes valid and its (new) low temperature is above the
zone temperature, a spurious trip crossing notification will occur and
it may trigger the WARN_ON() in handle_thermal_trip().

To address this, set the initial threshold for all trips to INT_MAX.

There is also the case when a valid writable trip becomes invalid that
requires special handling.  First, in accordance with the change
mentioned above, the trip's threshold needs to be set to INT_MAX to
avoid the same issue.  Second, if the trip in question is passive and
it has been crossed by the thermal zone temperature on the way up, the
zone's passive count has been incremented and it is in the passive
polling mode, so its passive count needs to be adjusted to allow the
passive polling to be turned off eventually.

Fixes: 9ad18043fb ("thermal: core: Send trip crossing notifications at init time if needed")
Fixes: 042a3d80f1 ("thermal: core: Move passive polling management to the core")
Reported-by: Zhang Rui <zhang.rui@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Wendy Wang <wendy.wang@intel.com>
2024-05-17 12:37:33 +02:00
Rafael J. Wysocki
9dbbcd6c83 - Check for a NULL pointer before using it in the probe routine of the
Mediatek LVTS driver (Julien Panis)
 
 - Remove the num_lvts_sensor and cal_offset fields of the
   lvts_ctrl_data as they are not used. These are not functional fixes
   but slight memory usage fix of the Mediatek LVTS driver (Julien
   Panis)
 
 - Fix wrong lvts_ctrl index leading to a NULL pointer dereference in
   the Mediatek LVTS driver (Julien Panis)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmZDL8IACgkQqDIjiipP
 6E9IGwgAlzaIbugHk6qgrkC1s6JhxdAay/sUAk4zJ7WYirtykdWS0/56hh2DL/Ub
 VttvTWgDf6W4k/+gaV/hVWjEq5og82n249DSFlJmcAAcvTQ0Stk2LG2TTqRBmfb4
 qPKqLoVWlMhDUlbhnNZoS04jApowRitgfOvtC1G0owqkmbzIOKryJJIbvK6VRWpm
 qPo7aR99DJW7Cm3lLujyIC41PGb4vKGS2jFFFrf/s69vj40IVHw2jAHOkSJDS+H1
 Jro7fTHtlf7eskg4abL9zUzDSJMJTNOTyyj8vJ0524V2I7SavZxbUBoWVfYc/OSf
 gWqCZFlY6sWX7k331hhxrpWjdALlyA==
 =e53Q
 -----END PGP SIGNATURE-----

Merge tag 'thermal-v6.10-rc1-2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux

Merge thermal driver fixes for 6.10-rc1 from Daniel Lezcano:

"- Check for a NULL pointer before using it in the probe routine of the
   Mediatek LVTS driver (Julien Panis)

 - Remove the num_lvts_sensor and cal_offset fields of the
   lvts_ctrl_data as they are not used. These are not functional fixes
   but slight memory usage fix of the Mediatek LVTS driver (Julien
   Panis)

 - Fix wrong lvts_ctrl index leading to a NULL pointer dereference in
   the Mediatek LVTS driver (Julien Panis)"

* tag 'thermal-v6.10-rc1-2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
  thermal/drivers/mediatek/lvts_thermal: Fix wrong lvts_ctrl index
  thermal/drivers/mediatek/lvts_thermal: Remove unused members from struct lvts_ctrl_data
  thermal/drivers/mediatek/lvts_thermal: Check NULL ptr on lvts_data
2024-05-15 12:00:38 +02:00
Linus Torvalds
101b7a9714 ACPI updates for 6.10-rc1
- Add EINJ CXL error types to actbl1.h (Ben Cheatham).
 
  - Add support for RAS2 table to ACPICA (Shiju Jose).
 
  - Fix various spelling mistakes in text files and code comments in
    ACPICA (Colin Ian King).
 
  - Fix spelling and typos in ACPICA (Saket Dumbre).
 
  - Modify ACPI_OBJECT_COMMON_HEADER (lijun).
 
  - Add RISC-V RINTC affinity structure support to ACPICA (Haibo Xu).
 
  - Fix CXL 3.0 structure (RDPAS) in the CEDT table (Hojin Nam).
 
  - Add missin increment of registered GPE count to ACPICA (Daniil
    Tatianin).
 
  - Mark new ACPICA release 20240322 (Saket Dumbre).
 
  - Add support for the AEST V2 table to ACPICA (Ruidong Tian).
 
  - Disable -Wstringop-truncation for some ACPICA code in the kernel to
    avoid a compiler warning  that is not very useful (Arnd Bergmann).
 
  - Make the kernel indicate support for several ACPI features that are
    in fact supported to the platform firmware through _OSC and fix the
    Generic Initiator Affinity _OSC bit (Armin Wolf).
 
  - Make the ACPI core set the owner value for ACPI drivers, drop the
    owner setting from a number of drivers and eliminate the owner
    field from struct acpi_driver (Krzysztof Kozlowski).
 
  - Rearrange fields in several structures to effectively eliminate
    computations from container_of() in some cases (Andy Shevchenko).
 
  - Do some assorted cleanups of the ACPI device enumeration code (Andy
    Shevchenko).
 
  - Make the ACPI device enumeration code skip devices with _STA values
    clearly identified by the specification as invalid (Rafael Wysocki).
 
  - Rework the handling of the NHLT table to simplify and clarify it and
    drop some obsolete pieces (Cezary Rojewski).
 
  - Add ACPI IRQ override quirks for Asus Vivobook Pro N6506MV, TongFang
    GXxHRXx and GMxHGxx, and XMG APEX 17 M23 (Guenter Schafranek, Tamim
    Khan, Christoffer Sandberg).
 
  - Add reference to UEFI DSD Guide to the documentation related to the
    ACPI handling of device properties (Sakari Ailus).
 
  - Fix SRAT lookup of CFMWS ranges with numa_fill_memblks(), remove
    lefover architecture-dependent code from the ACPI NUMA handling code
    and simplify it on top of that (Robert Richter).
 
  - Add a num-cs device property to specify the number of chip selects
    for Intel Braswell to the ACPI LPSS (Intel SoC) driver and remove a
    nested CONFIG_PM #ifdef from it (Andy Shevchenko).
 
  - Move three x86-specific ACPI files to the x86 directory (Andy
    Shevchenko).
 
  - Mark SMO8810 accel on Dell XPS 15 9550 as always present and add a
    PNP_UART1_SKIP quirk for Lenovo Blade2 tablets (Hans de Goede).
 
  - Move acpi_blacklisted() declaration to asm/acpi.h (Kuppuswamy
    Sathyanarayanan).
 
  - Add Lunar Lake support to the ACPI DPTF driver (Sumeet Pawnikar).
 
  - Mark the einj_driver driver's remove callback as __exit because it
    cannot get unbound via sysfs (Uwe Kleine-König).
 
  - Fix a typo in the ACPI documentation regarding the layout of sysfs
    subdirectory representing the ACPI namespace (John Watts).
 
  - Make the ACPI pfrut utility print the update_cap field during
    capability query (Chen Yu).
 
  - Add HAS_IOPORT dependencies to PNP (Niklas Schnelle).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmZCZksSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxQHMQAIaab/0AY6MXznyZtf9bfVXzwa/HXWJV
 CWVhubByKHKSJjGcsMbxAXvRq5EDX91QiuaoExEzKY7G9q+jFl1rUMvGWVrySveS
 ae1+jMF9Kd3QXTXB6bCbre4/7bn/LZN0U5nvD5yFpuEEl8GqGafKiUVRVEIQmMuB
 1yi5HU2orKxLYU84mCGIiiWlTAi/6MlSeJ5NZwxPfj/+B24pPWTdu4AsD+b7gC2G
 g36KlAZpnGJqRZ1pg5KQDCV0eKGCJuh9X7ACM8t/XDg4S7FL75X2fWMRUh+4sYp9
 12DdxaEPYCNaVebm2Ae7vrLtcifHNsMoirs3Me0RUOmFqRPaH5YILw3Un2jabiBp
 OWgVuIAa1+M4pKoVVnimM1f9i8rUr8XEl/kheb0c0OinQlecYpPeza9T5U+mMOsy
 knOs5jTm5oIhHS4sQxPu4I1/HX4/5kczDuMs1XW+6ypuT28WCbTj+mrryAQOQHyD
 vFsDBdInU5Yiit4APuJjiTSeEtFkClNI54oKz29mesPRRdDMuYBubdEuXG4lbsF1
 n4SWds3ELFqWlWvyshCMQdyjTN7UUMc2Af+xY3AEIq7uvPTdTQWUmf5LQi2vfgSO
 y3vhCjjt0ONyTXqbkP7HYcOXx5DZsFb/1uXqAAo4/RlZ6LWx/K1+onttECsoQIRH
 rR3pwW76z0Vy
 =UiEQ
 -----END PGP SIGNATURE-----

Merge tag 'acpi-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI updates from Rafael Wysocki:
 "These are ACPICA updates coming from the 20240322 release upstream, an
  ACPI DPTF driver update adding new platform support for it, some new
  quirks and some assorted fixes and cleanups.

  Specifics:

   - Add EINJ CXL error types to actbl1.h (Ben Cheatham)

   - Add support for RAS2 table to ACPICA (Shiju Jose)

   - Fix various spelling mistakes in text files and code comments in
     ACPICA (Colin Ian King)

   - Fix spelling and typos in ACPICA (Saket Dumbre)

   - Modify ACPI_OBJECT_COMMON_HEADER (lijun)

   - Add RISC-V RINTC affinity structure support to ACPICA (Haibo Xu)

   - Fix CXL 3.0 structure (RDPAS) in the CEDT table (Hojin Nam)

   - Add missin increment of registered GPE count to ACPICA (Daniil
     Tatianin)

   - Mark new ACPICA release 20240322 (Saket Dumbre)

   - Add support for the AEST V2 table to ACPICA (Ruidong Tian)

   - Disable -Wstringop-truncation for some ACPICA code in the kernel to
     avoid a compiler warning that is not very useful (Arnd Bergmann)

   - Make the kernel indicate support for several ACPI features that are
     in fact supported to the platform firmware through _OSC and fix the
     Generic Initiator Affinity _OSC bit (Armin Wolf)

   - Make the ACPI core set the owner value for ACPI drivers, drop the
     owner setting from a number of drivers and eliminate the owner
     field from struct acpi_driver (Krzysztof Kozlowski)

   - Rearrange fields in several structures to effectively eliminate
     computations from container_of() in some cases (Andy Shevchenko)

   - Do some assorted cleanups of the ACPI device enumeration code (Andy
     Shevchenko)

   - Make the ACPI device enumeration code skip devices with _STA values
     clearly identified by the specification as invalid (Rafael Wysocki)

   - Rework the handling of the NHLT table to simplify and clarify it
     and drop some obsolete pieces (Cezary Rojewski)

   - Add ACPI IRQ override quirks for Asus Vivobook Pro N6506MV,
     TongFang GXxHRXx and GMxHGxx, and XMG APEX 17 M23 (Guenter
     Schafranek, Tamim Khan, Christoffer Sandberg)

   - Add reference to UEFI DSD Guide to the documentation related to the
     ACPI handling of device properties (Sakari Ailus)

   - Fix SRAT lookup of CFMWS ranges with numa_fill_memblks(), remove
     lefover architecture-dependent code from the ACPI NUMA handling
     code and simplify it on top of that (Robert Richter)

   - Add a num-cs device property to specify the number of chip selects
     for Intel Braswell to the ACPI LPSS (Intel SoC) driver and remove a
     nested CONFIG_PM #ifdef from it (Andy Shevchenko)

   - Move three x86-specific ACPI files to the x86 directory (Andy
     Shevchenko)

   - Mark SMO8810 accel on Dell XPS 15 9550 as always present and add a
     PNP_UART1_SKIP quirk for Lenovo Blade2 tablets (Hans de Goede)

   - Move acpi_blacklisted() declaration to asm/acpi.h (Kuppuswamy
     Sathyanarayanan)

   - Add Lunar Lake support to the ACPI DPTF driver (Sumeet Pawnikar)

   - Mark the einj_driver driver's remove callback as __exit because it
     cannot get unbound via sysfs (Uwe Kleine-König)

   - Fix a typo in the ACPI documentation regarding the layout of sysfs
     subdirectory representing the ACPI namespace (John Watts)

   - Make the ACPI pfrut utility print the update_cap field during
     capability query (Chen Yu)

   - Add HAS_IOPORT dependencies to PNP (Niklas Schnelle)"

* tag 'acpi-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (72 commits)
  ACPI/NUMA: Squash acpi_numa_memory_affinity_init() into acpi_parse_memory_affinity()
  ACPI/NUMA: Squash acpi_numa_slit_init() into acpi_parse_slit()
  ACPI/NUMA: Remove architecture dependent remainings
  x86/numa: Fix SRAT lookup of CFMWS ranges with numa_fill_memblks()
  ACPI: video: Add backlight=native quirk for Lenovo Slim 7 16ARH7
  ACPI: scan: Avoid enumerating devices with clearly invalid _STA values
  ACPI: Move acpi_blacklisted() declaration to asm/acpi.h
  ACPI: resource: Skip IRQ override on Asus Vivobook Pro N6506MV
  ACPICA: AEST: Add support for the AEST V2 table
  ACPI: tools: pfrut: Print the update_cap field during capability query
  ACPI: property: Add reference to UEFI DSD Guide
  Documentation: firmware-guide: ACPI: Fix namespace typo
  PNP: add HAS_IOPORT dependencies
  ACPI: resource: Do IRQ override on TongFang GXxHRXx and GMxHGxx
  ACPI: resource: Do IRQ override on GMxBGxx (XMG APEX 17 M23)
  ACPICA: Update acpixf.h for new ACPICA release 20240322
  ACPICA: events/evgpeinit: don't forget to increment registered GPE count
  ACPICA: Fix CXL 3.0 structure (RDPAS) in the CEDT table
  ACPICA: SRAT: Add dump and compiler support for RINTC affinity structure
  ACPICA: SRAT: Add RISC-V RINTC affinity structure
  ...
2024-05-14 13:31:24 -07:00
Linus Torvalds
f952b6c863 Thermal control updates for 6.10-rc1
- Redesign the thermal governor interface to allow the governors to
    work in a more straightforward way (Rafael Wysocki).
 
  - Make thermal governors take the current trip point thresholds into
    account in their computations which allows trip hysteresis to be
    observed more accurately (Rafael Wysocki).
 
  - Make the thermal core manage passive polling for thermal zones and
    remove passive polling management from thermal governors  (Rafael
    Wysocki).
 
  - Refactor trip point representation and move the definition of
    thermal governor and thermal zone device structures to the thermal
    core (Rafael Wysocki).
 
  - Sort trip point crossing notifications and debug recording of trip
    point crossing events by temperature (Rafael Wysocki).
 
  - Improve the handling of cooling device states and thermal mitigation
    episodes in progress in the thermal debug code (Rafael Wysocki).
 
  - Avoid excessive updates of trip point statistics and clean up the
    printing of thermal mitigation episode information (Rafael Wysocki).
 
  - Clean up thermal governors and thermal core (Rafael Wysocki).
 
  - Allow thermal drivers to register notifiers that will be invoked
    on netlink events like BIND and UNBIND, so that they can adjust
    their activity depending on whether or not there are any
    subscribers of netlink messages coming from them, and make
    the Intel HFI driver use this mechanism (Stanislaw Gruszka).
 
  - Adjust the update delay and capabilities-per-event values in the
    Intel HFI thermal driver to prevent it from missing events and allow
    it to process more data in one go (Ricardo Neri).
 
  - Add missing MODULE_DESCRIPTION() to multiple files in the
    int340x_thermal and intel_soc_dts_iosf drivers (Srinivas Pandruvada).
 
  - Replace deprecated strncpy() with strscpy() in the int340x_thermal
    driver (Justin Stitt).
 
  - Add QCM2290 compatible DT bindings for Lmh and fix a NULL pointer
    dereference in the lmh driver when the SCM is not present (Konrad
    Dybcio).
 
  - Use the strreplace() function instead of doing it manually in the
    Armada driver (Rasmus Villemoes).
 
  - Convert st,stih407-thermal to DT schema and fix up missing
    properties (Raphael Gallais-Pou).
 
  - Add suspend/resume by restoring the context of the tsens sensor
    (Priyansh Jain).
 
  - Support A1 SoC family Thermal Sensor controller and add the DT
    bindings (Dmitry Rokosov).
 
  - Improve the temperature approximation calculation and consolidate
    the Tj constant into a shared area of the structure instead of
    duplicating it on the Rcar Gen3 (Niklas Söderlund).
 
  - Fix the Mediatek LVTS sensor coefficient for the MT8192 in order to support
    it correctly (Hsin-Te Yuan).
 
  - Fix a NULL pointer dereference in the tsens driver when the function
    compute_intercept_slope() is called with a NULL parameter (Aleksandr
    Mishin).
 
  - Remove some unused fields in struct qpnp_tm_chip and k3_bandgap
    (Christophe Jaillet).
 
  - Fix up calibration efuse data decoding, consolidate the code by
    checking boundaries and refactor some part of the LVTS Mediatek
    driver. After setting the scene, add MT8186 and MT8188 along with
    the DT bindings (Nicolas Pitre).
 
  - Add Loongson-2K2000 support after some minor code adjustements and
    providing the DT bindings definition (Binbin Zhou).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmZCZd4SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxg28P/03RJlnog56dF6Isv5m+wPf6S66aDwQ/
 Dv5tUTk7Fy9JU7ICny5TMfWCwAwaxJVmZ08s/BvowQsLFUnZvuKjxjX3ALTe68GU
 u93wX8xN/FGSTY/SnGxwhcS12TpZ33khlM2Ci16Nlfl2RFRkA7CJsjoEpUYMNU9h
 4WNS3gVRamKdrT+KHxZmJ4+ZwPxG3KOpSMgYtOW8Bg7uDTUgMzezL7au7z5YDlNd
 uxxWR2F/Ts0HceuYIXOIun5N+sqy9QEGHT9lJVaMYvQHzx+xUvz10nsUOpB56dZv
 m1CzP5IOy+JloldVEjt9BohzSlEfx4cvkqPePoToOVWlCCt++LUm2tJGEWvZU4XZ
 vo+9Ed3y/5Mp7ws5InSrM51PyC2l+P1Hdh6prgsoaq3XHn5b+DH0Nytly2fut9zF
 rKIN9xBO/UI8k7jYgB9Gk3WfekJWFu9QkA1+udzf8vmPYFyJOt1PdxBFXRy7DdpU
 vXF//E2SIZ428LVolHyQlUCw72ZiHuc4lV1UmqK/9s1vpfp4Ksn0ou3mZIwqfEio
 doI8A1GSym7a8YrnZnrHCyRNCyPj+Wmk42oIDQRGw3VEA9q4jLCSsmYwVptvQM4C
 tgAogbbRfbZ9hTWGhMyMM9f60oDdjbxBZ2uG607UXO/Hqz4P+C1P0I/J55Fp2OM2
 B8KoOSd82URw
 =CzKP
 -----END PGP SIGNATURE-----

Merge tag 'thermal-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control updates from Rafael Wysocki:
 "The most significant part of this is a rework of thermal governors,
  including a redesign of the thermal governor interface and changes to
  make some of them take trip point hysteresis into account properly, as
  well as some related cleanups of the thermal governors and thermal
  core.

  The above is based on preliminary changes refactoring thermal data
  structures and moving the definitions of some of them into the thermal
  core which also ensure that trip point crossing notifications will be
  sent to user space via netlink and recorded in the debug statistics in
  temperature order.

  In addition, netlink bind/unbind notifications are added to the
  thermal core and the Intel HFI driver is modified to use them to avoid
  sending netlink messages until there are subscribers.

  Apart from that, multiple thermal drivers are updated which includes
  new hardware support (MediaTek MT8188 and MT8186, Amlogic A1 thermal
  sensor, Loongson-2K2000, Lmh QCM2290), fixes, cleanups and
  documentation updates, and the recently added thermal debug code is
  fixed and cleaned up.

  Specifics:

   - Redesign the thermal governor interface to allow the governors to
     work in a more straightforward way (Rafael Wysocki)

   - Make thermal governors take the current trip point thresholds into
     account in their computations which allows trip hysteresis to be
     observed more accurately (Rafael Wysocki)

   - Make the thermal core manage passive polling for thermal zones and
     remove passive polling management from thermal governors (Rafael
     Wysocki)

   - Refactor trip point representation and move the definition of
     thermal governor and thermal zone device structures to the thermal
     core (Rafael Wysocki)

   - Sort trip point crossing notifications and debug recording of trip
     point crossing events by temperature (Rafael Wysocki)

   - Improve the handling of cooling device states and thermal
     mitigation episodes in progress in the thermal debug code (Rafael
     Wysocki)

   - Avoid excessive updates of trip point statistics and clean up the
     printing of thermal mitigation episode information (Rafael Wysocki)

   - Clean up thermal governors and thermal core (Rafael Wysocki)

   - Allow thermal drivers to register notifiers that will be invoked on
     netlink events like BIND and UNBIND, so that they can adjust their
     activity depending on whether or not there are any subscribers of
     netlink messages coming from them, and make the Intel HFI driver
     use this mechanism (Stanislaw Gruszka)

   - Adjust the update delay and capabilities-per-event values in the
     Intel HFI thermal driver to prevent it from missing events and
     allow it to process more data in one go (Ricardo Neri)

   - Add missing MODULE_DESCRIPTION() to multiple files in the
     int340x_thermal and intel_soc_dts_iosf drivers (Srinivas
     Pandruvada)

   - Replace deprecated strncpy() with strscpy() in the int340x_thermal
     driver (Justin Stitt)

   - Add QCM2290 compatible DT bindings for Lmh and fix a NULL pointer
     dereference in the lmh driver when the SCM is not present (Konrad
     Dybcio)

   - Use the strreplace() function instead of doing it manually in the
     Armada driver (Rasmus Villemoes)

   - Convert st,stih407-thermal to DT schema and fix up missing
     properties (Raphael Gallais-Pou)

   - Add suspend/resume by restoring the context of the tsens sensor
     (Priyansh Jain)

   - Support A1 SoC family Thermal Sensor controller and add the DT
     bindings (Dmitry Rokosov)

   - Improve the temperature approximation calculation and consolidate
     the Tj constant into a shared area of the structure instead of
     duplicating it on the Rcar Gen3 (Niklas Söderlund)

   - Fix the Mediatek LVTS sensor coefficient for the MT8192 in order to
     support it correctly (Hsin-Te Yuan)

   - Fix a NULL pointer dereference in the tsens driver when the
     function compute_intercept_slope() is called with a NULL parameter
     (Aleksandr Mishin)

   - Remove some unused fields in struct qpnp_tm_chip and k3_bandgap
     (Christophe Jaillet)

   - Fix up calibration efuse data decoding, consolidate the code by
     checking boundaries and refactor some part of the LVTS Mediatek
     driver. After setting the scene, add MT8186 and MT8188 along with
     the DT bindings (Nicolas Pitre)

   - Add Loongson-2K2000 support after some minor code adjustements and
     providing the DT bindings definition (Binbin Zhou)"

* tag 'thermal-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (72 commits)
  thermal: intel: hfi: Increase the number of CPU capabilities per netlink event
  thermal: intel: hfi: Rename HFI_MAX_THERM_NOTIFY_COUNT
  thermal: intel: hfi: Shorten the thermal netlink event delay to 100ms
  thermal: intel: hfi: Rename HFI_UPDATE_INTERVAL
  thermal: intel: Add missing module description
  thermal: core: Move passive polling management to the core
  thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid
  thermal: trip: Add missing empty code line
  thermal/debugfs: Avoid printing zero duration for mitigation events in progress
  thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()
  thermal/debugfs: Create records for cdev states as they get used
  thermal: core: Introduce thermal_governor_trip_crossed()
  thermal/debugfs: Make tze_seq_show() skip invalid trips and trips with no stats
  thermal/debugfs: Rename thermal_debug_update_temp() to thermal_debug_update_trip_stats()
  thermal/debugfs: Clean up thermal_debug_update_temp()
  thermal/debugfs: Avoid excessive updates of trip point statistics
  thermal: core: Relocate critical and hot trip handling
  thermal: core: Drop the .throttle() governor callback
  thermal: gov_user_space: Use .trip_crossed() instead of .throttle()
  thermal: gov_fair_share: Eliminate unnecessary integer divisions
  ...
2024-05-14 12:53:26 -07:00
Linus Torvalds
6e5a0c30b6 Scheduler changes for v6.10:
- Add cpufreq pressure feedback for the scheduler
 
  - Rework misfit load-balancing wrt. affinity restrictions
 
  - Clean up and simplify the code around ::overutilized and
    ::overload access.
 
  - Simplify sched_balance_newidle()
 
  - Bump SCHEDSTAT_VERSION to 16 due to a cleanup of CPU_MAX_IDLE_TYPES
    handling that changed the output.
 
  - Rework & clean up <asm/vtime.h> interactions wrt. arch_vtime_task_switch()
 
  - Reorganize, clean up and unify most of the higher level
    scheduler balancing function names around the sched_balance_*()
    prefix.
 
  - Simplify the balancing flag code (sched_balance_running)
 
  - Miscellaneous cleanups & fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmZBtA0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gQEw//WiCiV7zTlWShSiG/g8GTfoAvl53QTWXF
 0jQ8TUcoIhxB5VeGgxVG1srYt8f505UXjH7L0MJLrbC3nOgRCg4NK57WiQEachKK
 HORIJHT0tMMsKIwX9D5Ovo4xYJn+j7mv7j/caB+hIlzZAbWk+zZPNWcS84p0ZS/4
 appY6RIcp7+cI7bisNMGUuNZS14+WMdWoX3TgoI6ekgDZ7Ky+kQvkwGEMBXsNElO
 qZOj6yS/QUE4Htwz0tVfd6h5svoPM/VJMIvl0yfddPGurfNw6jEh/fjcXnLdAzZ6
 9mgcosETncQbm0vfSac116lrrZIR9ygXW/yXP5S7I5dt+r+5pCrBZR2E5g7U4Ezp
 GjX1+6J9U6r6y12AMLRjadFOcDvxdwtszhZq4/wAcmS3B9dvupnH/w7zqY9ho3wr
 hTdtDHoAIzxJh7RNEHgeUC0/yQX3wJ9THzfYltDRIIjHTuvl4d5lHgsug+4Y9ClE
 pUIQm/XKouweQN9TZz2ULle4ZhRrR9sM9QfZYfirJ/RppmuKool4riWyQFQNHLCy
 mBRMjFFsTpFIOoZXU6pD4EabOpWdNrRRuND/0yg3WbDat2gBWq6jvSFv2UN1/v7i
 Un5jijTuN7t8yP5lY5Tyf47kQfLlA9bUx1v56KnF9mrpI87FyiDD3MiQVhDsvpGX
 rP96BIOrkSo=
 =obph
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

 - Add cpufreq pressure feedback for the scheduler

 - Rework misfit load-balancing wrt affinity restrictions

 - Clean up and simplify the code around ::overutilized and
   ::overload access.

 - Simplify sched_balance_newidle()

 - Bump SCHEDSTAT_VERSION to 16 due to a cleanup of CPU_MAX_IDLE_TYPES
   handling that changed the output.

 - Rework & clean up <asm/vtime.h> interactions wrt arch_vtime_task_switch()

 - Reorganize, clean up and unify most of the higher level
   scheduler balancing function names around the sched_balance_*()
   prefix

 - Simplify the balancing flag code (sched_balance_running)

 - Miscellaneous cleanups & fixes

* tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  sched/pelt: Remove shift of thermal clock
  sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure()
  thermal/cpufreq: Remove arch_update_thermal_pressure()
  sched/cpufreq: Take cpufreq feedback into account
  cpufreq: Add a cpufreq pressure feedback for the scheduler
  sched/fair: Fix update of rd->sg_overutilized
  sched/vtime: Do not include <asm/vtime.h> header
  s390/irq,nmi: Include <asm/vtime.h> header directly
  s390/vtime: Remove unused __ARCH_HAS_VTIME_TASK_SWITCH leftover
  sched/vtime: Get rid of generic vtime_task_switch() implementation
  sched/vtime: Remove confusing arch_vtime_task_switch() declaration
  sched/balancing: Simplify the sg_status bitmask and use separate ->overloaded and ->overutilized flags
  sched/fair: Rename set_rd_overutilized_status() to set_rd_overutilized()
  sched/fair: Rename SG_OVERLOAD to SG_OVERLOADED
  sched/fair: Rename {set|get}_rd_overload() to {set|get}_rd_overloaded()
  sched/fair: Rename root_domain::overload to ::overloaded
  sched/fair: Use helper functions to access root_domain::overload
  sched/fair: Check root_domain::overload value before update
  sched/fair: Combine EAS check with root_domain::overutilized access
  sched/fair: Simplify the continue_balancing logic in sched_balance_newidle()
  ...
2024-05-13 17:18:51 -07:00
Rafael J. Wysocki
d9f87a7e9a Merge branches 'acpi-x86', 'acpi-dptf' and 'acpi-apei'
Merge x86-specific ACPI updates, an ACPI DPTF driver update adding new
platform support to it, and an ACPI APEI update:

 - Add a num-cs device property to specify the number of chip selects
   for Intel Braswell to the ACPI LPSS (Intel SoC) driver and remove a
   nested CONFIG_PM #ifdef from it (Andy Shevchenko).

 - Move three x86-specific ACPI files to the x86 directory (Andy
   Shevchenko).

 - Mark SMO8810 accel on Dell XPS 15 9550 as always present and add a
   PNP_UART1_SKIP quirk for Lenovo Blade2 tablets (Hans de Goede).

 - Move acpi_blacklisted() declaration to asm/acpi.h (Kuppuswamy
   Sathyanarayanan).

 - Add Lunar Lake support to the ACPI DPTF driver (Sumeet Pawnikar).

 - Mark the einj_driver driver's remove callback as __exit because it
   cannot get unbound via sysfs (Uwe Kleine-König).

* acpi-x86:
  ACPI: Move acpi_blacklisted() declaration to asm/acpi.h
  ACPI: x86: Add PNP_UART1_SKIP quirk for Lenovo Blade2 tablets
  ACPI: x86: utils: Mark SMO8810 accel on Dell XPS 15 9550 as always present
  ACPI: x86: Move LPSS to x86 folder
  ACPI: x86: Move blacklist to x86 folder
  ACPI: x86: Move acpi_cmos_rtc to x86 folder
  ACPI: x86: Introduce a Makefile
  ACPI: LPSS: Remove nested ifdeffery for CONFIG_PM
  ACPI: LPSS: Advertise number of chip selects via property

* acpi-dptf:
  ACPI: DPTF: Add Lunar Lake support

* acpi-apei:
  ACPI: APEI: EINJ: mark remove callback as __exit
2024-05-13 20:58:14 +02:00
Rafael J. Wysocki
3a47fbdd1a Merge branch 'thermal-intel'
Merge updates of Intel thermal drivers for v6.10:

 - Add missing MODULE_DESCRIPTION() to multiple files in the
   int340x_thermal and intel_soc_dts_iosf drivers (Srinivas Pandruvada).

 - Adjust the update delay and capabilities-per-event values in the
   Intel HFI thermal driver to prevent it from missing events and allow
   it to process more data in one go (Ricardo Neri).

* thermal-intel:
  thermal: intel: hfi: Increase the number of CPU capabilities per netlink event
  thermal: intel: hfi: Rename HFI_MAX_THERM_NOTIFY_COUNT
  thermal: intel: hfi: Shorten the thermal netlink event delay to 100ms
  thermal: intel: hfi: Rename HFI_UPDATE_INTERVAL
  thermal: intel: Add missing module description
2024-05-10 21:37:57 +02:00
Ricardo Neri
608fa85235 thermal: intel: hfi: Increase the number of CPU capabilities per netlink event
The number of updated CPU capabilities per netlink event is hard-coded to
16. On systems with more than 16 CPUs (a common case), it takes more than
one thermal netlink event to relay all the new capabilities after an HFI
interrupt. This adds unnecessary overhead to both the kernel and user space
entities.

Increase the number of CPU capabilities updated per event to 64. Any system
with 64 CPUs or less can now update all the capabilities in a single
thermal netlink event.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-08 14:02:02 +02:00
Ricardo Neri
07c6f3a7ff thermal: intel: hfi: Rename HFI_MAX_THERM_NOTIFY_COUNT
When processing a hardware update, HFI generates as many thermal netlink
events as needed to relay all the updated CPU capabilities to user space.
The constant HFI_MAX_THERM_NOTIFY_COUNT is the number of CPU capabilities
updated per each of those events.

Give this constant a more descriptive name.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-08 14:02:02 +02:00
Ricardo Neri
ba1a587ed6 thermal: intel: hfi: Shorten the thermal netlink event delay to 100ms
The delay between an HFI interrupt and its corresponding thermal netlink
event has so far been hard-coded to CONFIG_HZ jiffies (1 second). This
delay is too long for hardware that generates updates every tens of
milliseconds.

The HFI driver uses a delayed workqueue to send thermal netlink events. No
subsequent events will be sent if there is pending work.

As a result, much of the information of consecutive hardware updates will
be lost if the workqueue delay is too long. User space entities may act on
obsolete data. If the delay is too short, multiple events may overwhelm
listeners.

Set the delay to 100ms to strike a balance between too many and too few
events. Use milliseconds instead of jiffies to improve readability.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-08 14:02:02 +02:00
Ricardo Neri
564a88eb7a thermal: intel: hfi: Rename HFI_UPDATE_INTERVAL
The name of the constant HFI_UPDATE_INTERVAL is misleading. It is not a
periodic interval at which HFI updates are processed. It is the delay in
the processing of an HFI update after the arrival of an HFI interrupt.

Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-08 14:02:02 +02:00
Rafael J. Wysocki
9396b2a669 Merge branch 'thermal-core'
This includes a major rework of thermal governors and part of the
thermal core interacting with them as well as some fixes and cleanups
of the thermal debug code:

 - Redesign the thermal governor interface to allow the governors to
   work in a more straightforward way.

 - Make thermal governors take the current trip point thresholds into
   account in their computations which allows trip hysteresis to be
   observed more accurately.

 - Clean up thermal governors.

 - Make the thermal core manage passive polling for thermal zones and
   remove passive polling management from thermal governors.

 - Improve the handling of cooling device states and thermal mitigation
   episodes in progress in the thermal debug code.

 - Avoid excessive updates of trip point statistics and clean up the
   printing of thermal mitigation episode information.

* thermal-core: (27 commits)
  thermal: core: Move passive polling management to the core
  thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid
  thermal: trip: Add missing empty code line
  thermal/debugfs: Avoid printing zero duration for mitigation events in progress
  thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()
  thermal/debugfs: Create records for cdev states as they get used
  thermal: core: Introduce thermal_governor_trip_crossed()
  thermal/debugfs: Make tze_seq_show() skip invalid trips and trips with no stats
  thermal/debugfs: Rename thermal_debug_update_temp() to thermal_debug_update_trip_stats()
  thermal/debugfs: Clean up thermal_debug_update_temp()
  thermal/debugfs: Avoid excessive updates of trip point statistics
  thermal: core: Relocate critical and hot trip handling
  thermal: core: Drop the .throttle() governor callback
  thermal: gov_user_space: Use .trip_crossed() instead of .throttle()
  thermal: gov_fair_share: Eliminate unnecessary integer divisions
  thermal: gov_fair_share: Use trip thresholds instead of trip temperatures
  thermal: gov_fair_share: Use .manage() callback instead of .throttle()
  thermal: gov_step_wise: Clean up thermal_zone_trip_update()
  thermal: gov_step_wise: Use trip thresholds instead of trip temperatures
  thermal: gov_step_wise: Use .manage() callback instead of .throttle()
  ...
2024-05-06 12:24:30 +02:00
Rafael J. Wysocki
0021102527 Merge back thermal cotntrol material for v6.10. 2024-05-06 12:23:50 +02:00
Julien Panis
b66c079aab thermal/drivers/mediatek/lvts_thermal: Fix wrong lvts_ctrl index
In 'lvts_should_update_thresh()' and 'lvts_ctrl_start()' functions,
the parameter passed to 'lvts_for_each_valid_sensor()' macro is always
'lvts_ctrl->lvts_data->lvts_ctrl'. In other words, the array index 0
is systematically passed as 'struct lvts_ctrl_data' type item, even
when another item should be consumed instead.

Hence, the 'valid_sensor_mask' value which is selected can be wrong
because unrelated to the 'struct lvts_ctrl_data' type item that should
be used. Hence, some thermal zone can be registered for a sensor 'i'
that does not actually exist. Because of the invalid address used
as 'lvts_sensor[i].msr', this situation ends up with a crash in
'lvts_get_temp()' function, where this 'msr' pointer is passed to
'readl_poll_timeout()' function. The following message is output:
"Unable to handle kernel NULL pointer dereference at virtual
address <msr>", with <msr> = 0.

This patch fixes the issue.

Fixes: 11e6f4c314 ("thermal/drivers/mediatek/lvts_thermal: Allow early empty sensor slots")
Signed-off-by: Julien Panis <jpanis@baylibre.com>
Reviewed-by: Nicolas Pitre <npitre@baylibre.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20240503-mtk-thermal-lvts-ctrl-idx-fix-v1-2-f605c50ca117@baylibre.com
2024-05-06 10:33:26 +02:00
Julien Panis
e2d2266a2a thermal/drivers/mediatek/lvts_thermal: Remove unused members from struct lvts_ctrl_data
In struct lvts_ctrl_data, num_lvts_sensor and cal_offset[] are not used.

Signed-off-by: Julien Panis <jpanis@baylibre.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Nicolas Pitre <npitre@baylibre.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20240503-mtk-thermal-lvts-ctrl-idx-fix-v1-1-f605c50ca117@baylibre.com
2024-05-06 10:33:26 +02:00
Julien Panis
a1191a7735 thermal/drivers/mediatek/lvts_thermal: Check NULL ptr on lvts_data
Verify that lvts_data is not NULL before using it.

Signed-off-by: Julien Panis <jpanis@baylibre.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20240502-mtk-thermal-lvts-data-v1-1-65f1b0bfad37@baylibre.com
2024-05-02 15:56:41 +02:00
Srinivas Pandruvada
48d722fd39 thermal: intel: Add missing module description
Fix warnings displayed by "make W=1" build:

WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/thermal/intel/intel_soc_dts_iosf.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/thermal/intel/int340x_thermal/processor_thermal_rapl.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/thermal/intel/int340x_thermal/processor_thermal_rfim.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/thermal/intel/int340x_thermal/processor_thermal_mbox.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/thermal/intel/int340x_thermal/processor_thermal_wt_req.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/thermal/intel/int340x_thermal/processor_thermal_wt_hint.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/thermal/intel/int340x_thermal/processor_thermal_power_floor

Reported-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-02 12:51:48 +02:00
Rafael J. Wysocki
042a3d80f1 thermal: core: Move passive polling management to the core
Passive polling is enabled by setting the 'passive' field in
struct thermal_zone_device to a positive value so long as the
'passive_delay_jiffies' field is greater than zero.  It causes
the thermal core to actively check the thermal zone temperature
periodically which in theory should be done after crossing a
passive trip point on the way up in order to allow governors to
react more rapidly to temperature changes and adjust mitigation
more precisely.

However, the 'passive' field in struct thermal_zone_device is currently
managed by governors which is quite problematic.  First of all, only
two governors, Step-Wise and Power Allocator, update that field at
all, so the other governors do not benefit from passive polling,
although in principle they should.  Moreover, if the zone governor is
changed from, say, Step-Wise to Fair-Share after 'passive' has been
incremented by the former, it is not going to be reset back to zero by
the latter even if the zone temperature falls down below all passive
trip points.

For this reason, make handle_thermal_trip() increment 'passive'
to enable passive polling for the given thermal zone whenever a
passive trip point is crossed on the way up and decrement it
whenever a passive trip point is crossed on the way down.  Also
remove the 'passive' field updates from governors and additionally
clear it in thermal_zone_device_init() to prevent passive polling
from being enabled after a system resume just beacuse it was enabled
before suspending the system.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-30 21:16:13 +02:00
Rafael J. Wysocki
202aa0d4bb thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid
Make __thermal_zone_device_update() bail out if update_temperature()
fails to update the zone temperature because __thermal_zone_get_temp()
has returned an error and the current zone temperature is
THERMAL_TEMP_INVALID (user space receiving netlink thermal messages,
thermal debug code and thermal governors may get confused otherwise).

Fixes: 9ad18043fb ("thermal: core: Send trip crossing notifications at init time if needed")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-30 21:16:13 +02:00
Rafael J. Wysocki
1502718a66 thermal: trip: Add missing empty code line
Add missing empty line of code to thermal_zone_trip_id().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-30 13:04:56 +02:00
Rafael J. Wysocki
bd700ba9ad thermal/debugfs: Avoid printing zero duration for mitigation events in progress
If a thermal mitigation event is in progress, its duration value has
not been updated yet, so 0 will be printed as the event duration by
tze_seq_show() which is confusing.

Avoid doing that by marking the beginning of the event with the
KTIME_MIN duration value and making tze_seq_show() compute the current
event duration on the fly, in which case '>' will be printed instead of
'=' in the event duration value field.

Similarly, for trip points that have been crossed on the down, mark
the end of mitigation with the KTIME_MAX timestamp value and make
tze_seq_show() compute the current duration on the fly for the trip
points still involved in the mitigation, in which cases the duration
value printed by it will be prepended with a '>' character.

Fixes: 7ef01f228c ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-26 15:01:56 +02:00
Rafael J. Wysocki
31a0fa0019 thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()
If cdev_dt_seq_show() runs before the first state transition of a cooling
device, it will not print any state residency information for it, even
though it might be reasonably expected to print residency information for
the initial state of the cooling device.

For this reason, rearrange the code to get the initial state of a cooling
device at the registration time and pass it to thermal_debug_cdev_add(),
so that the latter can create a duration record for that state which will
allow cdev_dt_seq_show() to print its residency information.

Fixes: 755113d767 ("thermal/debugfs: Add thermal cooling device debugfs information")
Reported-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-26 15:01:56 +02:00
Rafael J. Wysocki
f4ae18fcb6 thermal/debugfs: Create records for cdev states as they get used
Because thermal_debug_cdev_state_update() only creates a duration record
for the old state of a cooling device, if its new state is used for the
first time, there will be no record for it and cdev_dt_seq_show() will
not print the duration information for it even though it contains code
to compute the duration value in that case.

Address this by making thermal_debug_cdev_state_update() create a
duration record for the new state if there is none.

Fixes: 755113d767 ("thermal/debugfs: Add thermal cooling device debugfs information")
Reported-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-26 15:01:56 +02:00
Rafael J. Wysocki
8c882f172f Merge back earlier thermal core changes for v6.10. 2024-04-26 15:00:56 +02:00
Rafael J. Wysocki
d351eb0ab0 thermal/debugfs: Prevent use-after-free from occurring after cdev removal
Since thermal_debug_cdev_remove() does not run under cdev->lock, it can
run in parallel with thermal_debug_cdev_state_update() and it may free
the struct thermal_debugfs object used by the latter after it has been
checked against NULL.

If that happens, thermal_debug_cdev_state_update() will access memory
that has been freed already causing the kernel to crash.

Address this by using cdev->lock in thermal_debug_cdev_remove() around
the cdev->debugfs value check (in case the same cdev is removed at the
same time in two different threads) and its reset to NULL.

Fixes: 755113d767 ("thermal/debugfs: Add thermal cooling device debugfs information")
Cc :6.8+ <stable@vger.kernel.org> # 6.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-26 14:57:50 +02:00
Rafael J. Wysocki
c7f7c37271 thermal/debugfs: Fix two locking issues with thermal zone debug
With the current thermal zone locking arrangement in the debugfs code,
user space can open the "mitigations" file for a thermal zone before
the zone's debugfs pointer is set which will result in a NULL pointer
dereference in tze_seq_start().

Moreover, thermal_debug_tz_remove() is not called under the thermal
zone lock, so it can run in parallel with the other functions accessing
the thermal zone's struct thermal_debugfs object.  Then, it may clear
tz->debugfs after one of those functions has checked it and the
struct thermal_debugfs object may be freed prematurely.

To address the first problem, pass a pointer to the thermal zone's
struct thermal_debugfs object to debugfs_create_file() in
thermal_debug_tz_add() and make tze_seq_start(), tze_seq_next(),
tze_seq_stop(), and tze_seq_show() retrieve it from s->private
instead of a pointer to the thermal zone object.  This will ensure
that tz_debugfs will be valid across the "mitigations" file accesses
until thermal_debugfs_remove_id() called by thermal_debug_tz_remove()
removes that file.

To address the second problem, use tz->lock in thermal_debug_tz_remove()
around the tz->debugfs value check (in case the same thermal zone is
removed at the same time in two different threads) and its reset to NULL.

Fixes: 7ef01f228c ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Cc :6.8+ <stable@vger.kernel.org> # 6.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-26 11:11:30 +02:00
Rafael J. Wysocki
72c1afffa4 thermal/debugfs: Free all thermal zone debug memory on zone removal
Because thermal_debug_tz_remove() does not free all memory allocated for
thermal zone diagnostics, some of that memory becomes unreachable after
freeing the thermal zone's struct thermal_debugfs object.

Address this by making thermal_debug_tz_remove() free all of the memory
in question.

Fixes: 7ef01f228c ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Cc :6.8+ <stable@vger.kernel.org> # 6.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-26 11:11:00 +02:00
Rafael J. Wysocki
f831892e23 thermal: core: Introduce thermal_governor_trip_crossed()
Add a wrapper around the .trip_crossed() governor callback invocation
to reduce code duplications slightly and improve the code layout in
__thermal_zone_device_update().

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-24 20:45:06 +02:00
Rafael J. Wysocki
a6258fde8d thermal/debugfs: Make tze_seq_show() skip invalid trips and trips with no stats
Currently, tze_seq_show() output includes all of the trips in the zone
except for critical ones, including invalid trips and trips with no stats
which is confusing.

Make it skip the trips for which there is not mitigation information.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-04-24 20:43:11 +02:00
Rafael J. Wysocki
8dff6e8438 thermal/debugfs: Rename thermal_debug_update_temp() to thermal_debug_update_trip_stats()
Rename thermal_debug_update_temp() to thermal_debug_update_trip_stats()
which is a better match for the purpose of the function.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-04-24 20:43:09 +02:00
Rafael J. Wysocki
e271f9974d thermal/debugfs: Clean up thermal_debug_update_temp()
Notice that it is not necessary to compute tze in every iteration of the
for () loop in thermal_debug_update_temp() because it is the same for all
trips, so compute it once before the loop starts.

Also use a trip_stats local variable to make the code in that loop easier
to follow and move the trip_id variable definition into that loop because
it is not used elsewhere in the function.

While at it, change to order of local variable definitions in the function
to follow the reverse-xmas-tree pattern.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-04-24 20:43:06 +02:00
Rafael J. Wysocki
0a293c7758 thermal/debugfs: Avoid excessive updates of trip point statistics
Since thermal_debug_update_temp() is called before invoking
thermal_debug_tz_trip_down() for the trips that were crossed by the
zone temperature on the way up, it updates the statistics for them
as though the current zone temperature was above the low temperature
of each of them.  However, if a given trip has just been crossed on the
way down, the zone temperature is in fact below its low temperature,
but this is handled by thermal_debug_tz_trip_down() running after the
update of the trip statistics.

The remedy is to call thermal_debug_update_temp() after
thermal_debug_tz_trip_down() has been invoked for all of the
trips in question, but then thermal_debug_tz_trip_up() needs to
be adjusted, so it does not update the statistics for the trips
that has just been crossed on the way up, as that will be taken
care of by thermal_debug_update_temp() down the road.

Modify the code accordingly.

Fixes: 7ef01f228c ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-04-24 20:43:02 +02:00
Rafael J. Wysocki
2ae0998c67 thermal: core: Relocate critical and hot trip handling
Modify handle_thermal_trip() to call handle_critical_trips() only after
finding that the trip temperature has been crossed on the way up and
remove the redundant temperature check from the latter.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-04-24 20:42:39 +02:00
Rafael J. Wysocki
ad2f8bccd0 thermal: core: Drop the .throttle() governor callback
Since all of the governors in the tree have been switched over to using
the new callbacks, either .trip_crossed() or .manage(), the .throttle()
governor callback is not used any more, so drop it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-04-24 20:42:31 +02:00