linux

iv/linux

Author	SHA1	Message	Date
Dan Carpenter	fa9e6fc74d	thermal/drivers/exynos: Fix an error code in exynos_tmu_probe() commit 02d438f62c05f0d055ceeedf12a2f8796b258c08 upstream. This error path return success but it should propagate the negative error code from devm_clk_get(). Fixes: 6c247393cfdd ("thermal: exynos: Add TMU support for Exynos7 SoC") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210810084413.GA23810@kili Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-09-26 13:36:18 +02:00
Yang Yingliang	092f2c474f	thermal/core: Correct function name thermal_zone_device_unregister() [ Upstream commit a052b5118f13febac1bd901fe0b7a807b9d6b51c ] Fix the following make W=1 kernel build warning: drivers/thermal/thermal_core.c:1376: warning: expecting prototype for thermal_device_unregister(). Prototype was for thermal_zone_device_unregister() instead Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210517051020.3463536-1-yangyingliang@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-28 09:14:25 +02:00
Lukasz Luba	2da11226aa	thermal/core/fair share: Lock the thermal zone while looping over instances commit fef05776eb02238dcad8d5514e666a42572c3f32 upstream. The tz->lock must be hold during the looping over the instances in that thermal zone. This lock was missing in the governor code since the beginning, so it's hard to point into a particular commit. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210422153624.6074-2-lukasz.luba@arm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-22 10:40:33 +02:00
Tony Lindgren	2e4a2bac24	thermal: ti-soc-thermal: Fix bogus thermal shutdowns for omap4430 [ Upstream commit 30d24faba0532d6972df79a1bf060601994b5873 ] We can sometimes get bogus thermal shutdowns on omap4430 at least with droid4 running idle with a battery charger connected: thermal thermal_zone0: critical temperature reached (143 C), shutting down Dumping out the register values shows we can occasionally get a 0x7f value that is outside the TRM listed values in the ADC conversion table. And then we get a normal value when reading again after that. Reading the register multiple times does not seem help avoiding the bogus values as they stay until the next sample is ready. Looking at the TRM chapter "18.4.10.2.3 ADC Codes Versus Temperature", we should have values from 13 to 107 listed with a total of 95 values. But looking at the omap4430_adc_to_temp array, the values are off, and the end values are missing. And it seems that the 4430 ADC table is similar to omap3630 rather than omap4460. Let's fix the issue by using values based on the omap3630 table and just ignoring invalid values. Compared to the 4430 TRM, the omap3630 table has the missing values added while the TRM table only shows every second value. Note that sometimes the ADC register values within the valid table can also be way off for about 1 out of 10 values. But it seems that those just show about 25 C too low values rather than too high values. So those do not cause a bogus thermal shutdown. Fixes: 1a31270e54d7 ("staging: omap-thermal: add OMAP4 data structures") Cc: Merlijn Wajer <merlijn@wizzup.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200706183338.25622-1-tony@atomide.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-09-12 11:47:34 +02:00
Enric Balletbo i Serra	5e48553597	Revert "thermal: mediatek: fix register index error" [ Upstream commit a8f62f183021be389561570ab5f8c701a5e70298 ] This reverts commit eb9aecd90d1a39601e91cd08b90d5fee51d321a6 The above patch is supposed to fix a register index error on mt2701. It is not clear if the problem solved is a hang or just an invalid value returned, my guess is the second. The patch introduces, though, a new hang on MT8173 device making them unusable. So, seems reasonable, revert the patch because introduces a worst issue. The reason I send a revert instead of trying to fix the issue for MT8173 is because the information needed to fix the issue is in the datasheet and is not public. So I am not really able to fix it. Fixes the following bug when CONFIG_MTK_THERMAL is set on MT8173 devices. [ 2.222488] Unable to handle kernel paging request at virtual address ffff8000125f5001 [ 2.230421] Mem abort info: [ 2.233207] ESR = 0x96000021 [ 2.236261] EC = 0x25: DABT (current EL), IL = 32 bits [ 2.241571] SET = 0, FnV = 0 [ 2.244623] EA = 0, S1PTW = 0 [ 2.247762] Data abort info: [ 2.250640] ISV = 0, ISS = 0x00000021 [ 2.254473] CM = 0, WnR = 0 [ 2.257544] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041850000 [ 2.264251] [ffff8000125f5001] pgd=000000013ffff003, pud=000000013fffe003, pmd=000000013fff9003, pte=006800001100b707 [ 2.274867] Internal error: Oops: 96000021 [#1] PREEMPT SMP [ 2.280432] Modules linked in: [ 2.283483] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc6+ #162 [ 2.289914] Hardware name: Google Elm (DT) [ 2.294003] pstate: 20000005 (nzCv daif -PAN -UAO) [ 2.298792] pc : mtk_read_temp+0xb8/0x1c8 [ 2.302793] lr : mtk_read_temp+0x7c/0x1c8 [ 2.306794] sp : ffff80001003b930 [ 2.310100] x29: ffff80001003b930 x28: 0000000000000000 [ 2.315404] x27: 0000000000000002 x26: ffff0000f9550b10 [ 2.320709] x25: ffff0000f9550a80 x24: 0000000000000090 [ 2.326014] x23: ffff80001003ba24 x22: 00000000610344c0 [ 2.331318] x21: 0000000000002710 x20: 00000000000001f4 [ 2.336622] x19: 0000000000030d40 x18: ffff800011742ec0 [ 2.341926] x17: 0000000000000001 x16: 0000000000000001 [ 2.347230] x15: ffffffffffffffff x14: ffffff0000000000 [ 2.352535] x13: ffffffffffffffff x12: 0000000000000028 [ 2.357839] x11: 0000000000000003 x10: ffff800011295ec8 [ 2.363143] x9 : 000000000000291b x8 : 0000000000000002 [ 2.368447] x7 : 00000000000000a8 x6 : 0000000000000004 [ 2.373751] x5 : 0000000000000000 x4 : ffff800011295cb0 [ 2.379055] x3 : 0000000000000002 x2 : ffff8000125f5001 [ 2.384359] x1 : 0000000000000001 x0 : ffff0000f9550a80 [ 2.389665] Call trace: [ 2.392105] mtk_read_temp+0xb8/0x1c8 [ 2.395760] of_thermal_get_temp+0x2c/0x40 [ 2.399849] thermal_zone_get_temp+0x78/0x160 [ 2.404198] thermal_zone_device_update.part.0+0x3c/0x1f8 [ 2.409589] thermal_zone_device_update+0x34/0x48 [ 2.414286] of_thermal_set_mode+0x58/0x88 [ 2.418375] thermal_zone_of_sensor_register+0x1a8/0x1d8 [ 2.423679] devm_thermal_zone_of_sensor_register+0x64/0xb0 [ 2.429242] mtk_thermal_probe+0x690/0x7d0 [ 2.433333] platform_drv_probe+0x5c/0xb0 [ 2.437335] really_probe+0xe4/0x448 [ 2.440901] driver_probe_device+0xe8/0x140 [ 2.445077] device_driver_attach+0x7c/0x88 [ 2.449252] __driver_attach+0xac/0x178 [ 2.453082] bus_for_each_dev+0x78/0xc8 [ 2.456909] driver_attach+0x2c/0x38 [ 2.460476] bus_add_driver+0x14c/0x230 [ 2.464304] driver_register+0x6c/0x128 [ 2.468131] __platform_driver_register+0x50/0x60 [ 2.472831] mtk_thermal_driver_init+0x24/0x30 [ 2.477268] do_one_initcall+0x50/0x298 [ 2.481098] kernel_init_freeable+0x1ec/0x264 [ 2.485450] kernel_init+0x1c/0x110 [ 2.488931] ret_from_fork+0x10/0x1c [ 2.492502] Code: f9401081 f9400402 b8a67821 8b010042 (b9400042) [ 2.498599] ---[ end trace e43e3105ed27dc99 ]--- [ 2.503367] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 2.511020] SMP: stopping secondary CPUs [ 2.514941] Kernel Offset: disabled [ 2.518421] CPU features: 0x090002,25006005 [ 2.522595] Memory Limit: none [ 2.525644] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]-- Cc: Michael Kao <michael.kao@mediatek.com> Fixes: eb9aecd90d1a ("thermal: mediatek: fix register index error") Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200707103412.1010823-1-enric.balletbo@collabora.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-07-22 09:10:51 +02:00
Matthias Kaehlcke	56fc3de955	thermal: cpu_cooling: Actually trace CPU load in thermal_power_cpu_get_power [ Upstream commit bf45ac18b78038e43af3c1a273cae4ab5704d2ce ] The CPU load values passed to the thermal_power_cpu_get_power tracepoint are zero for all CPUs, unless, unless the thermal_power_cpu_limit tracepoint is enabled too: irq/41-rockchip-98 [000] .... 290.972410: thermal_power_cpu_get_power: cpus=0000000f freq=1800000 load={{0x0,0x0,0x0,0x0}} dynamic_power=4815 vs irq/41-rockchip-96 [000] .... 95.773585: thermal_power_cpu_get_power: cpus=0000000f freq=1800000 load={{0x56,0x64,0x64,0x5e}} dynamic_power=4959 irq/41-rockchip-96 [000] .... 95.773596: thermal_power_cpu_limit: cpus=0000000f freq=408000 cdev_state=10 power=416 There seems to be no good reason for omitting the CPU load information depending on another tracepoint. My guess is that the intention was to check whether thermal_power_cpu_get_power is (still) enabled, however 'load_cpu != NULL' already indicates that it was at least enabled when cpufreq_get_requested_power() was entered, there seems little gain from omitting the assignment if the tracepoint was just disabled, so just remove the check. Fixes: 6828a4711f99 ("thermal: add trace events to the power allocator governor") Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Javi Merino <javi.merino@kernel.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-01-29 10:24:24 +01:00
Michael Kao	ffe1b01f7d	thermal: mediatek: fix register index error [ Upstream commit eb9aecd90d1a39601e91cd08b90d5fee51d321a6 ] The index of msr and adcpnp should match the sensor which belongs to the selected bank in the for loop. Fixes: b7cf0053738c ("thermal: Add Mediatek thermal driver for mt2701.") Signed-off-by: Michael Kao <michael.kao@mediatek.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-01-29 10:24:14 +01:00
Wei Wang	a7577abb95	thermal: Fix deadlock in thermal thermal_zone_device_check commit 163b00cde7cf2206e248789d2780121ad5e6a70b upstream. 1851799e1d29 ("thermal: Fix use-after-free when unregistering thermal zone device") changed cancel_delayed_work to cancel_delayed_work_sync to avoid a use-after-free issue. However, cancel_delayed_work_sync could be called insides the WQ causing deadlock. [54109.642398] c0 1162 kworker/u17:1 D 0 11030 2 0x00000000 [54109.642437] c0 1162 Workqueue: thermal_passive_wq thermal_zone_device_check [54109.642447] c0 1162 Call trace: [54109.642456] c0 1162 __switch_to+0x138/0x158 [54109.642467] c0 1162 __schedule+0xba4/0x1434 [54109.642480] c0 1162 schedule_timeout+0xa0/0xb28 [54109.642492] c0 1162 wait_for_common+0x138/0x2e8 [54109.642511] c0 1162 flush_work+0x348/0x40c [54109.642522] c0 1162 __cancel_work_timer+0x180/0x218 [54109.642544] c0 1162 handle_thermal_trip+0x2c4/0x5a4 [54109.642553] c0 1162 thermal_zone_device_update+0x1b4/0x25c [54109.642563] c0 1162 thermal_zone_device_check+0x18/0x24 [54109.642574] c0 1162 process_one_work+0x3cc/0x69c [54109.642583] c0 1162 worker_thread+0x49c/0x7c0 [54109.642593] c0 1162 kthread+0x17c/0x1b0 [54109.642602] c0 1162 ret_from_fork+0x10/0x18 [54109.643051] c0 1162 kworker/u17:2 D 0 16245 2 0x00000000 [54109.643067] c0 1162 Workqueue: thermal_passive_wq thermal_zone_device_check [54109.643077] c0 1162 Call trace: [54109.643085] c0 1162 __switch_to+0x138/0x158 [54109.643095] c0 1162 __schedule+0xba4/0x1434 [54109.643104] c0 1162 schedule_timeout+0xa0/0xb28 [54109.643114] c0 1162 wait_for_common+0x138/0x2e8 [54109.643122] c0 1162 flush_work+0x348/0x40c [54109.643131] c0 1162 __cancel_work_timer+0x180/0x218 [54109.643141] c0 1162 handle_thermal_trip+0x2c4/0x5a4 [54109.643150] c0 1162 thermal_zone_device_update+0x1b4/0x25c [54109.643159] c0 1162 thermal_zone_device_check+0x18/0x24 [54109.643167] c0 1162 process_one_work+0x3cc/0x69c [54109.643177] c0 1162 worker_thread+0x49c/0x7c0 [54109.643186] c0 1162 kthread+0x17c/0x1b0 [54109.643195] c0 1162 ret_from_fork+0x10/0x18 [54109.644500] c0 1162 cat D 0 7766 1 0x00000001 [54109.644515] c0 1162 Call trace: [54109.644524] c0 1162 __switch_to+0x138/0x158 [54109.644536] c0 1162 __schedule+0xba4/0x1434 [54109.644546] c0 1162 schedule_preempt_disabled+0x80/0xb0 [54109.644555] c0 1162 __mutex_lock+0x3a8/0x7f0 [54109.644563] c0 1162 __mutex_lock_slowpath+0x14/0x20 [54109.644575] c0 1162 thermal_zone_get_temp+0x84/0x360 [54109.644586] c0 1162 temp_show+0x30/0x78 [54109.644609] c0 1162 dev_attr_show+0x5c/0xf0 [54109.644628] c0 1162 sysfs_kf_seq_show+0xcc/0x1a4 [54109.644636] c0 1162 kernfs_seq_show+0x48/0x88 [54109.644656] c0 1162 seq_read+0x1f4/0x73c [54109.644664] c0 1162 kernfs_fop_read+0x84/0x318 [54109.644683] c0 1162 __vfs_read+0x50/0x1bc [54109.644692] c0 1162 vfs_read+0xa4/0x140 [54109.644701] c0 1162 SyS_read+0xbc/0x144 [54109.644708] c0 1162 el0_svc_naked+0x34/0x38 [54109.845800] c0 1162 D 720.000s 1->7766->7766 cat [panic] Fixes: 1851799e1d29 ("thermal: Fix use-after-free when unregistering thermal zone device") Cc: stable@vger.kernel.org Signed-off-by: Wei Wang <wvw@google.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-21 10:41:43 +01:00
Geert Uytterhoeven	00192e5ead	thermal: rcar_thermal: Prevent hardware access during system suspend [ Upstream commit 3a31386217628ffe2491695be2db933c25dde785 ] On r8a7791/koelsch, sometimes the following message is printed during system suspend: rcar_thermal e61f0000.thermal: thermal sensor was broken This happens if the workqueue runs while the device is already suspended. Fix this by using the freezable system workqueue instead, cfr. commit 51e20d0e3a60cf46 ("thermal: Prevent polling from happening during system suspend"). Fixes: e0a5172e9eec7f0d ("thermal: rcar: add interrupt support") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:39 +01:00
Ido Schimmel	3e5043ad4b	thermal: Fix use-after-free when unregistering thermal zone device [ Upstream commit 1851799e1d2978f68eea5d9dff322e121dcf59c1 ] thermal_zone_device_unregister() cancels the delayed work that polls the thermal zone, but it does not wait for it to finish. This is racy with respect to the freeing of the thermal zone device, which can result in a use-after-free [1]. Fix this by waiting for the delayed work to finish before freeing the thermal zone device. Note that thermal_zone_device_set_polling() is never invoked from an atomic context, so it is safe to call cancel_delayed_work_sync() that can block. [1] [ +0.002221] ================================================================== [ +0.000064] BUG: KASAN: use-after-free in __mutex_lock+0x1076/0x11c0 [ +0.000016] Read of size 8 at addr ffff8881e48e0450 by task kworker/1:0/17 [ +0.000023] CPU: 1 PID: 17 Comm: kworker/1:0 Not tainted 5.2.0-rc6-custom-02495-g8e73ca3be4af #1701 [ +0.000010] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016 [ +0.000016] Workqueue: events_freezable_power_ thermal_zone_device_check [ +0.000012] Call Trace: [ +0.000021] dump_stack+0xa9/0x10e [ +0.000020] print_address_description.cold.2+0x9/0x25e [ +0.000018] __kasan_report.cold.3+0x78/0x9d [ +0.000016] kasan_report+0xe/0x20 [ +0.000016] __mutex_lock+0x1076/0x11c0 [ +0.000014] step_wise_throttle+0x72/0x150 [ +0.000018] handle_thermal_trip+0x167/0x760 [ +0.000019] thermal_zone_device_update+0x19e/0x5f0 [ +0.000019] process_one_work+0x969/0x16f0 [ +0.000017] worker_thread+0x91/0xc40 [ +0.000014] kthread+0x33d/0x400 [ +0.000015] ret_from_fork+0x3a/0x50 [ +0.000020] Allocated by task 1: [ +0.000015] save_stack+0x19/0x80 [ +0.000015] __kasan_kmalloc.constprop.4+0xc1/0xd0 [ +0.000014] kmem_cache_alloc_trace+0x152/0x320 [ +0.000015] thermal_zone_device_register+0x1b4/0x13a0 [ +0.000015] mlxsw_thermal_init+0xc92/0x23d0 [ +0.000014] __mlxsw_core_bus_device_register+0x659/0x11b0 [ +0.000013] mlxsw_core_bus_device_register+0x3d/0x90 [ +0.000013] mlxsw_pci_probe+0x355/0x4b0 [ +0.000014] local_pci_probe+0xc3/0x150 [ +0.000013] pci_device_probe+0x280/0x410 [ +0.000013] really_probe+0x26a/0xbb0 [ +0.000013] driver_probe_device+0x208/0x2e0 [ +0.000013] device_driver_attach+0xfe/0x140 [ +0.000013] __driver_attach+0x110/0x310 [ +0.000013] bus_for_each_dev+0x14b/0x1d0 [ +0.000013] driver_register+0x1c0/0x400 [ +0.000015] mlxsw_sp_module_init+0x5d/0xd3 [ +0.000014] do_one_initcall+0x239/0x4dd [ +0.000013] kernel_init_freeable+0x42b/0x4e8 [ +0.000012] kernel_init+0x11/0x18b [ +0.000013] ret_from_fork+0x3a/0x50 [ +0.000015] Freed by task 581: [ +0.000013] save_stack+0x19/0x80 [ +0.000014] __kasan_slab_free+0x125/0x170 [ +0.000013] kfree+0xf3/0x310 [ +0.000013] thermal_release+0xc7/0xf0 [ +0.000014] device_release+0x77/0x200 [ +0.000014] kobject_put+0x1a8/0x4c0 [ +0.000014] device_unregister+0x38/0xc0 [ +0.000014] thermal_zone_device_unregister+0x54e/0x6a0 [ +0.000014] mlxsw_thermal_fini+0x184/0x35a [ +0.000014] mlxsw_core_bus_device_unregister+0x10a/0x640 [ +0.000013] mlxsw_devlink_core_bus_device_reload+0x92/0x210 [ +0.000015] devlink_nl_cmd_reload+0x113/0x1f0 [ +0.000014] genl_family_rcv_msg+0x700/0xee0 [ +0.000013] genl_rcv_msg+0xca/0x170 [ +0.000013] netlink_rcv_skb+0x137/0x3a0 [ +0.000012] genl_rcv+0x29/0x40 [ +0.000013] netlink_unicast+0x49b/0x660 [ +0.000013] netlink_sendmsg+0x755/0xc90 [ +0.000013] __sys_sendto+0x3de/0x430 [ +0.000013] __x64_sys_sendto+0xe2/0x1b0 [ +0.000013] do_syscall_64+0xa4/0x4d0 [ +0.000013] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ +0.000017] The buggy address belongs to the object at ffff8881e48e0008 which belongs to the cache kmalloc-2k of size 2048 [ +0.000012] The buggy address is located 1096 bytes inside of 2048-byte region [ffff8881e48e0008, ffff8881e48e0808) [ +0.000007] The buggy address belongs to the page: [ +0.000012] page:ffffea0007923800 refcount:1 mapcount:0 mapping:ffff88823680d0c0 index:0x0 compound_mapcount: 0 [ +0.000020] flags: 0x200000000010200(slab\|head) [ +0.000019] raw: 0200000000010200 ffffea0007682008 ffffea00076ab808 ffff88823680d0c0 [ +0.000016] raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000 [ +0.000007] page dumped because: kasan: bad access detected [ +0.000012] Memory state around the buggy address: [ +0.000012] ffff8881e48e0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] ffff8881e48e0380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] >ffff8881e48e0400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000008] ^ [ +0.000012] ffff8881e48e0480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] ffff8881e48e0500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000007] ================================================================== Fixes: b1569e99c795 ("ACPI: move thermal trip handling to generic thermal layer") Reported-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-10-17 13:42:12 -07:00
Amit Kucheria	bad5f0b768	drivers: thermal: tsens: Don't print error message on -EPROBE_DEFER [ Upstream commit fc7d18cf6a923cde7f5e7ba2c1105bb106d3e29a ] We print a calibration failure message on -EPROBE_DEFER from nvmem/qfprom as follows: [ 3.003090] qcom-tsens 4a9000.thermal-sensor: version: 1.4 [ 3.005376] qcom-tsens 4a9000.thermal-sensor: tsens calibration failed [ 3.113248] qcom-tsens 4a9000.thermal-sensor: version: 1.4 This confuses people when, in fact, calibration succeeds later when nvmem/qfprom device is available. Don't print this message on a -EPROBE_DEFER. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-06-22 08:17:13 +02:00
Peter Zijlstra	1739ba8b00	x86/cpu: Sanitize FAM6_ATOM naming commit f2c4db1bd80720cd8cb2a5aa220d9bc9f374f04e upstream. Going primarily by: https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors with additional information gleaned from other related pages; notably: - Bonnell shrink was called Saltwell - Moorefield is the Merriefield refresh which makes it Airmont The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE for i in `git grep -l FAM6_ATOM` ; do sed -i -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g' \ -e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/' \ -e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g' \ -e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g' \ -e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g' \ -e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g' \ -e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g' \ -e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g' \ -e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g' \ -e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g' \ -e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i} done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: dave.hansen@linux.intel.com Cc: len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bwh: Backported to 4.9: - Drop changes to CPU IDs that weren't already included - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-05-14 19:19:34 +02:00
Matthew Garrett	70dd3bc327	thermal/int340x_thermal: fix mode setting [ Upstream commit 396ee4d0cd52c13b3f6421b8d324d65da5e7e409 ] int3400 only pushes the UUID into the firmware when the mode is flipped to "enable". The current code only exposes the mode flag if the firmware supports the PASSIVE_1 UUID, which not all machines do. Remove the restriction. Signed-off-by: Matthew Garrett <mjg59@google.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-04-20 09:07:48 +02:00
Matthew Garrett	14216f1827	thermal/int340x_thermal: Add additional UUIDs [ Upstream commit 16fc8eca1975358111dbd7ce65e4ce42d1a848fb ] Add more supported DPTF policies than the driver currently exposes. Signed-off-by: Matthew Garrett <mjg59@google.com> Cc: Nisha Aram <nisha.aram@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-04-20 09:07:47 +02:00
Aaron Hill	311acb0a9f	drivers: thermal: int340x_thermal: Fix sysfs race condition [ Upstream commit 129699bb8c7572106b5bbb2407c2daee4727ccad ] Changes since V1: * Use dev_info instead of printk * Use dev_warn instead of BUG_ON Previously, sysfs_create_group was called before all initialization had fully run - specifically, before pci_set_drvdata was called. Since the sysctl group is visible to userspace as soon as sysfs_create_group returns, a small window of time existed during which a process could read from an uninitialized/partially-initialized device. This commit moves the creation of the sysctl group to after all initialized is completed. This ensures that it's impossible for userspace to read from a sysctl file before initialization has fully completed. To catch any future regressions, I've added a check to ensure that proc_thermal_emum_mode is never PROC_THERMAL_NONE when a process tries to read from a sysctl file. Previously, the aforementioned race condition could result in the 'else' branch running while PROC_THERMAL_NONE was set, leading to a null pointer deference. Signed-off-by: Aaron Hill <aa1ronham@gmail.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-03-05 17:57:05 +01:00
Dan Carpenter	26e284fab2	thermal: int340x_thermal: Fix a NULL vs IS_ERR() check [ Upstream commit 3fe931b31a4078395c1967f0495dcc9e5ec6b5e3 ] The intel_soc_dts_iosf_init() function doesn't return NULL, it returns error pointers. Fixes: 4d0dd6c1576b ("Thermal/int340x/processor_thermal: Enable auxiliary DTS for Braswell") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-03-05 17:57:04 +01:00
Eduardo Valentin	5bf9ca73e8	thermal: hwmon: inline helpers when CONFIG_THERMAL_HWMON is not set commit 03334ba8b425b2ad275c8f390cf83c7b081c3095 upstream. Avoid warnings like this: thermal_hwmon.h:29:1: warning: ‘thermal_remove_hwmon_sysfs’ defined but not used [-Wunused-function] thermal_remove_hwmon_sysfs(struct thermal_zone_device *tz) Fixes: 0dd88793aacd ("thermal: hwmon: move hwmon support to single file") Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-12 19:44:59 +01:00
Bjorn Andersson	5b4b5b9687	thermal: generic-adc: Fix adc to temp interpolation [ Upstream commit 9d216211fded20fff301d0317af3238d8383634c ] First correct the edge case to return the last element if we're outside the range, rather than at the last element, so that interpolation is not omitted for points between the two last entries in the table. Then correct the formula to perform linear interpolation based the two points surrounding the read ADC value. The indices for temp are kept as "hi" and "lo" to pair with the adc indices, but there's no requirement that the temperature is provided in descendent order. mult_frac() is used to prevent issues with overflowing the int. Cc: Laxman Dewangan <ldewangan@nvidia.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-02-12 19:44:59 +01:00
Wei Wang	a61cdb8ebc	Thermal: do not clear passive state during system sleep [ Upstream commit 964f4843a455d2ffb199512b08be8d5f077c4cac ] commit ff140fea847e ("Thermal: handle thermal zone device properly during system sleep") added PM hook to call thermal zone reset during sleep. However resetting thermal zone will also clear the passive state and thus cancel the polling queue which leads the passive cooling device state not being cleared properly after sleep. thermal_pm_notify => thermal_zone_device_reset set passive to 0 thermal_zone_trip_update will skip update passive as `old_target == instance->target'. monitor_thermal_zone => thermal_zone_device_set_polling will cancel tz->poll_queue, so the cooling device state will not be changed afterwards. Reported-by: Kame Wang <kamewang@google.com> Signed-off-by: Wei Wang <wvw@google.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-02-12 19:44:53 +01:00
Anson Huang	6ad9884758	thermal: of-thermal: disable passive polling when thermal zone is disabled [ Upstream commit 152395fd03d4ce1e535a75cdbf58105e50587611 ] When thermal zone is in passive mode, disabling its mode from sysfs is NOT taking effect at all, it is still polling the temperature of the disabled thermal zone and handling all thermal trips, it makes user confused. The disabling operation should disable the thermal zone behavior completely, for both active and passive mode, this patch clears the passive_delay when thermal zone is disabled and restores it when it is enabled. Signed-off-by: Anson Huang <Anson.Huang@nxp.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-10-03 17:01:51 -07:00
Bartlomiej Zolnierkiewicz	3221a270e2	thermal: exynos: fix setting rising_threshold for Exynos5433 [ Upstream commit 8bfc218d0ebbabcba8ed2b8ec1831e0cf1f71629 ] Add missing clearing of the previous value when setting rising temperature threshold. Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-08-03 07:55:23 +02:00
Marek Szyprowski	5d1639dae6	thermal: exynos: Propagate error value from tmu_read() commit c8da6cdef57b459ac0fd5d9d348f8460a575ae90 upstream. tmu_read() in case of Exynos4210 might return error for out of bound values. Current code ignores such value, what leads to reporting critical temperature value. Add proper error code propagation to exynos_get_temp() function. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> CC: stable@vger.kernel.org # v4.6+ Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-05-16 10:08:44 +02:00
Marek Szyprowski	3cc96a4acf	thermal: exynos: Reading temperature makes sense only when TMU is turned on commit 88fc6f73fddf64eb507b04f7b2bd01d7291db514 upstream. When thermal sensor is not yet enabled, reading temperature might return random value. This might even result in stopping system booting when such temperature is higher than the critical value. Fix this by checking if TMU has been actually enabled before reading the temperature. This change fixes booting of Exynos4210-based board with TMU enabled (for example Samsung Trats board), which was broken since v4.4 kernel release. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Fixes: 9e4249b40340 ("thermal: exynos: Fix first temperature read after registering sensor") CC: stable@vger.kernel.org # v4.6+ Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-05-16 10:08:44 +02:00
Mikhail Lappo	a84a584c08	thermal: imx: Fix race condition in imx_thermal_probe() commit cf1ba1d73a33944d8c1a75370a35434bf146b8a7 upstream. When device boots with T > T_trip_1 and requests interrupt, the race condition takes place. The interrupt comes before THERMAL_DEVICE_ENABLED is set. This leads to an attempt to reading sensor value from irq and disabling the sensor, based on the data->mode field, which expected to be THERMAL_DEVICE_ENABLED, but still stays as THERMAL_DEVICE_DISABLED. Afher this issue sensor is never re-enabled, as the driver state is wrong. Fix this problem by setting the 'data' members prior to requesting the interrupts. Fixes: 37713a1e8e4c ("thermal: imx: implement thermal alarm interrupt handling") Cc: <stable@vger.kernel.org> Signed-off-by: Mikhail Lappo <mikhail.lappo@esrlabs.com> Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Acked-by: Dong Aisheng <aisheng.dong@nxp.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-04-24 09:34:14 +02:00
Yi Zeng	8e50e9d66e	thermal: power_allocator: fix one race condition issue for thermal_instances list [ Upstream commit a5de11d67dcd268b8d0beb73dc374de5e97f0caf ] When invoking allow_maximum_power and traverse tz->thermal_instances, we should grab thermal_zone_device->lock to avoid race condition. For example, during the system reboot, if the mali GPU device implements device shutdown callback and unregister GPU devfreq cooling device, the deleted list head may be accessed to cause panic, as the following log shows: [ 33.551070] c3 25 (kworker/3:0) Unable to handle kernel paging request at virtual address dead000000000070 [ 33.566708] c3 25 (kworker/3:0) pgd = ffffffc0ed290000 [ 33.572071] c3 25 (kworker/3:0) [dead000000000070] pgd=00000001ed292003, pud=00000001ed292003, *pmd=0000000000000000 [ 33.581515] c3 25 (kworker/3:0) Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 33.599761] c3 25 (kworker/3:0) CPU: 3 PID: 25 Comm: kworker/3:0 Not tainted 4.4.35+ #912 [ 33.614137] c3 25 (kworker/3:0) Workqueue: events_freezable thermal_zone_device_check [ 33.620245] c3 25 (kworker/3:0) task: ffffffc0f32e4200 ti: ffffffc0f32f0000 task.ti: ffffffc0f32f0000 [ 33.629466] c3 25 (kworker/3:0) PC is at power_allocator_throttle+0x7c8/0x8a4 [ 33.636609] c3 25 (kworker/3:0) LR is at power_allocator_throttle+0x808/0x8a4 [ 33.643742] c3 25 (kworker/3:0) pc : [<ffffff8008683dd0>] lr : [<ffffff8008683e10>] pstate: 20000145 [ 33.652874] c3 25 (kworker/3:0) sp : ffffffc0f32f3bb0 [ 34.468519] c3 25 (kworker/3:0) Process kworker/3:0 (pid: 25, stack limit = 0xffffffc0f32f0020) [ 34.477220] c3 25 (kworker/3:0) Stack: (0xffffffc0f32f3bb0 to 0xffffffc0f32f4000) [ 34.819822] c3 25 (kworker/3:0) Call trace: [ 34.824021] c3 25 (kworker/3:0) Exception stack(0xffffffc0f32f39c0 to 0xffffffc0f32f3af0) [ 34.924993] c3 25 (kworker/3:0) [<ffffff8008683dd0>] power_allocator_throttle+0x7c8/0x8a4 [ 34.933184] c3 25 (kworker/3:0) [<ffffff80086807f4>] handle_thermal_trip.part.25+0x70/0x224 [ 34.941545] c3 25 (kworker/3:0) [<ffffff8008680a68>] thermal_zone_device_update+0xc0/0x20c [ 34.949818] c3 25 (kworker/3:0) [<ffffff8008680bd4>] thermal_zone_device_check+0x20/0x2c [ 34.957924] c3 25 (kworker/3:0) [<ffffff80080b93a4>] process_one_work+0x168/0x458 [ 34.965414] c3 25 (kworker/3:0) [<ffffff80080ba068>] worker_thread+0x13c/0x4b4 [ 34.972650] c3 25 (kworker/3:0) [<ffffff80080c0a4c>] kthread+0xe8/0xfc [ 34.979187] c3 25 (kworker/3:0) [<ffffff8008084e90>] ret_from_fork+0x10/0x40 [ 34.986244] c3 25 (kworker/3:0) Code: f9405e73 eb1302bf d102e273 54ffc460 (b9402a61) [ 34.994339] c3 25 (kworker/3:0) ---[ end trace 32057901e3b7e1db ]--- Signed-off-by: Yi Zeng <yizeng@asrmicro.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-04-13 19:48:08 +02:00
Arnd Bergmann	3bdcbc647b	thermal: fix INTEL_SOC_DTS_IOSF_CORE dependencies commit 68fd77cf8a4b045594231f07e5fc92e1a34c0a9e upstream. We get a Kconfig warning when selecting this without also enabling CONFIG_PCI: warning: (X86_INTEL_LPSS && INTEL_SOC_DTS_IOSF_CORE && SND_SST_IPC_ACPI && MMC_SDHCI_ACPI && PUNIT_ATOM_DEBUG) selects IOSF_MBI which has unmet direct dependencies (PCI) This adds a new depedency. Fixes: 3a2419f865a6 ("Thermal: Intel SoC: DTS thermal use common APIs") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-02-25 11:05:52 +01:00
Daniel Lezcano	3cff90788e	thermal/drivers/hisi: Fix multiple alarm interrupts firing commit db2b0332608c8e648ea1e44727d36ad37cdb56cb upstream. The DT specifies a threshold of 65000, we setup the register with a value in the temperature resolution for the controller, 64656. When we reach 64656, the interrupt fires, the interrupt is disabled. Then the irq thread runs and calls thermal_zone_device_update() which will call in turn hisi_thermal_get_temp(). The function will look if the temperature decreased, assuming it was more than 65000, but that is not the case because the current temperature is 64656 (because of the rounding when setting the threshold). This condition being true, we re-enable the interrupt which fires immediately after exiting the irq thread. That happens again and again until the temperature goes to more than 65000. Potentially, there is here an interrupt storm if the temperature stabilizes at this temperature. A very unlikely case but possible. In any case, it does not make sense to handle dozens of alarm interrupt for nothing. Fix this by rounding the threshold value to the controller resolution so the check against the threshold is consistent with the one set in the controller. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Leo Yan <leo.yan@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Kevin Wangtao <kevin.wangtao@hisilicon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-25 14:23:46 +01:00
Daniel Lezcano	1b2c46a6be	thermal/drivers/hisi: Simplify the temperature/step computation commit 48880b979cdc9ef5a70af020f42b8ba1e51dbd34 upstream. The step and the base temperature are fixed values, we can simplify the computation by converting the base temperature to milli celsius and use a pre-computed step value. That saves us a lot of mult + div for nothing at runtime. Take also the opportunity to change the function names to be consistent with the rest of the code. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Leo Yan <leo.yan@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Kevin Wangtao <kevin.wangtao@hisilicon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-25 14:23:46 +01:00
Daniel Lezcano	2dac559df9	thermal/drivers/hisi: Fix kernel panic on alarm interrupt commit 2cb4de785c40d4a2132cfc13e63828f5a28c3351 upstream. The threaded interrupt for the alarm interrupt is requested before the temperature controller is setup. This one can fire an interrupt immediately leading to a kernel panic as the sensor data is not initialized. In order to prevent that, move the threaded irq after the Tsensor is setup. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Leo Yan <leo.yan@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Kevin Wangtao <kevin.wangtao@hisilicon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-25 14:23:46 +01:00
Daniel Lezcano	b679b8d7ba	thermal/drivers/hisi: Fix missing interrupt enablement commit c176b10b025acee4dc8f2ab1cd64eb73b5ccef53 upstream. The interrupt for the temperature threshold is not enabled at the end of the probe function, enable it after the setup is complete. On the other side, the irq_enabled is not correctly set as we are checking if the interrupt is masked where 'yes' means irq_enabled=false. irq_get_irqchip_state(data->irq, IRQCHIP_STATE_MASKED, &data->irq_enabled); As we are always enabling the interrupt, it is pointless to check if the interrupt is masked or not, just set irq_enabled to 'true'. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Leo Yan <leo.yan@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Kevin Wangtao <kevin.wangtao@hisilicon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-25 14:23:46 +01:00
Arvind Yadav	82bf76afa8	thermal: hisilicon: Handle return value of clk_prepare_enable commit 919054fdfc8adf58c5512fe9872eb53ea0f5525d upstream. clk_prepare_enable() can fail here and we must check its return value. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Kevin Wangtao <kevin.wangtao@hisilicon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-25 14:23:46 +01:00
Daniel Lezcano	d8914530f2	thermal/drivers/step_wise: Fix temperature regulation misbehavior [ Upstream commit 07209fcf33542c1ff1e29df2dbdf8f29cdaacb10 ] There is a particular situation when the cooling device is cpufreq and the heat dissipation is not efficient enough where the temperature increases little by little until reaching the critical threshold and leading to a SoC reset. The behavior is reproducible on a hikey6220 with bad heat dissipation (eg. stacked with other boards). Running a simple C program doing while(1); for each CPU of the SoC makes the temperature to reach the passive regulation trip point and ends up to the maximum allowed temperature followed by a reset. This issue has been also reported by running the libhugetlbfs test suite. What is observed is a ping pong between two cpu frequencies, 1.2GHz and 900MHz while the temperature continues to grow. It appears the step wise governor calls get_target_state() the first time with the throttle set to true and the trend to 'raising'. The code selects logically the next state, so the cpu frequency decreases from 1.2GHz to 900MHz, so far so good. The temperature decreases immediately but still stays greater than the trip point, then get_target_state() is called again, this time with the throttle set to true and the trend to 'dropping'. From there the algorithm assumes we have to step down the state and the cpu frequency jumps back to 1.2GHz. But the temperature is still higher than the trip point, so get_target_state() is called with throttle=1 and trend='raising' again, we jump to 900MHz, then get_target_state() is called with throttle=1 and trend='dropping', we jump to 1.2GHz, etc ... but the temperature does not stabilizes and continues to increase. [ 237.922654] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1 [ 237.922678] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=1 [ 237.922690] thermal cooling_device0: cur_state=0 [ 237.922701] thermal cooling_device0: old_target=0, target=1 [ 238.026656] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=2,throttle=1 [ 238.026680] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=2,throttle=1 [ 238.026694] thermal cooling_device0: cur_state=1 [ 238.026707] thermal cooling_device0: old_target=1, target=0 [ 238.134647] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1 [ 238.134667] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=1 [ 238.134679] thermal cooling_device0: cur_state=0 [ 238.134690] thermal cooling_device0: old_target=0, target=1 In this situation the temperature continues to increase while the trend is oscillating between 'dropping' and 'raising'. We need to keep the current state untouched if the throttle is set, so the temperature can decrease or a higher state could be selected, thus preventing this oscillation. Keeping the next_target untouched when 'throttle' is true at 'dropping' time fixes the issue. The following traces show the governor does not change the next state if trend==2 (dropping) and throttle==1. [ 2306.127987] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1 [ 2306.128009] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=1 [ 2306.128021] thermal cooling_device0: cur_state=0 [ 2306.128031] thermal cooling_device0: old_target=0, target=1 [ 2306.231991] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=2,throttle=1 [ 2306.232016] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=2,throttle=1 [ 2306.232030] thermal cooling_device0: cur_state=1 [ 2306.232042] thermal cooling_device0: old_target=1, target=1 [ 2306.335982] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=0,throttle=1 [ 2306.336006] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=0,throttle=1 [ 2306.336021] thermal cooling_device0: cur_state=1 [ 2306.336034] thermal cooling_device0: old_target=1, target=1 [ 2306.439984] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=2,throttle=1 [ 2306.440008] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=2,throttle=0 [ 2306.440022] thermal cooling_device0: cur_state=1 [ 2306.440034] thermal cooling_device0: old_target=1, target=0 [ ... ] After a while, if the temperature continues to increase, the next state becomes 2 which is 720MHz on the hikey. That results in the temperature stabilizing around the trip point. [ 2455.831982] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1 [ 2455.832006] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=0 [ 2455.832019] thermal cooling_device0: cur_state=1 [ 2455.832032] thermal cooling_device0: old_target=1, target=1 [ 2455.935985] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=0,throttle=1 [ 2455.936013] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=0,throttle=0 [ 2455.936027] thermal cooling_device0: cur_state=1 [ 2455.936040] thermal cooling_device0: old_target=1, target=1 [ 2456.043984] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=0,throttle=1 [ 2456.044009] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=0,throttle=0 [ 2456.044023] thermal cooling_device0: cur_state=1 [ 2456.044036] thermal cooling_device0: old_target=1, target=1 [ 2456.148001] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1 [ 2456.148028] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=1 [ 2456.148042] thermal cooling_device0: cur_state=1 [ 2456.148055] thermal cooling_device0: old_target=1, target=2 [ 2456.252009] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=2,throttle=1 [ 2456.252041] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=2,throttle=0 [ 2456.252058] thermal cooling_device0: cur_state=2 [ 2456.252075] thermal cooling_device0: old_target=2, target=1 IOW, this change is needed to keep the state for a cooling device if the temperature trend is oscillating while the temperature increases slightly. Without this change, the situation above leads to a catastrophic crash by a hardware reset on hikey. This issue has been reported to happen on an OMAP dra7xx also. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Keerthy <j-keerthy@ti.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Tested-by: Keerthy <j-keerthy@ti.com> Reviewed-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-20 10:07:30 +01:00
Viresh Kumar	7cd7b56037	thermal: cpu_cooling: Avoid accessing potentially freed structures commit 289d72afddf83440117c35d864bf0c6309c1d011 upstream. After the lock is dropped, it is possible that the cpufreq_dev gets freed before we call get_level() and that can cause kernel to crash. Drop the lock after we are done using the structure. Fixes: 02373d7c69b4 ("thermal: cpu_cooling: fix lockdep problems in cpu_cooling") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Tested-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-07-27 15:07:55 -07:00
Johan Hovold	76572609e4	thermal: max77620: fix device-node reference imbalance commit c592fafbdbb6b1279b76a54722d1465ca77e5bde upstream. The thermal child device reuses the parent MFD-device device-tree node when registering a thermal zone, but did not take a reference to the node. This leads to a reference imbalance, and potential use-after-free, when the node reference is dropped by the platform-bus device destructor (once for the child and later again for the parent). Fix this by dropping any reference already held to a device-tree node and getting a reference to the parent's node which will be balanced on reprobe or on platform-device release, whichever comes first. Note that simply clearing the of_node pointer on probe errors and on driver unbind would not allow the use of device-managed resources as specifically thermal_zone_of_sensor_unregister() claims that a valid device-tree node pointer is needed during deregistration (even if it currently does not seem to use it). Fixes: ec4664b3fd6d ("thermal: max77620: Add thermal driver for reporting junction temp") Cc: Laxman Dewangan <ldewangan@nvidia.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-07-27 15:07:55 -07:00
Krzysztof Kozlowski	fab303ba78	thermal: hwmon: Properly report critical temperature in sysfs commit f37fabb8643eaf8e3b613333a72f683770c85eca upstream. In the critical sysfs entry the thermal hwmon was returning wrong temperature to the user-space. It was reporting the temperature of the first trip point instead of the temperature of critical trip point. For example: /sys/class/hwmon/hwmon0/temp1_crit:50000 /sys/class/thermal/thermal_zone0/trip_point_0_temp:50000 /sys/class/thermal/thermal_zone0/trip_point_0_type:active /sys/class/thermal/thermal_zone0/trip_point_3_temp:120000 /sys/class/thermal/thermal_zone0/trip_point_3_type:critical Since commit e68b16abd91d ("thermal: add hwmon sysfs I/F") the driver have been registering a sysfs entry if get_crit_temp() callback was provided. However when accessed, it was calling get_trip_temp() instead of the get_crit_temp(). Fixes: e68b16abd91d ("thermal: add hwmon sysfs I/F") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-01-09 08:32:18 +01:00
Jacob Pan	ec638db8cb	thermal/powerclamp: add back module device table Commit 3105f234e0aba43e44e277c20f9b32ee8add43d4 replaced module cpu id table with a cpu feature check, which is logically correct. But we need the module device table to allow module auto loading. Cc: stable@vger.kernel.org # 4.8 Fixes:3105f234 thermal/powerclamp: correct cpu support check Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>	2016-11-21 20:54:40 +08:00
Eric Ernst	3105f234e0	thermal/powerclamp: correct cpu support check Initial logic for checking CPU match resulted in OR of CPU features rather than the intended AND. Updated to use boot_cpu_has macro rather than x86_match_cpu. In addition, MWAIT is the only required CPU feature for idle injection to work. Drop other feature requirements since they are only needed for optimal efficiency. CC: stable@vger.kernel.org #v4.7 Signed-off-by: Eric Ernst <eric.ernst@linux.intel.com> Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>	2016-10-20 14:15:44 +08:00
Srinivas Pandruvada	33086a9a30	thermal: intel_pch_thermal: Enable Haswell PCH Added missing support for Haswell PCH thermal sensor. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>	2016-10-20 14:15:44 +08:00
Srinivas Pandruvada	aed3f249f9	thermal: intel_pch_thermal: Add an ACPI passive trip On the platforms which has an ACPI companion device associated with PCH thermal device, read passive trip temperature via ACPI _PSV control method. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>	2016-10-20 14:15:44 +08:00
Linus Torvalds	2d2474a194	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal managament updates from Zhang Rui: - Enhance thermal "userspace" governor to export the reason when a thermal event is triggered and delivered to user space. From Srinivas Pandruvada - Introduce a single TSENS thermal driver for the different versions of the TSENS IP that exist, on different qcom msm/apq SoCs'. Support for msm8916, msm8960, msm8974 and msm8996 families is also added. From Rajendra Nayak - Introduce hardware-tracked trip points support to the device tree thermal sensor framework. The framework supports an arbitrary number of trip points. Whenever the current temperature is changed, the trip points immediately below and above the current temperature are found, driver callback is invoked to program the hardware to get notified when either of the two trip points are triggered. Hardware-tracked trip points support for rockchip thermal driver is also added at the same time. From Sascha Hauer, Caesar Wang - Introduce a new thermal driver, which enables TMU (Thermal Monitor Unit) on QorIQ platform. From Jia Hongtao - Introduce a new thermal driver for Maxim MAX77620. From Laxman Dewangan - Introduce a new thermal driver for Intel platforms using WhiskeyCove PMIC. From Bin Gao - Add mt2701 chip support to MTK thermal driver. From Dawei Chien - Enhance Tegra thermal driver to enable soctherm node and set "critical", "hot" trips, for Tegra124, Tegra132, Tegra210. From Wei Ni - Add resume support for tango thermal driver. From Marc Gonzalez - several small fixes and improvements for rockchip, qcom, imx, rcar, mtk thermal drivers and thermal core code. From Caesar Wang, Keerthy, Rocky Hao, Wei Yongjun, Peter Robinson, Bui Duc Phuc, Axel Lin, Hugh Kang * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (48 commits) thermal: int3403: Process trip change notification thermal: int340x: New Interface to read trip and notify thermal: user_space gov: Add additional information in uevent thermal: Enhance thermal_zone_device_update for events arm64: tegra: set hot trips for Tegra210 arm64: tegra: set critical trips for Tegra210 arm64: tegra: add soctherm node for Tegra210 arm64: tegra: set hot trips for Tegra132 arm64: tegra: set critical trips for Tegra132 arm64: tegra: use tegra132-soctherm for Tegra132 arm: tegra: set hot trips for Tegra124 arm: tegra: set critical trips for Tegra124 thermal: tegra: add hw-throttle for Tegra132 thermal: tegra: add hw-throttle function of: Add bindings of hw throttle for Tegra soctherm thermal: mtk_thermal: Check return value of devm_thermal_zone_of_sensor_register thermal: Add Mediatek thermal driver for mt2701. dt-bindings: thermal: Add binding document for Mediatek thermal controller thermal: max77620: Add thermal driver for reporting junction temp thermal: max77620: Add DT binding doc for thermal driver ...	2016-10-12 11:05:23 -07:00
Srinivas Pandruvada	43720df960	thermal: int3403: Process trip change notification When ACPI sends notification for trip point change re-read trips and notify thermal core, so that this can be passed to user space thermal controller. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>	2016-09-27 14:37:17 +08:00
Srinivas Pandruvada	9176ae8668	thermal: int340x: New Interface to read trip and notify Separated the code for reading trip points from int340x_thermal_zone_add to a standalone function int340x_thermal_read_trips. This standlone interface to read is exported so that int340x drivers can re-read trips on ACPI notification for trip point change. Also the appropriate notification events are sent by int340x driver based on the acpi event using int340x_thermal_zone_device_update(). Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>	2016-09-27 14:37:14 +08:00
Srinivas Pandruvada	998d924b71	thermal: user_space gov: Add additional information in uevent Add additional properties: NAME= Thermal zone type TEMP= Temperature sample value TRIP= Violated trip index EVENT= The notification event (new temperature sample, trip violation trip changed) This is the additional information to what kobject_uevent already provides. So it will not impact existing user spaces. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>	2016-09-27 14:37:10 +08:00
Srinivas Pandruvada	0e70f466fb	thermal: Enhance thermal_zone_device_update for events Added one additional parameter to thermal_zone_device_update() to provide caller with an optional capability to specify reason. Currently this event is used by user space governor to trigger different processing based on event code. Also it saves an additional call to read temperature when the event is received. The following events are cuurently defined: - Unspecified event - New temperature sample - Trip point violated - Trip point changed - thermal device up and down - thermal device power capability changed Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>	2016-09-27 14:35:21 +08:00
Zhang Rui	040a3eadf0	Merge branches 'thermal-soc', 'thermal-core', 'thermal-intel' and 'thermal-tegra-hw-throttle' into next	2016-09-27 14:03:19 +08:00
Wei Ni	6c7c324570	thermal: tegra: add hw-throttle for Tegra132 Tegra132 use CCROC throttle registers to configure pulse skiper, set these registers to enable throttle function for Tegra132. Signed-off-by: Wei Ni <wni@nvidia.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>	2016-09-27 14:02:32 +08:00
Wei Ni	ce0dbf04f6	thermal: tegra: add hw-throttle function Tegra soctherm support HW throttle, when the soctherm snesors' temperature is above the throttle trip point, it will trigger pulse skiper to tune clocks accroding to the throttle depth. Add this function for Tegra124 and Tegra210. Since Tegra132 use different registers to configure pulse skiper, will support it in next patch. Signed-off-by: Wei Ni <wni@nvidia.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>	2016-09-27 14:02:32 +08:00
Axel Lin	1f6b0889d0	thermal: mtk_thermal: Check return value of devm_thermal_zone_of_sensor_register devm_thermal_zone_of_sensor_register can fail, so check it's return value. Signed-off-by: Axel Lin <axel.lin@ingics.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>	2016-09-27 14:02:16 +08:00
dawei.chien@mediatek.com	b7cf005373	thermal: Add Mediatek thermal driver for mt2701. This patch adds support for mt2701 chip to mtk_thermal, and integrate both mt8173 and mt2701 on the same driver. MT8173 has four banks and five sensors, and MT2701 has only one bank and three sensors. Signed-off-by: Dawei Chien <dawei.chien@mediatek.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>	2016-09-27 14:02:16 +08:00
Laxman Dewangan	ec4664b3fd	thermal: max77620: Add thermal driver for reporting junction temp Maxim Semiconductor Max77620 supports alarm interrupts when its die temperature crosses 120C and 140C. These threshold temperatures are not configurable. Add thermal driver to register PMIC die temperature as thermal zone sensor and capture the die temperature warning interrupts to notifying the client. Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>	2016-09-27 14:02:16 +08:00

1 2 3 4 5 ...

946 Commits