linux

iv/linux

Author	SHA1	Message	Date
Jakub Kicinski	a9fb0f1559	bnxt: don't lock the tx queue from napi poll [ Upstream commit 3c603136c9f82833813af77185618de5af67676c ] We can't take the tx lock from the napi poll routine, because netpoll can poll napi at any moment, including with the tx lock already held. The tx lock is protecting against two paths - the disable path, and (as Michael points out) the NETDEV_TX_BUSY case which may occur if NAPI completions race with start_xmit and both decide to re-enable the queue. For the disable/ifdown path use synchronize_net() to make sure closing the device does not race we restarting the queues. Annotate accesses to dev_state against data races. For the NAPI cleanup vs start_xmit path - appropriate barriers are already in place in the main spot where Tx queue is stopped but we need to do the same careful dance in the TX_BUSY case. Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:17 -04:00
Ilya Leoshkevich	1fe038030c	bpf: Clear zext_dst of dead insns [ Upstream commit 45c709f8c71b525b51988e782febe84ce933e7e0 ] "access skb fields ok" verifier test fails on s390 with the "verifier bug. zext_dst is set, but no reg is defined" message. The first insns of the test prog are ... 0: 61 01 00 00 00 00 00 00 ldxw %r0,[%r1+0] 8: 35 00 00 01 00 00 00 00 jge %r0,0,1 10: 61 01 00 08 00 00 00 00 ldxw %r0,[%r1+8] ... and the 3rd one is dead (this does not look intentional to me, but this is a separate topic). sanitize_dead_code() converts dead insns into "ja -1", but keeps zext_dst. When opt_subreg_zext_lo32_rnd_hi32() tries to parse such an insn, it sees this discrepancy and bails. This problem can be seen only with JITs whose bpf_jit_needs_zext() returns true. Fix by clearning dead insns' zext_dst. The commits that contributed to this problem are: 1. 5aa5bd14c5f8 ("bpf: add initial suite for selftests"), which introduced the test with the dead code. 2. 5327ed3d44b7 ("bpf: verifier: mark verified-insn with sub-register zext flag"), which introduced the zext_dst flag. 3. 83a2881903f3 ("bpf: Account for BPF_FETCH in insn_has_def32()"), which introduced the sanity check. 4. 9183671af6db ("bpf: Fix leakage under speculation on mispredicted branches"), which bisect points to. It's best to fix this on stable branches that contain the second one, since that's the point where the inconsistency was introduced. Fixes: 5327ed3d44b7 ("bpf: verifier: mark verified-insn with sub-register zext flag") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210812151811.184086-2-iii@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:17 -04:00
Xie Yongji	73a45f75a0	vhost: Fix the calculation in vhost_overflow() [ Upstream commit f7ad318ea0ad58ebe0e595e59aed270bb643b29b ] This fixes the incorrect calculation for integer overflow when the last address of iova range is 0xffffffff. Fixes: ec33d031a14b ("vhost: detect 32 bit integer wrap around") Reported-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210728130756.97-2-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:16 -04:00
Parav Pandit	b9a59636c4	virtio: Protect vqs list access [ Upstream commit 0e566c8f0f2e8325e35f6f97e13cde5356b41814 ] VQs may be accessed to mark the device broken while they are created/destroyed. Hence protect the access to the vqs list. Fixes: e2dcdfe95c0b ("virtio: virtio_break_device() to mark all virtqueues broken.") Signed-off-by: Parav Pandit <parav@nvidia.com> Link: https://lore.kernel.org/r/20210721142648.1525924-4-parav@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:16 -04:00
Randy Dunlap	b264e37b35	dccp: add do-while-0 stubs for dccp_pr_debug macros [ Upstream commit 86aab09a4870bb8346c9579864588c3d7f555299 ] GCC complains about empty macros in an 'if' statement, so convert them to 'do {} while (0)' macros. Fixes these build warnings: net/dccp/output.c: In function 'dccp_xmit_packet': ../net/dccp/output.c:283:71: warning: suggest braces around empty body in an 'if' statement [-Wempty-body] 283 \| dccp_pr_debug("transmit_skb() returned err=%d\n", err); net/dccp/ackvec.c: In function 'dccp_ackvec_update_old': ../net/dccp/ackvec.c:163:80: warning: suggest braces around empty body in an 'else' statement [-Wempty-body] 163 \| (unsigned long long)seqno, state); Fixes: dc841e30eaea ("dccp: Extend CCID packet dequeueing interface") Fixes: 380240864451 ("dccp ccid-2: Update code for the Ack Vector input/registration routine") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: dccp@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:16 -04:00
Marek Behún	9112ebc299	cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant [ Upstream commit 484f2b7c61b9ae58cc00c5127bcbcd9177af8dfe ] The 1.2 GHz variant of the Armada 3720 SOC is unstable with DVFS: when the SOC boots, the WTMI firmware sets clocks and AVS values that work correctly with 1.2 GHz CPU frequency, but random crashes occur once cpufreq driver starts scaling. We do not know currently what is the reason: - it may be that the voltage value for L0 for 1.2 GHz variant provided by the vendor in the OTP is simply incorrect when scaling is used, - it may be that some delay is needed somewhere, - it may be something else. The most sane solution now seems to be to simply forbid the cpufreq driver on 1.2 GHz variant. Signed-off-by: Marek Behún <kabel@kernel.org> Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:16 -04:00
Frank Wunderlich	cb9a9d5fe6	iommu: Check if group is NULL before remove device [ Upstream commit 5aa95d8834e07907e64937d792c12ffef7fb271f ] If probe_device is failing, iommu_group is not initialized because iommu_group_add_device is not reached, so freeing it will result in NULL pointer access. iommu_bus_init ->bus_iommu_probe ->probe_iommu_group in for each:/* return -22 in fail case / ->iommu_probe_device ->__iommu_probe_device / return -22 here./ -> ops->probe_device / return -22 here.*/ -> iommu_group_get_for_dev -> ops->device_group -> iommu_group_add_device //good case ->remove_iommu_group //in fail case, it will remove group ->iommu_release_device ->iommu_group_remove_device // here we don't have group In my case ops->probe_device (mtk_iommu_probe_device from mtk_iommu_v1.c) is due to failing fwspec->ops mismatch. Fixes: d72e31c93746 ("iommu: IOMMU Groups") Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Link: https://lore.kernel.org/r/20210731074737.4573-1-linux@fw-web.de Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:15 -04:00
Ole Bjørn Midtbø	911a8141ef	Bluetooth: hidp: use correct wait queue when removing ctrl_wait [ Upstream commit cca342d98bef68151a80b024f7bf5f388d1fbdea ] A different wait queue was used when removing ctrl_wait than when adding it. This effectively made the remove operation without locking compared to other operations on the wait queue ctrl_wait was part of. This caused issues like below where dead000000000100 is LIST_POISON1 and dead000000000200 is LIST_POISON2. list_add corruption. next->prev should be prev (ffffffc1b0a33a08), \ but was dead000000000200. (next=ffffffc03ac77de0). ------------[ cut here ]------------ CPU: 3 PID: 2138 Comm: bluetoothd Tainted: G O 4.4.238+ #9 ... ---[ end trace 0adc2158f0646eac ]--- Call trace: [<ffffffc000443f78>] __list_add+0x38/0xb0 [<ffffffc0000f0d04>] add_wait_queue+0x4c/0x68 [<ffffffc00020eecc>] __pollwait+0xec/0x100 [<ffffffc000d1556c>] bt_sock_poll+0x74/0x200 [<ffffffc000bdb8a8>] sock_poll+0x110/0x128 [<ffffffc000210378>] do_sys_poll+0x220/0x480 [<ffffffc0002106f0>] SyS_poll+0x80/0x138 [<ffffffc00008510c>] __sys_trace_return+0x0/0x4 Unable to handle kernel paging request at virtual address dead000000000100 ... CPU: 4 PID: 5387 Comm: kworker/u15:3 Tainted: G W O 4.4.238+ #9 ... Call trace: [<ffffffc0000f079c>] __wake_up_common+0x7c/0xa8 [<ffffffc0000f0818>] __wake_up+0x50/0x70 [<ffffffc000be11b0>] sock_def_wakeup+0x58/0x60 [<ffffffc000de5e10>] l2cap_sock_teardown_cb+0x200/0x224 [<ffffffc000d3f2ac>] l2cap_chan_del+0xa4/0x298 [<ffffffc000d45ea0>] l2cap_conn_del+0x118/0x198 [<ffffffc000d45f8c>] l2cap_disconn_cfm+0x6c/0x78 [<ffffffc000d29934>] hci_event_packet+0x564/0x2e30 [<ffffffc000d19b0c>] hci_rx_work+0x10c/0x360 [<ffffffc0000c2218>] process_one_work+0x268/0x460 [<ffffffc0000c2678>] worker_thread+0x268/0x480 [<ffffffc0000c94e0>] kthread+0x118/0x128 [<ffffffc000085070>] ret_from_fork+0x10/0x20 ---[ end trace 0adc2158f0646ead ]--- Signed-off-by: Ole Bjørn Midtbø <omidtbo@cisco.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:15 -04:00
Bing Guo	5b14c1f16e	drm/amd/display: Fix Dynamic bpp issue with 8K30 with Navi 1X [ Upstream commit 06050a0f01dbac2ca33145ef19a72041206ea983 ] Why: In DCN2x, HW doesn't automatically divide MASTER_UPDATE_LOCK_DB_X by the number of pipes ODM Combined. How: Set MASTER_UPDATE_LOCK_DB_X to the value that is adjusted by the number of pipes ODM Combined. Reviewed-by: Martin Leung <martin.leung@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Bing Guo <bing.guo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:15 -04:00
Ivan T. Ivanov	f92dc3a89d	net: usb: lan78xx: don't modify phy_device state concurrently [ Upstream commit 6b67d4d63edece1033972214704c04f36c5be89a ] Currently phy_device state could be left in inconsistent state shown by following alert message[1]. This is because phy_read_status could be called concurrently from lan78xx_delayedwork, phy_state_machine and __ethtool_get_link. Fix this by making sure that phy_device state is updated atomically. [1] lan78xx 1-1.1.1:1.0 eth0: No phy led trigger registered for speed(-1) Signed-off-by: Ivan T. Ivanov <iivanov@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:15 -04:00
Sudeep Holla	be70436799	ARM: dts: nomadik: Fix up interrupt controller node names [ Upstream commit 47091f473b364c98207c4def197a0ae386fc9af1 ] Once the new schema interrupt-controller/arm,vic.yaml is added, we get the below warnings: arch/arm/boot/dts/ste-nomadik-nhk15.dt.yaml: intc@10140000: $nodename:0: 'intc@10140000' does not match '^interrupt-controller(@[0-9a-f,]+)*$' Fix the node names for the interrupt controller to conform to the standard node name interrupt-controller@.. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Cc: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20210617210825.3064367-2-sudeep.holla@arm.com Link: https://lore.kernel.org/r/20210626000103.830184-1-linus.walleij@linaro.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:14 -04:00
lijinlin	69aa1a1a56	scsi: core: Fix capacity set to zero after offlinining device [ Upstream commit f0f82e2476f6adb9c7a0135cfab8091456990c99 ] After adding physical volumes to a volume group through vgextend, the kernel will rescan the partitions. This in turn will cause the device capacity to be queried. If the device status is set to offline through sysfs at this time, READ CAPACITY command will return a result which the host byte is DID_NO_CONNECT, and the capacity of the device will be set to zero in read_capacity_error(). After setting device status back to running, the capacity of the device will remain stuck at zero. Fix this issue by rescanning device when the device state changes to SDEV_RUNNING. Link: https://lore.kernel.org/r/20210727034455.1494960-1-lijinlin3@huawei.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: lijinlin <lijinlin3@huawei.com> Signed-off-by: Wu Bo <wubo40@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:14 -04:00
Sreekanth Reddy	935de7ec7a	scsi: core: Avoid printing an error if target_alloc() returns -ENXIO [ Upstream commit 70edd2e6f652f67d854981fd67f9ad0f1deaea92 ] Avoid printing a 'target allocation failed' error if the driver target_alloc() callback function returns -ENXIO. This return value indicates that the corresponding H:C:T:L entry is empty. Removing this error reduces the scan time if the user issues SCAN_WILD_CARD scan operation through sysfs parameter on a host with a lot of empty H:C:T:L entries. Avoiding the printk on -ENXIO matches the behavior of the other callback functions during scanning. Link: https://lore.kernel.org/r/20210726115402.1936-1-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:14 -04:00
Ye Bin	7a721a1e18	scsi: scsi_dh_rdac: Avoid crash during rdac_bus_attach() [ Upstream commit bc546c0c9abb3bb2fb46866b3d1e6ade9695a5f6 ] The following BUG_ON() was observed during RDAC scan: [595952.944297] kernel BUG at drivers/scsi/device_handler/scsi_dh_rdac.c:427! [595952.951143] Internal error: Oops - BUG: 0 [#1] SMP ...... [595953.251065] Call trace: [595953.259054] check_ownership+0xb0/0x118 [595953.269794] rdac_bus_attach+0x1f0/0x4b0 [595953.273787] scsi_dh_handler_attach+0x3c/0xe8 [595953.278211] scsi_dh_add_device+0xc4/0xe8 [595953.282291] scsi_sysfs_add_sdev+0x8c/0x2a8 [595953.286544] scsi_probe_and_add_lun+0x9fc/0xd00 [595953.291142] __scsi_scan_target+0x598/0x630 [595953.295395] scsi_scan_target+0x120/0x130 [595953.299481] fc_user_scan+0x1a0/0x1c0 [scsi_transport_fc] [595953.304944] store_scan+0xb0/0x108 [595953.308420] dev_attr_store+0x44/0x60 [595953.312160] sysfs_kf_write+0x58/0x80 [595953.315893] kernfs_fop_write+0xe8/0x1f0 [595953.319888] __vfs_write+0x60/0x190 [595953.323448] vfs_write+0xac/0x1c0 [595953.326836] ksys_write+0x74/0xf0 [595953.330221] __arm64_sys_write+0x24/0x30 Code is in check_ownership: list_for_each_entry_rcu(tmp, &h->ctlr->dh_list, node) { /* h->sdev should always be valid */ BUG_ON(!tmp->sdev); tmp->sdev->access_state = access_state; } rdac_bus_attach initialize_controller list_add_rcu(&h->node, &h->ctlr->dh_list); h->sdev = sdev; rdac_bus_detach list_del_rcu(&h->node); h->sdev = NULL; Fix the race between rdac_bus_attach() and rdac_bus_detach() where h->sdev is NULL when processing the RDAC attach. Link: https://lore.kernel.org/r/20210113063103.2698953-1-yebin10@huawei.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:14 -04:00
Harshvardhan Jha	9900e06ae6	scsi: megaraid_mm: Fix end of loop tests for list_for_each_entry() [ Upstream commit 77541f78eadfe9fdb018a7b8b69f0f2af2cf4b82 ] The list_for_each_entry() iterator, "adapter" in this code, can never be NULL. If we exit the loop without finding the correct adapter then "adapter" points invalid memory that is an offset from the list head. This will eventually lead to memory corruption and presumably a kernel crash. Link: https://lore.kernel.org/r/20210708074642.23599-1-harshvardhan.jha@oracle.com Acked-by: Sumit Saxena <sumit.saxena@broadcom.com> Signed-off-by: Harshvardhan Jha <harshvardhan.jha@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:14 -04:00
Peter Ujfalusi	e37cf26bd5	dmaengine: of-dma: router_xlate to return -EPROBE_DEFER if controller is not yet available [ Upstream commit eda97cb095f2958bbad55684a6ca3e7d7af0176a ] If the router_xlate can not find the controller in the available DMA devices then it should return with -EPORBE_DEFER in a same way as the of_dma_request_slave_channel() does. The issue can be reproduced if the event router is registered before the DMA controller itself and a driver would request for a channel before the controller is registered. In of_dma_request_slave_channel(): 1. of_dma_find_controller() would find the dma_router 2. ofdma->of_dma_xlate() would fail and returned NULL 3. -ENODEV is returned as error code with this patch we would return in this case the correct -EPROBE_DEFER and the client can try to request the channel later. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@gmail.com> Link: https://lore.kernel.org/r/20210717190021.21897-1-peter.ujfalusi@gmail.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:13 -04:00
Dave Gerlach	12d1322d93	ARM: dts: am43x-epos-evm: Reduce i2c0 bus speed for tps65218 [ Upstream commit 20a6b3fd8e2e2c063b25fbf2ee74d86b898e5087 ] Based on the latest timing specifications for the TPS65218 from the data sheet, http://www.ti.com/lit/ds/symlink/tps65218.pdf, document SLDS206 from November 2014, we must change the i2c bus speed to better fit within the minimum high SCL time required for proper i2c transfer. When running at 400khz, measurements show that SCL spends 0.8125 uS/1.666 uS high/low which violates the requirement for minimum high period of SCL provided in datasheet Table 7.6 which is 1 uS. Switching to 100khz gives us 5 uS/5 uS high/low which both fall above the minimum given values for 100 khz, 4.0 uS/4.7 uS high/low. Without this patch occasionally a voltage set operation from the kernel will appear to have worked but the actual voltage reflected on the PMIC will not have updated, causing problems especially with cpufreq that may update to a higher OPP without actually raising the voltage on DCDC2, leading to a hang. Signed-off-by: Dave Gerlach <d-gerlach@ti.com> Signed-off-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:13 -04:00
Yu Kuai	11145efd29	dmaengine: usb-dmac: Fix PM reference leak in usb_dmac_probe() [ Upstream commit 1da569fa7ec8cb0591c74aa3050d4ea1397778b4 ] pm_runtime_get_sync will increment pm usage counter even it failed. Forgetting to putting operation will result in reference leak here. Fix it by moving the error_pm label above the pm_runtime_put() in the error path. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20210706124521.1371901-1-yukuai3@huawei.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:13 -04:00
Adrian Larumbe	9c97a05392	dmaengine: xilinx_dma: Fix read-after-free bug when terminating transfers [ Upstream commit 7dd2dd4ff9f3abda601f22b9d01441a0869d20d7 ] When user calls dmaengine_terminate_sync, the driver will clean up any remaining descriptors for all the pending or active transfers that had previously been submitted. However, this might happen whilst the tasklet is invoking the DMA callback for the last finished transfer, so by the time it returns and takes over the channel's spinlock, the list of completed descriptors it was traversing is no longer valid. This leads to a read-after-free situation. Fix it by signalling whether a user-triggered termination has happened by means of a boolean variable. Signed-off-by: Adrian Larumbe <adrian.martinezlarumbe@imgtec.com> Link: https://lore.kernel.org/r/20210706234338.7696-3-adrian.martinezlarumbe@imgtec.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:13 -04:00
Alan Stern	fc566b5a21	USB: core: Avoid WARNings for 0-length descriptor requests [ Upstream commit 60dfe484cef45293e631b3a6e8995f1689818172 ] The USB core has utility routines to retrieve various types of descriptors. These routines will now provoke a WARN if they are asked to retrieve 0 bytes (USB "receive" requests must not have zero length), so avert this by checking the size argument at the start. CC: Johan Hovold <johan@kernel.org> Reported-and-tested-by: syzbot+7dbcd9ff34dc4ed45240@syzkaller.appspotmail.com Reviewed-by: Johan Hovold <johan@kernel.org> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/20210607152307.GD1768031@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:13 -04:00
Pavel Skripkin	1bd505c814	media: drivers/media/usb: fix memory leak in zr364xx_probe [ Upstream commit 9c39be40c0155c43343f53e3a439290c0fec5542 ] syzbot reported memory leak in zr364xx_probe()[1]. The problem was in invalid error handling order. All error conditions rigth after v4l2_ctrl_handler_init() must call v4l2_ctrl_handler_free(). Reported-by: syzbot+efe9aefc31ae1e6f7675@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:12 -04:00
Dan Carpenter	705660a6d9	media: zr364xx: fix memory leaks in probe() [ Upstream commit ea354b6ddd6f09be29424f41fa75a3e637fea234 ] Syzbot discovered that the probe error handling doesn't clean up the resources allocated in zr364xx_board_init(). There are several related bugs in this code so I have re-written the error handling. 1) Introduce a new function zr364xx_board_uninit() which cleans up the resources in zr364xx_board_init(). 2) In zr364xx_board_init() if the call to zr364xx_start_readpipe() fails then release the "cam->buffer.frame[i].lpvbits" memory before returning. This way every function either allocates everything successfully or it cleans up after itself. 3) Re-write the probe function so that each failure path goto frees the most recent allocation. That way we don't free anything before it has been allocated and we can also verify that everything is freed. 4) Originally, in the probe function the "cam->v4l2_dev.release" pointer was set to "zr364xx_release" near the start but I moved that assignment to the end, after everything had succeeded. The release function was never actually called during the probe cleanup process, but with this change I wanted to make it clear that we don't want to call zr364xx_release() until everything is allocated successfully. Next I re-wrote the zr364xx_release() function. Ideally this would have been a simple matter of copy and pasting the cleanup code from probe and adding an additional call to video_unregister_device(). But there are a couple quirks to note. 1) The probe function does not call videobuf_mmap_free() and I don't know where the videobuf_mmap is allocated. I left the code as-is to avoid introducing a bug in code I don't understand. 2) The zr364xx_board_uninit() has a call to zr364xx_stop_readpipe() which is a change from the original behavior with regards to unloading the driver. Calling zr364xx_stop_readpipe() on a stopped pipe is not a problem so this is safe and is potentially a bugfix. Reported-by: syzbot+b4d54814b339b5c6bbd4@syzkaller.appspotmail.com Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:12 -04:00
Evgeny Novikov	79dff2a3f4	media: zr364xx: propagate errors from zr364xx_start_readpipe() [ Upstream commit af0321a5be3e5647441eb6b79355beaa592df97a ] zr364xx_start_readpipe() can fail but callers do not care about that. This can result in various negative consequences. The patch adds missed error handling. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Evgeny Novikov <novikov@ispras.ru> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:36:12 -04:00
Andreas Persson	7305d6d407	mtd: cfi_cmdset_0002: fix crash when erasing/writing AMD cards commit 2394e628738933aa014093d93093030f6232946d upstream. Erasing an AMD linear flash card (AM29F016D) crashes after the first sector has been erased. Likewise, writing to it crashes after two bytes have been written. The reason is a missing check for a null pointer - the cmdset_priv field is not set for this type of card. Fixes: 4844ef80305d ("mtd: cfi_cmdset_0002: Add support for polling status register") Signed-off-by: Andreas Persson <andreasp56@outlook.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/DB6P189MB05830B3530B8087476C5CFE4C1159@DB6P189MB0583.EURP189.PROD.OUTLOOK.COM Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-26 08:36:12 -04:00
Jouni Malinen	23f77ad13f	ath9k: Postpone key cache entry deletion for TXQ frames reference it commit ca2848022c12789685d3fab3227df02b863f9696 upstream. Do not delete a key cache entry that is still being referenced by pending frames in TXQs. This avoids reuse of the key cache entry while a frame might still be transmitted using it. To avoid having to do any additional operations during the main TX path operations, track pending key cache entries in a new bitmap and check whether any pending entries can be deleted before every new key add/remove operation. Also clear any remaining entries when stopping the interface. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20201214172118.18100-6-jouni@codeaurora.org Cc: Pali Rohár <pali@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-26 08:36:12 -04:00
Jouni Malinen	c6feaf806d	ath: Modify ath_key_delete() to not need full key entry commit 144cd24dbc36650a51f7fe3bf1424a1432f1f480 upstream. tkip_keymap can be used internally to avoid the reference to key->cipher and with this, only the key index value itself is needed. This allows ath_key_delete() call to be postponed to be handled after the upper layer STA and key entry have already been removed. This is needed to make ath9k key cache management safer. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20201214172118.18100-5-jouni@codeaurora.org Cc: Pali Rohár <pali@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-26 08:36:11 -04:00
Jouni Malinen	b7d593705e	ath: Export ath_hw_keysetmac() commit d2d3e36498dd8e0c83ea99861fac5cf9e8671226 upstream. ath9k is going to use this for safer management of key cache entries. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20201214172118.18100-4-jouni@codeaurora.org Cc: Pali Rohár <pali@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-26 08:36:11 -04:00
Jouni Malinen	add283e251	ath9k: Clear key cache explicitly on disabling hardware commit 73488cb2fa3bb1ef9f6cf0d757f76958bd4deaca upstream. Now that ath/key.c may not be explicitly clearing keys from the key cache, clear all key cache entries when disabling hardware to make sure no keys are left behind beyond this point. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20201214172118.18100-3-jouni@codeaurora.org Cc: Pali Rohár <pali@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-26 08:36:11 -04:00
Jouni Malinen	0c049ce432	ath: Use safer key clearing with key cache entries commit 56c5485c9e444c2e85e11694b6c44f1338fc20fd upstream. It is possible for there to be pending frames in TXQs with a reference to the key cache entry that is being deleted. If such a key cache entry is cleared, those pending frame in TXQ might get transmitted without proper encryption. It is safer to leave the previously used key into the key cache in such cases. Instead, only clear the MAC address to prevent RX processing from using this key cache entry. This is needed in particularly in AP mode where the TXQs cannot be flushed on station disconnection. This change alone may not be able to address all cases where the key cache entry might get reused for other purposes immediately (the key cache entry should be released for reuse only once the TXQs do not have any remaining references to them), but this makes it less likely to get unprotected frames and the more complete changes may end up being significantly more complex. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20201214172118.18100-2-jouni@codeaurora.org Cc: Pali Rohár <pali@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-26 08:36:11 -04:00
Thomas Gleixner	172b91bbbb	x86/fpu: Make init_fpstate correct with optimized XSAVE commit f9dfb5e390fab2df9f7944bb91e7705aba14cd26 upstream. The XSAVE init code initializes all enabled and supported components with XRSTOR(S) to init state. Then it XSAVEs the state of the components back into init_fpstate which is used in several places to fill in the init state of components. This works correctly with XSAVE, but not with XSAVEOPT and XSAVES because those use the init optimization and skip writing state of components which are in init state. So init_fpstate.xsave still contains all zeroes after this operation. There are two ways to solve that: 1) Use XSAVE unconditionally, but that requires to reshuffle the buffer when XSAVES is enabled because XSAVES uses compacted format. 2) Save the components which are known to have a non-zero init state by other means. Looking deeper, #2 is the right thing to do because all components the kernel supports have all-zeroes init state except the legacy features (FP, SSE). Those cannot be hard coded because the states are not identical on all CPUs, but they can be saved with FXSAVE which avoids all conditionals. Use FXSAVE to save the legacy FP/SSE components in init_fpstate along with a BUILD_BUG_ON() which reminds developers to validate that a newly added component has all zeroes init state. As a bonus remove the now unused copy_xregs_to_kernel_booting() crutch. The XSAVE and reshuffle method can still be implemented in the unlikely case that components are added which have a non-zero init state and no other means to save them. For now, FXSAVE is just simple and good enough. [ bp: Fix a typo or two in the text. ] Fixes: 6bad06b76892 ("x86, xsave: Use xsaveopt in context-switch path when supported") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210618143444.587311343@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-26 08:36:11 -04:00
Ritesh Harjani	81d152c8da	ext4: fix EXT4_MAX_LOGICAL_BLOCK macro commit 175efa81feb8405676e0136d97b10380179c92e0 upstream. ext4 supports max number of logical blocks in a file to be 0xffffffff. (This is since ext4_extent's ee_block is __le32). This means that EXT4_MAX_LOGICAL_BLOCK should be 0xfffffffe (starting from 0 logical offset). This patch fixes this. The issue was seen when ext4 moved to iomap_fiemap API and when overlayfs was mounted on top of ext4. Since overlayfs was missing filemap_check_ranges(), so it could pass a arbitrary huge length which lead to overflow of map.m_len logic. This patch fixes that. Fixes: d3b6f23f7167 ("ext4: move ext4_fiemap to use iomap framework") Reported-by: syzbot+77fa5bdb65cc39711820@syzkaller.appspotmail.com Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20200505154324.3226743-2-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: George Kennedy <george.kennedy@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-26 08:36:11 -04:00
Greg Kroah-Hartman	c15b830f7c	Linux 5.4.142 Link: https://lore.kernel.org/r/20210816125428.198692661@linuxfoundation.org Link: https://lore.kernel.org/r/20210816171405.410986560@linuxfoundation.org Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Hulk Robot <hulkrobot@huawei.com> Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> v5.4.142	2021-08-18 08:57:05 +02:00
Maxim Levitsky	a17f2f2c89	KVM: nSVM: always intercept VMLOAD/VMSAVE when nested (CVE-2021-3656) commit c7dfa4009965a9b2d7b329ee970eb8da0d32f0bc upstream. If L1 disables VMLOAD/VMSAVE intercepts, and doesn't enable Virtual VMLOAD/VMSAVE (currently not supported for the nested hypervisor), then VMLOAD/VMSAVE must operate on the L1 physical memory, which is only possible by making L0 intercept these instructions. Failure to do so allowed the nested guest to run VMLOAD/VMSAVE unintercepted, and thus read/write portions of the host physical memory. Fixes: 89c8a4984fc9 ("KVM: SVM: Enable Virtual VMLOAD VMSAVE feature") Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-18 08:57:04 +02:00
Maxim Levitsky	7c1c96ffb6	KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl (CVE-2021-3653) commit 0f923e07124df069ba68d8bb12324398f4b6b709 upstream. * Invert the mask of bits that we pick from L2 in nested_vmcb02_prepare_control * Invert and explicitly use VIRQ related bits bitmask in svm_clear_vintr This fixes a security issue that allowed a malicious L1 to run L2 with AVIC enabled, which allowed the L2 to exploit the uninitialized and enabled AVIC to read/write the host physical memory at some offsets. Fixes: 3d6368ef580a ("KVM: SVM: Add VMRUN handler") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-18 08:57:04 +02:00
Saeed Mirzamohammadi	456fd88922	iommu/vt-d: Fix agaw for a supported 48 bit guest address width [ Upstream commit 327d5b2fee91c404a3956c324193892cf2cc9528 ] The IOMMU driver calculates the guest addressability for a DMA request based on the value of the mgaw reported from the IOMMU. However, this is a fused value and as mentioned in the spec, the guest width should be calculated based on the minimum of supported adjusted guest address width (SAGAW) and MGAW. This is from specification: "Guest addressability for a given DMA request is limited to the minimum of the value reported through this field and the adjusted guest address width of the corresponding page-table structure. (Adjusted guest address widths supported by hardware are reported through the SAGAW field)." This causes domain initialization to fail and following errors appear for EHCI PCI driver: [ 2.486393] ehci-pci 0000:01:00.4: EHCI Host Controller [ 2.486624] ehci-pci 0000:01:00.4: new USB bus registered, assigned bus number 1 [ 2.489127] ehci-pci 0000:01:00.4: DMAR: Allocating domain failed [ 2.489350] ehci-pci 0000:01:00.4: DMAR: 32bit DMA uses non-identity mapping [ 2.489359] ehci-pci 0000:01:00.4: can't setup: -12 [ 2.489531] ehci-pci 0000:01:00.4: USB bus 1 deregistered [ 2.490023] ehci-pci 0000:01:00.4: init 0000:01:00.4 fail, -12 [ 2.490358] ehci-pci: probe of 0000:01:00.4 failed with error -12 This issue happens when the value of the sagaw corresponds to a 48-bit agaw. This fix updates the calculation of the agaw based on the minimum of IOMMU's sagaw value and MGAW. This issue happens on the code path of getting a private domain for a device. A private domain was needed when the domain of an iommu group couldn't meet the requirement of a device. The IOMMU core has been evolved to eliminate the need for private domain, hence this code path has alreay been removed from the upstream since commit 327d5b2fee91c ("iommu/vt-d: Allow 32bit devices to uses DMA domain"). Instead of back porting all patches that are required for removing the private domain, this simply fixes it in the affected stable kernel between v4.16 and v5.7. [baolu: The orignal patch could be found here https://lore.kernel.org/linux-iommu/20210412202736.70765-1-saeed.mirzamohammadi@oracle.com/. I added commit message according to Greg's comments at https://lore.kernel.org/linux-iommu/YHZ%2FT9x7Xjf1r6fI@kroah.com/.] Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: stable@vger.kernel.org #v4.16+ Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com> Tested-by: Camille Lu <camille.lu@hpe.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-18 08:57:04 +02:00
Nathan Chancellor	5b5f855a79	vmlinux.lds.h: Handle clang's module.{c,d}tor sections commit 848378812e40152abe9b9baf58ce2004f76fb988 upstream. A recent change in LLVM causes module_{c,d}tor sections to appear when CONFIG_K{A,C}SAN are enabled, which results in orphan section warnings because these are not handled anywhere: ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_ctor) is being placed in '.text.asan.module_ctor' ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_dtor) is being placed in '.text.asan.module_dtor' ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.tsan.module_ctor) is being placed in '.text.tsan.module_ctor' Fangrui explains: "the function asan.module_ctor has the SHF_GNU_RETAIN flag, so it is in a separate section even with -fno-function-sections (default)". Place them in the TEXT_TEXT section so that these technologies continue to work with the newer compiler versions. All of the KASAN and KCSAN KUnit tests continue to pass after this change. Cc: stable@vger.kernel.org Link: https://github.com/ClangBuiltLinux/linux/issues/1432 Link: `7b78956224` Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Fangrui Song <maskray@google.com> Acked-by: Marco Elver <elver@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210731023107.1932981-1-nathan@kernel.org [nc: Resolve conflict due to lack of cf68fffb66d60] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-18 08:57:04 +02:00
Jeff Layton	e9b2b2b29c	ceph: take snap_empty_lock atomically with snaprealm refcount change commit 8434ffe71c874b9c4e184b88d25de98c2bf5fe3f upstream. There is a race in ceph_put_snap_realm. The change to the nref and the spinlock acquisition are not done atomically, so you could decrement nref, and before you take the spinlock, the nref is incremented again. At that point, you end up putting it on the empty list when it shouldn't be there. Eventually __cleanup_empty_realms runs and frees it when it's still in-use. Fix this by protecting the 1->0 transition with atomic_dec_and_lock, and just drop the spinlock if we can get the rwsem. Because these objects can also undergo a 0->1 refcount transition, we must protect that change as well with the spinlock. Increment locklessly unless the value is at 0, in which case we take the spinlock, increment and then take it off the empty list if it did the 0->1 transition. With these changes, I'm removing the dout() messages from these functions, as well as in __put_snap_realm. They've always been racy, and it's better to not print values that may be misleading. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/46419 Reported-by: Mark Nelson <mnelson@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-18 08:57:04 +02:00
Jeff Layton	95ff775df6	ceph: clean up locking annotation for ceph_get_snap_realm and __lookup_snap_realm commit df2c0cb7f8e8c83e495260ad86df8c5da947f2a7 upstream. They both say that the snap_rwsem must be held for write, but I don't see any real reason for it, and it's not currently always called that way. The lookup is just walking the rbtree, so holding it for read should be fine there. The "get" is bumping the refcount and (possibly) removing it from the empty list. I see no need to hold the snap_rwsem for write for that. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-18 08:57:04 +02:00
Jeff Layton	1d8c232afb	ceph: add some lockdep assertions around snaprealm handling commit a6862e6708c15995bc10614b2ef34ca35b4b9078 upstream. Turn some comments into lockdep asserts. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-18 08:57:04 +02:00
Sean Christopherson	a6ff0f3f9f	KVM: VMX: Use current VMCS to query WAITPKG support for MSR emulation commit 7b9cae027ba3aaac295ae23a62f47876ed97da73 upstream. Use the secondary_exec_controls_get() accessor in vmx_has_waitpkg() to effectively get the controls for the current VMCS, as opposed to using vmx->secondary_exec_controls, which is the cached value of KVM's desired controls for vmcs01 and truly not reflective of any particular VMCS. While the waitpkg control is not dynamic, i.e. vmcs01 will always hold the same waitpkg configuration as vmx->secondary_exec_controls, the same does not hold true for vmcs02 if the L1 VMM hides the feature from L2. If L1 hides the feature _and_ does not intercept MSR_IA32_UMWAIT_CONTROL, L2 could incorrectly read/write L1's virtual MSR instead of taking a #GP. Fixes: 6e3ba4abcea5 ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210810171952.2758100-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-18 08:57:03 +02:00
Thomas Gleixner	ec25d05e18	PCI/MSI: Protect msi_desc::masked for multi-MSI commit 77e89afc25f30abd56e76a809ee2884d7c1b63ce upstream. Multi-MSI uses a single MSI descriptor and there is a single mask register when the device supports per vector masking. To avoid reading back the mask register the value is cached in the MSI descriptor and updates are done by clearing and setting bits in the cache and writing it to the device. But nothing protects msi_desc::masked and the mask register from being modified concurrently on two different CPUs for two different Linux interrupts which belong to the same multi-MSI descriptor. Add a lock to struct device and protect any operation on the mask and the mask register with it. This makes the update of msi_desc::masked unconditional, but there is no place which requires a modification of the hardware register without updating the masked cache. msi_mask_irq() is now an empty wrapper which will be cleaned up in follow up changes. The problem goes way back to the initial support of multi-MSI, but picking the commit which introduced the mask cache is a valid cut off point (2.6.30). Fixes: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.726833414@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-18 08:57:03 +02:00
Thomas Gleixner	48d2439c6f	PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown() commit d28d4ad2a1aef27458b3383725bb179beb8d015c upstream. No point in using the raw write function from shutdown. Preparatory change to introduce proper serialization for the msi_desc::masked cache. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.674391354@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-18 08:57:03 +02:00
Thomas Gleixner	386ead1d35	PCI/MSI: Correct misleading comments commit 689e6b5351573c38ccf92a0dd8b3e2c2241e4aff upstream. The comments about preserving the cached state in pci_msi[x]_shutdown() are misleading as the MSI descriptors are freed right after those functions return. So there is nothing to restore. Preparatory change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.621609423@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-18 08:57:03 +02:00
Thomas Gleixner	76d81dec16	PCI/MSI: Do not set invalid bits in MSI mask commit 361fd37397f77578735907341579397d5bed0a2d upstream. msi_mask_irq() takes a mask and a flags argument. The mask argument is used to mask out bits from the cached mask and the flags argument to set bits. Some places invoke it with a flags argument which sets bits which are not used by the device, i.e. when the device supports up to 8 vectors a full unmask in some places sets the mask to 0xFFFFFF00. While devices probably do not care, it's still bad practice. Fixes: 7ba1930db02f ("PCI MSI: Unmask MSI if setup failed") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.568173099@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-18 08:57:03 +02:00
Thomas Gleixner	6b4bcbf133	PCI/MSI: Enforce MSI[X] entry updates to be visible commit b9255a7cb51754e8d2645b65dd31805e282b4f3e upstream. Nothing enforces the posted writes to be visible when the function returns. Flush them even if the flush might be redundant when the entry is masked already as the unmask will flush as well. This is either setup or a rare affinity change event so the extra flush is not the end of the world. While this is more a theoretical issue especially the logic in the X86 specific msi_set_affinity() function relies on the assumption that the update has reached the hardware when the function returns. Again, as this never has been enforced the Fixes tag refers to a commit in: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.515188147@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-18 08:57:03 +02:00
Thomas Gleixner	4495a41fbc	PCI/MSI: Enforce that MSI-X table entry is masked for update commit da181dc974ad667579baece33c2c8d2d1e4558d5 upstream. The specification (PCIe r5.0, sec 6.1.4.5) states: For MSI-X, a function is permitted to cache Address and Data values from unmasked MSI-X Table entries. However, anytime software unmasks a currently masked MSI-X Table entry either by clearing its Mask bit or by clearing the Function Mask bit, the function must update any Address or Data values that it cached from that entry. If software changes the Address or Data value of an entry while the entry is unmasked, the result is undefined. The Linux kernel's MSI-X support never enforced that the entry is masked before the entry is modified hence the Fixes tag refers to a commit in: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Enforce the entry to be masked across the update. There is no point in enforcing this to be handled at all possible call sites as this is just pointless code duplication and the common update function is the obvious place to enforce this. Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support") Reported-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.462096385@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-18 08:57:03 +02:00
Thomas Gleixner	1866c8f6d4	PCI/MSI: Mask all unused MSI-X entries commit 7d5ec3d3612396dc6d4b76366d20ab9fc06f399f upstream. When MSI-X is enabled the ordering of calls is: msix_map_region(); msix_setup_entries(); pci_msi_setup_msi_irqs(); msix_program_entries(); This has a few interesting issues: 1) msix_setup_entries() allocates the MSI descriptors and initializes them except for the msi_desc:masked member which is left zero initialized. 2) pci_msi_setup_msi_irqs() allocates the interrupt descriptors and sets up the MSI interrupts which ends up in pci_write_msi_msg() unless the interrupt chip provides its own irq_write_msi_msg() function. 3) msix_program_entries() does not do what the name suggests. It solely updates the entries array (if not NULL) and initializes the masked member for each MSI descriptor by reading the hardware state and then masks the entry. Obviously this has some issues: 1) The uninitialized masked member of msi_desc prevents the enforcement of masking the entry in pci_write_msi_msg() depending on the cached masked bit. Aside of that half initialized data is a NONO in general 2) msix_program_entries() only ensures that the actually allocated entries are masked. This is wrong as experimentation with crash testing and crash kernel kexec has shown. This limited testing unearthed that when the production kernel had more entries in use and unmasked when it crashed and the crash kernel allocated a smaller amount of entries, then a full scan of all entries found unmasked entries which were in use in the production kernel. This is obviously a device or emulation issue as the device reset should mask all MSI-X table entries, but obviously that's just part of the paper specification. Cure this by: 1) Masking all table entries in hardware 2) Initializing msi_desc::masked in msix_setup_entries() 3) Removing the mask dance in msix_program_entries() 4) Renaming msix_program_entries() to msix_update_entries() to reflect the purpose of that function. As the masking of unused entries has never been done the Fixes tag refers to a commit in: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.403833459@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-18 08:57:02 +02:00
Thomas Gleixner	3b4220c2bf	PCI/MSI: Enable and mask MSI-X early commit 438553958ba19296663c6d6583d208dfb6792830 upstream. The ordering of MSI-X enable in hardware is dysfunctional: 1) MSI-X is disabled in the control register 2) Various setup functions 3) pci_msi_setup_msi_irqs() is invoked which ends up accessing the MSI-X table entries 4) MSI-X is enabled and masked in the control register with the comment that enabling is required for some hardware to access the MSI-X table Step #4 obviously contradicts #3. The history of this is an issue with the NIU hardware. When #4 was introduced the table access actually happened in msix_program_entries() which was invoked after enabling and masking MSI-X. This was changed in commit d71d6432e105 ("PCI/MSI: Kill redundant call of irq_set_msi_desc() for MSI-X interrupts") which removed the table write from msix_program_entries(). Interestingly enough nobody noticed and either NIU still works or it did not get any testing with a kernel 3.19 or later. Nevertheless this is inconsistent and there is no reason why MSI-X can't be enabled and masked in the control register early on, i.e. move step #4 above to step #1. This preserves the NIU workaround and has no side effects on other hardware. Fixes: d71d6432e105 ("PCI/MSI: Kill redundant call of irq_set_msi_desc() for MSI-X interrupts") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.344136412@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-18 08:57:02 +02:00
Ben Dai	0c8dea3fd5	genirq/timings: Prevent potential array overflow in __irq_timings_store() commit b9cc7d8a4656a6e815852c27ab50365009cb69c1 upstream. When the interrupt interval is greater than 2 ^ PREDICTION_BUFFER_SIZE * PREDICTION_FACTOR us and less than 1s, the calculated index will be greater than the length of irqs->ema_time[]. Check the calculated index before using it to prevent array overflow. Fixes: 23aa3b9a6b7d ("genirq/timings: Encapsulate storing function") Signed-off-by: Ben Dai <ben.dai@unisoc.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210425150903.25456-1-ben.dai9703@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-18 08:57:02 +02:00
Bixuan Cui	4dfe809271	genirq/msi: Ensure deactivation on teardown commit dbbc93576e03fbe24b365fab0e901eb442237a8a upstream. msi_domain_alloc_irqs() invokes irq_domain_activate_irq(), but msi_domain_free_irqs() does not enforce deactivation before tearing down the interrupts. This happens when PCI/MSI interrupts are set up and never used before being torn down again, e.g. in error handling pathes. The only place which cleans that up is the error handling path in msi_domain_alloc_irqs(). Move the cleanup from msi_domain_alloc_irqs() into msi_domain_free_irqs() to cure that. Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early") Signed-off-by: Bixuan Cui <cuibixuan@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210518033117.78104-1-cuibixuan@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-18 08:57:02 +02:00

1 2 3 4 5 ...

888294 Commits