linux

iv/linux

Author	SHA1	Message	Date
Fedor Pchelkin	cd9c790de2	can: j1939: change j1939_netdev_lock type to mutex It turns out access to j1939_can_rx_register() needs to be serialized, otherwise j1939_priv can be corrupted when parallel threads call j1939_netdev_start() and j1939_can_rx_register() fails. This issue is thoroughly covered in other commit which serializes access to j1939_can_rx_register(). Change j1939_netdev_lock type to mutex so that we do not need to remove GFP_KERNEL from can_rx_register(). j1939_netdev_lock seems to be used in normal contexts where mutex usage is not prohibited. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: `9d71dd0c70` ("can: add support of SAE J1939 protocol") Suggested-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20230526171910.227615-2-pchelkin@ispras.ru Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-06-05 08:26:31 +02:00
Oleksij Rempel	2a84aea80e	can: j1939: j1939_sk_send_loop_abort(): improved error queue handling in J1939 Socket This patch addresses an issue within the j1939_sk_send_loop_abort() function in the j1939/socket.c file, specifically in the context of Transport Protocol (TP) sessions. Without this patch, when a TP session is initiated and a Clear To Send (CTS) frame is received from the remote side requesting one data packet, the kernel dispatches the first Data Transport (DT) frame and then waits for the next CTS. If the remote side doesn't respond with another CTS, the kernel aborts due to a timeout. This leads to the user-space receiving an EPOLLERR on the socket, and the socket becomes active. However, when trying to read the error queue from the socket with sock.recvmsg(, , socket.MSG_ERRQUEUE), it returns -EAGAIN, given that the socket is non-blocking. This situation results in an infinite loop: the user-space repeatedly calls epoll(), epoll() returns the socket file descriptor with EPOLLERR, but the socket then blocks on the recv() of ERRQUEUE. This patch introduces an additional check for the J1939_SOCK_ERRQUEUE flag within the j1939_sk_send_loop_abort() function. If the flag is set, it indicates that the application has subscribed to receive error queue messages. In such cases, the kernel can communicate the current transfer state via the error queue. This allows for the function to return early, preventing the unnecessary setting of the socket into an error state, and breaking the infinite loop. It is crucial to note that a socket error is only needed if the application isn't using the error queue, as, without it, the application wouldn't be aware of transfer issues. Fixes: `9d71dd0c70` ("can: add support of SAE J1939 protocol") Reported-by: David Jander <david@protonic.nl> Tested-by: David Jander <david@protonic.nl> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20230526081946.715190-1-o.rempel@pengutronix.de Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-06-05 08:20:36 +02:00
Dave Chinner	d4d12c02bf	xfs: collect errors from inodegc for unlinked inode recovery Unlinked list recovery requires errors removing the inode the from the unlinked list get fed back to the main recovery loop. Now that we offload the unlinking to the inodegc work, we don't get errors being fed back when we trip over a corruption that prevents the inode from being removed from the unlinked list. This means we never clear the corrupt unlinked list bucket, resulting in runtime operations eventually tripping over it and shutting down. Fix this by collecting inodegc worker errors and feed them back to the flush caller. This is largely best effort - the only context that really cares is log recovery, and it only flushes a single inode at a time so we don't need complex synchronised handling. Essentially the inodegc workers will capture the first error that occurs and the next flush will gather them and clear them. The flush itself will only report the first gathered error. In the cases where callers can return errors, propagate the collected inodegc flush error up the error handling chain. In the case of inode unlinked list recovery, there are several superfluous calls to flush queued unlinked inodes - xlog_recover_iunlink_bucket() guarantees that it has flushed the inodegc and collected errors before it returns. Hence nothing in the calling path needs to run a flush, even when an error is returned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>	2023-06-05 14:48:15 +10:00
Dave Chinner	7dfee17b13	xfs: validate block number being freed before adding to xefi Bad things happen in defered extent freeing operations if it is passed a bad block number in the xefi. This can come from a bogus agno/agbno pair from deferred agfl freeing, or just a bad fsbno being passed to __xfs_free_extent_later(). Either way, it's very difficult to diagnose where a null perag oops in EFI creation is coming from when the operation that queued the xefi has already been completed and there's no longer any trace of it around.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>	2023-06-05 14:48:15 +10:00
Dave Chinner	3148ebf2c0	xfs: validity check agbnos on the AGFL If the agfl or the indexing in the AGF has been corrupted, getting a block form the AGFL could return an invalid block number. If this happens, bad things happen. Check the agbno we pull off the AGFL and return -EFSCORRUPTED if we find somethign bad. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>	2023-06-05 14:48:15 +10:00
Dave Chinner	e0a8de7da3	xfs: fix agf/agfl verification on v4 filesystems When a v4 filesystem has fl_last - fl_first != fl_count, we do not not detect the corruption and allow the AGF to be used as it if was fully valid. On V5 filesystems, we reset the AGFL to empty in these cases and avoid the corruption at a small cost of leaked blocks. If we don't catch the corruption on V4 filesystems, bad things happen later when an allocation attempts to trim the free list and either double-frees stale entries in the AGFl or tries to free NULLAGBNO entries. Either way, this is bad. Prevent this from happening by using the AGFL_NEED_RESET logic for v4 filesysetms, too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>	2023-06-05 14:48:15 +10:00
Dave Chinner	1e473279f4	xfs: fix double xfs_perag_rele() in xfs_filestream_pick_ag() xfs_bmap_longest_free_extent() can return an error when accessing the AGF fails. In this case, the behaviour of xfs_filestream_pick_ag() is conditional on the error. We may continue the loop, or break out of it. The error handling after the loop cleans up the perag reference held when the break occurs. If we continue, the next loop iteration handles cleaning up the perag reference. EIther way, we don't need to release the active perag reference when xfs_bmap_longest_free_extent() fails. Doing so means we do a double decrement on the active reference count, and this causes tha active reference count to fall to zero. At this point, new active references will fail. This leads to unmount hanging because it tries to grab active references to that perag, only for it to fail. This happens inside a loop that retries until a inode tree radix tree tag is cleared, which cannot happen because we can't get an active reference to the perag. The unmount livelocks in this path: xfs_reclaim_inodes+0x80/0xc0 xfs_unmount_flush_inodes+0x5b/0x70 xfs_unmountfs+0x5b/0x1a0 xfs_fs_put_super+0x49/0x110 generic_shutdown_super+0x7c/0x1a0 kill_block_super+0x27/0x50 deactivate_locked_super+0x30/0x90 deactivate_super+0x3c/0x50 cleanup_mnt+0xc2/0x160 __cleanup_mnt+0x12/0x20 task_work_run+0x5e/0xa0 exit_to_user_mode_prepare+0x1bc/0x1c0 syscall_exit_to_user_mode+0x16/0x40 do_syscall_64+0x40/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Reported-by: Pengfei Xu <pengfei.xu@intel.com> Fixes: `eb70aa2d8e` ("xfs: use for_each_perag_wrap in xfs_filestream_pick_ag") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>	2023-06-05 14:48:15 +10:00
Darrick J. Wong	6be73cecb5	xfs: fix broken logic when detecting mergeable bmap records Commit 6bc6c99a944c was a well-intentioned effort to initiate consolidation of adjacent bmbt mapping records by setting the PREEN flag. Consolidation can only happen if the length of the combined record doesn't overflow the 21-bit blockcount field of the bmbt recordset. Unfortunately, the length test is inverted, leading to it triggering on data forks like these: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL 0: [0..16777207]: 76110848..92888055 0 (76110848..92888055) 16777208 1: [16777208..20639743]: 92888056..96750591 0 (92888056..96750591) `3862536` Note that record 0 has a length of 16777208 512b blocks. This corresponds to 2097151 4k fsblocks, which is the maximum. Hence the two records cannot be merged. However, the logic is still wrong even if we change the in-loop comparison, because the scope of our examination isn't broad enough inside the loop to detect mappings like this: 0: [0..9]: 76110838..76110847 0 (76110838..76110847) 10 1: [10..16777217]: 76110848..92888055 0 (76110848..92888055) 16777208 2: [16777218..20639753]: 92888056..96750591 0 (92888056..96750591) `3862536` These three records could be merged into two, but one cannot determine this purely from looking at records 0-1 or 1-2 in isolation. Hoist the mergability detection outside the loop, and base its decision making on whether or not a merged mapping could be expressed in fewer bmbt records. While we're at it, fix the incorrect return type of the iter function. Fixes: `336642f792` ("xfs: alert the user about data/attr fork mappings that could be merged") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2023-06-05 14:48:12 +10:00
Geert Uytterhoeven	4320f34666	xfs: Fix undefined behavior of shift into sign bit With gcc-5: In file included from ./include/trace/define_trace.h:102:0, from ./fs/xfs/scrub/trace.h:988, from fs/xfs/scrub/trace.c:40: ./fs/xfs/./scrub/trace.h: In function ‘trace_raw_output_xchk_fsgate_class’: ./fs/xfs/scrub/scrub.h:111:28: error: initializer element is not constant #define XREP_ALREADY_FIXED (1 << 31) /* checking our repair work */ ^ Shifting the (signed) value 1 into the sign bit is undefined behavior. Fix this for all definitions in the file by shifting "1U" instead of "1". This was exposed by the first user added in commit `466c525d6d` ("xfs: minimize overhead of drain wakeups by using jump labels"). Fixes: `160b5a7845` ("xfs: hoist the already_fixed variable to the scrub context") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>	2023-06-05 04:09:27 +10:00
Dave Chinner	82842fee6e	xfs: fix AGF vs inode cluster buffer deadlock Lock order in XFS is AGI -> AGF, hence for operations involving inode unlinked list operations we always lock the AGI first. Inode unlinked list operations operate on the inode cluster buffer, so the lock order there is AGI -> inode cluster buffer. For O_TMPFILE operations, this now means the lock order set down in xfs_rename and xfs_link is AGI -> inode cluster buffer -> AGF as the unlinked ops are done before the directory modifications that may allocate space and lock the AGF. Unfortunately, we also now lock the inode cluster buffer when logging an inode so that we can attach the inode to the cluster buffer and pin it in memory. This creates a lock order of AGF -> inode cluster buffer in directory operations as we have to log the inode after we've allocated new space for it. This creates a lock inversion between the AGF and the inode cluster buffer. Because the inode cluster buffer is shared across multiple inodes, the inversion is not specific to individual inodes but can occur when inodes in the same cluster buffer are accessed in different orders. To fix this we need move all the inode log item cluster buffer interactions to the end of the current transaction. Unfortunately, xfs_trans_log_inode() calls are littered throughout the transactions with no thought to ordering against other items or locking. This makes it difficult to do anything that involves changing the call sites of xfs_trans_log_inode() to change locking orders. However, we do now have a mechanism that allows is to postpone dirty item processing to just before we commit the transaction: the ->iop_precommit method. This will be called after all the modifications are done and high level objects like AGI and AGF buffers have been locked and modified, thereby providing a mechanism that guarantees we don't lock the inode cluster buffer before those high level objects are locked. This change is largely moving the guts of xfs_trans_log_inode() to xfs_inode_item_precommit() and providing an extra flag context in the inode log item to track the dirty state of the inode in the current transaction. This also means we do a lot less repeated work in xfs_trans_log_inode() by only doing it once per transaction when all the work is done. Fixes: `298f7bec50` ("xfs: pin inode backing buffer to the inode log item") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>	2023-06-05 04:08:27 +10:00
Dave Chinner	cb04211748	xfs: defered work could create precommits To fix a AGI-AGF-inode cluster buffer deadlock, we need to move inode cluster buffer operations to the ->iop_precommit() method. However, this means that deferred operations can require precommits to be run on the final transaction that the deferred ops pass back to xfs_trans_commit() context. This will be exposed by attribute handling, in that the last changes to the inode in the attr set state machine "disappear" because the precommit operation is not run. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2023-06-05 04:07:27 +10:00
Dave Chinner	00dcd17cfa	xfs: restore allocation trylock iteration It was accidentally dropped when refactoring the allocation code, resulting in the AG iteration always doing blocking AG iteration. This results in a small performance regression for a specific fsmark test that runs more user data writer threads than there are AGs. Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: `2edf06a50f` ("xfs: factor xfs_alloc_vextent_this_ag() for _iterate_ags()") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2023-06-05 04:06:27 +10:00
Dave Chinner	89a4bf0dc3	xfs: buffer pins need to hold a buffer reference When a buffer is unpinned by xfs_buf_item_unpin(), we need to access the buffer after we've dropped the buffer log item reference count. This opens a window where we can have two racing unpins for the buffer item (e.g. shutdown checkpoint context callback processing racing with journal IO iclog completion processing) and both attempt to access the buffer after dropping the BLI reference count. If we are unlucky, the "BLI freed" context wins the race and frees the buffer before the "BLI still active" case checks the buffer pin count. This results in a use after free that can only be triggered in active filesystem shutdown situations. To fix this, we need to ensure that buffer existence extends beyond the BLI reference count checks and until the unpin processing is complete. This implies that a buffer pin operation must also take a buffer reference to ensure that the buffer cannot be freed until the buffer unpin processing is complete. Reported-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2023-06-05 04:05:27 +10:00
Linus Torvalds	9561de3a55	Linux 6.4-rc5	2023-06-04 14:04:27 -04:00
Linus Torvalds	6f64a5ebe1	- Fix open firmware quirks validation so that they don't get applied wrongly -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmR8qTcACgkQEsHwGGHe VUqEsQ//UH2BSu+aSCTZWTOMLHnrsCToVOURKbUV6GD/+in8VrOxE1tXqdNCG5u8 wnHGi+hyZGxK1fWHrAObfj1OT0O3cLbdxSpNIufESihny+eoganlVngweCFZVGcK JymC1bAlDYEUIdBsJlRdBt/uSWYBw9CFCdHkTsdku6CUf2BLr8NkMHCcYg11nvaK iAKP96pFIxUmeI0R+l1Wc6qIE1plqeiKYYR+pGO+ubyQAC0OEGR/robCZkTeCHth 9FhmtMSsVghqmXfUfjbRY+fiDsSdnwn7Iw8qSLl4zmsD+CIsy/CRTag0gDljyZ+m WeBx1ripUU3hVGv3CGpfq4MIdkILYqFORAcuhYVgiEAmOHa4V28S+eEBWAlct7Bc NIs3UVP8BItbsdHiqOCAyXJs6dAT3Ja+PDr2a7WelrpLOXsb1u8ffjhr4UTWYU0a VlAnohKTBJifl9bAD5dvgOAfnDCmHAibVqdB6ylNQyDPB4l8JzIf7q0ZuO2gBsoa UNVKaYNVbeIl/DEEW2Qct6prXj8ekfUXdiWzjOe3JzP2QuDOZDVTxO1fyxWfyyGE kpLWhqUcqOHSk8BYPH7BzaEkKY3oojwJUlTnWqJvf0cGYNwOmA3xfL1iphyl/F5p Tpj9CkwnSD1sNdw6Rh6NNZWkmk0Tz7qAh3ywFsni3vx6p+7aEpw= =Twrd -----END PGP SIGNATURE----- Merge tag 'irq_urgent_for_v6.4_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Borislav Petkov: - Fix open firmware quirks validation so that they don't get applied wrongly * tag 'irq_urgent_for_v6.4_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic: Correctly validate OF quirk descriptors	2023-06-04 11:57:38 -04:00
Min-Hua Chen	8cde87b007	net: sched: wrap tc_skip_wrapper with CONFIG_RETPOLINE This patch fixes the following sparse warning: net/sched/sch_api.c:2305:1: sparse: warning: symbol 'tc_skip_wrapper' was not declared. Should it be static? No functional change intended. Signed-off-by: Min-Hua Chen <minhuadotchen@gmail.com> Acked-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-06-04 15:49:06 +01:00
David S. Miller	3d5f4d29f6	Merge branch 'enetc-fixes' Wei Fang says: ==================== net: enetc: correct the statistics of rx bytes The purpose of this patch set is to fix the issue of rx bytes statistics. The first patch corrects the rx bytes statistics of normal kernel protocol stack path, and the second patch is used to correct the rx bytes statistics of XDP. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-06-04 15:43:45 +01:00
Wei Fang	fdebd850cc	net: enetc: correct rx_bytes statistics of XDP The rx_bytes statistics of XDP are always zero, because rx_byte_cnt is not updated after it is initialized to 0. So fix it. Fixes: `d1b15102dd` ("net: enetc: add support for XDP_DROP and XDP_PASS") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-06-04 15:43:45 +01:00
Wei Fang	7190d0ff0e	net: enetc: correct the statistics of rx bytes The rx_bytes of struct net_device_stats should count the length of ethernet frames excluding the FCS. However, there are two problems with the rx_bytes statistics of the current enetc driver. one is that the length of VLAN header is not counted if the VLAN extraction feature is enabled. The other is that the length of L2 header is not counted, because eth_type_trans() is invoked before updating rx_bytes which will subtract the length of L2 header from skb->len. BTW, the rx_bytes statistics of XDP path also have similar problem, I will fix it in another patch. Fixes: `a800abd3ec` ("net: enetc: move skb creation into enetc_build_skb") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-06-04 15:43:45 +01:00
Linus Torvalds	5e89d62ec1	media fixes for v6.4-rc5 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmR8hVgACgkQCF8+vY7k 4RXDRg//SxDkN3JSKKUuNzBBYI7KJkEXE/G7rsA0DZmgBS90ri8lop6psw1dq7pY PgffWc01YKJqWK9Tz6V8ejn9jFo7YAAiTwYqitwOxEfsF5r+2yLV0lfSGU7OhYAP m/CwSbsL78RU8YAcAXJm1K8UJu/NDHKcJQiroCDAJanw5W1dvKx42flE0kw6g9YY CqanR3XuiqUxq4XDzUoN86VHIUk97AhRDeCi9E4hpYJgMHuxQPoRd71/vuA15KYD H2d/Xh/jJ+qNtPnO/Ivgy43Ueb6qVvbjr5uNevFtPghJ8ATsP+a/dwBeprqMuuX5 k5jWfNTNiT0VHWVG0ruOsGMpq6NCUXXVt5IHAaLWiuGh8RQrnn1JfEkMvYqiu5ar /4Z55Fl6qU2760N/PVLUwskcDnGNOKSTAKSPBZg3hj4jn5eCwQIkAysEt8BULiLs SdyOODiqH8r+g2j6JXFqRWl9sV7jH6cV+ZaNW6mbfCyRIJdJ25W1C3yKIDK/G3dG qBj1dm0uLd7ufvdSwgNW2LwLFH4a8sHXELfij603K3ysO/NZfdlzgY+6rDuv3w2P OiHNtMTig0O4TImIELjJlxOsb3bSsaM3tPBSdCl0KC0kkMT8U+7rqXZFfCvYqKUV uwvUnfmu6dx2CfVIPEep90Vnsr1rtIL+ZPqut1x5LfFrVYuTYrE= =hLQn -----END PGP SIGNATURE----- Merge tag 'media/v6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "Some driver fixes: - a regression fix for the verisilicon driver - uvcvideo: don't expose unsupported video formats to userspace - camss-video: don't zero subdev format after init - mediatek: some fixes for 4K decoder formats - fix a Sphinx build warning (missing doc for client_caps) - some fixes for imx and atomisp staging drivers And two CEC core fixes: - don't set last_initiator if TX in progress - disable adapter in cec_devnode_unregister" * tag 'media/v6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: uvcvideo: Don't expose unsupported formats to userspace media: v4l2-subdev: Fix missing kerneldoc for client_caps media: staging: media: imx: initialize hs_settle to avoid warning media: v4l2-mc: Drop subdev check in v4l2_create_fwnode_links_to_pad() media: staging: media: atomisp: init high & low vars media: cec: core: don't set last_initiator if tx in progress media: cec: core: disable adapter in cec_devnode_unregister media: mediatek: vcodec: Only apply 4K frame sizes on decoder formats media: camss: camss-video: Don't zero subdev format again after initialization media: verisilicon: Additional fix for the crash when opening the driver	2023-06-04 09:10:43 -04:00
Linus Torvalds	209835e8ec	Char/Misc driver fixes for 6.4-rc5 Here are a bunch of tiny char/misc/other driver fixes for 6.4-rc5 that resolve a number of reported issues. Included in here are: - iio driver fixes - fpga driver fixes - test_firmware bugfixes - fastrpc driver tiny bugfixes - MAINTAINERS file updates for some subsystems All of these have been in linux-next this past week with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZHxDNg8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+yl1ywCg0uz+E/GYKx5cP9chPFmbbaFwxH4AnRpn/kIH xz6nbAqSf7CBbtxmED11 =4J1c -----END PGP SIGNATURE----- Merge tag 'char-misc-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are a bunch of tiny char/misc/other driver fixes for 6.4-rc5 that resolve a number of reported issues. Included in here are: - iio driver fixes - fpga driver fixes - test_firmware bugfixes - fastrpc driver tiny bugfixes - MAINTAINERS file updates for some subsystems All of these have been in linux-next this past week with no reported issues" * tag 'char-misc-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (34 commits) test_firmware: fix the memory leak of the allocated firmware buffer test_firmware: fix a memory leak with reqs buffer test_firmware: prevent race conditions by a correct implementation of locking firmware_loader: Fix a NULL vs IS_ERR() check MAINTAINERS: Vaibhav Gupta is the new ipack maintainer dt-bindings: fpga: replace Ivan Bornyakov maintainership MAINTAINERS: update Microchip MPF FPGA reviewers misc: fastrpc: reject new invocations during device removal misc: fastrpc: return -EPIPE to invocations on device removal misc: fastrpc: Reassign memory ownership only for remote heap misc: fastrpc: Pass proper scm arguments for secure map request iio: imu: inv_icm42600: fix timestamp reset iio: adc: ad_sigma_delta: Fix IRQ issue by setting IRQ_DISABLE_UNLAZY flag dt-bindings: iio: adc: renesas,rcar-gyroadc: Fix adi,ad7476 compatible value iio: dac: mcp4725: Fix i2c_master_send() return value handling iio: accel: kx022a fix irq getting iio: bu27034: Ensure reset is written iio: dac: build ad5758 driver when AD5758 is selected iio: addac: ad74413: fix resistance input processing iio: light: vcnl4035: fixed chip ID check ...	2023-06-04 08:32:30 -04:00
Adam Ford	9bf2e53431	arm64: dts: imx8mn-beacon: Fix SPI CS pinmux The final production baseboard had a different chip select than earlier prototype boards. When the newer board was released, the SPI stopped working because the wrong pin was used in the device tree and conflicted with the UART RTS. Fix the pinmux for production boards. Fixes: `36ca3c8ccb` ("arm64: dts: imx: Add Beacon i.MX8M Nano development kit") Signed-off-by: Adam Ford <aford173@gmail.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>	2023-06-04 20:31:58 +08:00
Linus Torvalds	41f3ab2d5d	Driver core fixes for 6.4-rc5 Here are 2 small driver core cacheinfo fixes for 6.4-rc5 that resolve a number of reported issues with that file. These changes have been in linux-next this past week with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZHxChg8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ykrLACeJBLCDThdooct8G/7MzfpJhFcjSYAn1/EhJDA GxgOmZrsB1HcO3Bo587a =Cucq -----END PGP SIGNATURE----- Merge tag 'driver-core-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are two small driver core cacheinfo fixes for 6.4-rc5 that resolve a number of reported issues with that file. These changes have been in linux-next this past week with no reported problems" * tag 'driver-core-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: drivers: base: cacheinfo: Update cpu_map_populated during CPU Hotplug drivers: base: cacheinfo: Fix shared_cpu_map changes in event of CPU hotplug	2023-06-04 08:02:25 -04:00
Linus Torvalds	12c2f77b32	TTY/Serial driver fixes for 6.4-rc5 Here are some small tty/serial driver fixes for 6.4-rc5 that have all been in linux-next this past week with no reported problems. Included in here are: - 8250_tegra driver bugfix - fsl uart driver bugfixes - Kconfig fix for dependancy issue - dt-bindings fix for the 8250_omap driver Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZHxD4w8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ykcTQCdGohhrEfOmNVDGnYHTTCZ7NXgjX4AoJkqRjsT pp6mxqTNLHy/NQqjboUR =O/xg -----END PGP SIGNATURE----- Merge tag 'tty-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver fixes from Greg KH: "Here are some small tty/serial driver fixes for 6.4-rc5 that have all been in linux-next this past week with no reported problems. Included in here are: - 8250_tegra driver bugfix - fsl uart driver bugfixes - Kconfig fix for dependancy issue - dt-bindings fix for the 8250_omap driver" * tag 'tty-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: dt-bindings: serial: 8250_omap: add rs485-rts-active-high serial: cpm_uart: Fix a COMPILE_TEST dependency soc: fsl: cpm1: Fix TSA and QMC dependencies in case of COMPILE_TEST tty: serial: fsl_lpuart: use UARTCTRL_TXINV to send break instead of UARTCTRL_SBK serial: 8250_tegra: Fix an error handling path in tegra_uart_probe()	2023-06-04 07:51:33 -04:00
Linus Torvalds	8b435e4025	USB fixes for 6.4-rc5 Here are some USB driver and core fixes for 6.4-rc5. Most of these are tiny driver fixes, including: - udc driver bugfix - f_fs gadget driver bugfix - cdns3 driver bugfix - typec bugfixes But the "big" thing in here is a fix yet-again for how the USB buffers are handled from userspace when dealing with DMA issues. The changes were discussed a lot, and tested a lot, on the list, and acked by the relevant mm maintainers and have been in linux-next all this past week with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZHxFGA8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ykd0wCgwHMYaXa8jJCGgG+e4o/rFvBucK8AoJdmHc8M hoeLOGdBuxJItXNOnMac =uUEx -----END PGP SIGNATURE----- Merge tag 'usb-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some USB driver and core fixes for 6.4-rc5. Most of these are tiny driver fixes, including: - udc driver bugfix - f_fs gadget driver bugfix - cdns3 driver bugfix - typec bugfixes But the "big" thing in here is a fix yet-again for how the USB buffers are handled from userspace when dealing with DMA issues. The changes were discussed a lot, and tested a lot, on the list, and acked by the relevant mm maintainers and have been in linux-next all this past week with no reported problems" * tag 'usb-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: typec: tps6598x: Fix broken polling mode after system suspend/resume mm: page_table_check: Ensure user pages are not slab pages mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM usb: usbfs: Use consistent mmap functions usb: usbfs: Enforce page requirements for mmap dt-bindings: usb: snps,dwc3: Fix "snps,hsphy_interface" type usb: gadget: udc: fix NULL dereference in remove() usb: gadget: f_fs: Add unbind event before functionfs_unbind usb: cdns3: fix NCM gadget RX speed 20x slow than expection at iMX8QM	2023-06-04 07:31:48 -04:00
Linus Torvalds	b066935bf8	ARM: * Address some fallout of the locking rework, this time affecting the way the vgic is configured * Fix an issue where the page table walker frees a subtree and then proceeds with walking what it has just freed... * Check that a given PA donated to the guest is actually memory (only affecting pKVM) * Correctly handle MTE CMOs by Set/Way * Fix the reported address of a watchpoint forwarded to userspace * Fix the freeing of the root of stage-2 page tables * Stop creating spurious PMU events to perform detection of the default PMU and use the existing PMU list instead. x86: * Fix a memslot lookup bug in the NX recovery thread that could theoretically let userspace bypass the NX hugepage mitigation * Fix a s/BLOCKING/PENDING bug in SVM's vNMI support * Account exit stats for fastpath VM-Exits that never leave the super tight run-loop * Fix an out-of-bounds bug in the optimized APIC map code, and add a regression test for the race. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmR7k1QUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNblwf/faUVOBMv7mQBGsGa7FNcmaNhYeIT U1k4pFNlo7dNNuNJrGdpo+sOGP5A8CRLNSVvlyjgCHF1Qc9gVtXNvZ9PnA6nAYmB qqvUz/TDw9/NLTlJEkbSs05B4am4yfd5pV6R/32jrPIbXOW++6ae2LpILS/NPBrB y0tGiVUJrO3zVXdBKa4PFmlO8jsXPmMEiicEJa5v2Boeo5SFyFfErw9zDNwSMsQc 27bzbs3O2daXTNMFnwVCCpWUxt1EqWYUXGvBjsChAUI0K10F2/GW9f6YeFsGXqKI d8g1QuCukSt/CvN0pT+g/540mR6i0Azpek1myQfuCu2IhQ1jCJaSWOjoEw== =8VrO -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "ARM: - Address some fallout of the locking rework, this time affecting the way the vgic is configured - Fix an issue where the page table walker frees a subtree and then proceeds with walking what it has just freed... - Check that a given PA donated to the guest is actually memory (only affecting pKVM) - Correctly handle MTE CMOs by Set/Way - Fix the reported address of a watchpoint forwarded to userspace - Fix the freeing of the root of stage-2 page tables - Stop creating spurious PMU events to perform detection of the default PMU and use the existing PMU list instead x86: - Fix a memslot lookup bug in the NX recovery thread that could theoretically let userspace bypass the NX hugepage mitigation - Fix a s/BLOCKING/PENDING bug in SVM's vNMI support - Account exit stats for fastpath VM-Exits that never leave the super tight run-loop - Fix an out-of-bounds bug in the optimized APIC map code, and add a regression test for the race" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: selftests: Add test for race in kvm_recalculate_apic_map() KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-bounds KVM: x86: Account fastpath-only VM-Exits in vCPU stats KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK KVM: x86/mmu: Grab memslot for correct address space in NX recovery worker KVM: arm64: Document default vPMU behavior on heterogeneous systems KVM: arm64: Iterate arm_pmus list to probe for default PMU KVM: arm64: Drop last page ref in kvm_pgtable_stage2_free_removed() KVM: arm64: Populate fault info for watchpoint KVM: arm64: Reload PTE after invoking walker callback on preorder traversal KVM: arm64: Handle trap of tagged Set/Way CMOs arm64: Add missing Set/Way CMO encodings KVM: arm64: Prevent unconditional donation of unmapped regions from the host KVM: arm64: vgic: Fix a comment KVM: arm64: vgic: Fix locking comment KVM: arm64: vgic: Wrap vgic_its_create() with config_lock KVM: arm64: vgic: Fix a circular locking issue	2023-06-04 07:16:53 -04:00
Linus Torvalds	9455b4b6db	powerpc fixes for 6.4 #4 - Fix link errors in new aes-gcm-p10 code when built-in with other drivers. - Limit number of TCEs passed to H_STUFF_TCE hcall as per spec. - Use KSYM_NAME_LEN in xmon array size to avoid possible OOB write. Thanks to: Gaurav Batra, Maninder Singh Vishal Chourasia. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmR70UsTHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgCdcEACJQ7GOV3MuV7oSAivumF81AmOG/86Y eN1wqI25nPyhH0sUOYjM97A8e2vPvEVJPFCNnDHe1fcICaRR+X6rW0cnfrE3NCI6 JU+Qu1zEDC/JVd+AXh2vjHPUyi91rCNAXuao0Y+IHu+ViTjmKLd1bEa5hhFS0vxj WkEWWatWxtWnJV9mfS29v+leGmFgX2wX04IuIFzA4OafMU2eaBDYDXMvvqXkIpLj CGmA5mRGYsSyPZIG2CITFcSOrQ5hSd8w2M5zenDth6lwIMXJLsi1f2cfn0GEueQF lp2e8cF96D20M+oOuP7+35hZ/Iq9haQkLUR3m55ai+RK1MhpyXSJPLUkMg6/M5BN n8P7x+BuCJh148YS+qdb7FEyMLK7Zjjr+j4yR0LVmL+HBQL8/BklX5HhkpMA4UCh l9MBDIvqzMVGpKwoR/vdTuMH+g4Y6tDWV9yR2Oz4zOXrYb5nR4KHvhCcax5SfC11 bVC3tP2hMgMalfTlm7J+iSdukwkLUZT3aubJoAi7r4iyjIwFJkCySJWO+GI7IGAd OIyg2RQtObIy8evOL+0RIomuek9UBASQymA3N0EP8QmxZocoqJU7FAOiMIFhP86/ yr6fmcW8Mov+aV0fZtOzOwtnCXn96j4xgVPdznKz/vWZkNzerRmOOHOqLHcfN5Cf tFEceNQLrjpGdg== =EvgV -----END PGP SIGNATURE----- Merge tag 'powerpc-6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix link errors in new aes-gcm-p10 code when built-in with other drivers - Limit number of TCEs passed to H_STUFF_TCE hcall as per spec - Use KSYM_NAME_LEN in xmon array size to avoid possible OOB write Thanks to Gaurav Batra and Maninder Singh Vishal Chourasia. * tag 'powerpc-6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/xmon: Use KSYM_NAME_LEN in array size powerpc/iommu: Limit number of TCEs to 512 for H_STUFF_TCE hcall powerpc/crypto: Fix aes-gcm-p10 link errors	2023-06-04 07:11:13 -04:00
Tian Lan	ddad59331a	blk-mq: fix blk_mq_hw_ctx active request accounting The nr_active counter continues to increase over time which causes the blk_mq_get_tag to hang until the thread is rescheduled to a different core despite there are still tags available. kernel-stack INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:inboundIOReacto state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000 Call Trace: <TASK> __schedule+0x351/0xa20 scheduler+0x5d/0xe0 io_schedule+0x42/0x70 blk_mq_get_tag+0x11a/0x2a0 ? dequeue_task_stop+0x70/0x70 __blk_mq_alloc_requests+0x191/0x2e0 kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before __blk_mq_free_request being called. 320 320 kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1 b'__blk_mq_free_request+0x1 [kernel]' b'bt_iter+0x50 [kernel]' b'blk_mq_queue_tag_busy_iter+0x318 [kernel]' b'blk_mq_timeout_work+0x7c [kernel]' b'process_one_work+0x1c4 [kernel]' b'worker_thread+0x4d [kernel]' b'kthread+0xe6 [kernel]' b'ret_from_fork+0x1f [kernel]' Signed-off-by: Tian Lan <tian.lan@twosigma.com> Fixes: `2e315dc07d` ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter") Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230513221227.497327-1-tilan7663@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-06-03 17:20:00 -06:00
Wen Gu	c308e9ec00	net/smc: Avoid to access invalid RMBs' MRs in SMCRv1 ADD LINK CONT SMCRv1 has a similar issue to SMCRv2 (see link below) that may access invalid MRs of RMBs when construct LLC ADD LINK CONT messages. BUG: kernel NULL pointer dereference, address: 0000000000000014 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 5 PID: 48 Comm: kworker/5:0 Kdump: loaded Tainted: G W E 6.4.0-rc3+ #49 Workqueue: events smc_llc_add_link_work [smc] RIP: 0010:smc_llc_add_link_cont+0x160/0x270 [smc] RSP: 0018:ffffa737801d3d50 EFLAGS: 00010286 RAX: ffff964f82144000 RBX: ffffa737801d3dd8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff964f81370c30 RBP: ffffa737801d3dd4 R08: ffff964f81370000 R09: ffffa737801d3db0 R10: 0000000000000001 R11: 0000000000000060 R12: ffff964f82e70000 R13: ffff964f81370c38 R14: ffffa737801d3dd3 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff9652bfd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000014 CR3: 000000008fa20004 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> smc_llc_srv_rkey_exchange+0xa7/0x190 [smc] smc_llc_srv_add_link+0x3ae/0x5a0 [smc] smc_llc_add_link_work+0xb8/0x140 [smc] process_one_work+0x1e5/0x3f0 worker_thread+0x4d/0x2f0 ? __pfx_worker_thread+0x10/0x10 kthread+0xe5/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> When an alernate RNIC is available in system, SMC will try to add a new link based on the RNIC for resilience. All the RMBs in use will be mapped to the new link. Then the RMBs' MRs corresponding to the new link will be filled into LLC messages. For SMCRv1, they are ADD LINK CONT messages. However smc_llc_add_link_cont() may mistakenly access to unused RMBs which haven't been mapped to the new link and have no valid MRs, thus causing a crash. So this patch fixes it. Fixes: `87f88cda21` ("net/smc: rkey processing for a new link as SMC client") Link: https://lore.kernel.org/r/1685101741-74826-3-git-send-email-guwen@linux.alibaba.com Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-06-03 20:51:04 +01:00
Weihao Gao	02a7eee1eb	Fix gitignore for recently added usptream self tests This resolves the issue that generated binary is showing up as an untracked git file after every build on the kernel. Signed-off-by: Weihao Gao <weihaogao@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-06-03 20:44:58 +01:00
Paolo Bonzini	f211b45057	KVM x86 fixes for 6.4 - Fix a memslot lookup bug in the NX recovery thread that could theoretically let userspace bypass the NX hugepage mitigation - Fix a s/BLOCKING/PENDING bug in SVM's vNMI support - Account exit stats for fastpath VM-Exits that never leave the super tight run-loop - Fix an out-of-bounds bug in the optimized APIC map code, and add a regression test for the race. -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmR6id4SHHNlYW5qY0Bn b29nbGUuY29tAAoJEGCRIgFNDBL5i8UP/0QqgBeVHluyG+0v+x1pXNsB1lEKjkTX 7W0OLkEPB2p4JVa0TG4zcBrs+JTCRGPnC3aL86Xd5LLQq51aZuEsFYRHJ6Ngs5jR Sy/EOHGwgL0Ol83KBPpRsAwX9TXspRq/fWDKqlNCFa90tJFqTioTaaG5T4YNqdR9 NcxmtvOi3GnqSUU+9MV0/DL4vOthhstQdj4JPptgP5hS/FH1F7ZdFSFC46AS0C60 bXJglcmv4l8tEGnIo2VtJAAK8MxCJp0m/4Ow+AXqAqIFDxP3F66KG60l4Jrty0GE URuXXFUtwYMhJaeZlnHM1po6RKX/E1etYEA4E0zHD/fntXg8ciE9j9XwQadXT44Q 8WIGmSt52OH3zRqbuMcOpdkuW9LepkrA2A8POpJuN9j9ELoNBHzMC+L1pe9FMEd+ VT2lfoAfDyQzIuwzOwyqc0/axUx0YeyMJy/3mVuk10hI4L32YRDvc97IVXf6MUQA DaAu+mgEjU6J+MVsWcpHHEq31VtpZXAaaluub2JlBOaPANMIM9Qfur2XtwFIuL3u sOZAiyBWQyUUzMEqdOAVpB1HB1QZ2Tb1NmhUXGLh5BrpkWxqEf89vtM6xN+sXmip lQDyxf1NDFxiAVfcnG0I0q0285mDr2mbGHTawu1vM5o46MWyqmXiSQMdAEbnyIUm uEF1Rppg4Hkc =ztue -----END PGP SIGNATURE----- Merge tag 'kvm-x86-fixes-6.4' of https://github.com/kvm-x86/linux into HEAD KVM x86 fixes for 6.4 - Fix a memslot lookup bug in the NX recovery thread that could theoretically let userspace bypass the NX hugepage mitigation - Fix a s/BLOCKING/PENDING bug in SVM's vNMI support - Account exit stats for fastpath VM-Exits that never leave the super tight run-loop - Fix an out-of-bounds bug in the optimized APIC map code, and add a regression test for the race.	2023-06-03 15:16:58 -04:00
Paolo Bonzini	49661a52a4	KVM/arm64 fixes for 6.4, take #3 - Fix the reported address of a watchpoint forwarded to userspace - Fix the freeing of the root of stage-2 page tables - Stop creating spurious PMU events to perform detection of the default PMU and use the existing PMU list instead. -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmR3F3EPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpDPdQP/3dnsiOsA6uHMuRvSa6/+IMAjAoe6CNSQ+iD CE8AvVGuHD4/99JFxrBZJrFXJiB3VIhJS2NCxzcXu26ZKwbGx9ZhEuBcWa5O2yo5 Aa9EfPuFzTFdB207kkzk/TMeLbnT2jy8d9ZYeop/h1x+xsVQxgqllAA6NCooBCYA +BcYiT+HKGklqzKpBrbPqjA1lHrsSmjccf64+3xGYMaTnlCsjav3K3fc1a9jGipy k2LnVlh/Bv1aw1qj7Aqf3kZyhD/1F8U7QuuaSpCYXxHJPwlrQRmmRC5iL31BVlvy tVfgVEloi1sas/rqZYI9UKhi6S/Z9Hx3AVIW4ehgJkjj35hrre7ZvQjqjEj8XUBz 9P9XVz1TKt5GjTIycaYvopQiXnnE4J7GAC80vTLnGtTFXGCpXT3DVDCpEIjOpKj4 8CECdBaY7XQGQGupNspUOaWvlDuBUSIARvccBH6Z2y3hcdF9eXf05T9j3Pw6Eexf M0AsQAAgj5C00dTzfV/R9A1uNdL2x3M1eiZgXgttbLEyzd+slvvuUfpJA5k+c9+/ 9g3x+u0qFmIRQMFtYYnDNtMhYtjtgVJQunspUwXmij6dzAz/dXU7v+QoB2/KZLY+ fpfgpGuDZH9OBC/TmzKzjT+onR+hYSPdrzx3WTJwzsKGr/4WtIBiiykjKVF6c5yY GyxT00p3 =63kl -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.4, take #3 - Fix the reported address of a watchpoint forwarded to userspace - Fix the freeing of the root of stage-2 page tables - Stop creating spurious PMU events to perform detection of the default PMU and use the existing PMU list instead.	2023-06-03 15:15:49 -04:00
Paolo Bonzini	26f3149880	KVM/arm64 fixes for 6.4, take #2 - Address some fallout of the locking rework, this time affecting the way the vgic is configured - Fix an issue where the page table walker frees a subtree and then proceeds with walking what it has just freed... - Check that a given PA donated to the gues is actually memory (only affecting pKVM) - Correctly handle MTE CMOs by Set/Way -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmRuCOgPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpD08wP/1g00EJei1G0ZHDhoWI/F7axx2ibmoEU1dkn BYVgwC9y51O4h+sfMZw1nLLtOw65EPCMspAxgSxMXMBW09A2wRvj9HbKnlrtLi1W Y3pkTuJwN4o8YX/R5pdMtt0gNI2odlq7YokXpYB5+CbmTZa1P12eZz21t3e+CQh9 dkSW/zPP4ihnSYtB/47OrRGYpyv6HKwPjU9VtRbKcbomO0SYPAiizyEy2IE3UABK nBvt7mMXnlPpkLmhWwKYj4QuOoB3H48xXf059hbZLNcGWhqtQloE7TvvqILQ7PJW n68r6cnmZpW/+so9Wn8+/iznFf5C/YyXUEx4NiZtm395VyJRe1LCCP07soBvlwHh x+wPm2xDrHasFJeEg2f8wXpqM4LByHfyIV4KMdPg2dUMW8jkglETDyFWd4i0QzbH Sovfkt2AnSJ/soUXlkZA+f5bf023Q5zXDBBlNZyxXf24fynglVdaKiRXbzutqt+s pKAniy2UxHf9Yz0M9hR+LeJ8Cvchnq+7+ToEo5B1em9g17ngGnKzrWt22F57b8Ro U3ZSLQedob5G8ykt4jEJuBlKwf6bDijwRlpwo8aCfMzs2iqthOJx/wjcj8HMONRF A0stENROnCF2sz5ThJXfXNtLZk2gLrTk0p9EPhU7qDquKzTURauzI/uyFBsgeRxD 1RcxKUMd =Uwqe -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.4, take #2 - Address some fallout of the locking rework, this time affecting the way the vgic is configured - Fix an issue where the page table walker frees a subtree and then proceeds with walking what it has just freed... - Check that a given PA donated to the gues is actually memory (only affecting pKVM) - Correctly handle MTE CMOs by Set/Way	2023-06-03 15:14:18 -04:00
Linus Torvalds	e5282a7d8f	SCSI fixes on 20230603 Five fixes, all in drivers. The most extensive is the target change to fix the hang in the login code, which involves changing timers from per login to per connection. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZHt7YCYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishSh+AP9iwKp2 MJJMB5GeijKd0TxlOp5gigcTAaqYgkco5xl/wAD+ItgItZ/Fbcn4t5ScMzbOQddb Z4QNYUVhplUr+cBel1Y= =eHo1 -----END PGP SIGNATURE----- Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Five fixes, all in drivers. The most extensive is the target change to fix the hang in the login code, which involves changing timers from per login to per connection" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: stex: Fix gcc 13 warnings scsi: qla2xxx: Fix NULL pointer dereference in target mode scsi: target: iscsi: Prevent login threads from racing between each other scsi: target: iscsi: Remove unused transport_timer scsi: target: iscsi: Fix hang in the iSCSI login code	2023-06-03 13:52:24 -04:00
Linus Torvalds	d1b65edf4d	LEDs fixes for 6.4-rc5 Here's a fix for a regression in 6.4-rc1 which broke the backlight on machines such as the Lenovo ThinkPad X13s. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQHbPq+cpGvN/peuzMLxc3C7H1lCAUCZHtacAAKCRALxc3C7H1l CCV+APsExsBg7Pk+mSkSAqVCcWAj6kj2y6VYveevXrmZnhm2bgD9HQJFSzJreTEG BmdLH9uAZ8pEHlb45wQM+Br9Iw0wHAY= =1kor -----END PGP SIGNATURE----- Merge tag 'leds-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/johan/linux Pull LED fix from Johan Hovold: "Here's a fix for a regression in 6.4-rc1 which broke the backlight on machines such as the Lenovo ThinkPad X13s" Acked-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/lkml/20230602091928.GR449117@google.com/ * tag 'leds-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/johan/linux: leds: qcom-lpg: Fix PWM period limits	2023-06-03 13:46:11 -04:00
Bjorn Andersson	b05d39466b	leds: qcom-lpg: Fix PWM period limits The introduction of high resolution PWM support changed the order of the operations in the calculation of min and max period. The result in both divisions is in most cases a truncation to 0, which limits the period to the range of [0, 0]. Both numerators (and denominators) are within 64 bits, so the whole expression can be put directly into the div64_u64, instead of doing it partially. Fixes: `b00d2ed376` ("leds: rgb: leds-qcom-lpg: Add support for high resolution PWM") Reviewed-by: Caleb Connolly <caleb.connolly@linaro.org> Tested-by: Steev Klimaszewski <steev@kali.org> Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com> Acked-by: Lee Jones <lee@kernel.org> Tested-by: Johan Hovold <johan+linaro@kernel.org> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD Link: https://lore.kernel.org/r/20230515162604.649203-1-quic_bjorande@quicinc.com Signed-off-by: Johan Hovold <johan@kernel.org>	2023-06-03 17:00:28 +02:00
Linus Torvalds	51f269a6ec	Probes fixes for 6.4-rc4: - Return NULL if the trace_probe list on trace_probe_event is empty. - selftests/ftrace: Choose testing symbol name for filtering feature from sample data instead of fixed symbol. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmR640AACgkQ2/sHvwUr PxugGgf/YwwocmUqiEtTukTB7fzoAjYyQXr0YaJM+DjeZXMqAJ4dl9tV1/AmAL4j iWtZd53aolTym/3P2VADfSc4xiyWjFdkYv7zRPjpqfMg3XsELJgshwz+12dmmMdx 0uco1l2/Ge3JNPK6BuWaO3V44QjoPSgiRsmxxKLh5K7M9V5swL7fadoLtins1B0r TVVqdyEHQkZLTByexg7wHYd/ro+4lexv1yhvyP4rEmYRPDoR56eOF2zwcQMHPvaY qstdP2ce6m5rG0gp4TsY7oRkezb64y903hNQuumoU6VR9nI3IK4PZjuX5/xns2By G9mRaOqb02+UmP+HhX4QGmr92G9Vyw== =o07w -----END PGP SIGNATURE----- Merge tag 'probes-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: - Return NULL if the trace_probe list on trace_probe_event is empty - selftests/ftrace: Choose testing symbol name for filtering feature from sample data instead of fixed symbol * tag 'probes-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: selftests/ftrace: Choose target function for filter test from samples tracing/probe: trace_probe_primary_from_call(): checked list_first_entry	2023-06-03 08:23:16 -04:00
Russell King (Oracle)	03c44a21d0	net: phylink: actually fix ksettings_set() ethtool call Raju Lakkaraju reported that the below commit caused a regression with Lan743x drivers and a 2.5G SFP. Sadly, this is because the commit was utterly wrong. Let's fix this properly by not moving the linkmode_and(), but instead copying the link ksettings and then modifying the advertising mask before passing the modified link ksettings to phylib. Fixes: `df0acdc59b` ("net: phylink: fix ksettings_set() ethtool call") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1q4eLm-00Ayxk-GZ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-02 23:49:56 -07:00
Masami Hiramatsu (Google)	eb50d0f250	selftests/ftrace: Choose target function for filter test from samples Since the event-filter-function.tc expects the 'exit_mmap()' directly calls 'kmem_cache_free()', this is vulnerable to code modifications. Choose the target function for the filter test from the sample event data so that it can keep test running correctly even if the caller function name will be changed. Link: https://lore.kernel.org/linux-trace-kernel/167919441260.1922645.18355804179347364057.stgit@mhiramat.roam.corp.google.com/ Link: https://lore.kernel.org/all/CA+G9fYtF-XEKi9YNGgR=Kf==7iRb2FrmEC7qtwAeQbfyah-UhA@mail.gmail.com/ Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Fixes: `7f09d639b8` ("tracing/selftests: Add test for event filtering on function name") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2023-06-03 15:48:22 +09:00
Jakub Kicinski	3696e140c2	Merge branch 'net-ipv6-skip_notify_on_dev_down-fix' Eric Dumazet says: ==================== net/ipv6: skip_notify_on_dev_down fix While reviewing Matthieu Baerts recent patch [1], I found it copied/pasted an existing bug around skip_notify_on_dev_down. First patch is a stable candidate, and second one can simply land in net tree. https://lore.kernel.org/lkml/20230601-net-next-skip_print_link_becomes_ready-v1-1-c13e64c14095@tessares.net/ ==================== Link: https://lore.kernel.org/r/20230601160445.1480257-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-02 22:55:45 -07:00
Eric Dumazet	ef62c0ae6d	net/ipv6: convert skip_notify_on_dev_down sysctl to u8 Save a bit a space, and could help future sysctls to use the same pattern. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-02 22:55:43 -07:00
Eric Dumazet	edf2e1d201	net/ipv6: fix bool/int mismatch for skip_notify_on_dev_down skip_notify_on_dev_down ctl table expects this field to be an int (4 bytes), not a bool (1 byte). Because proc_dou8vec_minmax() was added in 5.13, this patch converts skip_notify_on_dev_down to an int. Following patch then converts the field to u8 and use proc_dou8vec_minmax(). Fixes: `7c6bb7d2fa` ("net/ipv6: Add knob to skip DELROUTE message on device down") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-02 22:55:43 -07:00
Michal Luczaj	47d2804bc9	KVM: selftests: Add test for race in kvm_recalculate_apic_map() Keep switching between LAPIC_MODE_X2APIC and LAPIC_MODE_DISABLED during APIC map construction to hunt for TOCTOU bugs in KVM. KVM's optimized map recalc makes multiple passes over the list of vCPUs, and the calculations ignore vCPU's whose APIC is hardware-disabled, i.e. there's a window where toggling LAPIC_MODE_DISABLED is quite interesting. Signed-off-by: Michal Luczaj <mhal@rbox.co> Co-developed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230602233250.1014316-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-06-02 17:21:06 -07:00
Sean Christopherson	4364b28798	KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-bounds Bail from kvm_recalculate_phys_map() and disable the optimized map if the target vCPU's x2APIC ID is out-of-bounds, i.e. if the vCPU was added and/or enabled its local APIC after the map was allocated. This fixes an out-of-bounds access bug in the !x2apic_format path where KVM would write beyond the end of phys_map. Check the x2APIC ID regardless of whether or not x2APIC is enabled, as KVM's hardcodes x2APIC ID to be the vCPU ID, i.e. it can't change, and the map allocation in kvm_recalculate_apic_map() doesn't check for x2APIC being enabled, i.e. the check won't get false postivies. Note, this also affects the x2apic_format path, which previously just ignored the "x2apic_id > new->max_apic_id" case. That too is arguably a bug fix, as ignoring the vCPU meant that KVM would not send interrupts to the vCPU until the next map recalculation. In practice, that "bug" is likely benign as a newly present vCPU/APIC would immediately trigger a recalc. But, there's no functional downside to disabling the map, and a future patch will gracefully handle the -E2BIG case by retrying instead of simply disabling the optimized map. Opportunistically add a sanity check on the xAPIC ID size, along with a comment explaining why the xAPIC ID is guaranteed to be "good". Reported-by: Michal Luczaj <mhal@rbox.co> Fixes: `5b84b02917` ("KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230602233250.1014316-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-06-02 17:20:50 -07:00
Martin KaFai Lau	23509e92cf	Merge branch 'Fix elem_size not being set for inner maps' Rhys Rustad-Elliott says: ==================== Commit `d937bc3449` ("bpf: make uniform use of array->elem_size everywhere in arraymap.c") changed array_map_gen_lookup to use array->elem_size instead of round_up(map->value_size, 8) as the element size when generating code to access a value in an array map. array->elem_size, however, is not set by bpf_map_meta_alloc when initializing an BPF_MAP_TYPE_ARRAY_OF_MAPS or BPF_MAP_TYPE_HASH_OF_MAPS. This results in array_map_gen_lookup incorrectly outputting code that always accesses index 0 in the array (as the index will be calculated via a multiplication with the element size, which is incorrectly set to 0). This patchset sets elem_size on the bpf_array object when allocating an array or hash of maps to fix this and adds a selftest that accesses an array map nested within a hash of maps at a nonzero index to prevent regressions. v1: https://lore.kernel.org/bpf/95b5da7c-ee52-3ecb-0a4e-f6a7a114f269@linux.dev/ Changelog: v1 -> v2: Address comments by Martin KaFai Lau: - Directly use inner_array->elem_size instead of using round_up - Move selftests to a new patch - Use ASSERT_* macros instead of CHECK and remove duration - Remove unnecessary usleep - Shorten selftest name ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-06-02 17:04:25 -07:00
Rhys Rustad-Elliott	1022b67b89	selftests/bpf: Add access_inner_map selftest Add a selftest that accesses a BPF_MAP_TYPE_ARRAY (at a nonzero index) nested within a BPF_MAP_TYPE_HASH_OF_MAPS to flex a previously buggy case. Signed-off-by: Rhys Rustad-Elliott <me@rhysre.net> Link: https://lore.kernel.org/r/20230602190110.47068-3-me@rhysre.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-06-02 17:04:22 -07:00
Tom Lendacky	a37f2699c3	x86/head/64: Switch to KERNEL_CS as soon as new GDT is installed The call to startup_64_setup_env() will install a new GDT but does not actually switch to using the KERNEL_CS entry until returning from the function call. Commit `bcce829083` ("x86/sev: Detect/setup SEV/SME features earlier in boot") moved the call to sme_enable() earlier in the boot process and in between the call to startup_64_setup_env() and the switch to KERNEL_CS. An SEV-ES or an SEV-SNP guest will trigger #VC exceptions during the call to sme_enable() and if the CS pushed on the stack as part of the exception and used by IRETQ is not mapped by the new GDT, then problems occur. Today, the current CS when entering startup_64 is the kernel CS value because it was set up by the decompressor code, so no issue is seen. However, a recent patchset that looked to avoid using the legacy decompressor during an EFI boot exposed this bug. At entry to startup_64, the CS value is that of EFI and is not mapped in the new kernel GDT. So when a #VC exception occurs, the CS value used by IRETQ is not valid and the guest boot crashes. Fix this issue by moving the block that switches to the KERNEL_CS value to be done immediately after returning from startup_64_setup_env(). Fixes: `bcce829083` ("x86/sev: Detect/setup SEV/SME features earlier in boot") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/all/6ff1f28af2829cc9aea357ebee285825f90a431f.1684340801.git.thomas.lendacky%40amd.com	2023-06-02 16:59:57 -07:00
Sean Christopherson	8b703a49c9	KVM: x86: Account fastpath-only VM-Exits in vCPU stats Increment vcpu->stat.exits when handling a fastpath VM-Exit without going through any part of the "slow" path. Not bumping the exits stat can result in wildly misleading exit counts, e.g. if the primary reason the guest is exiting is to program the TSC deadline timer. Fixes: `404d5d7bff` ("KVM: X86: Introduce more exit_fastpath_completion enum values") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230602011920.787844-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-06-02 16:37:49 -07:00
Maciej S. Szmigiero	b2ce899788	KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware I noticed that with vCPU count large enough (> 16) they sometimes froze at boot. With vCPU count of 64 they never booted successfully - suggesting some kind of a race condition. Since adding "vnmi=0" module parameter made these guests boot successfully it was clear that the problem is most likely (v)NMI-related. Running kvm-unit-tests quickly showed failing NMI-related tests cases, like "multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests and the NMI parts of eventinj test. The issue was that once one NMI was being serviced no other NMI was allowed to be set pending (NMI limit = 0), which was traced to svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather than for the "NMI pending" flag. Fix this by testing for the right flag in svm_is_vnmi_pending(). Once this is done, the NMI-related kvm-unit-tests pass successfully and the Windows guest no longer freezes at boot. Fixes: `fa4c027a79` ("KVM: x86: Add support for SVM's Virtual NMI") Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/be4ca192eb0c1e69a210db3009ca984e6a54ae69.1684495380.git.maciej.szmigiero@oracle.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-06-02 16:34:20 -07:00
Sean Christopherson	817fa99836	KVM: x86/mmu: Grab memslot for correct address space in NX recovery worker Factor in the address space (non-SMM vs. SMM) of the target shadow page when recovering potential NX huge pages, otherwise KVM will retrieve the wrong memslot when zapping shadow pages that were created for SMM. The bug most visibly manifests as a WARN on the memslot being non-NULL, but the worst case scenario is that KVM could unaccount the shadow page without ensuring KVM won't install a huge page, i.e. if the non-SMM slot is being dirty logged, but the SMM slot is not. ------------[ cut here ]------------ WARNING: CPU: 1 PID: 3911 at arch/x86/kvm/mmu/mmu.c:7015 kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm] CPU: 1 PID: 3911 Comm: kvm-nx-lpage-re RIP: 0010:kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm] RSP: 0018:ffff99b284f0be68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff99b284edd000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff9271397024e0 R08: 0000000000000000 R09: ffff927139702450 R10: 0000000000000000 R11: 0000000000000001 R12: ffff99b284f0be98 R13: 0000000000000000 R14: ffff9270991fcd80 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff927f9f640000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0aacad3ae0 CR3: 000000088fc2c005 CR4: 00000000003726e0 Call Trace: <TASK> __pfx_kvm_nx_huge_page_recovery_worker+0x10/0x10 [kvm] kvm_vm_worker_thread+0x106/0x1c0 [kvm] kthread+0xd9/0x100 ret_from_fork+0x2c/0x50 </TASK> ---[ end trace 0000000000000000 ]--- This bug was exposed by commit `edbdb43fc9` ("KVM: x86: Preserve TDP MMU roots until they are explicitly invalidated"), which allowed KVM to retain SMM TDP MMU roots effectively indefinitely. Before commit `edbdb43fc9`, KVM would zap all SMM TDP MMU roots and thus all SMM TDP MMU shadow pages once all vCPUs exited SMM, which made the window where this bug (recovering an SMM NX huge page) could be encountered quite tiny. To hit the bug, the NX recovery thread would have to run while at least one vCPU was in SMM. Most VMs typically only use SMM during boot, and so the problematic shadow pages were gone by the time the NX recovery thread ran. Now that KVM preserves TDP MMU roots until they are explicitly invalidated (e.g. by a memslot deletion), the window to trigger the bug is effectively never closed because most VMMs don't delete memslots after boot (except for a handful of special scenarios). Fixes: `eb29860570` ("KVM: x86/mmu: Do not recover dirty-tracked NX Huge Pages") Reported-by: Fabio Coatti <fabio.coatti@gmail.com> Closes: https://lore.kernel.org/all/CADpTngX9LESCdHVu_2mQkNGena_Ng2CphWNwsRGSMxzDsTjU2A@mail.gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230602010137.784664-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-06-02 16:34:10 -07:00

... 3 4 5 6 7 ...

1186775 Commits