73952 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Linus Torvalds
|
8efd0d9c31 |
Networking changes for 5.17.
Core ---- - Defer freeing TCP skbs to the BH handler, whenever possible, or at least perform the freeing outside of the socket lock section to decrease cross-CPU allocator work and improve latency. - Add netdevice refcount tracking to locate sources of netdevice and net namespace refcount leaks. - Make Tx watchdog less intrusive - avoid pausing Tx and restarting all queues from a single CPU removing latency spikes. - Various small optimizations throughout the stack from Eric Dumazet. - Make netdev->dev_addr[] constant, force modifications to go via appropriate helpers to allow us to keep addresses in ordered data structures. - Replace unix_table_lock with per-hash locks, improving performance of bind() calls. - Extend skb drop tracepoint with a drop reason. - Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW. BPF --- - New helpers: - bpf_find_vma(), find and inspect VMAs for profiling use cases - bpf_loop(), runtime-bounded loop helper trading some execution time for much faster (if at all converging) verification - bpf_strncmp(), improve performance, avoid compiler flakiness - bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt() for tracing programs, all inlined by the verifier - Support BPF relocations (CO-RE) in the kernel loader. - Further the support for BTF_TYPE_TAG annotations. - Allow access to local storage in sleepable helpers. - Convert verifier argument types to a composable form with different attributes which can be shared across types (ro, maybe-null). - Prepare libbpf for upcoming v1.0 release by cleaning up APIs, creating new, extensible ones where missing and deprecating those to be removed. Protocols --------- - WiFi (mac80211/cfg80211): - notify user space about long "come back in N" AP responses, allow it to react to such temporary rejections - allow non-standard VHT MCS 10/11 rates - use coarse time in airtime fairness code to save CPU cycles - Bluetooth: - rework of HCI command execution serialization to use a common queue and work struct, and improve handling errors reported in the middle of a batch of commands - rework HCI event handling to use skb_pull_data, avoiding packet parsing pitfalls - support AOSP Bluetooth Quality Report - SMC: - support net namespaces, following the RDMA model - improve connection establishment latency by pre-clearing buffers - introduce TCP ULP for automatic redirection to SMC - Multi-Path TCP: - support ioctls: SIOCINQ, OUTQ, and OUTQNSD - support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT, IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY - support cmsgs: TCP_INQ - improvements in the data scheduler (assigning data to subflows) - support fastclose option (quick shutdown of the full MPTCP connection, similar to TCP RST in regular TCP) - MCTP (Management Component Transport) over serial, as defined by DMTF spec DSP0253 - "MCTP Serial Transport Binding". Driver API ---------- - Support timestamping on bond interfaces in active/passive mode. - Introduce generic phylink link mode validation for drivers which don't have any quirks and where MAC capability bits fully express what's supported. Allow PCS layer to participate in the validation. Convert a number of drivers. - Add support to set/get size of buffers on the Rx rings and size of the tx copybreak buffer via ethtool. - Support offloading TC actions as first-class citizens rather than only as attributes of filters, improve sharing and device resource utilization. - WiFi (mac80211/cfg80211): - support forwarding offload (ndo_fill_forward_path) - support for background radar detection hardware - SA Query Procedures offload on the AP side New hardware / drivers ---------------------- - tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with real-time requirements for isochronous communication with protocols like OPC UA Pub/Sub. - Qualcomm BAM-DMUX WWAN - driver for data channels of modems integrated into many older Qualcomm SoCs, e.g. MSM8916 or MSM8974 (qcom_bam_dmux). - Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch driver with support for bridging, VLANs and multicast forwarding (lan966x). - iwlmei driver for co-operating between Intel's WiFi driver and Intel's Active Management Technology (AMT) devices. - mse102x - Vertexcom MSE102x Homeplug GreenPHY chips - Bluetooth: - MediaTek MT7921 SDIO devices - Foxconn MT7922A - Realtek RTL8852AE Drivers ------- - Significantly improve performance in the datapaths of: lan78xx, ax88179_178a, lantiq_xrx200, bnxt. - Intel Ethernet NICs: - igb: support PTP/time PEROUT and EXTTS SDP functions on 82580/i354/i350 adapters - ixgbevf: new PF -> VF mailbox API which avoids the risk of mailbox corruption with ESXi - iavf: support configuration of VLAN features of finer granularity, stacked tags and filtering - ice: PTP support for new E822 devices with sub-ns precision - ice: support firmware activation without reboot - Mellanox Ethernet NICs (mlx5): - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool - support TC forwarding when tunnel encap and decap happen between two ports of the same NIC - dynamically size and allow disabling various features to save resources for running in embedded / SmartNIC scenarios - Broadcom Ethernet NICs (bnxt): - use page frag allocator to improve Rx performance - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool - Other Ethernet NICs: - amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support - Microsoft cloud/virtual NIC (mana): - add XDP support (PASS, DROP, TX) - Mellanox Ethernet switches (mlxsw): - initial support for Spectrum-4 ASICs - VxLAN with IPv6 underlay - Marvell Ethernet switches (prestera): - support flower flow templates - add basic IP forwarding support - NXP embedded Ethernet switches (ocelot & felix): - support Per-Stream Filtering and Policing (PSFP) - enable cut-through forwarding between ports by default - support FDMA to improve packet Rx/Tx to CPU - Other embedded switches: - hellcreek: improve trapping management (STP and PTP) packets - qca8k: support link aggregation and port mirroring - Qualcomm 802.11ax WiFi (ath11k): - qca6390, wcn6855: enable 802.11 power save mode in station mode - BSS color change support - WCN6855 hw2.1 support - 11d scan offload support - scan MAC address randomization support - full monitor mode, only supported on QCN9074 - qca6390/wcn6855: report signal and tx bitrate - qca6390: rfkill support - qca6390/wcn6855: regdb.bin support - Intel WiFi (iwlwifi): - support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS) in cooperation with the BIOS - support for Optimized Connectivity Experience (OCE) scan - support firmware API version 68 - lots of preparatory work for the upcoming Bz device family - MediaTek WiFi (mt76): - Specific Absorption Rate (SAR) support - mt7921: 160 MHz channel support - RealTek WiFi (rtw88): - Specific Absorption Rate (SAR) support - scan offload - Other WiFi NICs - ath10k: support fetching (pre-)calibration data from nvmem - brcmfmac: configure keep-alive packet on suspend - wcn36xx: beacon filter support Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmHbkZAACgkQMUZtbf5S IruYkQ//XX7BggcwBfukPK83j0dONolClijqKcKR08g4vB5L8GXvv6OErKIWrh4k h8JanCH352ZkbCSw3MvFdm825UYQv8vPMd6Qks/LJ4aSKqCuy4MIlAo+yOw4Km3O i7++lRfma6DqHHI59wvLjWoxZSPu8lL+rI8UsZ5qMOlnNlGAOXsNrzRjaqQ3FddY AMxZeBUtrPqUCCQZFq3U8apkYzUp7CA/3XR9zRcja3uPbrtOV2G+4whRF90qGNWz Tm/QvJ9F/Ab292cbhxR4KuaQ3hUhaCQyDjbZk3+FZzZpAVhYTVqcNjny6+yXmbiP NXRtwemnl1NlWKMnJM8lEeY48u626tRIkxA/Wtd61uoO5uKUSxfGP+UpUi+DfXbF yIw50VQ7L2bpxXP/HjtmhVgZDaWKYyh22Zw4Hp/muMJz0hgUB0KODY3tf2jUWbjJ 0oEgocWyzhhwMQKqupTDCIaRgIs2ewYr4ZrFDhI3HnHC/vv1VjoPRUPIyxwppD2N cXvZb3B1sWK8iX5gCbISGzyU4bB7I0rvJSTU42ueti7n6NqRFZ79qHQpYnnY+JdO z1qOwY/d/yWfBoXVKRtRg2qz6CdEt5BQklwAgVEBgrFpf58gp694EwGMb1htY14J r/k9bVpmyIFpUnBH2CPMRfBVA3tUTqzyzzFV4AMw40NYLKmhLdo= =KLm3 -----END PGP SIGNATURE----- Merge tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core ---- - Defer freeing TCP skbs to the BH handler, whenever possible, or at least perform the freeing outside of the socket lock section to decrease cross-CPU allocator work and improve latency. - Add netdevice refcount tracking to locate sources of netdevice and net namespace refcount leaks. - Make Tx watchdog less intrusive - avoid pausing Tx and restarting all queues from a single CPU removing latency spikes. - Various small optimizations throughout the stack from Eric Dumazet. - Make netdev->dev_addr[] constant, force modifications to go via appropriate helpers to allow us to keep addresses in ordered data structures. - Replace unix_table_lock with per-hash locks, improving performance of bind() calls. - Extend skb drop tracepoint with a drop reason. - Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW. BPF --- - New helpers: - bpf_find_vma(), find and inspect VMAs for profiling use cases - bpf_loop(), runtime-bounded loop helper trading some execution time for much faster (if at all converging) verification - bpf_strncmp(), improve performance, avoid compiler flakiness - bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt() for tracing programs, all inlined by the verifier - Support BPF relocations (CO-RE) in the kernel loader. - Further the support for BTF_TYPE_TAG annotations. - Allow access to local storage in sleepable helpers. - Convert verifier argument types to a composable form with different attributes which can be shared across types (ro, maybe-null). - Prepare libbpf for upcoming v1.0 release by cleaning up APIs, creating new, extensible ones where missing and deprecating those to be removed. Protocols --------- - WiFi (mac80211/cfg80211): - notify user space about long "come back in N" AP responses, allow it to react to such temporary rejections - allow non-standard VHT MCS 10/11 rates - use coarse time in airtime fairness code to save CPU cycles - Bluetooth: - rework of HCI command execution serialization to use a common queue and work struct, and improve handling errors reported in the middle of a batch of commands - rework HCI event handling to use skb_pull_data, avoiding packet parsing pitfalls - support AOSP Bluetooth Quality Report - SMC: - support net namespaces, following the RDMA model - improve connection establishment latency by pre-clearing buffers - introduce TCP ULP for automatic redirection to SMC - Multi-Path TCP: - support ioctls: SIOCINQ, OUTQ, and OUTQNSD - support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT, IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY - support cmsgs: TCP_INQ - improvements in the data scheduler (assigning data to subflows) - support fastclose option (quick shutdown of the full MPTCP connection, similar to TCP RST in regular TCP) - MCTP (Management Component Transport) over serial, as defined by DMTF spec DSP0253 - "MCTP Serial Transport Binding". Driver API ---------- - Support timestamping on bond interfaces in active/passive mode. - Introduce generic phylink link mode validation for drivers which don't have any quirks and where MAC capability bits fully express what's supported. Allow PCS layer to participate in the validation. Convert a number of drivers. - Add support to set/get size of buffers on the Rx rings and size of the tx copybreak buffer via ethtool. - Support offloading TC actions as first-class citizens rather than only as attributes of filters, improve sharing and device resource utilization. - WiFi (mac80211/cfg80211): - support forwarding offload (ndo_fill_forward_path) - support for background radar detection hardware - SA Query Procedures offload on the AP side New hardware / drivers ---------------------- - tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with real-time requirements for isochronous communication with protocols like OPC UA Pub/Sub. - Qualcomm BAM-DMUX WWAN - driver for data channels of modems integrated into many older Qualcomm SoCs, e.g. MSM8916 or MSM8974 (qcom_bam_dmux). - Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch driver with support for bridging, VLANs and multicast forwarding (lan966x). - iwlmei driver for co-operating between Intel's WiFi driver and Intel's Active Management Technology (AMT) devices. - mse102x - Vertexcom MSE102x Homeplug GreenPHY chips - Bluetooth: - MediaTek MT7921 SDIO devices - Foxconn MT7922A - Realtek RTL8852AE Drivers ------- - Significantly improve performance in the datapaths of: lan78xx, ax88179_178a, lantiq_xrx200, bnxt. - Intel Ethernet NICs: - igb: support PTP/time PEROUT and EXTTS SDP functions on 82580/i354/i350 adapters - ixgbevf: new PF -> VF mailbox API which avoids the risk of mailbox corruption with ESXi - iavf: support configuration of VLAN features of finer granularity, stacked tags and filtering - ice: PTP support for new E822 devices with sub-ns precision - ice: support firmware activation without reboot - Mellanox Ethernet NICs (mlx5): - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool - support TC forwarding when tunnel encap and decap happen between two ports of the same NIC - dynamically size and allow disabling various features to save resources for running in embedded / SmartNIC scenarios - Broadcom Ethernet NICs (bnxt): - use page frag allocator to improve Rx performance - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool - Other Ethernet NICs: - amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support - Microsoft cloud/virtual NIC (mana): - add XDP support (PASS, DROP, TX) - Mellanox Ethernet switches (mlxsw): - initial support for Spectrum-4 ASICs - VxLAN with IPv6 underlay - Marvell Ethernet switches (prestera): - support flower flow templates - add basic IP forwarding support - NXP embedded Ethernet switches (ocelot & felix): - support Per-Stream Filtering and Policing (PSFP) - enable cut-through forwarding between ports by default - support FDMA to improve packet Rx/Tx to CPU - Other embedded switches: - hellcreek: improve trapping management (STP and PTP) packets - qca8k: support link aggregation and port mirroring - Qualcomm 802.11ax WiFi (ath11k): - qca6390, wcn6855: enable 802.11 power save mode in station mode - BSS color change support - WCN6855 hw2.1 support - 11d scan offload support - scan MAC address randomization support - full monitor mode, only supported on QCN9074 - qca6390/wcn6855: report signal and tx bitrate - qca6390: rfkill support - qca6390/wcn6855: regdb.bin support - Intel WiFi (iwlwifi): - support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS) in cooperation with the BIOS - support for Optimized Connectivity Experience (OCE) scan - support firmware API version 68 - lots of preparatory work for the upcoming Bz device family - MediaTek WiFi (mt76): - Specific Absorption Rate (SAR) support - mt7921: 160 MHz channel support - RealTek WiFi (rtw88): - Specific Absorption Rate (SAR) support - scan offload - Other WiFi NICs - ath10k: support fetching (pre-)calibration data from nvmem - brcmfmac: configure keep-alive packet on suspend - wcn36xx: beacon filter support" * tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2048 commits) tcp: tcp_send_challenge_ack delete useless param `skb` net/qla3xxx: Remove useless DMA-32 fallback configuration rocker: Remove useless DMA-32 fallback configuration hinic: Remove useless DMA-32 fallback configuration lan743x: Remove useless DMA-32 fallback configuration net: enetc: Remove useless DMA-32 fallback configuration cxgb4vf: Remove useless DMA-32 fallback configuration cxgb4: Remove useless DMA-32 fallback configuration cxgb3: Remove useless DMA-32 fallback configuration bnx2x: Remove useless DMA-32 fallback configuration et131x: Remove useless DMA-32 fallback configuration be2net: Remove useless DMA-32 fallback configuration vmxnet3: Remove useless DMA-32 fallback configuration bna: Simplify DMA setting net: alteon: Simplify DMA setting myri10ge: Simplify DMA setting qlcnic: Simplify DMA setting net: allwinner: Fix print format page_pool: remove spinlock in page_pool_refill_alloc_cache() amt: fix wrong return type of amt_send_membership_update() ... |
||
Linus Torvalds
|
404dbad382 |
pstore update for v5.17-rc1
- Add boot param for early ftrace recording in pstore (Uwe Kleine-König) -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmHV0TIWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJveID/0a/qRLfwNyjjLZ2MNtZLfL8ejs Qc/3wXJy7IoMe6OagbwS+boJ7PTOa3EMYh0h7GUGc8EjxFai+N2mGqEDMX/JZFRy zk1eWfNDpmCBrzo3VlOXrMM0Zb5cj3y9bfsLK31FcLN7gu+ivSXrA6BlPA513E+s SISPfK04tXJsLMBg12kKDkp8OcQ9ybKYDxUnScCFg+ef50pFdgzU6K79fUMGac8P M0n/ZDZRadUK8fCI2CChlOghWe09gWjWZIxstfn4qY+3C6R97ueB7IVCVgn/fbzX JIhsVmxjubQhd6AP/t5Ib5yzRFQ1O+p60LBGuwG8uA25DvnH3MfYhNRqpOB9R0a8 3IygXuMcCGDl5LCBDteKrktdIKKCA8gLt0IydWD1xcVt8z7ux52/KTPyQZYYCPn4 LvzACEL6O1qoPEod5k//BbF8p4A8JRBUPUCy55xtdQbOw/kg5H1fWg6xjGYW20Vs GpmuKKtDrAKX4OxIdLnaA5p3eBaF98OEo/d/H9PEUkYVc6xoPIGqwvepbt+N5Zpc vQ8FRqLTG+KvS3uUyXFlIdR5Rt/qwF0OoFxrMEEp0X/5aKwJHjW9v5qSkKehfQDI Ly4QsAaIZJx7w8FckGCBQzvTxfteMp5yivHW0ccOmnqxDLK04keoCxNSFBfP4FLQ 4puMlfhXcL2ZO640bA== =Ux14 -----END PGP SIGNATURE----- Merge tag 'pstore-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore update from Kees Cook: - Add boot param for early ftrace recording in pstore (Uwe Kleine-König) * tag 'pstore-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore/ftrace: Allow immediate recording |
||
Linus Torvalds
|
622e42a674 |
Fixes for 5.16-rc8:
- Make the old ALLOCSP ioctl behave in a consistent manner with newer syscalls like fallocate. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmHDpfcACgkQ+H93GTRK tOuI9A/9HAdOYymXZMcYrFxbxKrzgjnX+2dYAJHyko9ZT2tFl512xfGE7hA50V1w r6FIPCOHXmuIT82kOMs+0MwqY61tJAMzkGB3N2xX1o99bl4enQ7/gqkAaMWx/0YS H11h619EdcnZPOZhgH9vtLkWdAW4muhvmmNHnLvsv8XHinDkIOLR4fepa5zbbY/b fkGH/OHuJ8an2scdvgPQzov8jd76jPmLT8fd1hu34WFNkReBmpllU1ATAz1pUHpZ EvIHfUmrlHTtGrGaa5Ti8bgY0ppKIfrv/7DxRmCaSmc8K8cqBtjSM4FbCOZJKeiN Ajn4Rru86KMD0Ker+1aLRZchY+j0S4NQH2blCqeUL3DfIcZ+qrnOEqOo8VG8slBV GD7GKAuGL9YPQazLnrENVnl2v3r9IR83WXpO5htDom+wCcwKkI7lXLKiyrHQOYM6 spcLMfa+XUudyLyFkXZc4hCHh234B+CeEBOWQtcEOZsC+HhILLTowrTJ9m53H7v2 AjCON+6CYArP/6WlW1WaeGeZhZFq0y+HNplcu46R8ccnrZhv8UnbuUH5UQC65ktj nWwD3T+o9toJqzsz02Kg9zseAtM4ungfrc/QMMYKIUQH7XMYAXePhr7t890NU5sZ HPXfGSAxzV7XefG/rqe8yu0ZLR+OtKb16d1Cl/B4Py6TkLVj4G0= =8ha9 -----END PGP SIGNATURE----- Merge tag 'xfs-5.16-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fix from Darrick Wong: - Make the old ALLOCSP ioctl behave in a consistent manner with newer syscalls like fallocate. * tag 'xfs-5.16-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: map unwritten blocks in XFS_IOC_{ALLOC,FREE}SP just like fallocate |
||
Jakub Kicinski
|
b9adba350a |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
David S. Miller
|
e63a023489 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-12-30 The following pull-request contains BPF updates for your *net-next* tree. We've added 72 non-merge commits during the last 20 day(s) which contain a total of 223 files changed, 3510 insertions(+), 1591 deletions(-). The main changes are: 1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii. 2) Beautify and de-verbose verifier logs, from Christy. 3) Composable verifier types, from Hao. 4) bpf_strncmp helper, from Hou. 5) bpf.h header dependency cleanup, from Jakub. 6) get_func_[arg|ret|arg_cnt] helpers, from Jiri. 7) Sleepable local storage, from KP. 8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Christian Brauner
|
012e332286 |
fs/mount_setattr: always cleanup mount_kattr
Make sure that finish_mount_kattr() is called after mount_kattr was succesfully built in both the success and failure case to prevent leaking any references we took when we built it. We returned early if path lookup failed thereby risking to leak an additional reference we took when building mount_kattr when an idmapped mount was requested. Cc: linux-fsdevel@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 9caccd41541a ("fs: introduce MOUNT_ATTR_IDMAP") Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Jakub Kicinski
|
aec53e60e0 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c commit 077cdda764c7 ("net/mlx5e: TC, Fix memory leak with rules with internal port") commit 31108d142f36 ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'") commit 4390c6edc0fb ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'") https://lore.kernel.org/all/20211229065352.30178-1-saeed@kernel.org/ net/smc/smc_wr.c commit 49dc9013e34b ("net/smc: Use the bitmap API when applicable") commit 349d43127dac ("net/smc: fix kernel panic caused by race of smc_sock") bitmap_zero()/memset() is removed by the fix Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Jakub Kicinski
|
b6459415b3 |
net: Don't include filter.h from net/sock.h
sock.h is pretty heavily used (5k objects rebuilt on x86 after it's touched). We can drop the include of filter.h from it and add a forward declaration of struct sk_filter instead. This decreases the number of rebuilt objects when bpf.h is touched from ~5k to ~1k. There's a lot of missing includes this was masking. Primarily in networking tho, this time. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org |
||
Linus Torvalds
|
7a29b11da9 |
three ksmbd fixes, all for stable as well. Two fix potential unitialized memory and one fixes a security problem where encryption is unitentionally disabled from some clients
-----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmG+uVcACgkQiiy9cAdy T1EbsQv+NRGXy7WoI4V0QQSY6Q3uWVZwKXwe8MO4+wHex7t5ZDaGDTUboExZ9gnK W63t/pq+hsb4Z8JI//vWdcVdVgnUexrcbHt3ANKGPPwZz1/uttIismyqLGkw+Ntc c99NKve+Q6LNuU1TEF+RRDd3wgrT3ZD3odlRW+xN0w6ZQ7BPMz38OlemSmUaYqUc fh2S3W0Vp3/gxLNfjp1GMrOqjQ8qtTZIzHtLLXHzJKGUYxtq3d77+jP36BAA2Wvn ufBNFO/Y94gRekKyw+KiW8+E5ceDHNits4SCI/C97txyBnBCbo7gu4TKi+ChDcek auDqiLrFp/GuuU2D5TuOw8GeFPNQo08O6IAkcR3ftnxcTM8rvqYPdvRNnEkgkHKH Fu/FWk8wXJLGlymW7bMjeA4bDs0e8z6HjqFWd8d0xeYOismcdO7OZuY4iaC73lCy Nv/2shlzocO68nNd0kxU7gkSPkv1LynVE5IV55DQaJpryUWe71gClldiNS/HTm4c OTnxbAfN =tOhf -----END PGP SIGNATURE----- Merge tag '5.16-rc5-ksmbd-fixes' of git://git.samba.org/ksmbd Pull ksmbd fixes from Steve French: "Three ksmbd fixes, all for stable as well. Two fix potential unitialized memory and one fixes a security problem where encryption is unitentionally disabled from some clients" * tag '5.16-rc5-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: disable SMB2_GLOBAL_CAP_ENCRYPTION for SMB 3.1.1 ksmbd: fix uninitialized symbol 'pntsd_size' ksmbd: fix error code in ndr_read_int32() |
||
Linus Torvalds
|
a026fa5404 |
io_uring-5.16-2021-12-23
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmHEmDcQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpu1EEACnW4afkADzsu5L0DOYV0k2171cBQFaIeNw FJb3ebp6iHbWS0BBnseKKqdpFOQoGF6q8lcDdyx/cmUH7217cZTRDt1jxxXPwmGP i3Zy2HMh6Dq/RX2t0Be7oMEFZeRyB+FIQySvsR4JSfhOeCF+x0kg1Rvwljd4u/t/ sEM4BEPl1ssm1YO5ZXvhnxCoX7eEhS3pbnJj5ZV/y46YTLr9crJQBYfeOHOflEvp iyTfK6NGy/CDR8a9DTXrYaGRozd7J8ArO+5K4qdLtNP7Tc6+bMrV/CE5l7gwWdlU 06eeRxGt/4NQwfWD2aphRaRM5Qa/VcssMy5ikKf98UKvalzwbaAl7Hk5ywbxdQPV cgiHTQa1gwpY7FAw+KQwE2m0iAqaLs4eXfFNFt68yJ2NKc3tVWsMXtFskuBdz/E7 TeoXQTEb5PwDsjHmmFbFlW2OuJajEeIJfRG0pXellOgqm8VDruvjemNDFlg80QmC TpF5y/dbhIht9BHTt+K5mniRZXlVfjePok04XWQgYaIvkYATbGzC4vXMAQdwfW6y kO6Ez7zaDjQJShDKZvni5VlXhWwWQEjm01f1tLmljPiLmZH+bwGQVin1jHgYpskW 4mUZD2UpwNY5N5QbEfKeQ/xCxl7EQAN8WPY4HoCssK139WFHczoGTECl6/DFC4sB TrTMi9rRaA== =CQfA -----END PGP SIGNATURE----- Merge tag 'io_uring-5.16-2021-12-23' of git://git.kernel.dk/linux-block Pull io_uring fix from Jens Axboe: "Single fix for not clearing kiocb->ki_pos back to 0 for a stream, destined for stable as well" * tag 'io_uring-5.16-2021-12-23' of git://git.kernel.dk/linux-block: io_uring: zero iocb->ki_pos for stream file types |
||
Jens Axboe
|
7b9762a5e8 |
io_uring: zero iocb->ki_pos for stream file types
io_uring supports using offset == -1 for using the current file position, and we read that in as part of read/write command setup. For the non-iter read/write types we pass in NULL for the position pointer, but for the iter types we should not be passing any anything but 0 for the position for a stream. Clear kiocb->ki_pos if the file is a stream, don't leave it as -1. If we do, then the request will error with -ESPIPE. Fixes: ba04291eb66e ("io_uring: allow use of offset == -1 to mean file position") Link: https://github.com/axboe/liburing/discussions/501 Reported-by: Samuel Williams <samuel.williams@oriontransfer.co.nz> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Darrick J. Wong
|
983d8e60f5 |
xfs: map unwritten blocks in XFS_IOC_{ALLOC,FREE}SP just like fallocate
The old ALLOCSP/FREESP ioctls in XFS can be used to preallocate space at the end of files, just like fallocate and RESVSP. Make the behavior consistent with the other ioctls. Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> |
||
Linus Torvalds
|
5dbdc4c565 |
Fixes:
- Address a buffer overrun reported by Anatoly Trosinenko -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmHArD0ACgkQM2qzM29m f5ddQRAAnPQnaLHGzvXkLKP76EMkFWlCPZk4pkNy1LuHRyU3F+D2Pj1qZBJpqFCa gzRh2N1U+zPguwsoKlPuPjHUqqU4D2Wf0DHo9xsU0gvY5B86m4bebJgpK3zXpD3m 37gyM/UMe744D5A2Kh/yyKCEWR+8STh5/a956dCi22Z7qyEhPQDkrbk9yLBsDo9t NIb2rV/tvdvQvjzBd5om4Sm8QAXrdqoChK619b/T3v46shj/OX2Rqneid1deJi0U czrz19g/hzzvaTlrdXmFT0w9qZQ3Md/T2wDtiKtc/XoTF7ZsBHZFOwLhcDRuTXJO UAfXzXkf1WmhrQZOqJqHCuWFf01vdc3++8L01PXyxVupwGYS/uvdy0VNdO/u5hDr ibYjkrMpTKT14Q7iyPU7CCPolWipVpKt3pMwXuusCz8ky8oMQQSmpjdkofXutVnP NNuzx8iqW8N4Vo86Itoau7qEFM0FcxWR0Ut2F0EKiOjiS63Ccg9wxNbu3rJe/Wpb gpb3ICpPgyOaSUI1D0NEEibbRyiOAE+ldiDdpGHyOGgcK672jaD/5NAvD3s18FRp 2V7Xf/vY3Qiggakopk+hEKYNh2aFXkhZmUf+rNNbhEHpgJGQ6HU1rlSw4+SfrdPK vKx/uDuUr5aahnjrZDaLv1sGp4Nq+m3KEjhlQGKSVi25n4xXgtc= =FIJ5 -----END PGP SIGNATURE----- Merge tag 'nfsd-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: "Address a buffer overrun reported by Anatoly Trosinenko" * tag 'nfsd-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: Fix READDIR buffer overflow |
||
Linus Torvalds
|
9273d6cb99 |
two cifs/smb3 fixes, one fscache related, and one mount parsing related for stable
-----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmG9V/0ACgkQiiy9cAdy T1GbQAv+N/2p7mMzhxcB91xlU6DSo2kZhC1FcmiXld3BEiMzkef6Rtb/lzW1AtBP kNgqjVU7MMmdFQNKLhrHUV2BNCtLWKOC2I0xTZqr+nBlGMJkiNUBzUWW8DcnLd1e zGky7Wck8HHu7LdnZA3TVlPhNV9KjdlHMEYBBJFlOc779a/Q0KyHLdwhP9mpSa2d QrVRHkBMylNUqF4WiLKQ0I8eJFhKtd7pJNAW1qGeKe2YDV6kZQiXwHOLIZb7SHET 9Qtj83IwsQ13cLTmb7tGibj0Fn+GEEqgBv40PjSMuTCoiCpfhcLrx0lWT5hxlxao 9w3Yw1pPPaJXbxCmL9b3Bqm3vnNqMaN7HqxsuGIFwnulGFB4GJY2l+ZQPcTHzWG0 oCHlimlixH7RKVEfjIDhidC1C0ReZI8/1XuHPGE17Ysxt6L7dH/a7J2y7SlIdYTZ ZVvVT0Kib74I7ZUJH1ymc4MdAVnBIkiTGOrQk/FU+cHOyzMUpNcwXiCqkvF77QUa hDW4hkzE =MW5I -----END PGP SIGNATURE----- Merge tag '5.16-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "Two cifs/smb3 fixes, one fscache related, and one mount parsing related for stable" * tag '5.16-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: sanitize multiple delimiters in prepath cifs: ignore resource_id while getting fscache super cookie |
||
Chuck Lever
|
53b1119a6e |
NFSD: Fix READDIR buffer overflow
If a client sends a READDIR count argument that is too small (say, zero), then the buffer size calculation in the new init_dirlist helper functions results in an underflow, allowing the XDR stream functions to write beyond the actual buffer. This calculation has always been suspect. NFSD has never sanity- checked the READDIR count argument, but the old entry encoders managed the problem correctly. With the commits below, entry encoding changed, exposing the underflow to the pointer arithmetic in xdr_reserve_space(). Modern NFS clients attempt to retrieve as much data as possible for each READDIR request. Also, we have no unit tests that exercise the behavior of READDIR at the lower bound of @count values. Thus this case was missed during testing. Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Fixes: f5dcccd647da ("NFSD: Update the NFSv2 READDIR entry encoder to use struct xdr_stream") Fixes: 7f87fc2d34d4 ("NFSD: Update NFSv3 READDIR entry encoders to use struct xdr_stream") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> |
||
Linus Torvalds
|
1887bf5cc4 |
zonefs fixes for 5.16-rc6
One fix and one trivial update for rc6: * Add MODULE_ALIAS_FS to get automatic module loading on mount (from Naohiro) * Update Damien's email address in the MAINTAINERS file (from me). -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCYb0hkgAKCRDdoc3SxdoY dlciAP4lGpsiFcO7TdLY2W64EHgIkcstGx1UitsqBTR5iZnp6QD/bAfaHOaTNQDG Nr8GznsB3di4WRbFoV7BhBsKFcC4cgA= =GWY0 -----END PGP SIGNATURE----- Merge tag 'zonefs-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs fixes from Damien Le Moal: "One fix and one trivial update for rc6: - Add MODULE_ALIAS_FS to get automatic module loading on mount (Naohiro) - Update Damien's email address in the MAINTAINERS file (me)" * tag 'zonefs-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: MAITAINERS: Change zonefs maintainer email address zonefs: add MODULE_ALIAS_FS |
||
Marcos Del Sol Vives
|
83912d6d55 |
ksmbd: disable SMB2_GLOBAL_CAP_ENCRYPTION for SMB 3.1.1
According to the official Microsoft MS-SMB2 document section 3.3.5.4, this flag should be used only for 3.0 and 3.0.2 dialects. Setting it for 3.1.1 is a violation of the specification. This causes my Windows 10 client to detect an anomaly in the negotiation, and disable encryption entirely despite being explicitly enabled in ksmbd, causing all data transfers to go in plain text. Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org # v5.15 Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Marcos Del Sol Vives <marcos@orca.pet> Signed-off-by: Steve French <stfrench@microsoft.com> |
||
Thiago Rafael Becker
|
a31080899d |
cifs: sanitize multiple delimiters in prepath
mount.cifs can pass a device with multiple delimiters in it. This will cause rename(2) to fail with ENOENT. V2: - Make sanitize_path more readable. - Fix multiple delimiters between UNC and prepath. - Avoid a memory leak if a bad user starts putting a lot of delimiters in the path on purpose. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=2031200 Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api") Cc: stable@vger.kernel.org # 5.11+ Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Thiago Rafael Becker <trbecker@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com> |
||
Shyam Prasad N
|
b774302e88 |
cifs: ignore resource_id while getting fscache super cookie
We have a cyclic dependency between fscache super cookie and root inode cookie. The super cookie relies on tcon->resource_id, which gets populated from the root inode number. However, fetching the root inode initializes inode cookie as a child of super cookie, which is yet to be populated. resource_id is only used as auxdata to check the validity of super cookie. We can completely avoid setting resource_id to remove the circular dependency. Since vol creation time and vol serial numbers are used for auxdata, we should be fine. Additionally, there will be auxiliary data check for each inode cookie as well. Fixes: 5bf91ef03d98 ("cifs: wait for tcon resource_id before getting fscache super") CC: David Howells <dhowells@redhat.com> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> |
||
Linus Torvalds
|
9609134186 |
for-5.16-rc5-tag
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmG8+tEACgkQxWXV+ddt WDuuGA/9E75ZMqsMLW5az7z8Rt5voBjPeweyRHmGCLZKpgfaj0QjrJRvu0CTKU/W zCSQf+ShTTY2D3cmh1eEwKyX/waKQ71qBrMX/SgIeA0OjmlhK1UGB18MF5sAVGCB mymVYJh7IntYJE7S7OiMUL/yILmIWZYrYT+iaPZlIc9M6h0b1gjMIsE0VEmxJMCN X8RAQ4CfL9bpTTKItNehSyXx+J7TB5yamh5AspaiB/ivyN1DcUcsFf3AoaWXeh2D YIBzq4WbGnDMfUdWXKE2rqDfQgaTXtN9ffGUvphJnegg8Tqfp29LyLZ1GU0qGSXc /K8g5QNmM3nhubXq2MG5zfbHPJ1H2CgnvkDqiCcyeop+09yj/ugxTt+ULaIbJL76 pKSpcuIFXTmoW2Z7ZwijIEX4H5Dgk2l2DbE8SkJT4LjJybgpHfBT1KDQrj5iQdx+ XgmG/CbRELuGGltJNuldp0SqIyMNRgDuv6Rheg9N9H73m9epwfH5oiM0Fj/FYyQ6 lfgle6DQCP4xaDmk1zA9zrJHTUqi8Caeyg+tQYT6AbkoeCoXnvEAPgv9OOGe1M+C Ks7zeAseWs3A/j/+wCdiCKombOfR+AY3RGkPzlodUJj4YYOTyXrigtb5yhTz6Zdv ozVBZ71LUMMOf0NV45mGtqsiLqyfe3cnlqj1XtHQKaajyjgHvW8= =G7CE -----END PGP SIGNATURE----- Merge tag 'for-5.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more fixes, almost all error handling one-liners and for stable. - regression fix in directory logging items - regression fix of extent buffer status bits handling after an error - fix memory leak in error handling path in tree-log - fix freeing invalid anon device number when handling errors during subvolume creation - fix warning when freeing leaf after subvolume creation failure - fix missing blkdev put in device scan error handling - fix invalid delayed ref after subvolume creation failure" * tag 'for-5.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix missing blkdev_put() call in btrfs_scan_one_device() btrfs: fix warning when freeing leaf after subvolume creation failure btrfs: fix invalid delayed ref after subvolume creation failure btrfs: check WRITE_ERR when trying to read an extent buffer btrfs: fix missing last dir item offset update when logging directory btrfs: fix double free of anon_dev after failure to create subvolume btrfs: fix memory leak in __add_inode_ref() |
||
Linus Torvalds
|
cb29eee3b2 |
io_uring-5.16-2021-12-17
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmG8wLIQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpncuD/wI9UnDapmyWodpXCw+f/GN0ZfuGzKAblkR yYJcA3spTIYHHZiR4KCKke1vM+j8PoPBTGwCuDX9EhHFq9EIEAAAFqQwbta85Rcu c3iKhb7qWGN2VQwqdmQDZtegdblM5usNsY3fNUA3iaGeQy+7qvYEyk0u5CDSIKCQ oR0NQpE/cHYlyoqUbITlCvOwD32QvcuHK2oCBnGW02FYiKq3gddJ3EPxnBkLcTUI zeHRCtslZTFrZexapQ/w4rAJ0O6bMR/08TYjtRDxLB4bTpYR34Hx23tvEsZk4jqE olpUlJDwV2B+vZ1aAKqCrabbkNQpiws7Y1EMSwM1pqqZYaPhpVeEmcqkpDNsSgY2 s764iXPKjJnKB6XWUuuShobOjiwwk0rAyNdWKVdvzNVkEnhpNnFVmfllChOiphqM jHua88LHpRJpmHSDdfJMO9qT1uffixv1AIfhEoc0hBCddwp9RwqhijelbJDYIjou KKE7aip0rzwK90tOTFGP9lD7cUl3aFig3yQNLaJpXybm2Qa+xE//hdUk4lm1i57c ciEYtz8S827lCx/CzhpWoTszG11cpT249IMXyxtspoZZh44VTXYC83EeYI2cyG24 q8zfmgMLNMYn6dc4dP5oAA/d0r7s9AY9GvyblhwmLGpjO5mANqJPZAtl2PpSYWg6 jVh1NDEyKw== =IY1e -----END PGP SIGNATURE----- Merge tag 'io_uring-5.16-2021-12-17' of git://git.kernel.dk/linux-block Pull io_uring fix from Jens Axboe: "Just a single fix, fixing an issue with the worker creation change that was merged last week" * tag 'io_uring-5.16-2021-12-17' of git://git.kernel.dk/linux-block: io-wq: drop wqe lock before creating new worker |
||
Naohiro Aota
|
8ffea2599f |
zonefs: add MODULE_ALIAS_FS
Add MODULE_ALIAS_FS() to load the module automatically when you do "mount -t zonefs". Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system") Cc: stable <stable@vger.kernel.org> # 5.6+ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Johannes Thumshirn <jth@kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> |
||
Jakub Kicinski
|
7cd2802d74 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Namjae Jeon
|
f2e78affc4 |
ksmbd: fix uninitialized symbol 'pntsd_size'
No check for if "rc" is an error code for build_sec_desc(). This can cause problems with using uninitialized pntsd_size. Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org # v5.15 Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> |
||
Dan Carpenter
|
ef399469d9 |
ksmbd: fix error code in ndr_read_int32()
This is a failure path and it should return -EINVAL instead of success. Otherwise it could result in the caller using uninitialized memory. Fixes: 303fff2b8c77 ("ksmbd: add validation for ndr read/write functions") Cc: stable@vger.kernel.org # v5.15 Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <stfrench@microsoft.com> |
||
David Howells
|
1744a22ae9 |
afs: Fix mmap
Fix afs_add_open_map() to check that the vnode isn't already on the list when it adds it. It's possible that afs_drop_open_mmap() decremented the cb_nr_mmap counter, but hadn't yet got into the locked section to remove it. Also vnode->cb_mmap_link should be initialised, so fix that too. Fixes: 6e0e99d58a65 ("afs: Fix mmap coherency vs 3rd-party changes") Reported-by: kafs-testing+fedora34_64checkkafs-build-300@auristor.com Suggested-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: kafs-testing+fedora34_64checkkafs-build-300@auristor.com cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/686465.1639435380@warthog.procyon.org.uk/ # v1 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
2b14864acb |
An SGID directory handling fix (marked for stable), a metrics
accounting fix and two fixups to appease static checkers. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmG54aMTHGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzixPBB/4rf0d+oJbBNQQIjTVSO09G9A5pxnU1 CY++W5xppv2hQQX8Czh8iLLYSP3tRh+1xpAlKT+DFsRI3GTh5tXiPf7Mpu76noo3 UAKxiz5NmtqiwrHKZtpsQEJlmwtAFa+qFVJIOiqq1l+gFeiYMueoFEGDgdSCLJrE P6TL9CZx06Yf+fef7hJnXKR3YM0qras1jV6Adpvu+2g/ciaThzJF3YdyAFYFp7nl sGnPTXlJ1DDIsiOAgj0oGeQICpjQJOSGNZtxVjhJx5dJjqgKjim0V4MIShqsW/tg n8LzjrdGh/lR2QfCgOfYmvFrVTeb9QQyF57sT034rGrP6VXlzCkALPi3 =FQag -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.16-rc6' of git://github.com/ceph/ceph-client Pull ceph fixes from Ilya Dryomov: "An SGID directory handling fix (marked for stable), a metrics accounting fix and two fixups to appease static checkers" * tag 'ceph-for-5.16-rc6' of git://github.com/ceph/ceph-client: ceph: fix up non-directory creation in SGID directories ceph: initialize pathlen variable in reconnect_caps_cb ceph: initialize i_size variable in ceph_sync_read ceph: fix duplicate increment of opened_inodes metric |
||
Shin'ichiro Kawasaki
|
4989d4a0ae |
btrfs: fix missing blkdev_put() call in btrfs_scan_one_device()
The function btrfs_scan_one_device() calls blkdev_get_by_path() and blkdev_put() to get and release its target block device. However, when btrfs_sb_log_location_bdev() fails, blkdev_put() is not called and the block device is left without clean up. This triggered failure of fstests generic/085. Fix the failure path of btrfs_sb_log_location_bdev() to call blkdev_put(). Fixes: 12659251ca5df ("btrfs: implement log-structured superblock for ZONED mode") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
Filipe Manana
|
212a58fda9 |
btrfs: fix warning when freeing leaf after subvolume creation failure
When creating a subvolume, at ioctl.c:create_subvol(), if we fail to insert the root item for the new subvolume into the root tree, we can trigger the following warning: [78961.741046] WARNING: CPU: 0 PID: 4079814 at fs/btrfs/extent-tree.c:3357 btrfs_free_tree_block+0x2af/0x310 [btrfs] [78961.743344] Modules linked in: [78961.749440] dm_snapshot dm_thin_pool (...) [78961.773648] CPU: 0 PID: 4079814 Comm: fsstress Not tainted 5.16.0-rc4-btrfs-next-108 #1 [78961.775198] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [78961.777266] RIP: 0010:btrfs_free_tree_block+0x2af/0x310 [btrfs] [78961.778398] Code: 17 00 48 85 (...) [78961.781067] RSP: 0018:ffffaa4001657b28 EFLAGS: 00010202 [78961.781877] RAX: 0000000000000213 RBX: ffff897f8a796910 RCX: 0000000000000000 [78961.782780] RDX: 0000000000000000 RSI: 0000000011004000 RDI: 00000000ffffffff [78961.783764] RBP: ffff8981f490e800 R08: 0000000000000001 R09: 0000000000000000 [78961.784740] R10: 0000000000000000 R11: 0000000000000001 R12: ffff897fc963fcc8 [78961.785665] R13: 0000000000000001 R14: ffff898063548000 R15: ffff898063548000 [78961.786620] FS: 00007f31283c6b80(0000) GS:ffff8982ace00000(0000) knlGS:0000000000000000 [78961.787717] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [78961.788598] CR2: 00007f31285c3000 CR3: 000000023fcc8003 CR4: 0000000000370ef0 [78961.789568] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [78961.790585] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [78961.791684] Call Trace: [78961.792082] <TASK> [78961.792359] create_subvol+0x5d1/0x9a0 [btrfs] [78961.793054] btrfs_mksubvol+0x447/0x4c0 [btrfs] [78961.794009] ? preempt_count_add+0x49/0xa0 [78961.794705] __btrfs_ioctl_snap_create+0x123/0x190 [btrfs] [78961.795712] ? _copy_from_user+0x66/0xa0 [78961.796382] btrfs_ioctl_snap_create_v2+0xbb/0x140 [btrfs] [78961.797392] btrfs_ioctl+0xd1e/0x35c0 [btrfs] [78961.798172] ? __slab_free+0x10a/0x360 [78961.798820] ? rcu_read_lock_sched_held+0x12/0x60 [78961.799664] ? lock_release+0x223/0x4a0 [78961.800321] ? lock_acquired+0x19f/0x420 [78961.800992] ? rcu_read_lock_sched_held+0x12/0x60 [78961.801796] ? trace_hardirqs_on+0x1b/0xe0 [78961.802495] ? _raw_spin_unlock_irqrestore+0x3e/0x60 [78961.803358] ? kmem_cache_free+0x321/0x3c0 [78961.804071] ? __x64_sys_ioctl+0x83/0xb0 [78961.804711] __x64_sys_ioctl+0x83/0xb0 [78961.805348] do_syscall_64+0x3b/0xc0 [78961.805969] entry_SYSCALL_64_after_hwframe+0x44/0xae [78961.806830] RIP: 0033:0x7f31284bc957 [78961.807517] Code: 3c 1c 48 f7 d8 (...) This is because we are calling btrfs_free_tree_block() on an extent buffer that is dirty. Fix that by cleaning the extent buffer, with btrfs_clean_tree_block(), before freeing it. This was triggered by test case generic/475 from fstests. Fixes: 67addf29004c5b ("btrfs: fix metadata extent leak after failure to create subvolume") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
Filipe Manana
|
7a1636089a |
btrfs: fix invalid delayed ref after subvolume creation failure
When creating a subvolume, at ioctl.c:create_subvol(), if we fail to insert the new root's root item into the root tree, we are freeing the metadata extent we reserved for the new root to prevent a metadata extent leak, as we don't abort the transaction at that point (since there is nothing at that point that is irreversible). However we allocated the metadata extent for the new root which we are creating for the new subvolume, so its delayed reference refers to the ID of this new root. But when we free the metadata extent we pass the root of the subvolume where the new subvolume is located to btrfs_free_tree_block() - this is incorrect because this will generate a delayed reference that refers to the ID of the parent subvolume's root, and not to ID of the new root. This results in a failure when running delayed references that leads to a transaction abort and a trace like the following: [3868.738042] RIP: 0010:__btrfs_free_extent+0x709/0x950 [btrfs] [3868.739857] Code: 68 0f 85 e6 fb ff (...) [3868.742963] RSP: 0018:ffffb0e9045cf910 EFLAGS: 00010246 [3868.743908] RAX: 00000000fffffffe RBX: 00000000fffffffe RCX: 0000000000000002 [3868.745312] RDX: 00000000fffffffe RSI: 0000000000000002 RDI: ffff90b0cd793b88 [3868.746643] RBP: 000000000e5d8000 R08: 0000000000000000 R09: ffff90b0cd793b88 [3868.747979] R10: 0000000000000002 R11: 00014ded97944d68 R12: 0000000000000000 [3868.749373] R13: ffff90b09afe4a28 R14: 0000000000000000 R15: ffff90b0cd793b88 [3868.750725] FS: 00007f281c4a8b80(0000) GS:ffff90b3ada00000(0000) knlGS:0000000000000000 [3868.752275] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [3868.753515] CR2: 00007f281c6a5000 CR3: 0000000108a42006 CR4: 0000000000370ee0 [3868.754869] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [3868.756228] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [3868.757803] Call Trace: [3868.758281] <TASK> [3868.758655] ? btrfs_merge_delayed_refs+0x178/0x1c0 [btrfs] [3868.759827] __btrfs_run_delayed_refs+0x2b1/0x1250 [btrfs] [3868.761047] btrfs_run_delayed_refs+0x86/0x210 [btrfs] [3868.762069] ? lock_acquired+0x19f/0x420 [3868.762829] btrfs_commit_transaction+0x69/0xb20 [btrfs] [3868.763860] ? _raw_spin_unlock+0x29/0x40 [3868.764614] ? btrfs_block_rsv_release+0x1c2/0x1e0 [btrfs] [3868.765870] create_subvol+0x1d8/0x9a0 [btrfs] [3868.766766] btrfs_mksubvol+0x447/0x4c0 [btrfs] [3868.767669] ? preempt_count_add+0x49/0xa0 [3868.768444] __btrfs_ioctl_snap_create+0x123/0x190 [btrfs] [3868.769639] ? _copy_from_user+0x66/0xa0 [3868.770391] btrfs_ioctl_snap_create_v2+0xbb/0x140 [btrfs] [3868.771495] btrfs_ioctl+0xd1e/0x35c0 [btrfs] [3868.772364] ? __slab_free+0x10a/0x360 [3868.773198] ? rcu_read_lock_sched_held+0x12/0x60 [3868.774121] ? lock_release+0x223/0x4a0 [3868.774863] ? lock_acquired+0x19f/0x420 [3868.775634] ? rcu_read_lock_sched_held+0x12/0x60 [3868.776530] ? trace_hardirqs_on+0x1b/0xe0 [3868.777373] ? _raw_spin_unlock_irqrestore+0x3e/0x60 [3868.778280] ? kmem_cache_free+0x321/0x3c0 [3868.779011] ? __x64_sys_ioctl+0x83/0xb0 [3868.779718] __x64_sys_ioctl+0x83/0xb0 [3868.780387] do_syscall_64+0x3b/0xc0 [3868.781059] entry_SYSCALL_64_after_hwframe+0x44/0xae [3868.781953] RIP: 0033:0x7f281c59e957 [3868.782585] Code: 3c 1c 48 f7 d8 4c (...) [3868.785867] RSP: 002b:00007ffe1f83e2b8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [3868.787198] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f281c59e957 [3868.788450] RDX: 00007ffe1f83e2c0 RSI: 0000000050009418 RDI: 0000000000000003 [3868.789748] RBP: 00007ffe1f83f300 R08: 0000000000000000 R09: 00007ffe1f83fe36 [3868.791214] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000003 [3868.792468] R13: 0000000000000003 R14: 00007ffe1f83e2c0 R15: 00000000000003cc [3868.793765] </TASK> [3868.794037] irq event stamp: 0 [3868.794548] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [3868.795670] hardirqs last disabled at (0): [<ffffffff98294214>] copy_process+0x934/0x2040 [3868.797086] softirqs last enabled at (0): [<ffffffff98294214>] copy_process+0x934/0x2040 [3868.798309] softirqs last disabled at (0): [<0000000000000000>] 0x0 [3868.799284] ---[ end trace be24c7002fe27747 ]--- [3868.799928] BTRFS info (device dm-0): leaf 241188864 gen 1268 total ptrs 214 free space 469 owner 2 [3868.801133] BTRFS info (device dm-0): refs 2 lock_owner 225627 current 225627 [3868.802056] item 0 key (237436928 169 0) itemoff 16250 itemsize 33 [3868.802863] extent refs 1 gen 1265 flags 2 [3868.803447] ref#0: tree block backref root 1610 (...) [3869.064354] item 114 key (241008640 169 0) itemoff 12488 itemsize 33 [3869.065421] extent refs 1 gen 1268 flags 2 [3869.066115] ref#0: tree block backref root 1689 (...) [3869.403834] BTRFS error (device dm-0): unable to find ref byte nr 241008640 parent 0 root 1622 owner 0 offset 0 [3869.405641] BTRFS: error (device dm-0) in __btrfs_free_extent:3076: errno=-2 No such entry [3869.407138] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:2159: errno=-2 No such entry Fix this by passing the new subvolume's root ID to btrfs_free_tree_block(). This requires changing the root argument of btrfs_free_tree_block() from struct btrfs_root * to a u64, since at this point during the subvolume creation we have not yet created the struct btrfs_root for the new subvolume, and btrfs_free_tree_block() only needs a root ID and nothing else from a struct btrfs_root. This was triggered by test case generic/475 from fstests. Fixes: 67addf29004c5b ("btrfs: fix metadata extent leak after failure to create subvolume") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
Josef Bacik
|
651740a502 |
btrfs: check WRITE_ERR when trying to read an extent buffer
Filipe reported a hang when we have errors on btrfs. This turned out to be a side-effect of my fix c2e39305299f01 ("btrfs: clear extent buffer uptodate when we fail to write it") which made it so we clear EXTENT_BUFFER_UPTODATE on an eb when we fail to write it out. Below is a paste of Filipe's analysis he got from using drgn to debug the hang """ btree readahead code calls read_extent_buffer_pages(), sets ->io_pages to a value while writeback of all pages has not yet completed: --> writeback for the first 3 pages finishes, we clear EXTENT_BUFFER_UPTODATE from eb on the first page when we get an error. --> at this point eb->io_pages is 1 and we cleared Uptodate bit from the first 3 pages --> read_extent_buffer_pages() does not see EXTENT_BUFFER_UPTODATE() so it continues, it's able to lock the pages since we obviously don't hold the pages locked during writeback --> read_extent_buffer_pages() then computes 'num_reads' as 3, and sets eb->io_pages to 3, since only the first page does not have Uptodate bit set at this point --> writeback for the remaining page completes, we ended decrementing eb->io_pages by 1, resulting in eb->io_pages == 2, and therefore never calling end_extent_buffer_writeback(), so EXTENT_BUFFER_WRITEBACK remains in the eb's flags --> of course, when the read bio completes, it doesn't and shouldn't call end_extent_buffer_writeback() --> we should clear EXTENT_BUFFER_UPTODATE only after all pages of the eb finished writeback? or maybe make the read pages code wait for writeback of all pages of the eb to complete before checking which pages need to be read, touch ->io_pages, submit read bio, etc writeback bit never cleared means we can hang when aborting a transaction, at: btrfs_cleanup_one_transaction() btrfs_destroy_marked_extents() wait_on_extent_buffer_writeback() """ This is a problem because our writes are not synchronized with reads in any way. We clear the UPTODATE flag and then we can easily come in and try to read the EB while we're still waiting on other bio's to complete. We have two options here, we could lock all the pages, and then check to see if eb->io_pages != 0 to know if we've already got an outstanding write on the eb. Or we can simply check to see if we have WRITE_ERR set on this extent buffer. We set this bit _before_ we clear UPTODATE, so if the read gets triggered because we aren't UPTODATE because of a write error we're guaranteed to have WRITE_ERR set, and in this case we can simply return -EIO. This will fix the reported hang. Reported-by: Filipe Manana <fdmanana@suse.com> Fixes: c2e39305299f01 ("btrfs: clear extent buffer uptodate when we fail to write it") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
Filipe Manana
|
1b2e5e5c7f |
btrfs: fix missing last dir item offset update when logging directory
When logging a directory, once we finish processing a leaf that is full of dir items, if we find the next leaf was not modified in the current transaction, we grab the first key of that next leaf and log it as to mark the end of a key range boundary. However we did not update the value of ctx->last_dir_item_offset, which tracks the offset of the last logged key. This can result in subsequent logging of the same directory in the current transaction to not realize that key was already logged, and then add it to the middle of a batch that starts with a lower key, resulting later in a leaf with one key that is duplicated and at non-consecutive slots. When that happens we get an error later when writing out the leaf, reporting that there is a pair of keys in wrong order. The report is something like the following: Dec 13 21:44:50 kernel: BTRFS critical (device dm-0): corrupt leaf: root=18446744073709551610 block=118444032 slot=21, bad key order, prev (704687 84 4146773349) current (704687 84 1063561078) Dec 13 21:44:50 kernel: BTRFS info (device dm-0): leaf 118444032 gen 91449 total ptrs 39 free space 546 owner 18446744073709551610 Dec 13 21:44:50 kernel: item 0 key (704687 1 0) itemoff 3835 itemsize 160 Dec 13 21:44:50 kernel: inode generation 35532 size 1026 mode 40755 Dec 13 21:44:50 kernel: item 1 key (704687 12 704685) itemoff 3822 itemsize 13 Dec 13 21:44:50 kernel: item 2 key (704687 24 3817753667) itemoff 3736 itemsize 86 Dec 13 21:44:50 kernel: item 3 key (704687 60 0) itemoff 3728 itemsize 8 Dec 13 21:44:50 kernel: item 4 key (704687 72 0) itemoff 3720 itemsize 8 Dec 13 21:44:50 kernel: item 5 key (704687 84 140445108) itemoff 3666 itemsize 54 Dec 13 21:44:50 kernel: dir oid 704793 type 1 Dec 13 21:44:50 kernel: item 6 key (704687 84 298800632) itemoff 3599 itemsize 67 Dec 13 21:44:50 kernel: dir oid 707849 type 2 Dec 13 21:44:50 kernel: item 7 key (704687 84 476147658) itemoff 3532 itemsize 67 Dec 13 21:44:50 kernel: dir oid 707901 type 2 Dec 13 21:44:50 kernel: item 8 key (704687 84 633818382) itemoff 3471 itemsize 61 Dec 13 21:44:50 kernel: dir oid 704694 type 2 Dec 13 21:44:50 kernel: item 9 key (704687 84 654256665) itemoff 3403 itemsize 68 Dec 13 21:44:50 kernel: dir oid 707841 type 1 Dec 13 21:44:50 kernel: item 10 key (704687 84 995843418) itemoff 3331 itemsize 72 Dec 13 21:44:50 kernel: dir oid 2167736 type 1 Dec 13 21:44:50 kernel: item 11 key (704687 84 1063561078) itemoff 3278 itemsize 53 Dec 13 21:44:50 kernel: dir oid 704799 type 2 Dec 13 21:44:50 kernel: item 12 key (704687 84 1101156010) itemoff 3225 itemsize 53 Dec 13 21:44:50 kernel: dir oid 704696 type 1 Dec 13 21:44:50 kernel: item 13 key (704687 84 2521936574) itemoff 3173 itemsize 52 Dec 13 21:44:50 kernel: dir oid 704704 type 2 Dec 13 21:44:50 kernel: item 14 key (704687 84 2618368432) itemoff 3112 itemsize 61 Dec 13 21:44:50 kernel: dir oid 704738 type 1 Dec 13 21:44:50 kernel: item 15 key (704687 84 2676316190) itemoff 3046 itemsize 66 Dec 13 21:44:50 kernel: dir oid 2167729 type 1 Dec 13 21:44:50 kernel: item 16 key (704687 84 3319104192) itemoff 2986 itemsize 60 Dec 13 21:44:50 kernel: dir oid 704745 type 2 Dec 13 21:44:50 kernel: item 17 key (704687 84 3908046265) itemoff 2929 itemsize 57 Dec 13 21:44:50 kernel: dir oid 2167734 type 1 Dec 13 21:44:50 kernel: item 18 key (704687 84 3945713089) itemoff 2857 itemsize 72 Dec 13 21:44:50 kernel: dir oid 2167730 type 1 Dec 13 21:44:50 kernel: item 19 key (704687 84 4077169308) itemoff 2795 itemsize 62 Dec 13 21:44:50 kernel: dir oid 704688 type 1 Dec 13 21:44:50 kernel: item 20 key (704687 84 4146773349) itemoff 2727 itemsize 68 Dec 13 21:44:50 kernel: dir oid 707892 type 1 Dec 13 21:44:50 kernel: item 21 key (704687 84 1063561078) itemoff 2674 itemsize 53 Dec 13 21:44:50 kernel: dir oid 704799 type 2 Dec 13 21:44:50 kernel: item 22 key (704687 96 2) itemoff 2612 itemsize 62 Dec 13 21:44:50 kernel: item 23 key (704687 96 6) itemoff 2551 itemsize 61 Dec 13 21:44:50 kernel: item 24 key (704687 96 7) itemoff 2498 itemsize 53 Dec 13 21:44:50 kernel: item 25 key (704687 96 12) itemoff 2446 itemsize 52 Dec 13 21:44:50 kernel: item 26 key (704687 96 14) itemoff 2385 itemsize 61 Dec 13 21:44:50 kernel: item 27 key (704687 96 18) itemoff 2325 itemsize 60 Dec 13 21:44:50 kernel: item 28 key (704687 96 24) itemoff 2271 itemsize 54 Dec 13 21:44:50 kernel: item 29 key (704687 96 28) itemoff 2218 itemsize 53 Dec 13 21:44:50 kernel: item 30 key (704687 96 62) itemoff 2150 itemsize 68 Dec 13 21:44:50 kernel: item 31 key (704687 96 66) itemoff 2083 itemsize 67 Dec 13 21:44:50 kernel: item 32 key (704687 96 75) itemoff 2015 itemsize 68 Dec 13 21:44:50 kernel: item 33 key (704687 96 79) itemoff 1948 itemsize 67 Dec 13 21:44:50 kernel: item 34 key (704687 96 82) itemoff 1882 itemsize 66 Dec 13 21:44:50 kernel: item 35 key (704687 96 83) itemoff 1810 itemsize 72 Dec 13 21:44:50 kernel: item 36 key (704687 96 85) itemoff 1753 itemsize 57 Dec 13 21:44:50 kernel: item 37 key (704687 96 87) itemoff 1681 itemsize 72 Dec 13 21:44:50 kernel: item 38 key (704694 1 0) itemoff 1521 itemsize 160 Dec 13 21:44:50 kernel: inode generation 35534 size 30 mode 40755 Dec 13 21:44:50 kernel: BTRFS error (device dm-0): block=118444032 write time tree block corruption detected So fix that by adding the missing update of ctx->last_dir_item_offset with the offset of the boundary key. Reported-by: Chris Murphy <lists@colorremedies.com> Link: https://lore.kernel.org/linux-btrfs/CAJCQCtT+RSzpUjbMq+UfzNUMe1X5+1G+DnAGbHC=OZ=iRS24jg@mail.gmail.com/ Fixes: dc2872247ec0ca ("btrfs: keep track of the last logged keys when logging a directory") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
Filipe Manana
|
33fab97249 |
btrfs: fix double free of anon_dev after failure to create subvolume
When creating a subvolume, at create_subvol(), we allocate an anonymous device and later call btrfs_get_new_fs_root(), which in turn just calls btrfs_get_root_ref(). There we call btrfs_init_fs_root() which assigns the anonymous device to the root, but if after that call there's an error, when we jump to 'fail' label, we call btrfs_put_root(), which frees the anonymous device and then returns an error that is propagated back to create_subvol(). Than create_subvol() frees the anonymous device again. When this happens, if the anonymous device was not reallocated after the first time it was freed with btrfs_put_root(), we get a kernel message like the following: (...) [13950.282466] BTRFS: error (device dm-0) in create_subvol:663: errno=-5 IO failure [13950.283027] ida_free called for id=65 which is not allocated. [13950.285974] BTRFS info (device dm-0): forced readonly (...) If the anonymous device gets reallocated by another btrfs filesystem or any other kernel subsystem, then bad things can happen. So fix this by setting the root's anonymous device to 0 at btrfs_get_root_ref(), before we call btrfs_put_root(), if an error happened. Fixes: 2dfb1e43f57dd3 ("btrfs: preallocate anon block device at first phase of snapshot creation") CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
Jianglei Nie
|
f35838a693 |
btrfs: fix memory leak in __add_inode_ref()
Line 1169 (#3) allocates a memory chunk for victim_name by kmalloc(), but when the function returns in line 1184 (#4) victim_name allocated by line 1169 (#3) is not freed, which will lead to a memory leak. There is a similar snippet of code in this function as allocating a memory chunk for victim_name in line 1104 (#1) as well as releasing the memory in line 1116 (#2). We should kfree() victim_name when the return value of backref_in_log() is less than zero and before the function returns in line 1184 (#4). 1057 static inline int __add_inode_ref(struct btrfs_trans_handle *trans, 1058 struct btrfs_root *root, 1059 struct btrfs_path *path, 1060 struct btrfs_root *log_root, 1061 struct btrfs_inode *dir, 1062 struct btrfs_inode *inode, 1063 u64 inode_objectid, u64 parent_objectid, 1064 u64 ref_index, char *name, int namelen, 1065 int *search_done) 1066 { 1104 victim_name = kmalloc(victim_name_len, GFP_NOFS); // #1: kmalloc (victim_name-1) 1105 if (!victim_name) 1106 return -ENOMEM; 1112 ret = backref_in_log(log_root, &search_key, 1113 parent_objectid, victim_name, 1114 victim_name_len); 1115 if (ret < 0) { 1116 kfree(victim_name); // #2: kfree (victim_name-1) 1117 return ret; 1118 } else if (!ret) { 1169 victim_name = kmalloc(victim_name_len, GFP_NOFS); // #3: kmalloc (victim_name-2) 1170 if (!victim_name) 1171 return -ENOMEM; 1180 ret = backref_in_log(log_root, &search_key, 1181 parent_objectid, victim_name, 1182 victim_name_len); 1183 if (ret < 0) { 1184 return ret; // #4: missing kfree (victim_name-2) 1185 } else if (!ret) { 1241 return 0; 1242 } Fixes: d3316c8233bb ("btrfs: Properly handle backref_in_log retval") CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Jianglei Nie <niejianglei2021@163.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
Linus Torvalds
|
e386dfc56f |
fget: clarify and improve __fget_files() implementation
Commit 054aa8d439b9 ("fget: check that the fd still exists after getting a ref to it") fixed a race with getting a reference to a file just as it was being closed. It was a fairly minimal patch, and I didn't think re-checking the file pointer lookup would be a measurable overhead, since it was all right there and cached. But I was wrong, as pointed out by the kernel test robot. The 'poll2' case of the will-it-scale.per_thread_ops benchmark regressed quite noticeably. Admittedly it seems to be a very artificial test: doing "poll()" system calls on regular files in a very tight loop in multiple threads. That means that basically all the time is spent just looking up file descriptors without ever doing anything useful with them (not that doing 'poll()' on a regular file is useful to begin with). And as a result it shows the extra "re-check fd" cost as a sore thumb. Happily, the regression is fixable by just writing the code to loook up the fd to be better and clearer. There's still a cost to verify the file pointer, but now it's basically in the noise even for that benchmark that does nothing else - and the code is more understandable and has better comments too. [ Side note: this patch is also a classic case of one that looks very messy with the default greedy Myers diff - it's much more legible with either the patience of histogram diff algorithm ] Link: https://lore.kernel.org/lkml/20211210053743.GA36420@xsang-OptiPlex-9020/ Link: https://lore.kernel.org/lkml/20211213083154.GA20853@linux.intel.com/ Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Carel Si <beibei.si@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Jens Axboe
|
d800c65c2d |
io-wq: drop wqe lock before creating new worker
We have two io-wq creation paths: - On queue enqueue - When a worker goes to sleep The latter invokes worker creation with the wqe->lock held, but that can run into problems if we end up exiting and need to cancel the queued work. syzbot caught this: ============================================ WARNING: possible recursive locking detected 5.16.0-rc4-syzkaller #0 Not tainted -------------------------------------------- iou-wrk-6468/6471 is trying to acquire lock: ffff88801aa98018 (&wqe->lock){+.+.}-{2:2}, at: io_worker_cancel_cb+0xb7/0x210 fs/io-wq.c:187 but task is already holding lock: ffff88801aa98018 (&wqe->lock){+.+.}-{2:2}, at: io_wq_worker_sleeping+0xb6/0x140 fs/io-wq.c:700 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&wqe->lock); lock(&wqe->lock); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by iou-wrk-6468/6471: #0: ffff88801aa98018 (&wqe->lock){+.+.}-{2:2}, at: io_wq_worker_sleeping+0xb6/0x140 fs/io-wq.c:700 stack backtrace: CPU: 1 PID: 6471 Comm: iou-wrk-6468 Not tainted 5.16.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1dc/0x2d8 lib/dump_stack.c:106 print_deadlock_bug kernel/locking/lockdep.c:2956 [inline] check_deadlock kernel/locking/lockdep.c:2999 [inline] validate_chain+0x5984/0x8240 kernel/locking/lockdep.c:3788 __lock_acquire+0x1382/0x2b00 kernel/locking/lockdep.c:5027 lock_acquire+0x19f/0x4d0 kernel/locking/lockdep.c:5637 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:154 io_worker_cancel_cb+0xb7/0x210 fs/io-wq.c:187 io_wq_cancel_tw_create fs/io-wq.c:1220 [inline] io_queue_worker_create+0x3cf/0x4c0 fs/io-wq.c:372 io_wq_worker_sleeping+0xbe/0x140 fs/io-wq.c:701 sched_submit_work kernel/sched/core.c:6295 [inline] schedule+0x67/0x1f0 kernel/sched/core.c:6323 schedule_timeout+0xac/0x300 kernel/time/timer.c:1857 wait_woken+0xca/0x1b0 kernel/sched/wait.c:460 unix_msg_wait_data net/unix/unix_bpf.c:32 [inline] unix_bpf_recvmsg+0x7f9/0xe20 net/unix/unix_bpf.c:77 unix_stream_recvmsg+0x214/0x2c0 net/unix/af_unix.c:2832 sock_recvmsg_nosec net/socket.c:944 [inline] sock_recvmsg net/socket.c:962 [inline] sock_read_iter+0x3a7/0x4d0 net/socket.c:1035 call_read_iter include/linux/fs.h:2156 [inline] io_iter_do_read fs/io_uring.c:3501 [inline] io_read fs/io_uring.c:3558 [inline] io_issue_sqe+0x144c/0x9590 fs/io_uring.c:6671 io_wq_submit_work+0x2d8/0x790 fs/io_uring.c:6836 io_worker_handle_work+0x808/0xdd0 fs/io-wq.c:574 io_wqe_worker+0x395/0x870 fs/io-wq.c:630 ret_from_fork+0x1f/0x30 We can safely drop the lock before doing work creation, making the two contexts the same in that regard. Reported-by: syzbot+b18b8be69df33a3918e9@syzkaller.appspotmail.com Fixes: 71a85387546e ("io-wq: check for wq exit after adding new worker task_work") Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Linus Torvalds
|
e034d9cbf9 |
Fixes for 5.16-rc4:
- Fix a data corruption vector that can result from the ro remount process failing to clear all speculative preallocations from files and the rw remount process not noticing the incomplete cleanup. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmGyQ6UACgkQ+H93GTRK tOskJA/+Jg+fn+Vs7dHwBN+SaHCBH81RT+YD9gbI7Lq5xcK16K7BWz3Hf360J9uJ BpVn0fv6/+3bNg0/QMWUE67CotB2S43CzWnltUN0ymiGgAEO1eMKL1JHRtZ/89d4 pG2Lr9W1QsxMqP9L/TgxqHb5WC2xtromDLMWa7qYL8aH4qp4Dx6hv++u3qhbdTis sfvTOMsDKTwK5RC+7yB2qxr/Txn7viIlBLjTWR3UvvO0K1LxjuBTXpBsiVeQE92T ZqLgAhWDBu/wZFBibqtMsnLF0Wmn5oIZJgBoxp/+Ze8saUsuYdAI1X65ESdVmw7E ZJtSyZObDHOlRhvDn7Vlb8BNF7QY4l4n2eeGFB4zoPdy0jjz/7QKCqMSm7YCMXqc WycjeEIo1H+ZEk0AWu7KrTUAaO3zzqHB/DJt4zQA/w4QdSD14bAsyp8ZWyUEMnWy 7M0Pc0q2ptzOOp1rjd2J+0s9YSc24/dceW1lCDpIDSkOuxO82HdCxlWQ/YPEt2Lx gGv2hll8Cg0j18VThB++XZ3ozImFu+sZhkQPPCM/kKY0tWZvd4Yby1bWLNzS+ppE RIKh+hyhFTXmr0C2iJ8fGNFu/KA3V0efztvCIATHDmGcC6f4WePUQyLzZGFgS0Yw uBPmvqWeUWySgfauIvqW7Oosf0pbKvYVEd+1fcLVdh0sgr2MUPE= =N9Wf -----END PGP SIGNATURE----- Merge tag 'xfs-5.16-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fix from Darrick Wong: "This fixes a race between a readonly remount process and other processes that hold a file IOLOCK on files that previously experienced copy on write, that could result in severe filesystem corruption if the filesystem is then remounted rw. I think this is fairly rare (since the only reliable reproducer I have that fits the second criteria is the experimental xfs_scrub program), but the race is clear, so we still need to fix this. Summary: - Fix a data corruption vector that can result from the ro remount process failing to clear all speculative preallocations from files and the rw remount process not noticing the incomplete cleanup" * tag 'xfs-5.16-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: remove all COW fork extents when remounting readonly |
||
Linus Torvalds
|
f152165ada |
io_uring-5.16-2021-12-10
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmG0PwYQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgphbiEACv95PrGVuOzgXwC1Ydk4Km5Yyt8vANKcxA LFzeScrRhGG28vHCLZuPSxtFn9nGjBJPDKscDWQE0tpq8NwB6BAjr3Ps2AR4tI3M 2bn1ssHIHADdkaV1ixwTtQynHgCvOtKSTmLJ+TTDIYdgwAyRVlRBtpke/nXn5VCR kUdQ/3hHSPbEY+WvrDz7yAvDIMxTrBHmt8TbDiI1FXzNf8AOd32ekpBNzwe2hTQi OLhpaQ60S74UG78c4OGg6SV+twW0nkp1DyTf8MhC3WoBBnTwQdc/BpLNJbnUwH57 lJVJTz0sknbnQCtifJ8/Wjzn6r95cOwbYvaGgXyi7KtTGn0QXLcK+FohNVizFR0x qTMWlFJ5aPmzqed6oMZ2m8iaGxR9813I0GP54Hsqw4PmoPb1GDN5FvZIN+MKdaJd PSp9YH7RDHsusYZjjaUtHIcxhxztblf5wNVAkmkfrFZzNArnwsw0k9gX7DI0f+GX +TazZzX8gLRonAskBHgoSDT+8WOzDOHYYIVVe0g/8TR+jW8oGDd9D0sJbH74dV1Y 3C4WLbJCquBzatotmcmgkt9CJzZz9y373cVVNu4HZgH49qjEZCJuFgBaTpanvwHk Qk5KTl9yIfjs+H2FDWa2Pd6LnsVvuIpwOd4R37RDLGM2ZiX100eX1VbD6RxtIehQ PPKP13kYAQ== =BUVd -----END PGP SIGNATURE----- Merge tag 'io_uring-5.16-2021-12-10' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "A few fixes that are all bound for stable: - Two syzbot reports for io-wq that turned out to be separate fixes, but ultimately very closely related - io_uring task_work running on cancelations" * tag 'io_uring-5.16-2021-12-10' of git://git.kernel.dk/linux-block: io-wq: check for wq exit after adding new worker task_work io_uring: ensure task_work gets run as part of cancelations io-wq: remove spurious bit clear on task_work addition |
||
Linus Torvalds
|
6f51352929 |
for-5.16-rc4-tag
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmGwxl8ACgkQxWXV+ddt WDvSeQ//T+mv4o7ucldQZCVdN0TTVCkQUhia+ZdMwBcPty2/ZEdap+KEIVmfCV/v OLRmSNkIPDhHcIc/O3zJ1/AY0DFbb9brYGkMD/qidgPbRArhDZSrDIr+xnrKJ0iq HFxM01B54l8hJe6GWIGFuuOz+nXUP1o9SfiDOwMDTkqgzz1JSvPec70RKMxG8pTC 4plVrGaUXkKTC8WyBXnSvkP2gvjfJxqnEKv2Ru1eP1t7Bq65aed0+Z32Mogzl2ip ZSHVlRergeB03xF9YErWSfgofVWLEIXgLCh/Wfq73sMnHUHUGM/dpEKxP911MI00 A5TuTl3I25cVnDv3qMayBPECMCCGZc2vUyXTLCWUNYLb/vEUTI36QBu/KglRURmO Zx20B+3/7Yu9pFeZ23S+nlNdGDADmjkOfvZIZDoYDzqBAKWM6kZ/9oWiib21Uwi6 ql5oZNl7G6UXLPvgJoq8dqZIj/HYeLEHeqwf/tepSVQLXYzQyWYDMp686z958XI1 K/A/TKaKk19nn1Dhsz4KJeb3xhMFlFN30K7BNjdH3XRH1RrwJP6pLF/WUduJq2qn rAvJoODLHkby1D9+HqSKW/RToeR5NbBYYdfB0Q8zpy9zF/gZX7wgFc21/4B7qtib hkKfNSfAVjedOJJWTFCUDnZwwrqtcrKxYzUpdcls/O1AjifK7+U= =Mh19 -----END PGP SIGNATURE----- Merge tag 'for-5.16-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more regression fixes and stable patches, mostly one-liners. Regression fixes: - fix pointer/ERR_PTR mismatch returned from memdup_user - reset dedicated zoned mode relocation block group to avoid using it and filling it without any recourse Fixes: - handle a case to FITRIM range (also to make fstests/generic/260 work) - fix warning when extent buffer state and pages get out of sync after an IO error - fix transaction abort when syncing due to missing mapping error set on metadata inode after inlining a compressed file - fix transaction abort due to tree-log and zoned mode interacting in an unexpected way - fix memory leak of additional extent data when qgroup reservation fails - do proper handling of slot search call when deleting root refs" * tag 'for-5.16-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: replace the BUG_ON in btrfs_del_root_ref with proper error handling btrfs: zoned: clear data relocation bg on zone finish btrfs: free exchange changeset on failures btrfs: fix re-dirty process of tree-log nodes btrfs: call mapping_set_error() on btree inode with a write error btrfs: clear extent buffer uptodate when we fail to write it btrfs: fail if fstrim_range->start == U64_MAX btrfs: fix error pointer dereference in btrfs_ioctl_rm_dev_v2() |
||
Linus Torvalds
|
e1b96811e2 |
two cifs/smb3 fixes, one for stable, the other fixes a recently reported NTLMSSP auth problem
-----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmGzuJoACgkQiiy9cAdy T1H9MAv/cOAk4iUhgDUsa+HYsBNiAQ+IOOu/WOZI56jEV/qVP0I3cpkiIJenG5gu ZurivI4smsgNokHOFoT3vjtLXTfTl0OdHHY/mftd5IGIPG+KnXcg3+ZaE3T+fUV3 uHX0cH3a3Azo3RGf2rRiTfW6u9FXJnb9aAUTif7UDVwsU37wUAQrKEmcazaUXdaT +j3KpqwGhCgbkneKuAd/FDTAg4wciJgRg3aE/4W2s2ovFiF6vsUcrmhQD1zP1EZh sPWdx4/U+WpeV02RfKBLQlXi6ofqRF5qRT4HkD07G5Zhz8OOZcm06wYFNl/+aYhF lTOTa9KAoTPfV2tFlTGZVEy07ggEFd3wDxeAPulDrqY8etwWSwRAHHRi5HstMlIX iYz06DLS+LSPyriy6GV6CIL01OPg8vkeLPbrLFbs8oLSVQgmx74UDS339oxJMdFe kkhLgtfRr5DfxS3uD0u17aBDL4ullH84dWdlsUKiUfGvVsBcoVSCPIQeWqkB8xjA OTKih3J1 =+1L9 -----END PGP SIGNATURE----- Merge tag '5.16-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "Two cifs/smb3 fixes - one for stable, the other fixes a recently reported NTLMSSP auth problem" * tag '5.16-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix ntlmssp auth when there is no key exchange cifs: Fix crash on unload of cifs_arc4.ko |
||
Linus Torvalds
|
e80bdc5ed0 |
Fix a race on startup and another in the delegation code. The latter
has been around for years, but I suspect recent changes may have widened the race window a little, so I'd like to go ahead and get it in. -----BEGIN PGP SIGNATURE----- iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAmGzuB8VHGJmaWVsZHNA ZmllbGRzZXMub3JnAAoJECebzXlCjuG+rYQP/1N61E3rf2MOfF4sluubaGGi0fVR cx3w8pTDSALfxAiz0ZTA6hJbZN0SR7MZuXMhWDHT/RiG7jFfuMJACU81BWn0cCPJ zuJziE3Y6fElGi5Ifp71yMLABqTlFPOUHBZkpzaADqXGwmZk4fecXmqKbtqa9kCU 1mCQJlZDGU1ljGVYuXU4SaBOzsrCpPfbxby2ZWFDs8bu5cGKDTq9OSVF1LaGzjMN 2+v0y2Qni+dRfTBCGAmxF8AukRa2r5i2rfHRatgKQO8PdROv/pguy1hYjjeeofhX /Jlq2RIVl1/9/Fwar4Cew2+fpQhpaY62wUuo4KG4PK5WZ92LGDjSpYqvu/I4lI6z BXdge0HzDPRnpZtD+tu4IJ7nQwG+JnO2Ws91rPxwT6DDX4zX9gNZc+7EGevVeXJI XTO+61Tyz+7/6YLR8UvnvyER9KvKjQiUWZkO81LZpuAAwn1PE3rahD48jxkmwwYE m5EvfuA6YR8oyk3Ml+nZNuKwgrNqPcvohyJz4QHpNpqmvOlVrb9I1k90JzIm8D9Q 09M8An1Z3AGw76+RMnvBnWewn1Nx1bKIS32SJk9x3zBoeyXLS17G6FmpEhW81xvN LoToRqD/GNx+4AtfvkHqwv2/Qd1rvONCsjbZv+2P0DAj8ZEOWn0YHk18WlvAIZUg 4ipyZbwMUbXeytKP =GgUu -----END PGP SIGNATURE----- Merge tag 'nfsd-5.16-2' of git://linux-nfs.org/~bfields/linux Pull nfsd fixes from Bruce Fields: "Fix a race on startup and another in the delegation code. The latter has been around for years, but I suspect recent changes may have widened the race window a little, so I'd like to go ahead and get it in" * tag 'nfsd-5.16-2' of git://linux-nfs.org/~bfields/linux: nfsd: fix use-after-free due to delegation race nfsd: Fix nsfd startup race (again) |
||
Linus Torvalds
|
257dcf2923 |
tracing, ftrace and tracefs fixes:
- Have tracefs honor the gid mount option - Have new files in tracefs inherit the parent ownership - Have direct_ops unregister when it has no more functions - Properly clean up the ops when unregistering multi direct ops - Add a sample module to test the multiple direct ops - Fix memory leak in error path of __create_synth_event() -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYbOgPBQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qgOtAP0YD+cRLxnRKA376oQVB8zmuZ3mZ/4x 6M1hqruSDlno3AEA19PyHpxl7flFwnBb6Gnfo9VeefcMS5ENDH9p1aHX4wU= =Tr6t -----END PGP SIGNATURE----- Merge tag 'trace-v5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Tracing, ftrace and tracefs fixes: - Have tracefs honor the gid mount option - Have new files in tracefs inherit the parent ownership - Have direct_ops unregister when it has no more functions - Properly clean up the ops when unregistering multi direct ops - Add a sample module to test the multiple direct ops - Fix memory leak in error path of __create_synth_event()" * tag 'trace-v5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix possible memory leak in __create_synth_event() error path ftrace/samples: Add module to test multi direct modify interface ftrace: Add cleanup to unregister_ftrace_direct_multi ftrace: Use direct_ops hash in unregister_ftrace_direct tracefs: Set all files to the same group ownership as the mount option tracefs: Have new files inherit the ownership of their parent |
||
Linus Torvalds
|
0d21e66847 |
aio poll fixes for 5.16-rc5
Fix three bugs in aio poll, and one issue with POLLFREE more broadly: - aio poll didn't handle POLLFREE, causing a use-after-free. - aio poll could block while the file is ready. - aio poll called eventfd_signal() when it isn't allowed. - POLLFREE didn't handle multiple exclusive waiters correctly. This has been tested with the libaio test suite, as well as with test programs I wrote that reproduce the first two bugs. I am sending this pull request myself as no one seems to be maintaining this code. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCYbObthQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK+3mAQC9W8ApzBleEPI6FXzIIo5AiQT/2jGl 7FbO1MtkdUBU4QEAzf+VWl4Z4BJTgxl44avRdVDpXGAMnbWkd7heY+e3HwA= =mp+r -----END PGP SIGNATURE----- Merge tag 'aio-poll-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull aio poll fixes from Eric Biggers: "Fix three bugs in aio poll, and one issue with POLLFREE more broadly: - aio poll didn't handle POLLFREE, causing a use-after-free. - aio poll could block while the file is ready. - aio poll called eventfd_signal() when it isn't allowed. - POLLFREE didn't handle multiple exclusive waiters correctly. This has been tested with the libaio test suite, as well as with test programs I wrote that reproduce the first two bugs. I am sending this pull request myself as no one seems to be maintaining this code" * tag 'aio-poll-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: aio: Fix incorrect usage of eventfd_signal_allowed() aio: fix use-after-free due to missing POLLFREE handling aio: keep poll requests on waitqueue until completed signalfd: use wake_up_pollfree() binder: use wake_up_pollfree() wait: add wake_up_pollfree() |
||
Jens Axboe
|
71a8538754 |
io-wq: check for wq exit after adding new worker task_work
We check IO_WQ_BIT_EXIT before attempting to create a new worker, and wq exit cancels pending work if we have any. But it's possible to have a race between the two, where creation checks exit finding it not set, but we're in the process of exiting. The exit side will cancel pending creation task_work, but there's a gap where we add task_work after we've canceled existing creations at exit time. Fix this by checking the EXIT bit post adding the creation task_work. If it's set, run the same cancelation that exit does. Reported-and-tested-by: syzbot+b60c982cb0efc5e05a47@syzkaller.appspotmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Jens Axboe
|
78a7806020 |
io_uring: ensure task_work gets run as part of cancelations
If we successfully cancel a work item but that work item needs to be processed through task_work, then we can be sleeping uninterruptibly in io_uring_cancel_generic() and never process it. Hence we don't make forward progress and we end up with an uninterruptible sleep warning. While in there, correct a comment that should be IFF, not IIF. Reported-and-tested-by: syzbot+21e6887c0be14181206d@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
J. Bruce Fields
|
548ec0805c |
nfsd: fix use-after-free due to delegation race
A delegation break could arrive as soon as we've called vfs_setlease. A delegation break runs a callback which immediately (in nfsd4_cb_recall_prepare) adds the delegation to del_recall_lru. If we then exit nfs4_set_delegation without hashing the delegation, it will be freed as soon as the callback is done with it, without ever being removed from del_recall_lru. Symptoms show up later as use-after-free or list corruption warnings, usually in the laundromat thread. I suspect aba2072f4523 "nfsd: grant read delegations to clients holding writes" made this bug easier to hit, but I looked as far back as v3.0 and it looks to me it already had the same problem. So I'm not sure where the bug was introduced; it may have been there from the beginning. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> |
||
Alexander Sverdlin
|
b10252c7ae |
nfsd: Fix nsfd startup race (again)
Commit bd5ae9288d64 ("nfsd: register pernet ops last, unregister first") has re-opened rpc_pipefs_event() race against nfsd_net_id registration (register_pernet_subsys()) which has been fixed by commit bb7ffbf29e76 ("nfsd: fix nsfd startup race triggering BUG_ON"). Restore the order of register_pernet_subsys() vs register_cld_notifier(). Add WARN_ON() to prevent a future regression. Crash info: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000012 CPU: 8 PID: 345 Comm: mount Not tainted 5.4.144-... #1 pc : rpc_pipefs_event+0x54/0x120 [nfsd] lr : rpc_pipefs_event+0x48/0x120 [nfsd] Call trace: rpc_pipefs_event+0x54/0x120 [nfsd] blocking_notifier_call_chain rpc_fill_super get_tree_keyed rpc_fs_get_tree vfs_get_tree do_mount ksys_mount __arm64_sys_mount el0_svc_handler el0_svc Fixes: bd5ae9288d64 ("nfsd: register pernet ops last, unregister first") Cc: stable@vger.kernel.org Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> |
||
Eric Dumazet
|
04a931e58d |
net: add netns refcount tracker to struct seq_net_private
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Xie Yongji
|
4b37498653 |
aio: Fix incorrect usage of eventfd_signal_allowed()
We should defer eventfd_signal() to the workqueue when eventfd_signal_allowed() return false rather than return true. Fixes: b542e383d8c0 ("eventfd: Make signal recursion protection a task bit") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210913111928.98-1-xieyongji@bytedance.com Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> |
||
Eric Biggers
|
50252e4b5e |
aio: fix use-after-free due to missing POLLFREE handling
signalfd_poll() and binder_poll() are special in that they use a waitqueue whose lifetime is the current task, rather than the struct file as is normally the case. This is okay for blocking polls, since a blocking poll occurs within one task; however, non-blocking polls require another solution. This solution is for the queue to be cleared before it is freed, by sending a POLLFREE notification to all waiters. Unfortunately, only eventpoll handles POLLFREE. A second type of non-blocking poll, aio poll, was added in kernel v4.18, and it doesn't handle POLLFREE. This allows a use-after-free to occur if a signalfd or binder fd is polled with aio poll, and the waitqueue gets freed. Fix this by making aio poll handle POLLFREE. A patch by Ramji Jiyani <ramjiyani@google.com> (https://lore.kernel.org/r/20211027011834.2497484-1-ramjiyani@google.com) tried to do this by making aio_poll_wake() always complete the request inline if POLLFREE is seen. However, that solution had two bugs. First, it introduced a deadlock, as it unconditionally locked the aio context while holding the waitqueue lock, which inverts the normal locking order. Second, it didn't consider that POLLFREE notifications are missed while the request has been temporarily de-queued. The second problem was solved by my previous patch. This patch then properly fixes the use-after-free by handling POLLFREE in a deadlock-free way. It does this by taking advantage of the fact that freeing of the waitqueue is RCU-delayed, similar to what eventpoll does. Fixes: 2c14fa838cbe ("aio: implement IOCB_CMD_POLL") Cc: <stable@vger.kernel.org> # v4.18+ Link: https://lore.kernel.org/r/20211209010455.42744-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> |