linux

iv/linux

Author	SHA1	Message	Date
Ranjan Kumar	0a319f1629	scsi: mpi3mr: Wait for diagnostic save during controller init If a controller reset operation is triggered to recover the controller from a fault state, then wait for the snapdump to be saved in the firmware region before proceeding to reset the controller. Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Link: https://lore.kernel.org/r/20230228140835.4075-4-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-03-06 18:33:12 -05:00
Ranjan Kumar	5b06a7169c	scsi: mpi3mr: Driver unload crashes host when enhanced logging is enabled Prevent driver from trying to dereference a NULL pointer in a debug print while removing a device during driver unload. Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Link: https://lore.kernel.org/r/20230228140835.4075-3-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-03-06 18:33:12 -05:00
Ranjan Kumar	02ca7da291	scsi: mpi3mr: ioctl timeout when disabling/enabling interrupt As part of Task Management handling, the driver will disable and enable the MSIx index zero which belongs to the Admin reply queue. During this transition the driver loses some interrupts and this leads to Admin request and ioctl timeouts. After enabling the interrupts, poll the Admin reply queue to avoid timeouts. Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Link: https://lore.kernel.org/r/20230228140835.4075-2-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-03-06 18:33:12 -05:00
Jakob Koschel	2850b23e9f	scsi: lpfc: Avoid usage of list iterator variable after loop If the &epd_pool->list is empty when executing lpfc_get_io_buf_from_expedite_pool() the function would return an invalid pointer. Even in the case if the list is guaranteed to be populated, the iterator variable should not be used after the loop to be more robust for future changes. Linus proposed to avoid any use of the list iterator variable after the loop, in the attempt to move the list iterator variable declaration into the macro to avoid any potential misuse after the loop [1]. Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Signed-off-by: Jakob Koschel <jkl820.git@gmail.com> Link: https://lore.kernel.org/r/20230301-scsi-lpfc-avoid-list-iterator-after-loop-v1-1-325578ae7561@gmail.com Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-03-06 18:33:12 -05:00
Justin Tee	312320b0e0	scsi: lpfc: Check kzalloc() in lpfc_sli4_cgn_params_read() If kzalloc() fails in lpfc_sli4_cgn_params_read(), then we rely on lpfc_read_object()'s routine to NULL check pdata. Currently, an early return error is thrown from lpfc_read_object() to protect us from NULL ptr dereference, but the errno code is -ENODEV. Change the errno code to a more appropriate -ENOMEM. Reported-by: Kang Chen <void0red@gmail.com> Link: https://lore.kernel.org/all/20230226102338.3362585-1-void0red@gmail.com Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230228044336.5195-1-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-03-06 18:33:12 -05:00
Asutosh Das	c9507eab9f	scsi: ufs: mcq: qcom: Clean the return path of ufs_qcom_mcq_config_resource() Smatch static checker reported: drivers/ufs/host/ufs-qcom.c:1469 ufs_qcom_mcq_config_resource() info: returning a literal zero is cleaner Fix the above warning by returning in place instead of a jump to a label. Also remove the usage of devm_kfree() as it's unnecessary in this function. Fixes: c263b4ef737e ("scsi: ufs: core: mcq: Configure resource regions") Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Asutosh Das <quic_asutoshd@quicinc.com> Link: https://lore.kernel.org/r/3ebd2582af74b81ef7b57149f57c6a3bf0963953.1677721229.git.quic_asutoshd@quicinc.com Reviewed-by: Bjorn Andersson <andersson@kernel.org> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-03-06 18:33:12 -05:00
Asutosh Das	c8be073bd2	scsi: ufs: mcq: qcom: Fix passing zero to PTR_ERR Fix an error case in ufs_qcom_mcq_config_resource(), where the return value is set to 0 before passing it to PTR_ERR. This led to Smatch warning: drivers/ufs/host/ufs-qcom.c:1455 ufs_qcom_mcq_config_resource() warn: passing zero to 'PTR_ERR' Fixes: c263b4ef737e ("scsi: ufs: core: mcq: Configure resource regions") Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Asutosh Das <quic_asutoshd@quicinc.com> Link: https://lore.kernel.org/r/94ca99b327af634799ce5f25d0112c28cd00970d.1677721072.git.quic_asutoshd@quicinc.com Reviewed-by: Bjorn Andersson <andersson@kernel.org> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-03-06 18:33:12 -05:00
Dan Carpenter	fa8d32721a	scsi: ufs: ufs-qcom: Remove impossible check The "dev_req_params" pointer points to inside the middle of a struct so it can't be NULL. Removing this impossible condition is nice because now we don't need to consider the correct error code for that situation. Link: https://lore.kernel.org/r/Y/yA3niWUcGYgBU8@kili Fixes: f06fcc7155dc ("scsi: ufs-qcom: add QUniPro hardware support and power optimizations") Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-03-06 18:33:12 -05:00
Adrien Thierry	2ebe16155d	scsi: ufs: core: Add soft dependency on governor_simpleondemand The ufshcd driver uses simpleondemand governor for devfreq. Add it to the list of ufshcd softdeps to allow userspace initramfs tools like dracut to automatically pull the governor module into the initramfs together with UFS drivers. Link: https://lore.kernel.org/r/20230220140740.14379-1-athierry@redhat.com Signed-off-by: Adrien Thierry <athierry@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-03-06 18:33:12 -05:00
Kang Chen	06d1a90de6	scsi: hisi_sas: Check devm_add_action() return value In case devm_add_action() fails, check it in the caller of interrupt_preinit_v3_hw(). Link: https://lore.kernel.org/r/20230227031030.893324-1-void0red@gmail.com Signed-off-by: Kang Chen <void0red@gmail.com> Acked-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-03-06 18:33:12 -05:00
Daniel Wagner	877b03795f	scsi: qla2xxx: Add option to disable FC2 Target support Commit 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target") added support for FC2 Targets. Unfortunately, there are older setups which break with this new feature enabled. Allow to disable it via module option. Link: https://lore.kernel.org/r/20230208152014.109214-1-dwagner@suse.de Signed-off-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-03-06 17:06:39 -05:00
Maurizio Lombardi	6cc55c969b	scsi: target: iscsi: Fix an error message in iscsi_check_key() The first half of the error message is printed by pr_err(), the second half is printed by pr_debug(). The user will therefore see only the first part of the message and will miss some useful information. Link: https://lore.kernel.org/r/20230214141556.762047-1-mlombard@redhat.com Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-03-06 16:50:42 -05:00
Jakub Kicinski	e539a105f9	net: tls: fix device-offloaded sendpage straddling records Adrien reports that incorrect data is transmitted when a single page straddles multiple records. We would transmit the same data in all iterations of the loop. Reported-by: Adrien Moulin <amoulin@corp.free.fr> Link: https://lore.kernel.org/all/61481278.42813558.1677845235112.JavaMail.zimbra@corp.free.fr Fixes: c1318b39c7d3 ("tls: Add opt-in zerocopy mode of sendfile()") Tested-by: Adrien Moulin <amoulin@corp.free.fr> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Maxim Mikityanskiy <maxtram95@gmail.com> Link: https://lore.kernel.org/r/20230304192610.3818098-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-03-06 13:26:16 -08:00
Daniel Golle	193250ace2	net: ethernet: mtk_eth_soc: fix RX data corruption issue Fix data corruption issue with SerDes connected PHYs operating at 1.25 Gbps speed where we could previously observe about 30% packet loss while the bad packet counter was increasing. As almost all boards with MediaTek MT7622 or MT7986 use either the MT7531 switch IC operating at 3.125Gbps SerDes rate or single-port PHYs using rate-adaptation to 2500Base-X mode, this issue only got exposed now when we started trying to use SFP modules operating with 1.25 Gbps with the BananaPi R3 board. The fix is to set bit 12 which disables the RX FIFO clear function when setting up MAC MCR, MediaTek SDK did the same change stating: "If without this patch, kernel might receive invalid packets that are corrupted by GMAC."[1] [1]: `d8a2975939` Fixes: 42c03844e93d ("net-next: mediatek: add support for MediaTek MT7622 SoC") Tested-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/138da2735f92c8b6f8578ec2e5a794ee515b665f.1677937317.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-03-06 13:25:44 -08:00
Heiner Kallweit	58aac3a2ef	net: phy: smsc: fix link up detection in forced irq mode Currently link up can't be detected in forced mode if polling isn't used. Only link up interrupt source we have is aneg complete which isn't applicable in forced mode. Therefore we have to use energy-on as link up indicator. Fixes: 7365494550f6 ("net: phy: smsc: skip ENERGYON interrupt if disabled") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-03-06 13:23:10 -08:00
Arnaldo Carvalho de Melo	5b201a82cd	perf tools: Add Adrian Hunter to MAINTAINERS as a reviewer Adrian is the main author of the Intel PT codebase and has been reviewing perf tooling patches consistently for a long time, so lets reflect that in the MAINTAINERS file so that contributors add him to the CC list in patch submissions. Acked-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/ZAYosCjlzO9plAYO@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-03-06 17:23:18 -03:00
Arnaldo Carvalho de Melo	06a1574b94	tools headers UAPI: Sync linux/perf_event.h with the kernel sources To pick up the changes in: 09519ec3b19e4144 ("perf: Add perf_event_attr::config3") The patches for the tooling side will come later. This addresses this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h' diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/lkml/ZAZLYmDjWjSItWOq@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-03-06 17:21:23 -03:00
Linus Torvalds	8ca09d5fa3	cpumask: fix incorrect cpumask scanning result checks It turns out that commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask optimizations") exposed a number of cases of drivers not checking the result of "cpumask_next()" and friends correctly. The documented correct check for "no more cpus in the cpumask" is to check for the result being equal or larger than the number of possible CPU ids, exactly _because_ we've always done those constant-sized cpumask scans using a widened type before. So the return value of a cpumask scan should be checked with if (cpu >= nr_cpu_ids) ... because the cpumask scan did not necessarily stop exactly at that maximum CPU id. But a few cases ended up instead using checks like if (cpu == nr_cpumask_bits) ... which used that internal "widened" number of bits. And that used to work pretty much by accident (ok, in this case "by accident" is simply because it matched the historical internal implementation of the cpumask scanning, so it was more of a "intentionally using implementation details rather than an accident"). But the extended constant-sized optimizations then did that internal implementation differently, and now that code that did things wrong but matched the old implementation no longer worked at all. Which then causes subsequent odd problems due to using what ends up being an invalid CPU ID. Most of these cases require either unusual hardware or special uses to hit, but the random.c one triggers quite easily. All you really need is to have a sufficiently small CONFIG_NR_CPUS value for the bit scanning optimization to be triggered, but not enough CPUs to then actually fill that widened cpumask. At that point, the cpumask scanning will return the NR_CPUS constant, which is _not_ the same as nr_cpumask_bits. This just does the mindless fix with sed -i 's/== nr_cpumask_bits/>= nr_cpu_ids/' to fix the incorrect uses. The ones in the SCSI lpfc driver in particular could probably be fixed more cleanly by just removing that repeated pattern entirely, but I am not emptionally invested enough in that driver to care. Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/lkml/481b19b5-83a0-4793-b4fd-194ad7b978c3@roeck-us.net/ Reported-and-tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/lkml/CAMuHMdUKo_Sf7TjKzcNDa8Ve+6QrK+P8nSQrSQ=6LTRmcBKNww@mail.gmail.com/ Reported-by: Vernon Yang <vernon2gm@gmail.com> Link: https://lore.kernel.org/lkml/20230306160651.2016767-1-vernon2gm@gmail.com/ Cc: Yury Norov <yury.norov@gmail.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2023-03-06 12:15:13 -08:00
Martin KaFai Lau	32dfc59e43	Merge branch 'fix resolving VAR after DATASEC' Lorenz Bauer says: ==================== See the first patch for a detailed explanation. v2: - Move RESOLVE_TBD assignment out of the loop (Martin) ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-03-06 11:44:14 -08:00
Lorenz Bauer	dfdd608c3b	selftests/bpf: check that modifier resolves after pointer Add a regression test that ensures that a VAR pointing at a modifier which follows a PTR (or STRUCT or ARRAY) is resolved correctly by the datasec validator. Signed-off-by: Lorenz Bauer <lmb@isovalent.com> Link: https://lore.kernel.org/r/20230306112138.155352-3-lmb@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-03-06 11:44:14 -08:00
Lorenz Bauer	9b459804ff	btf: fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR btf_datasec_resolve contains a bug that causes the following BTF to fail loading: [1] DATASEC a size=2 vlen=2 type_id=4 offset=0 size=1 type_id=7 offset=1 size=1 [2] INT (anon) size=1 bits_offset=0 nr_bits=8 encoding=(none) [3] PTR (anon) type_id=2 [4] VAR a type_id=3 linkage=0 [5] INT (anon) size=1 bits_offset=0 nr_bits=8 encoding=(none) [6] TYPEDEF td type_id=5 [7] VAR b type_id=6 linkage=0 This error message is printed during btf_check_all_types: [1] DATASEC a size=2 vlen=2 type_id=7 offset=1 size=1 Invalid type By tracing btf_*_resolve we can pinpoint the problem: btf_datasec_resolve(depth: 1, type_id: 1, mode: RESOLVE_TBD) = 0 btf_var_resolve(depth: 2, type_id: 4, mode: RESOLVE_TBD) = 0 btf_ptr_resolve(depth: 3, type_id: 3, mode: RESOLVE_PTR) = 0 btf_var_resolve(depth: 2, type_id: 4, mode: RESOLVE_PTR) = 0 btf_datasec_resolve(depth: 1, type_id: 1, mode: RESOLVE_PTR) = -22 The last invocation of btf_datasec_resolve should invoke btf_var_resolve by means of env_stack_push, instead it returns EINVAL. The reason is that env_stack_push is never executed for the second VAR. if (!env_type_is_resolve_sink(env, var_type) && !env_type_is_resolved(env, var_type_id)) { env_stack_set_next_member(env, i + 1); return env_stack_push(env, var_type, var_type_id); } env_type_is_resolve_sink() changes its behaviour based on resolve_mode. For RESOLVE_PTR, we can simplify the if condition to the following: (btf_type_is_modifier() \|\| btf_type_is_ptr) && !env_type_is_resolved() Since we're dealing with a VAR the clause evaluates to false. This is not sufficient to trigger the bug however. The log output and EINVAL are only generated if btf_type_id_size() fails. if (!btf_type_id_size(btf, &type_id, &type_size)) { btf_verifier_log_vsi(env, v->t, vsi, "Invalid type"); return -EINVAL; } Most types are sized, so for example a VAR referring to an INT is not a problem. The bug is only triggered if a VAR points at a modifier. Since we skipped btf_var_resolve that modifier was also never resolved, which means that btf_resolved_type_id returns 0 aka VOID for the modifier. This in turn causes btf_type_id_size to return NULL, triggering EINVAL. To summarise, the following conditions are necessary: - VAR pointing at PTR, STRUCT, UNION or ARRAY - Followed by a VAR pointing at TYPEDEF, VOLATILE, CONST, RESTRICT or TYPE_TAG The fix is to reset resolve_mode to RESOLVE_TBD before attempting to resolve a VAR from a DATASEC. Fixes: 1dc92851849c ("bpf: kernel side support for BTF Var and DataSec") Signed-off-by: Lorenz Bauer <lmb@isovalent.com> Link: https://lore.kernel.org/r/20230306112138.155352-2-lmb@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-03-06 11:44:13 -08:00
Dave Airlie	66305069eb	A fix for nouveau preventing the system shutdown and one for a build warning, and NULL pointer dereference fix for cirrus. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCY/cl7AAKCRDj7w1vZxhR xft6AP9or5SQDSiSh17EavCVcdwMtLkYSt5Trjrlzwpv/rUAbQEA6uR9l5qOm3uN K7By7lGCb1HHi1t1wbhzfGsSOzSE6Q8= =zu21 -----END PGP SIGNATURE----- Merge tag 'drm-misc-fixes-2023-02-23' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes A fix for nouveau preventing the system shutdown and one for a build warning, and NULL pointer dereference fix for cirrus. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20230223083839.5gtmu6i42bnj7pfh@houat	2023-03-07 05:42:34 +10:00
Alexander Lobakin	294635a816	bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES &xdp_buff and &xdp_frame are bound in a way that xdp_buff->data_hard_start == xdp_frame It's always the case and e.g. xdp_convert_buff_to_frame() relies on this. IOW, the following: for (u32 i = 0; i < 0xdead; i++) { xdpf = xdp_convert_buff_to_frame(&xdp); xdp_convert_frame_to_buff(xdpf, &xdp); } shouldn't ever modify @xdpf's contents or the pointer itself. However, "live packet" code wrongly treats &xdp_frame as part of its context placed before the data_hard_start. With such flow, data_hard_start is sizeof(*xdpf) off to the right and no longer points to the XDP frame. Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several places and praying that there are no more miscalcs left somewhere in the code, unionize ::frm with ::data in a flex array, so that both starts pointing to the actual data_hard_start and the XDP frame actually starts being a part of it, i.e. a part of the headroom, not the context. A nice side effect is that the maximum frame size for this mode gets increased by 40 bytes, as xdp_buff::frame_sz includes everything from data_hard_start (-> includes xdpf already) to the end of XDP/skb shared info. Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it hardcoded for 64 bit && 4k pages, it can be made more flexible later on. Minor: align `&head->data` with how `head->frm` is assigned for consistency. Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for clarity. (was found while testing XDP traffic generator on ice, which calls xdp_convert_frame_to_buff() for each XDP frame) Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN") Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20230224163607.2994755-1-aleksander.lobakin@intel.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-03-06 11:15:54 -08:00
Andy Shevchenko	80c16b2b12	cpumask: Fix typo nr_cpumask_size --> nr_cpumask_bits The never used nr_cpumask_size is just a typo, hence use existing redefinition that's called nr_cpumask_bits. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2023-03-06 10:58:04 -08:00
Filipe Manana	e4cc1483f3	btrfs: fix extent map logging bit not cleared for split maps after dropping range At btrfs_drop_extent_map_range() we are clearing the EXTENT_FLAG_LOGGING bit on a 'flags' variable that was not initialized. This makes static checkers complain about it, so initialize the 'flags' variable before clearing the bit. In practice this has no consequences, because EXTENT_FLAG_LOGGING should not be set when btrfs_drop_extent_map_range() is called, as an fsync locks the inode in exclusive mode, locks the inode's mmap semaphore in exclusive mode too and it always flushes all delalloc. Also add a comment about why we clear EXTENT_FLAG_LOGGING on a copy of the flags of the split extent map. Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/linux-btrfs/Y%2FyipSVozUDEZKow@kili/ Fixes: db21370bffbc ("btrfs: drop extent map range more efficiently") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-03-06 19:28:19 +01:00
Johannes Thumshirn	95cd356ca2	btrfs: fix percent calculation for bg reclaim message We have a report, that the info message for block-group reclaim is crossing the 100% used mark. This is happening as we were truncating the divisor for the division (the block_group->length) to a 32bit value. Fix this by using div64_u64() to not truncate the divisor. In the worst case, it can lead to a div by zero error and should be possible to trigger on 4 disks RAID0, and each device is large enough: $ mkfs.btrfs -f /dev/test/scratch[1234] -m raid1 -d raid0 btrfs-progs v6.1 [...] Filesystem size: 40.00GiB Block group profiles: Data: RAID0 4.00GiB <<< Metadata: RAID1 256.00MiB System: RAID1 8.00MiB Reported-by: Forza <forza@tnonline.net> Link: https://lore.kernel.org/linux-btrfs/e99483.c11a58d.1863591ca52@tnonline.net/ Fixes: 5f93e776c673 ("btrfs: zoned: print unusable percentage when reclaiming block groups") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add Qu's note ] Signed-off-by: David Sterba <dsterba@suse.com>	2023-03-06 19:28:19 +01:00
Naohiro Aota	98e8d36a26	btrfs: fix unnecessary increment of read error stat on write error Current btrfs_log_dev_io_error() increases the read error count even if the erroneous IO is a WRITE request. This is because it forget to use "else if", and all the error WRITE requests counts as READ error as there is (of course) no REQ_RAHEAD bit set. Fixes: c3a62baf21ad ("btrfs: use chained bios when cloning") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-03-06 19:28:19 +01:00
void0red	c06016a02a	btrfs: handle btrfs_del_item errors in __btrfs_update_delayed_inode Even if the slot is already read out, we may still need to re-balance the tree, thus it can cause error in that btrfs_del_item() call and we need to handle it properly. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: void0red <void0red@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-03-06 19:28:19 +01:00
Qu Wenruo	2943868a90	btrfs: ioctl: return device fsid from DEV_INFO ioctl Currently user space utilizes dev info ioctl to grab the info of a certain devid, this includes its device uuid. But the returned info is not enough to determine if a device is a seed. Commit a26d60dedf9a ("btrfs: sysfs: add devinfo/fsid to retrieve actual fsid from the device") exports the same value in sysfs so this is for parity with ioctl. Add a new member, fsid, into btrfs_ioctl_dev_info_args, and populate the member with fsid value. This should not cause any compatibility problem, following the combinations: - Old user space, old kernel - Old user space, new kernel User space tool won't even check the new member. - New user space, old kernel The kernel won't touch the new member, and user space tool should zero out its argument, thus the new member is all zero. User space tool can then know the kernel doesn't support this fsid reporting, and falls back to whatever they can. - New user space, new kernel Go as planned. Would find the fsid member is no longer zero, and trust its value. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-03-06 19:28:19 +01:00
Boris Burkov	12148367d7	btrfs: fix potential dead lock in size class loading logic As reported by Filipe, there's a potential deadlock caused by using btrfs_search_forward on commit_root. The locking there is unconditional, even if ->skip_locking and ->search_commit_root is set. It's not meant to be used for commit roots, so it always needs to do locking. So if another task is COWing a child node of the same root node and then needs to wait for block group caching to complete when trying to allocate a metadata extent, it deadlocks. For example: [539604.239315] sysrq: Show Blocked State [539604.240133] task:kworker/u16:6 state:D stack:0 pid:2119594 ppid:2 flags:0x00004000 [539604.241613] Workqueue: btrfs-cache btrfs_work_helper [btrfs] [539604.242673] Call Trace: [539604.243129] <TASK> [539604.243925] __schedule+0x41d/0xee0 [539604.244797] ? rcu_read_lock_sched_held+0x12/0x70 [539604.245399] ? rwsem_down_read_slowpath+0x185/0x490 [539604.246111] schedule+0x5d/0xf0 [539604.246593] rwsem_down_read_slowpath+0x2da/0x490 [539604.247290] ? rcu_barrier_tasks_trace+0x10/0x20 [539604.248090] __down_read_common+0x3d/0x150 [539604.248702] down_read_nested+0xc3/0x140 [539604.249280] __btrfs_tree_read_lock+0x24/0x100 [btrfs] [539604.250097] btrfs_read_lock_root_node+0x48/0x60 [btrfs] [539604.250915] btrfs_search_forward+0x59/0x460 [btrfs] [539604.251781] ? btrfs_global_root+0x50/0x70 [btrfs] [539604.252476] caching_thread+0x1be/0x920 [btrfs] [539604.253167] btrfs_work_helper+0xf6/0x400 [btrfs] [539604.253848] process_one_work+0x24f/0x5a0 [539604.254476] worker_thread+0x52/0x3b0 [539604.255166] ? __pfx_worker_thread+0x10/0x10 [539604.256047] kthread+0xf0/0x120 [539604.256591] ? __pfx_kthread+0x10/0x10 [539604.257212] ret_from_fork+0x29/0x50 [539604.257822] </TASK> [539604.258233] task:btrfs-transacti state:D stack:0 pid:2236474 ppid:2 flags:0x00004000 [539604.259802] Call Trace: [539604.260243] <TASK> [539604.260615] __schedule+0x41d/0xee0 [539604.261205] ? rcu_read_lock_sched_held+0x12/0x70 [539604.262000] ? rwsem_down_read_slowpath+0x185/0x490 [539604.262822] schedule+0x5d/0xf0 [539604.263374] rwsem_down_read_slowpath+0x2da/0x490 [539604.266228] ? lock_acquire+0x160/0x310 [539604.266917] ? rcu_read_lock_sched_held+0x12/0x70 [539604.267996] ? lock_contended+0x19e/0x500 [539604.268720] __down_read_common+0x3d/0x150 [539604.269400] down_read_nested+0xc3/0x140 [539604.270057] __btrfs_tree_read_lock+0x24/0x100 [btrfs] [539604.271129] btrfs_read_lock_root_node+0x48/0x60 [btrfs] [539604.272372] btrfs_search_slot+0x143/0xf70 [btrfs] [539604.273295] update_block_group_item+0x9e/0x190 [btrfs] [539604.274282] btrfs_start_dirty_block_groups+0x1c4/0x4f0 [btrfs] [539604.275381] ? __mutex_unlock_slowpath+0x45/0x280 [539604.276390] btrfs_commit_transaction+0xee/0xed0 [btrfs] [539604.277391] ? lock_acquire+0x1a4/0x310 [539604.278080] ? start_transaction+0xcb/0x6c0 [btrfs] [539604.279099] transaction_kthread+0x142/0x1c0 [btrfs] [539604.279996] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] [539604.280673] kthread+0xf0/0x120 [539604.281050] ? __pfx_kthread+0x10/0x10 [539604.281496] ret_from_fork+0x29/0x50 [539604.281966] </TASK> [539604.282255] task:fsstress state:D stack:0 pid:2236483 ppid:1 flags:0x00004006 [539604.283897] Call Trace: [539604.284700] <TASK> [539604.285088] __schedule+0x41d/0xee0 [539604.285660] schedule+0x5d/0xf0 [539604.286175] btrfs_wait_block_group_cache_progress+0xf2/0x170 [btrfs] [539604.287342] ? __pfx_autoremove_wake_function+0x10/0x10 [539604.288450] find_free_extent+0xd93/0x1750 [btrfs] [539604.289256] ? _raw_spin_unlock+0x29/0x50 [539604.289911] ? btrfs_get_alloc_profile+0x127/0x2a0 [btrfs] [539604.290843] btrfs_reserve_extent+0x147/0x290 [btrfs] [539604.291943] btrfs_alloc_tree_block+0xcb/0x3e0 [btrfs] [539604.292903] __btrfs_cow_block+0x138/0x580 [btrfs] [539604.293773] btrfs_cow_block+0x10e/0x240 [btrfs] [539604.294595] btrfs_search_slot+0x7f3/0xf70 [btrfs] [539604.295585] btrfs_update_device+0x71/0x1b0 [btrfs] [539604.296459] btrfs_chunk_alloc_add_chunk_item+0xe0/0x340 [btrfs] [539604.297489] btrfs_chunk_alloc+0x1bf/0x490 [btrfs] [539604.298335] find_free_extent+0x6fa/0x1750 [btrfs] [539604.299174] ? _raw_spin_unlock+0x29/0x50 [539604.299950] ? btrfs_get_alloc_profile+0x127/0x2a0 [btrfs] [539604.300918] btrfs_reserve_extent+0x147/0x290 [btrfs] [539604.301797] btrfs_alloc_tree_block+0xcb/0x3e0 [btrfs] [539604.303017] ? lock_release+0x224/0x4a0 [539604.303855] __btrfs_cow_block+0x138/0x580 [btrfs] [539604.304789] btrfs_cow_block+0x10e/0x240 [btrfs] [539604.305611] btrfs_search_slot+0x7f3/0xf70 [btrfs] [539604.306682] ? btrfs_global_root+0x50/0x70 [btrfs] [539604.308198] lookup_inline_extent_backref+0x17b/0x7a0 [btrfs] [539604.309254] lookup_extent_backref+0x43/0xd0 [btrfs] [539604.310122] __btrfs_free_extent+0xf8/0x810 [btrfs] [539604.310874] ? lock_release+0x224/0x4a0 [539604.311724] ? btrfs_merge_delayed_refs+0x17b/0x1d0 [btrfs] [539604.313023] __btrfs_run_delayed_refs+0x2ba/0x1260 [btrfs] [539604.314271] btrfs_run_delayed_refs+0x8f/0x1c0 [btrfs] [539604.315445] ? rcu_read_lock_sched_held+0x12/0x70 [539604.316706] btrfs_commit_transaction+0xa2/0xed0 [btrfs] [539604.317855] ? do_raw_spin_unlock+0x4b/0xa0 [539604.318544] ? _raw_spin_unlock+0x29/0x50 [539604.319240] create_subvol+0x53d/0x6e0 [btrfs] [539604.320283] btrfs_mksubvol+0x4f5/0x590 [btrfs] [539604.321220] __btrfs_ioctl_snap_create+0x11b/0x180 [btrfs] [539604.322307] btrfs_ioctl_snap_create_v2+0xc6/0x150 [btrfs] [539604.323295] btrfs_ioctl+0x9f7/0x33e0 [btrfs] [539604.324331] ? rcu_read_lock_sched_held+0x12/0x70 [539604.325137] ? lock_release+0x224/0x4a0 [539604.325808] ? __x64_sys_ioctl+0x87/0xc0 [539604.326467] __x64_sys_ioctl+0x87/0xc0 [539604.327109] do_syscall_64+0x38/0x90 [539604.327875] entry_SYSCALL_64_after_hwframe+0x72/0xdc [539604.328792] RIP: 0033:0x7f05a7babaeb This needs to use regular btrfs_search_slot() with some skip and stop logic. Since we only consider five samples (five search slots), don't bother with the complexity of looking for commit_root_sem contention. If necessary, it can be added to the load function in between samples. Reported-by: Filipe Manana <fdmanana@kernel.org> Link: https://lore.kernel.org/linux-btrfs/CAL3q7H7eKMD44Z1+=Kb-1RFMMeZpAm2fwyO59yeBwCcSOU80Pg@mail.gmail.com/ Fixes: c7eec3d9aa95 ("btrfs: load block group size class when caching") Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>	2023-03-06 19:28:19 +01:00
Arnaldo Carvalho de Melo	7d0930647c	tools headers x86 cpufeatures: Sync with the kernel sources To pick the changes from: 8415a74852d7c247 ("x86/cpu, kvm: Add support for CPUID_80000021_EAX") This only causes these perf files to be rebuilt: CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o And addresses these perf build warnings: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h' diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h Warning: Kernel ABI header at 'tools/arch/x86/include/asm/required-features.h' differs from latest version at 'arch/x86/include/asm/required-features.h' diff -u tools/arch/x86/include/asm/required-features.h arch/x86/include/asm/required-features.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZAYlS2XTJ5hRtss7@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-03-06 14:42:12 -03:00
Bagas Sanjaya	b7abcd9c65	bpf, doc: Link to submitting-patches.rst for general patch submission info The link for patch submission information in general refers to index page for "Working with the kernel development community" section of kernel docs, whereas the link should have been Documentation/process/submitting-patches.rst instead. Fix it by replacing the index target with the appropriate doc. Fixes: 542228384888f5 ("bpf, doc: convert bpf_devel_QA.rst to use RST formatting") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230228074523.11493-3-bagasdotme@gmail.com	2023-03-06 16:44:39 +01:00
Bagas Sanjaya	32db18d606	bpf, doc: Do not link to docs.kernel.org for kselftest link The question on how to run BPF selftests have a reference link to kernel selftest documentation (Documentation/dev-tools/kselftest.rst). However, it uses external link to the documentation at kernel.org/docs (aka docs.kernel.org) instead, which requires Internet access. Fix this and replace the link with internal linking, by using :doc: directive while keeping the anchor text. Fixes: b7a27c3aafa252 ("bpf, doc: howto use/run the BPF selftests") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230228074523.11493-2-bagasdotme@gmail.com	2023-03-06 16:44:34 +01:00
Jan Kara	63bceed808	udf: Warn if block mapping is done for in-ICB files Now that address space operations are merge dfor in-ICB and normal files, it is more likely some code mistakenly tries to map blocks for in-ICB files. WARN and return error instead of silently returning garbage. Signed-off-by: Jan Kara <jack@suse.cz>	2023-03-06 16:38:25 +01:00
Jan Kara	cecb1f0654	udf: Fix reading of in-ICB files After merging address space operations of normal and in-ICB files, readahead could get called for in-ICB files which resulted in udf_get_block() being called for these files. udf_get_block() is not prepared to be called for in-ICB files and ends up returning garbage results as it interprets file data as extent list. Fix the problem by skipping readahead for in-ICB files. Fixes: 37a8a39f7ad3 ("udf: Switch to single address_space_operations") Signed-off-by: Jan Kara <jack@suse.cz>	2023-03-06 16:38:25 +01:00
Jan Kara	49854d3ccc	udf: Fix lost writes in udf_adinicb_writepage() The patch converting udf_adinicb_writepage() to avoid manually kmapping the page used memcpy_to_page() however that copies in the wrong direction (effectively overwriting file data with the old contents). What we should be using is memcpy_from_page() to copy data from the page into the inode and then mark inode dirty to store the data. Fixes: 5cfc45321a6d ("udf: Convert udf_adinicb_writepage() to memcpy_to_page()") Signed-off-by: Jan Kara <jack@suse.cz>	2023-03-06 16:38:25 +01:00
Michael Schmitz	e36a82bebb	m68k: Only force 030 bus error if PC not in exception table __get_kernel_nofault() does copy data in supervisor mode when forcing a task backtrace log through /proc/sysrq_trigger. This is expected cause a bus error exception on e.g. NULL pointer dereferencing when logging a kernel task has no workqueue associated. This bus error ought to be ignored. Our 030 bus error handler is ill equipped to deal with this: Whenever ssw indicates a kernel mode access on a data fault, we don't even attempt to handle the fault and instead always send a SEGV signal (or panic). As a result, the check for exception handling at the fault PC (buried in send_sig_fault() which gets called from do_page_fault() eventually) is never used. In contrast, both 040 and 060 access error handlers do not care whether a fault happened on supervisor mode access, and will call do_page_fault() on those, ultimately honoring the exception table. Add a check in bus_error030 to call do_page_fault() in case we do have an entry for the fault PC in our exception table. I had attempted a fix for this earlier in 2019 that did rely on testing pagefault_disabled() (see link below) to achieve the same thing, but this patch should be more generic. Tested on 030 Atari Falcon. Reported-by: Eero Tamminen <oak@helsinkinet.fi> Link: https://lore.kernel.org/r/alpine.LNX.2.21.1904091023540.25@nippy.intranet Link: https://lore.kernel.org/r/63130691-1984-c423-c1f2-73bfd8d3dcd3@gmail.com Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20230301021107.26307-1-schmitzmic@gmail.com Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>	2023-03-06 14:09:42 +01:00
Geert Uytterhoeven	d4b97925e8	m68k: mm: Move initrd phys_to_virt handling after paging_init() When booting with an initial ramdisk on platforms where physical memory does not start at address zero (e.g. on Amiga): initrd: 0ef0602c - 0f800000 Zone ranges: DMA [mem 0x0000000008000000-0x000000f7ffffffff] Normal empty Movable zone start for each node Early memory node ranges node 0: [mem 0x0000000008000000-0x000000000f7fffff] Initmem setup node 0 [mem 0x0000000008000000-0x000000000f7fffff] Unable to handle kernel access at virtual address (ptrval) Oops: 00000000 Modules linked in: PC: [<00201d3c>] memcmp+0x28/0x56 As phys_to_virt() relies on m68k_memoffset and module_fixup(), it must not be called before paging_init(). Hence postpone the phys_to_virt handling for the initial ramdisk until after calling paging_init(). While at it, reduce #ifdef clutter by using IS_ENABLED() instead. Fixes: 376e3fdecb0dcae2 ("m68k: Enable memtest functionality") Reported-by: Stephen Walsh <vk3heg@vk3heg.net> Link: https://lists.debian.org/debian-68k/2022/09/msg00007.html Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/4f45f05f377bf3f5baf88dbd5c3c8aeac59d94f0.camel@physik.fu-berlin.de Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Finn Thain <fthain@linux-m68k.org> Link: https://lore.kernel.org/r/dff216da09ab7a60217c3fc2147e671ae07d636f.1677528627.git.geert@linux-m68k.org	2023-03-06 14:09:42 +01:00
Kars de Jong	0d9fad91ab	m68k: mm: Fix systems with memory at end of 32-bit address space The calculation of end addresses of memory chunks overflowed to 0 when a memory chunk is located at the end of 32-bit address space. This is the case for the HP300 architecture. Link: https://lore.kernel.org/linux-m68k/CACz-3rhUo5pgNwdWHaPWmz+30Qo9xCg70wNxdf7o5x-6tXq8QQ@mail.gmail.com/ Signed-off-by: Kars de Jong <jongk@linux-m68k.org> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20230223112349.26675-1-jongk@linux-m68k.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>	2023-03-06 14:09:42 +01:00
Arnaldo Carvalho de Melo	14e998ed42	tools include UAPI: Sync linux/vhost.h with the kernel sources To get the changes in: 3b688d7a086d0438 ("vhost-vdpa: uAPI to resume the device") To pick up these changes and support them: $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > before $ cp ../linux/include/uapi/linux/vhost.h tools/include/uapi/linux/vhost.h $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > after $ diff -u before after --- before 2023-03-06 09:26:14.889251817 -0300 +++ after 2023-03-06 09:26:20.594406270 -0300 @@ -30,6 +30,7 @@ [0x77] = "VDPA_SET_CONFIG_CALL", [0x7C] = "VDPA_SET_GROUP_ASID", [0x7D] = "VDPA_SUSPEND", + [0x7E] = "VDPA_RESUME", }; static const char *vhost_virtio_ioctl_read_cmds[] = { [0x00] = "GET_FEATURES", $ For instance, see how those 'cmd' ioctl arguments get translated, now VDPA_RESUME will be as well: # perf trace -a -e ioctl --max-events=10 0.000 ( 0.011 ms): pipewire/2261 ioctl(fd: 60, cmd: SNDRV_PCM_HWSYNC, arg: 0x1) = 0 21.353 ( 0.014 ms): pipewire/2261 ioctl(fd: 60, cmd: SNDRV_PCM_HWSYNC, arg: 0x1) = 0 25.766 ( 0.014 ms): gnome-shell/2196 ioctl(fd: 14, cmd: DRM_I915_IRQ_WAIT, arg: 0x7ffe4a22c740) = 0 25.845 ( 0.034 ms): gnome-shel:cs0/2212 ioctl(fd: 14, cmd: DRM_I915_IRQ_EMIT, arg: 0x7fd43915dc70) = 0 25.916 ( 0.011 ms): gnome-shell/2196 ioctl(fd: 9, cmd: DRM_MODE_ADDFB2, arg: 0x7ffe4a22c8a0) = 0 25.941 ( 0.025 ms): gnome-shell/2196 ioctl(fd: 9, cmd: DRM_MODE_ATOMIC, arg: 0x7ffe4a22c840) = 0 32.915 ( 0.009 ms): gnome-shell/2196 ioctl(fd: 9, cmd: DRM_MODE_RMFB, arg: 0x7ffe4a22cf9c) = 0 42.522 ( 0.013 ms): gnome-shell/2196 ioctl(fd: 14, cmd: DRM_I915_IRQ_WAIT, arg: 0x7ffe4a22c740) = 0 42.579 ( 0.031 ms): gnome-shel:cs0/2212 ioctl(fd: 14, cmd: DRM_I915_IRQ_EMIT, arg: 0x7fd43915dc70) = 0 42.644 ( 0.010 ms): gnome-shell/2196 ioctl(fd: 9, cmd: DRM_MODE_ADDFB2, arg: 0x7ffe4a22c8a0) = 0 # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Sebastien Boeuf <sebastien.boeuf@intel.com> Link: https://lore.kernel.org/lkml/ZAXdCTecxSNwAoeK@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-03-06 09:31:26 -03:00
Florian Westphal	4a02426787	netfilter: tproxy: fix deadlock due to missing BH disable The xtables packet traverser performs an unconditional local_bh_disable(), but the nf_tables evaluation loop does not. Functions that are called from either xtables or nftables must assume that they can be called in process context. inet_twsk_deschedule_put() assumes that no softirq interrupt can occur. If tproxy is used from nf_tables its possible that we'll deadlock trying to aquire a lock already held in process context. Add a small helper that takes care of this and use it. Link: https://lore.kernel.org/netfilter-devel/401bd6ed-314a-a196-1cdc-e13c720cc8f2@balasys.hu/ Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support") Reported-and-tested-by: Major Dávid <major.david@balasys.hu> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-03-06 12:09:48 +01:00
Ivan Delalande	9f7dd42f0d	netfilter: ctnetlink: revert to dumping mark regardless of event type It seems that change was unintentional, we have userspace code that needs the mark while listening for events like REPLY, DESTROY, etc. Also include 0-marks in requested dumps, as they were before that fix. Fixes: 1feeae071507 ("netfilter: ctnetlink: fix compilation warning after data race fixes in ct mark") Signed-off-by: Ivan Delalande <colona@arista.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-03-06 12:09:23 +01:00
Selvin Xavier	89b59a84cb	bnxt_en: Fix the double free during device removal Following warning reported by KASAN during driver unload ================================================================== BUG: KASAN: double-free in bnxt_remove_one+0x103/0x200 [bnxt_en] Free of addr ffff88814e8dd4c0 by task rmmod/17469 CPU: 47 PID: 17469 Comm: rmmod Kdump: loaded Tainted: G S 6.2.0-rc7+ #2 Hardware name: Dell Inc. PowerEdge R740/01YM03, BIOS 2.3.10 08/15/2019 Call Trace: <TASK> dump_stack_lvl+0x33/0x46 print_report+0x17b/0x4b3 ? __call_rcu_common.constprop.79+0x27e/0x8c0 ? __pfx_free_object_rcu+0x10/0x10 ? __virt_addr_valid+0xe3/0x160 ? bnxt_remove_one+0x103/0x200 [bnxt_en] kasan_report_invalid_free+0x64/0xd0 ? bnxt_remove_one+0x103/0x200 [bnxt_en] ? bnxt_remove_one+0x103/0x200 [bnxt_en] __kasan_slab_free+0x179/0x1c0 ? bnxt_remove_one+0x103/0x200 [bnxt_en] __kmem_cache_free+0x194/0x350 bnxt_remove_one+0x103/0x200 [bnxt_en] pci_device_remove+0x62/0x110 device_release_driver_internal+0xf6/0x1c0 driver_detach+0x76/0xe0 bus_remove_driver+0x89/0x160 pci_unregister_driver+0x26/0x110 ? strncpy_from_user+0x188/0x1c0 bnxt_exit+0xc/0x24 [bnxt_en] __x64_sys_delete_module+0x21f/0x390 ? __pfx___x64_sys_delete_module+0x10/0x10 ? __pfx_mem_cgroup_handle_over_high+0x10/0x10 ? _raw_spin_lock+0x87/0xe0 ? __pfx__raw_spin_lock+0x10/0x10 ? __audit_syscall_entry+0x185/0x210 ? ktime_get_coarse_real_ts64+0x51/0x80 ? syscall_trace_enter.isra.18+0x126/0x1a0 do_syscall_64+0x37/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7effcb6fd71b Code: 73 01 c3 48 8b 0d 6d 17 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 17 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffeada270b8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 00005623660e0750 RCX: 00007effcb6fd71b RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005623660e07b8 RBP: 0000000000000000 R08: 00007ffeada26031 R09: 0000000000000000 R10: 00007effcb771280 R11: 0000000000000206 R12: 00007ffeada272e0 R13: 00007ffeada28bc4 R14: 00005623660e02a0 R15: 00005623660e0750 </TASK> Auxiliary device structures are freed in bnxt_aux_dev_release. So avoid calling kfree from bnxt_remove_one. Also, set bp->edev to NULL before freeing the auxilary private structure. Fixes: d80d88b0dfff ("bnxt_en: Add auxiliary driver support") Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-03-06 09:36:08 +00:00
Michael Chan	accd7e2369	bnxt_en: Avoid order-5 memory allocation for TPA data The driver needs to keep track of all the possible concurrent TPA (GRO/LRO) completions on the aggregation ring. On P5 chips, the maximum number of concurrent TPA is 256 and the amount of memory we allocate is order-5 on systems using 4K pages. Memory allocation failure has been reported: NetworkManager: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL\|__GFP_COMP\|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0-1 CPU: 15 PID: 2995 Comm: NetworkManager Kdump: loaded Not tainted 5.10.156 #1 Hardware name: Dell Inc. PowerEdge R660/0M1CC5, BIOS 0.2.25 08/12/2022 Call Trace: dump_stack+0x57/0x6e warn_alloc.cold.120+0x7b/0xdd ? _cond_resched+0x15/0x30 ? __alloc_pages_direct_compact+0x15f/0x170 __alloc_pages_slowpath.constprop.108+0xc58/0xc70 __alloc_pages_nodemask+0x2d0/0x300 kmalloc_order+0x24/0xe0 kmalloc_order_trace+0x19/0x80 bnxt_alloc_mem+0x1150/0x15c0 [bnxt_en] ? bnxt_get_func_stat_ctxs+0x13/0x60 [bnxt_en] __bnxt_open_nic+0x12e/0x780 [bnxt_en] bnxt_open+0x10b/0x240 [bnxt_en] __dev_open+0xe9/0x180 __dev_change_flags+0x1af/0x220 dev_change_flags+0x21/0x60 do_setlink+0x35c/0x1100 Instead of allocating this big chunk of memory and dividing it up for the concurrent TPA instances, allocate each small chunk separately for each TPA instance. This will reduce it to order-0 allocations. Fixes: 79632e9ba386 ("bnxt_en: Expand bnxt_tpa_info struct to support 57500 chips.") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-03-06 09:36:08 +00:00
Russell King (Oracle)	f4b47a2e94	net: phylib: get rid of unnecessary locking The locking in phy_probe() and phy_remove() does very little to prevent any races with e.g. phy_attach_direct(), but instead causes lockdep ABBA warnings. Remove it. ====================================================== WARNING: possible circular locking dependency detected 6.2.0-dirty #1108 Tainted: G W E ------------------------------------------------------ ip/415 is trying to acquire lock: ffff5c268f81ef50 (&dev->lock){+.+.}-{3:3}, at: phy_attach_direct+0x17c/0x3a0 [libphy] but task is already holding lock: ffffaef6496cb518 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x154/0x560 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (rtnl_mutex){+.+.}-{3:3}: __lock_acquire+0x35c/0x6c0 lock_acquire.part.0+0xcc/0x220 lock_acquire+0x68/0x84 __mutex_lock+0x8c/0x414 mutex_lock_nested+0x34/0x40 rtnl_lock+0x24/0x30 sfp_bus_add_upstream+0x34/0x150 phy_sfp_probe+0x4c/0x94 [libphy] mv3310_probe+0x148/0x184 [marvell10g] phy_probe+0x8c/0x200 [libphy] call_driver_probe+0xbc/0x15c really_probe+0xc0/0x320 __driver_probe_device+0x84/0x120 driver_probe_device+0x44/0x120 __device_attach_driver+0xc4/0x160 bus_for_each_drv+0x80/0xe0 __device_attach+0xb0/0x1f0 device_initial_probe+0x1c/0x2c bus_probe_device+0xa4/0xb0 device_add+0x360/0x53c phy_device_register+0x60/0xa4 [libphy] fwnode_mdiobus_phy_device_register+0xc0/0x190 [fwnode_mdio] fwnode_mdiobus_register_phy+0x160/0xd80 [fwnode_mdio] of_mdiobus_register+0x140/0x340 [of_mdio] orion_mdio_probe+0x298/0x3c0 [mvmdio] platform_probe+0x70/0xe0 call_driver_probe+0x34/0x15c really_probe+0xc0/0x320 __driver_probe_device+0x84/0x120 driver_probe_device+0x44/0x120 __driver_attach+0x104/0x210 bus_for_each_dev+0x78/0xdc driver_attach+0x2c/0x3c bus_add_driver+0x184/0x240 driver_register+0x80/0x13c __platform_driver_register+0x30/0x3c xt_compat_calc_jump+0x28/0xa4 [x_tables] do_one_initcall+0x50/0x1b0 do_init_module+0x50/0x1fc load_module+0x684/0x744 __do_sys_finit_module+0xc4/0x140 __arm64_sys_finit_module+0x28/0x34 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0x6c/0x1b0 do_el0_svc+0x34/0x44 el0_svc+0x48/0xf0 el0t_64_sync_handler+0xb8/0xc0 el0t_64_sync+0x1a0/0x1a4 -> #0 (&dev->lock){+.+.}-{3:3}: check_prev_add+0xb4/0xc80 validate_chain+0x414/0x47c __lock_acquire+0x35c/0x6c0 lock_acquire.part.0+0xcc/0x220 lock_acquire+0x68/0x84 __mutex_lock+0x8c/0x414 mutex_lock_nested+0x34/0x40 phy_attach_direct+0x17c/0x3a0 [libphy] phylink_fwnode_phy_connect.part.0+0x70/0xe4 [phylink] phylink_fwnode_phy_connect+0x48/0x60 [phylink] mvpp2_open+0xec/0x2e0 [mvpp2] __dev_open+0x104/0x214 __dev_change_flags+0x1d4/0x254 dev_change_flags+0x2c/0x7c do_setlink+0x254/0xa50 __rtnl_newlink+0x430/0x514 rtnl_newlink+0x58/0x8c rtnetlink_rcv_msg+0x17c/0x560 netlink_rcv_skb+0x64/0x150 rtnetlink_rcv+0x20/0x30 netlink_unicast+0x1d4/0x2b4 netlink_sendmsg+0x1a4/0x400 ____sys_sendmsg+0x228/0x290 ___sys_sendmsg+0x88/0xec __sys_sendmsg+0x70/0xd0 __arm64_sys_sendmsg+0x2c/0x40 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0x6c/0x1b0 do_el0_svc+0x34/0x44 el0_svc+0x48/0xf0 el0t_64_sync_handler+0xb8/0xc0 el0t_64_sync+0x1a0/0x1a4 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock(&dev->lock); lock(rtnl_mutex); lock(&dev->lock); * DEADLOCK * Fixes: 298e54fa810e ("net: phy: add core phylib sfp support") Reported-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-03-06 09:33:07 +00:00
Rongguang Wei	a9334b702a	net: stmmac: add to set device wake up flag when stmmac init phy When MAC is not support PMT, driver will check PHY's WoL capability and set device wakeup capability in stmmac_init_phy(). We can enable the WoL through ethtool, the driver would enable the device wake up flag. Now the device_may_wakeup() return true. But if there is a way which enable the PHY's WoL capability derectly, like in BIOS. The driver would not know the enable thing and would not set the device wake up flag. The phy_suspend may failed like this: [ 32.409063] PM: dpm_run_callback(): mdio_bus_phy_suspend+0x0/0x50 returns -16 [ 32.409065] PM: Device stmmac-1:00 failed to suspend: error -16 [ 32.409067] PM: Some devices failed to suspend, or early wake event detected Add to set the device wakeup enable flag according to the get_wol function result in PHY can fix the error in this scene. v2: add a Fixes tag. Fixes: 1d8e5b0f3f2c ("net: stmmac: Support WOL with phy") Signed-off-by: Rongguang Wei <weirongguang@kylinos.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-03-06 07:36:20 +00:00
Dave Chinner	8ac5b996bf	xfs: fix off-by-one-block in xfs_discard_folio() The recent writeback corruption fixes changed the code in xfs_discard_folio() to calculate a byte range to for punching delalloc extents. A mistake was made in using round_up(pos) for the end offset, because when pos points at the first byte of a block, it does not get rounded up to point to the end byte of the block. hence the punch range is short, and this leads to unexpected behaviour in certain cases in xfs_bmap_punch_delalloc_range. e.g. pos = 0 means we call xfs_bmap_punch_delalloc_range(0,0), so there is no previous extent and it rounds up the punch to the end of the delalloc extent it found at offset 0, not the end of the range given to xfs_bmap_punch_delalloc_range(). Fix this by handling the zero block offset case correctly. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=217030 Link: https://lore.kernel.org/linux-xfs/Y+vOfaxIWX1c%2Fyy9@bfoster/ Fixes: 7348b322332d ("xfs: xfs_bmap_punch_delalloc_range() should take a byte range") Reported-by: Pengfei Xu <pengfei.xu@intel.com> Found-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2023-03-05 15:13:23 -08:00
Dave Chinner	0c7273e494	xfs: quotacheck failure can race with background inode inactivation The background inode inactivation can attached dquots to inodes, but this can race with a foreground quotacheck failure that leads to disabling quotas and freeing the mp->m_quotainfo structure. The background inode inactivation then tries to allocate a quota, tries to dereference mp->m_quotainfo, and crashes like so: XFS (loop1): Quotacheck: Unsuccessful (Error -5): Disabling quotas. xfs filesystem being mounted at /root/syzkaller.qCVHXV/0/file0 supports timestamps until 2038 (0x7fffffff) BUG: kernel NULL pointer dereference, address: 00000000000002a8 .... CPU: 0 PID: 161 Comm: kworker/0:4 Not tainted 6.2.0-c9c3395d5e3d #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: xfs-inodegc/loop1 xfs_inodegc_worker RIP: 0010:xfs_dquot_alloc+0x95/0x1e0 .... Call Trace: <TASK> xfs_qm_dqread+0x46/0x440 xfs_qm_dqget_inode+0x154/0x500 xfs_qm_dqattach_one+0x142/0x3c0 xfs_qm_dqattach_locked+0x14a/0x170 xfs_qm_dqattach+0x52/0x80 xfs_inactive+0x186/0x340 xfs_inodegc_worker+0xd3/0x430 process_one_work+0x3b1/0x960 worker_thread+0x52/0x660 kthread+0x161/0x1a0 ret_from_fork+0x29/0x50 </TASK> .... Prevent this race by flushing all the queued background inode inactivations pending before purging all the cached dquots when quotacheck fails. Reported-by: Pengfei Xu <pengfei.xu@intel.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2023-03-05 15:13:22 -08:00
Linus Torvalds	fe15c26ee2	Linux 6.3-rc1 v6.3-rc1	2023-03-05 14:52:03 -08:00
Linus Torvalds	596ff4a09b	cpumask: re-introduce constant-sized cpumask optimizations Commit aa47a7c215e7 ("lib/cpumask: deprecate nr_cpumask_bits") resulted in the cpumask operations potentially becoming hugely less efficient, because suddenly the cpumask was always considered to be variable-sized. The optimization was then later added back in a limited form by commit 6f9c07be9d02 ("lib/cpumask: add FORCE_NR_CPUS config option"), but that FORCE_NR_CPUS option is not useful in a generic kernel and more of a special case for embedded situations with fixed hardware. Instead, just re-introduce the optimization, with some changes. Instead of depending on CPUMASK_OFFSTACK being false, and then always using the full constant cpumask width, this introduces three different cpumask "sizes": - the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids. This is used for situations where we should use the exact size. - the "small" size (small_cpumask_bits) is the NR_CPUS constant if it fits in a single word and the bitmap operations thus end up able to trigger the "small_const_nbits()" optimizations. This is used for the operations that have optimized single-word cases that get inlined, notably the bit find and scanning functions. - the "large" size (large_cpumask_bits) is the NR_CPUS constant if it is an sufficiently small constant that makes simple "copy" and "clear" operations more efficient. This is arbitrarily set at four words or less. As a an example of this situation, without this fixed size optimization, cpumask_clear() will generate code like movl nr_cpu_ids(%rip), %edx addq $63, %rdx shrq $3, %rdx andl $-8, %edx callq memset@PLT on x86-64, because it would calculate the "exact" number of longwords that need to be cleared. In contrast, with this patch, using a MAX_CPU of 64 (which is quite a reasonable value to use), the above becomes a single movq $0,cpumask instruction instead, because instead of caring to figure out exactly how many CPU's the system has, it just knows that the cpumask will be a single word and can just clear it all. Note that this does end up tightening the rules a bit from the original version in another way: operations that set bits in the cpumask are now limited to the actual nr_cpu_ids limit, whereas we used to do the nr_cpumask_bits thing almost everywhere in the cpumask code. But if you just clear bits, or scan for bits, we can use the simpler compile-time constants. In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()' which were not useful, and which fundamentally have to be limited to 'nr_cpu_ids'. Better remove them now than have somebody introduce use of them later. Of course, on x86-64 with MAXSMP there is no sane small compile-time constant for the cpumask sizes, and we end up using the actual CPU bits, and will generate the above kind of horrors regardless. Please don't use MAXSMP unless you really expect to have machines with thousands of cores. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2023-03-05 14:30:34 -08:00

... 2 3 4 5 6 ...

1169482 Commits