IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
commit 0967bf837784a11c65d66060623a74e65211af0b upstream.
Include the second VLAN HLEN into account when computing the maximum
MTU size as other drivers do.
Fixes: fabf1bce103a ("ixgbe: Prevent unsupported configurations with XDP")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fda6c89fe3d9aca073495a664e1d5aea28cd4377 upstream.
lianhui reports that when MPLS fails to register the sysctl table
under new location (during device rename) the old pointers won't
get overwritten and may be freed again (double free).
Handle this gracefully. The best option would be unregistering
the MPLS from the device completely on failure, but unfortunately
mpls_ifdown() can fail. So failing fully is also unreliable.
Another option is to register the new table first then only
remove old one if the new one succeeds. That requires more
code, changes order of notifications and two tables may be
visible at the same time.
sysctl point is not used in the rest of the code - set to NULL
on failures and skip unregister if already NULL.
Reported-by: lianhui tang <bluetlh@gmail.com>
Fixes: 0fae3bf018d9 ("mpls: handle device renames for per-device sysctls")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 05d7623a892a9da62da0e714428e38f09e4a64d8 upstream.
When setting 'snps,force_thresh_dma_mode' DT property, the following
warning is always emitted, regardless the status of force_sf_dma_mode:
dwmac-starfive 10020000.ethernet: force_sf_dma_mode is ignored if force_thresh_dma_mode is set.
Do not print the rather misleading message when DMA store and forward
mode is already disabled.
Fixes: e2a240c7d3bc ("driver:net:stmmac: Disable DMA store and forward mode if platform data force_thresh_dma_mode is set.")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Link: https://lore.kernel.org/r/20230210202126.877548-1-cristian.ciocaltea@collabora.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2038cc592811209de20c4e094ca08bfb1e6fbc6c upstream.
In bnxt_reserve_rings(), there is logic to check that the number of TX
rings reserved is enough to cover all the mqprio TCs, but it fails to
account for the TX XDP rings. So the check will always fail if there
are mqprio TCs and TX XDP rings. As a result, the driver always fails
to initialize after the XDP program is attached and the device will be
brought down. A subsequent ifconfig up will also fail because the
number of TX rings is set to an inconsistent number. Fix the check to
properly account for TX XDP rings. If the check fails, set the number
of TX rings back to a consistent number after calling netdev_reset_tc().
Fixes: 674f50a5b026 ("bnxt_en: Implement new method to reserve rings.")
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4562c65ec852067c6196abdcf2d925f08841dcbc upstream.
So far changing the period by just setting new period values while
running did not work.
The order as indicated by the publicly available reference manual of the i.MX8MP [1]
indicates a sequence:
* initiate the programming sequence
* set the values for PPS period and start time
* start the pulse train generation.
This is currently not used in dwmac5_flex_pps_config(), which instead does:
* initiate the programming sequence and immediately start the pulse train generation
* set the values for PPS period and start time
This caused the period values written not to take effect until the FlexPPS output was
disabled and re-enabled again.
This patch fix the order and allows the period to be set immediately.
[1] https://www.nxp.com/webapp/Download?colCode=IMX8MPRM
Fixes: 9a8a02c9d46d ("net: stmmac: Add Flexible PPS support")
Signed-off-by: Johannes Zink <j.zink@pengutronix.de>
Link: https://lore.kernel.org/r/20230210143937.3427483-1-j.zink@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2fa28f5c6fcbfc794340684f36d2581b4f2d20b5 upstream.
old_meter needs to be free after it is detached regardless of whether
the new meter is successfully attached.
Fixes: c7c4c44c9a95 ("net: openvswitch: expand the meters supported number")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c68f345b7c425b38656e1791a0486769a8797016 upstream.
syzbot reported that act_len in kalmia_send_init_packet() is
uninitialized when passing it to the first usb_bulk_msg error path. Jiri
Pirko noted that it's pointless to pass it in the error path, and that
the value that would be printed in the second error path would be the
value of act_len from the first call to usb_bulk_msg.[1]
With this in mind, let's just not pass act_len to the usb_bulk_msg error
paths.
1: https://lore.kernel.org/lkml/Y9pY61y1nwTuzMOa@nanopsycho/
Fixes: d40261236e8e ("net/usb: Add Samsung Kalmia driver for Samsung GT-B3730")
Reported-and-tested-by: syzbot+cd80c5ef5121bfe85b55@syzkaller.appspotmail.com
Signed-off-by: Miko Larsson <mikoxyzzz@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ca43ccf41224b023fc290073d5603a755fd12eed upstream.
Eric Dumazet pointed out [0] that when we call skb_set_owner_r()
for ipv6_pinfo.pktoptions, sk_rmem_schedule() has not been called,
resulting in a negative sk_forward_alloc.
We add a new helper which clones a skb and sets its owner only
when sk_rmem_schedule() succeeds.
Note that we move skb_set_owner_r() forward in (dccp|tcp)_v6_do_rcv()
because tcp_send_synack() can make sk_forward_alloc negative before
ipv6_opt_accepted() in the crossed SYN-ACK or self-connect() cases.
[0]: https://lore.kernel.org/netdev/CANn89iK9oc20Jdi_41jb9URdF210r7d1Y-+uypbMSbOfY6jqrg@mail.gmail.com/
Fixes: 323fbd0edf3f ("net: dccp: Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv()")
Fixes: 3df80d9320bc ("[DCCP]: Introduce DCCPv6")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee059170b1f7e94e55fa6cadee544e176a6e59c2 upstream.
The imperfect hash area can be updated while packets are traversing,
which will cause a use-after-free when 'tcf_exts_exec()' is called
with the destroyed tcf_ext.
CPU 0: CPU 1:
tcindex_set_parms tcindex_classify
tcindex_lookup
tcindex_lookup
tcf_exts_change
tcf_exts_exec [UAF]
Stop operating on the shared area directly, by using a local copy,
and update the filter with 'rcu_replace_pointer()'. Delete the old
filter version only after a rcu grace period elapsed.
Fixes: 9b0d4446b569 ("net: sched: avoid atomic swap in tcf_exts_change")
Reported-by: valis <sec@valis.email>
Suggested-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20230209143739.279867-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a1221703a0f75a9d81748c516457e0fc76951496 upstream.
Use list_is_first() to check whether tsp->asoc matches the first
element of ep->asocs, as the list is not guaranteed to have an entry.
Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20230208-sctp-filter-v2-1-6e1f4017f326@diag.uniroma1.it
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0ed577e7e8e508c24e22ba07713ecc4903e147c3 upstream.
In TI's AM62x/AM64x SoCs, successful teardown of RX DMA Channel raises an
interrupt. The process of servicing this interrupt involves flushing all
pending RX DMA descriptors and clearing the teardown completion marker
(TDCM). The am65_cpsw_nuss_rx_packets() function invoked from the RX
NAPI callback services the interrupt. Thus, it is necessary to wait for
this handler to run, drain all packets and clear TDCM, before calling
napi_disable() in am65_cpsw_nuss_common_stop() function post channel
teardown. If napi_disable() executes before ensuring that TDCM is
cleared, the TDCM remains set when the interfaces are down, resulting in
an interrupt storm when the interfaces are brought up again.
Since the interrupt raised to indicate the RX DMA Channel teardown is
specific to the AM62x and AM64x SoCs, add a quirk for it.
Fixes: 4f7cce272403 ("net: ethernet: ti: am65-cpsw: add support for am64x cpsw3g")
Co-developed-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Link: https://lore.kernel.org/r/20230209084432.189222-1-s-vadapalli@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d61615c366a489646a1bfe5b33455f916762d5f4 upstream.
Code blocks handling BCMA_CHIP_ID_BCM5357 and BCMA_CHIP_ID_BCM53572 were
incorrectly unified. Chip package values are not unique and cannot be
checked independently. They are meaningful only in a context of a given
chip.
Packages BCM5358 and BCM47188 share the same value but then belong to
different chips. Code unification resulted in treating BCM5358 as
BCM47188 and broke its initialization.
Link: https://github.com/openwrt/openwrt/issues/8278
Fixes: cb1b0f90acfe ("net: ethernet: bgmac: unify code of the same family")
Cc: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230208091637.16291-1-zajec5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ce45ffb815e8e238f05de1630be3969b6bb15e4e upstream.
Include the second VLAN HLEN into account when computing the maximum
MTU size as other drivers do.
Fixes: 0c8493d90b6b ("i40e: add XDP support for pass and drop actions")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f9cd6a4418bac6a046ee78382423b1ae7565fb24 upstream.
Recently I encountered one case where I cannot increase the MTU size
directly from 1500 to a much bigger value with XDP enabled if the
server is equipped with IXGBE card, which happened on thousands of
servers in production environment. After applying the current patch,
we can set the maximum MTU size to 3K.
This patch follows the behavior of changing MTU as i40e/ice does.
References:
[1] commit 23b44513c3e6 ("ice: allow 3k MTU for XDP")
[2] commit 0c8493d90b6b ("i40e: add XDP support for pass and drop actions")
Fixes: fabf1bce103a ("ixgbe: Prevent unsupported configurations with XDP")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9b55d3f0a69af649c62cbc2633e6d695bb3cc583 upstream.
When converting net_device_stats to rtnl_link_stats64 sign extension
is triggered on ILP32 machines as 6c1c509778 changed the previous
"ulong -> u64" conversion to "long -> u64" by accessing the
net_device_stats fields through a (signed) atomic_long_t.
This causes for example the received bytes counter to jump to 16EiB after
having received 2^31 bytes. Casting the atomic value to "unsigned long"
beforehand converting it into u64 avoids this.
Fixes: 6c1c5097781f ("net: add atomic_long_t to net_device_stats fields")
Signed-off-by: Felix Riemann <felix.riemann@sma.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 647037adcad00f2bab8828d3d41cd0553d41f3bd upstream.
This reverts commit 115d9d77bb0f9152c60b6e8646369fa7f6167593.
The pages being freed by memblock_free_late() have already been
initialized, but if they are in the deferred init range,
__free_one_page() might access nearby uninitialized pages when trying to
coalesce buddies. This can, for example, trigger this BUG:
BUG: unable to handle page fault for address: ffffe964c02580c8
RIP: 0010:__list_del_entry_valid+0x3f/0x70
<TASK>
__free_one_page+0x139/0x410
__free_pages_ok+0x21d/0x450
memblock_free_late+0x8c/0xb9
efi_free_boot_services+0x16b/0x25c
efi_enter_virtual_mode+0x403/0x446
start_kernel+0x678/0x714
secondary_startup_64_no_verify+0xd2/0xdb
</TASK>
A proper fix will be more involved so revert this change for the time
being.
Fixes: 115d9d77bb0f ("mm: Always release pages to the buddy allocator in memblock_free_late().")
Signed-off-by: Aaron Thompson <dev@aaront.org>
Link: https://lore.kernel.org/r/20230207082151.1303-1-dev@aaront.org
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 90091c367e74d5b58d9ebe979cc363f7468f58d3 upstream.
Exit with return code 4 if lkdtm is not available like other tests
in order to properly skip the test.
Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210805101236.1140381-1-misono.tomohiro@jp.fujitsu.com
Cc: Andrew Paniakin <apanyaki@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ce4d9a1ea35ac5429e822c4106cb2859d5c71f3e upstream.
Patch series "Fix kmemleak crashes when scanning CMA regions", v2.
When trying to boot a device with an ARM64 kernel with the following
config options enabled:
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
CONFIG_DEBUG_KMEMLEAK=y
a crash is encountered when kmemleak starts to scan the list of gray
or allocated objects that it maintains. Upon closer inspection, it was
observed that these page-faults always occurred when kmemleak attempted
to scan a CMA region.
At the moment, kmemleak is made aware of CMA regions that are specified
through the devicetree to be dynamically allocated within a range of
addresses. However, kmemleak should not need to scan CMA regions or any
reserved memory region, as those regions can be used for DMA transfers
between drivers and peripherals, and thus wouldn't contain anything
useful for kmemleak.
Additionally, since CMA regions are unmapped from the kernel's address
space when they are freed to the buddy allocator at boot when
CONFIG_DEBUG_PAGEALLOC is enabled, kmemleak shouldn't attempt to access
those memory regions, as that will trigger a crash. Thus, kmemleak
should ignore all dynamically allocated reserved memory regions.
This patch (of 1):
Currently, kmemleak ignores dynamically allocated reserved memory regions
that don't have a kernel mapping. However, regions that do retain a
kernel mapping (e.g. CMA regions) do get scanned by kmemleak.
This is not ideal for two reasons:
1 kmemleak works by scanning memory regions for pointers to allocated
objects to determine if those objects have been leaked or not.
However, reserved memory regions can be used between drivers and
peripherals for DMA transfers, and thus, would not contain pointers to
allocated objects, making it unnecessary for kmemleak to scan these
reserved memory regions.
2 When CONFIG_DEBUG_PAGEALLOC is enabled, along with kmemleak, the
CMA reserved memory regions are unmapped from the kernel's address
space when they are freed to buddy at boot. These CMA reserved regions
are still tracked by kmemleak, however, and when kmemleak attempts to
scan them, a crash will happen, as accessing the CMA region will result
in a page-fault, since the regions are unmapped.
Thus, use kmemleak_ignore_phys() for all dynamically allocated reserved
memory regions, instead of those that do not have a kernel mapping
associated with them.
Link: https://lkml.kernel.org/r/20230208232001.2052777-1-isaacmanjarres@google.com
Link: https://lkml.kernel.org/r/20230208232001.2052777-2-isaacmanjarres@google.com
Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private")
Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Kirill A. Shutemov <kirill.shtuemov@linux.intel.com>
Cc: Nick Kossifidis <mick@ics.forth.gr>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Saravana Kannan <saravanak@google.com>
Cc: <stable@vger.kernel.org> [5.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ec4288fe63966b26d53907212ecd05dfa81dd2cc upstream.
Users can specify the hugetlb page size in the mmap, shmget and
memfd_create system calls. This is done by using 6 bits within the flags
argument to encode the base-2 logarithm of the desired page size. The
routine hstate_sizelog() uses the log2 value to find the corresponding
hugetlb hstate structure. Converting the log2 value (page_size_log) to
potential hugetlb page size is the simple statement:
1UL << page_size_log
Because only 6 bits are used for page_size_log, the left shift can not be
greater than 63. This is fine on 64 bit architectures where a long is 64
bits. However, if a value greater than 31 is passed on a 32 bit
architecture (where long is 32 bits) the shift will result in undefined
behavior. This was generally not an issue as the result of the undefined
shift had to exactly match hugetlb page size to proceed.
Recent improvements in runtime checking have resulted in this undefined
behavior throwing errors such as reported below.
Fix by comparing page_size_log to BITS_PER_LONG before doing shift.
Link: https://lkml.kernel.org/r/20230216013542.138708-1-mike.kravetz@oracle.com
Link: https://lore.kernel.org/lkml/CA+G9fYuei_Tr-vN9GS7SfFyU1y9hNysnf=PB7kT0=yv4MiPgVg@mail.gmail.com/
Fixes: 42d7395feb56 ("mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reviewed-by: Jesper Juhl <jesperjuhl76@gmail.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c2dbe32d5db5c4ead121cf86dabd5ab691fb47fe upstream.
If a non-root cgroup gets removed when there is a thread that registered
trigger and is polling on a pressure file within the cgroup, the polling
waitqueue gets freed in the following path:
do_rmdir
cgroup_rmdir
kernfs_drain_open_files
cgroup_file_release
cgroup_pressure_release
psi_trigger_destroy
However, the polling thread still has a reference to the pressure file and
will access the freed waitqueue when the file is closed or upon exit:
fput
ep_eventpoll_release
ep_free
ep_remove_wait_queue
remove_wait_queue
This results in use-after-free as pasted below.
The fundamental problem here is that cgroup_file_release() (and
consequently waitqueue's lifetime) is not tied to the file's real lifetime.
Using wake_up_pollfree() here might be less than ideal, but it is in line
with the comment at commit 42288cb44c4b ("wait: add wake_up_pollfree()")
since the waitqueue's lifetime is not tied to file's one and can be
considered as another special case. While this would be fixable by somehow
making cgroup_file_release() be tied to the fput(), it would require
sizable refactoring at cgroups or higher layer which might be more
justifiable if we identify more cases like this.
BUG: KASAN: use-after-free in _raw_spin_lock_irqsave+0x60/0xc0
Write of size 4 at addr ffff88810e625328 by task a.out/4404
CPU: 19 PID: 4404 Comm: a.out Not tainted 6.2.0-rc6 #38
Hardware name: Amazon EC2 c5a.8xlarge/, BIOS 1.0 10/16/2017
Call Trace:
<TASK>
dump_stack_lvl+0x73/0xa0
print_report+0x16c/0x4e0
kasan_report+0xc3/0xf0
kasan_check_range+0x2d2/0x310
_raw_spin_lock_irqsave+0x60/0xc0
remove_wait_queue+0x1a/0xa0
ep_free+0x12c/0x170
ep_eventpoll_release+0x26/0x30
__fput+0x202/0x400
task_work_run+0x11d/0x170
do_exit+0x495/0x1130
do_group_exit+0x100/0x100
get_signal+0xd67/0xde0
arch_do_signal_or_restart+0x2a/0x2b0
exit_to_user_mode_prepare+0x94/0x100
syscall_exit_to_user_mode+0x20/0x40
do_syscall_64+0x52/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
</TASK>
Allocated by task 4404:
kasan_set_track+0x3d/0x60
__kasan_kmalloc+0x85/0x90
psi_trigger_create+0x113/0x3e0
pressure_write+0x146/0x2e0
cgroup_file_write+0x11c/0x250
kernfs_fop_write_iter+0x186/0x220
vfs_write+0x3d8/0x5c0
ksys_write+0x90/0x110
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 4407:
kasan_set_track+0x3d/0x60
kasan_save_free_info+0x27/0x40
____kasan_slab_free+0x11d/0x170
slab_free_freelist_hook+0x87/0x150
__kmem_cache_free+0xcb/0x180
psi_trigger_destroy+0x2e8/0x310
cgroup_file_release+0x4f/0xb0
kernfs_drain_open_files+0x165/0x1f0
kernfs_drain+0x162/0x1a0
__kernfs_remove+0x1fb/0x310
kernfs_remove_by_name_ns+0x95/0xe0
cgroup_addrm_files+0x67f/0x700
cgroup_destroy_locked+0x283/0x3c0
cgroup_rmdir+0x29/0x100
kernfs_iop_rmdir+0xd1/0x140
vfs_rmdir+0xfe/0x240
do_rmdir+0x13d/0x280
__x64_sys_rmdir+0x2c/0x30
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: 0e94682b73bf ("psi: introduce psi monitor")
Signed-off-by: Munehisa Kamata <kamatam@amazon.com>
Signed-off-by: Mengchi Cheng <mengcc@amazon.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/20230106224859.4123476-1-kamatam@amazon.com/
Link: https://lore.kernel.org/r/20230214212705.4058045-1-kamatam@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2bdccfd290d421b50df4ec6a68d832dad1310748 upstream.
GPIO2 PIN use for output. Mask Dir and Data need to assign for 0x4. Not 0x3.
This fixed was for Lenovo Desktop(0x17aa1056). GPIO2 use for AMP enable.
Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/8d02bb9ac8134f878cd08607fdf088fd@realtek.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 18d7e16c917a08f08778ecf2b780d63648d5d923 upstream.
The current kernel does not support the SN6180 codec chip.
Add the SN6180 codec configuration item to kernel.
Signed-off-by: Bo Liu <bo.liu@senarytech.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/1675908828-1012-1-git-send-email-bo.liu@senarytech.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cf4c9d2ac1e42c7d18b921bec39486896645b714 upstream.
If mmc_add_host() fails, it doesn't need to call mmc_remove_host(),
or it will cause null-ptr-deref, because of deleting a not added
device in mmc_remove_host().
To fix this, goto label 'fail_glue_init', if mmc_add_host() fails,
and change the label 'fail_add_host' to 'fail_gpiod_request'.
Fixes: 15a0580ced08 ("mmc_spi host driver")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230131013835.3564011-1-yangyingliang@huawei.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 605d9fb9556f8f5fb4566f4df1480f280f308ded upstream.
If sdio_add_func() or sdio_init_func() fails, sdio_remove_func() can
not release the resources, because the sdio function is not presented
in these two cases, it won't call of_node_put() or put_device().
To fix these leaks, make sdio_func_present() only control whether
device_del() needs to be called or not, then always call of_node_put()
and put_device().
In error case in sdio_init_func(), the reference of 'card->dev' is
not get, to avoid redundant put in sdio_free_func_cis(), move the
get_device() to sdio_alloc_func() and put_device() to sdio_release_func(),
it can keep the get/put function be balanced.
Without this patch, while doing fault inject test, it can get the
following leak reports, after this fix, the leak is gone.
unreferenced object 0xffff888112514000 (size 2048):
comm "kworker/3:2", pid 65, jiffies 4294741614 (age 124.774s)
hex dump (first 32 bytes):
00 e0 6f 12 81 88 ff ff 60 58 8d 06 81 88 ff ff ..o.....`X......
10 40 51 12 81 88 ff ff 10 40 51 12 81 88 ff ff .@Q......@Q.....
backtrace:
[<000000009e5931da>] kmalloc_trace+0x21/0x110
[<000000002f839ccb>] mmc_alloc_card+0x38/0xb0 [mmc_core]
[<0000000004adcbf6>] mmc_sdio_init_card+0xde/0x170 [mmc_core]
[<000000007538fea0>] mmc_attach_sdio+0xcb/0x1b0 [mmc_core]
[<00000000d4fdeba7>] mmc_rescan+0x54a/0x640 [mmc_core]
unreferenced object 0xffff888112511000 (size 2048):
comm "kworker/3:2", pid 65, jiffies 4294741623 (age 124.766s)
hex dump (first 32 bytes):
00 40 51 12 81 88 ff ff e0 58 8d 06 81 88 ff ff .@Q......X......
10 10 51 12 81 88 ff ff 10 10 51 12 81 88 ff ff ..Q.......Q.....
backtrace:
[<000000009e5931da>] kmalloc_trace+0x21/0x110
[<00000000fcbe706c>] sdio_alloc_func+0x35/0x100 [mmc_core]
[<00000000c68f4b50>] mmc_attach_sdio.cold.18+0xb1/0x395 [mmc_core]
[<00000000d4fdeba7>] mmc_rescan+0x54a/0x640 [mmc_core]
Fixes: 3d10a1ba0d37 ("sdio: fix reference counting in sdio_remove_func()")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230130125808.3471254-1-yangyingliang@huawei.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3f18c5046e633cc4bbad396b74c05d46d353033d upstream.
On JZ4760 and JZ4760B, SD cards fail to run if the maximum clock
rate is set to 50 MHz, even though the controller officially does
support it.
Until the actual bug is found and fixed, limit the maximum clock rate to
24 MHz.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230131210229.68129-1-paul@crapouillou.net
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we backport dadd0dcaa67d ("net/ulp: prevent ULP without clone op from
entering the LISTEN status"), we have accidentally backported a part of
7a7160edf1bf ("net: Return errno in sk->sk_prot->get_port().") and removed
err = -EADDRINUSE in inet_csk_listen_start().
Thus, listen() no longer returns -EADDRINUSE even if ->get_port() failed
as reported in [0].
We set -EADDRINUSE to err just before ->get_port() to fix the regression.
[0]: https://lore.kernel.org/stable/EF8A45D0-768A-4CD5-9A8A-0FA6E610ABF7@winter.cafe/
Reported-by: Winter <winter@winter.cafe>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 18bbc3213383a82b05383827f4b1b882e3f0a5a5 upstream.
TPROXY is only allowed from prerouting, but nft_tproxy doesn't check this.
This fixes a crash (null dereference) when using tproxy from e.g. output.
Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support")
Reported-by: Shell Chen <xierch@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Qingfang DENG <dqfext@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8e60615e8932167057b363c11a7835da7f007106 upstream.
By default when the system is configured for low power idle in the FADT
the keyboard is set up as a wake source. This matches the behavior that
Windows uses for Modern Standby as well.
It has been reported that a variety of AMD based designs there are
spurious wakeups are happening where two IRQ sources are active.
For example:
```
PM: Triggering wakeup from IRQ 9
PM: Triggering wakeup from IRQ 1
```
In these designs IRQ 9 is the ACPI SCI and IRQ 1 is the keyboard.
One way to trigger this problem is to suspend the laptop and then unplug
the AC adapter. The SOC will be in a hardware sleep state and plugging
in the AC adapter returns control to the kernel's s2idle loop.
Normally if just IRQ 9 was active the s2idle loop would advance any EC
transactions and no other IRQ being active would cause the s2idle loop
to put the SOC back into hardware sleep state.
When this bug occurred IRQ 1 is also active even if no keyboard activity
occurred. This causes the s2idle loop to break and the system to wake.
This is a platform firmware bug triggering IRQ1 without keyboard activity.
This occurs in Windows as well, but Windows will enter "SW DRIPS" and
then with no activity enters back into "HW DRIPS" (hardware sleep state).
This issue affects Renoir, Lucienne, Cezanne, and Barcelo platforms. It
does not happen on newer systems such as Mendocino or Rembrandt.
It's been fixed in newer platform firmware. To avoid triggering the bug
on older systems check the SMU F/W version and adjust the policy at suspend
time for s2idle wakeup from keyboard on these systems. A lot of thought
and experimentation has been given around the timing of disabling IRQ1,
and to make it work the "suspend" PM callback is restored.
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reported-by: Xaver Hugl <xaver.hugl@gmail.com>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2115
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1951
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20230120191519.15926-1-mario.limonciello@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
[ This has been hand modified for missing dependency commits. ]
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1921#note_1770257
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b8fb0d9b47660ddb8a8256412784aad7cee9f21a upstream.
Yellow carp has been outputting versions like `1093.24.0`, but this
is supposed to be 69.24.0. That is the MSB is being interpreted
incorrectly.
The MSB is not part of the major version, but has generally been
treated that way thus far. It's actually the program, and used to
distinguish between two programs from a similar family but different
codebase.
Link: https://patchwork.freedesktop.org/patch/469993/
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20220120174439.12770-1-mario.limonciello@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 40635cd32f0d83573a558dc30e9ba3469e769249 upstream.
The amd_pmc_get_smu_version() and amd_pmc_idlemask_read() functions are
used in the probe / suspend/resume code, so they are also used when
CONFIG_DEBUGFS is disabled, move them outside of the #ifdef CONFIG_DEBUGFS
block.
Note this purely moves the code to above the #ifdef CONFIG_DEBUGFS,
the code is completely unchanged.
Fixes: f6045de1f532 ("platform/x86: amd-pmc: Export Idlemask values based on the APU")
Cc: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Cc: Sanket Goswami <Sanket.Goswami@amd.com>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6045de1f53268131ea75a99b210b869dcc150b2 upstream.
IdleMask is the metric used by the PM firmware to know the status of each
of the Hardware IP blocks monitored by the PM firmware.
Knowing this value is key to get the information of s2idle suspend/resume
status. This value is mapped to PMC scratch registers, retrieve them
accordingly based on the CPU family and the underlying firmware support.
Co-developed-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Sanket Goswami <Sanket.Goswami@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20210916124002.2529-1-Sanket.Goswami@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2a00299e7447395d0898e7c6214817c06a61a8e8 upstream.
[Why]
drm_atomic_normalize_zpos() can return an error code when there's
modeset lock contention. This was being ignored.
[How]
Bail out of atomic check if normalize_zpos() returns an error.
Fixes: b261509952bc ("drm/amd/display: Fix double cursor on non-video RGB MPO")
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 81e9d6f8647650a7bead74c5f926e29970e834d1 upstream.
Commit e4a0d3e720e7 ("aio: Make it possible to remap aio ring") introduced
a null-deref if mremap is called on an old aio mapping after fork as
mm->ioctx_table will be set to NULL.
[jmoyer@redhat.com: fix 80 column issue]
Link: https://lkml.kernel.org/r/x49sffq4nvg.fsf@segfault.boston.devel.redhat.com
Fixes: e4a0d3e720e7 ("aio: Make it possible to remap aio ring")
Signed-off-by: Seth Jenkins <sethjenkins@google.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Jann Horn <jannh@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d4e85922e3e7ef2071f91f65e61629b60f3a9cf4 upstream.
If the peer closes all the existing subflows for a given
mptcp socket and later the application closes it, the current
implementation let it survive until the timewait timeout expires.
While the above is allowed by the protocol specification it
consumes resources for almost no reason and additionally
causes sporadic self-tests failures.
Let's move the mptcp socket to the TCP_CLOSE state when there are
no alive subflows at close time, so that the allocated resources
will be freed immediately.
Fixes: e16163b6e2b7 ("mptcp: refactor shutdown and close")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a54f78def73d847cb060b18c4e4a3d1d26c9ca6d ]
The recent patch to improve btree cycle checking caused a regression
when I rebased the in-memory btree branch atop the 5.19 for-next branch,
because in-memory short-pointer btrees do not have AG numbers. This
produced the following complaint from kmemleak:
unreferenced object 0xffff88803d47dde8 (size 264):
comm "xfs_io", pid 4889, jiffies 4294906764 (age 24.072s)
hex dump (first 32 bytes):
90 4d 0b 0f 80 88 ff ff 00 a0 bd 05 80 88 ff ff .M..............
e0 44 3a a0 ff ff ff ff 00 df 08 06 80 88 ff ff .D:.............
backtrace:
[<ffffffffa0388059>] xfbtree_dup_cursor+0x49/0xc0 [xfs]
[<ffffffffa029887b>] xfs_btree_dup_cursor+0x3b/0x200 [xfs]
[<ffffffffa029af5d>] __xfs_btree_split+0x6ad/0x820 [xfs]
[<ffffffffa029b130>] xfs_btree_split+0x60/0x110 [xfs]
[<ffffffffa029f6da>] xfs_btree_make_block_unfull+0x19a/0x1f0 [xfs]
[<ffffffffa029fada>] xfs_btree_insrec+0x3aa/0x810 [xfs]
[<ffffffffa029fff3>] xfs_btree_insert+0xb3/0x240 [xfs]
[<ffffffffa02cb729>] xfs_rmap_insert+0x99/0x200 [xfs]
[<ffffffffa02cf142>] xfs_rmap_map_shared+0x192/0x5f0 [xfs]
[<ffffffffa02cf60b>] xfs_rmap_map_raw+0x6b/0x90 [xfs]
[<ffffffffa0384a85>] xrep_rmap_stash+0xd5/0x1d0 [xfs]
[<ffffffffa0384dc0>] xrep_rmap_visit_bmbt+0xa0/0xf0 [xfs]
[<ffffffffa0384fb6>] xrep_rmap_scan_iext+0x56/0xa0 [xfs]
[<ffffffffa03850d8>] xrep_rmap_scan_ifork+0xd8/0x160 [xfs]
[<ffffffffa0385195>] xrep_rmap_scan_inode+0x35/0x80 [xfs]
[<ffffffffa03852ee>] xrep_rmap_find_rmaps+0x10e/0x270 [xfs]
I noticed that xfs_btree_insrec has a bunch of debug code that return
out of the function immediately, without freeing the "new" btree cursor
that can be returned when _make_block_unfull calls xfs_btree_split. Fix
the error return in this function to free the btree cursor.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 86d40f1e49e9a909d25c35ba01bea80dbcd758cb ]
xfs/434 and xfs/436 have been reporting occasional memory leaks of
xfs_dquot objects. These tests themselves were the messenger, not the
culprit, since they unload the xfs module, which trips the slub
debugging code while tearing down all the xfs slab caches:
=============================================================================
BUG xfs_dquot (Tainted: G W ): Objects remaining in xfs_dquot on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Slab 0xffffea000606de00 objects=30 used=5 fp=0xffff888181b78a78 flags=0x17ff80000010200(slab|head|node=0|zone=2|lastcpupid=0xfff)
CPU: 0 PID: 3953166 Comm: modprobe Tainted: G W 5.18.0-rc6-djwx #rc6 d5824be9e46a2393677bda868f9b154d917ca6a7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-builder-01.us.oracle.com-4.el7.1 04/01/2014
Since we don't generally rmmod the xfs module between fstests, this
means that xfs/434 is really just the canary in the coal mine --
something leaked a dquot, but we don't know who. After days of pounding
on fstests with kmemleak enabled, I finally got it to spit this out:
unreferenced object 0xffff8880465654c0 (size 536):
comm "u10:4", pid 88, jiffies 4294935810 (age 29.512s)
hex dump (first 32 bytes):
60 4a 56 46 80 88 ff ff 58 ea e4 5c 80 88 ff ff `JVF....X..\....
00 e0 52 49 80 88 ff ff 01 00 01 00 00 00 00 00 ..RI............
backtrace:
[<ffffffffa0740f6c>] xfs_dquot_alloc+0x2c/0x530 [xfs]
[<ffffffffa07443df>] xfs_qm_dqread+0x6f/0x330 [xfs]
[<ffffffffa07462a2>] xfs_qm_dqget+0x132/0x4e0 [xfs]
[<ffffffffa0756bb0>] xfs_qm_quotacheck_dqadjust+0xa0/0x3e0 [xfs]
[<ffffffffa075724d>] xfs_qm_dqusage_adjust+0x35d/0x4f0 [xfs]
[<ffffffffa06c9068>] xfs_iwalk_ag_recs+0x348/0x5d0 [xfs]
[<ffffffffa06c95d3>] xfs_iwalk_run_callbacks+0x273/0x540 [xfs]
[<ffffffffa06c9e8d>] xfs_iwalk_ag+0x5ed/0x890 [xfs]
[<ffffffffa06ca22f>] xfs_iwalk_ag_work+0xff/0x170 [xfs]
[<ffffffffa06d22c9>] xfs_pwork_work+0x79/0x130 [xfs]
[<ffffffff81170bb2>] process_one_work+0x672/0x1040
[<ffffffff81171b1b>] worker_thread+0x59b/0xec0
[<ffffffff8118711e>] kthread+0x29e/0x340
[<ffffffff810032bf>] ret_from_fork+0x1f/0x30
Now we know that quotacheck is at fault, but even this report was
canaryish -- it was triggered by xfs/494, which doesn't actually mount
any filesystems. (kmemleak can be a little slow to notice leaks, even
with fstests repeatedly whacking it to look for them.) Looking at the
*previous* fstest, however, showed that the test run before xfs/494 was
xfs/117. The tipoff to the problem is in this excerpt from dmesg:
XFS (sda4): Quotacheck needed: Please wait.
XFS (sda4): Metadata corruption detected at xfs_dinode_verify.part.0+0xdb/0x7b0 [xfs], inode 0x119 dinode
XFS (sda4): Unmount and run xfs_repair
XFS (sda4): First 128 bytes of corrupted metadata buffer:
00000000: 49 4e 81 a4 03 02 00 00 00 00 00 00 00 00 00 00 IN..............
00000010: 00 00 00 01 00 00 00 00 00 90 57 54 54 1a 4c 68 ..........WTT.Lh
00000020: 81 f9 7d e1 6d ee 16 00 34 bd 7d e1 6d ee 16 00 ..}.m...4.}.m...
00000030: 34 bd 7d e1 6d ee 16 00 00 00 00 00 00 00 00 00 4.}.m...........
00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000050: 00 00 00 02 00 00 00 00 00 00 00 00 96 80 f3 ab ................
00000060: ff ff ff ff da 57 7b 11 00 00 00 00 00 00 00 03 .....W{.........
00000070: 00 00 00 01 00 00 00 10 00 00 00 00 00 00 00 08 ................
XFS (sda4): Quotacheck: Unsuccessful (Error -117): Disabling quotas.
The dinode verifier decided that the inode was corrupt, which causes
iget to return with EFSCORRUPTED. Since this happened during
quotacheck, it is obvious that the kernel aborted the inode walk on
account of the corruption error and disabled quotas. Unfortunately, we
neglect to purge the dquot cache before doing that, which is how the
dquots leaked.
The problems started 10 years ago in commit b84a3a, when the dquot lists
were converted to a radix tree, but the error handling behavior was not
correctly preserved -- in that commit, if the bulkstat failed and
usrquota was enabled, the bulkstat failure code would be overwritten by
the result of flushing all the dquots to disk. As long as that
succeeds, we'd continue the quota mount as if everything were ok, but
instead we're now operating with a corrupt inode and incorrect quota
usage counts. I didn't notice this bug in 2019 when I wrote commit
ebd126a, which changed quotacheck to skip the dqflush when the scan
doesn't complete due to inode walk failures.
Introduced-by: b84a3a96751f ("xfs: remove the per-filesystem list of dquots")
Fixes: ebd126a651f8 ("xfs: convert quotacheck to use the new iwalk functions")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 56486f307100e8fc66efa2ebd8a71941fa10bf6f ]
xfs/538 on a 1kB block filesystem failed with this assert:
XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_ino.allocated == 0 || xfs_is_shutdown(cur->bc_mp), file: fs/xfs/libxfs/xfs_btree.c, line: 448
The problem was that an allocation failed unexpectedly in
xfs_bmbt_alloc_block() after roughly 150,000 minlen allocation error
injections, resulting in an EFSCORRUPTED error being returned to
xfs_bmapi_write(). The error occurred on extent-to-btree format
conversion allocating the new root block:
RIP: 0010:xfs_bmbt_alloc_block+0x177/0x210
Call Trace:
<TASK>
xfs_btree_new_iroot+0xdf/0x520
xfs_btree_make_block_unfull+0x10d/0x1c0
xfs_btree_insrec+0x364/0x790
xfs_btree_insert+0xaa/0x210
xfs_bmap_add_extent_hole_real+0x1fe/0x9a0
xfs_bmapi_allocate+0x34c/0x420
xfs_bmapi_write+0x53c/0x9c0
xfs_alloc_file_space+0xee/0x320
xfs_file_fallocate+0x36b/0x450
vfs_fallocate+0x148/0x340
__x64_sys_fallocate+0x3c/0x70
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa
Why the allocation failed at this point is unknown, but is likely
that we ran the transaction out of reserved space and filesystem out
of space with bmbt blocks because of all the minlen allocations
being done causing worst case fragmentation of a large allocation.
Regardless of the cause, we've then called xfs_bmapi_finish() which
calls xfs_btree_del_cursor(cur, error) to tear down the cursor.
So we have a failed operation, error != 0, cur->bc_ino.allocated > 0
and the filesystem is still up. The assert fails to take into
account that allocation can fail with an error and the transaction
teardown will shut the filesystem down if necessary. i.e. the
assert needs to check "|| error != 0" as well, because at this point
shutdown is pending because the current transaction is dirty....
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5b55cbc2d72632e874e50d2e36bce608e55aaaea ]
Not fatal, the assert is there to catch developer attention. I'm
seeing this occasionally during recoveryloop testing after a
shutdown, and I don't want this to stop an overnight recoveryloop
run as it is currently doing.
Convert the ASSERT to a XFS_IS_CORRUPT() check so it will dump a
corruption report into the log and cause a test failure that way,
but it won't stop the machine dead.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5672225e8f2a872a22b0cecedba7a6644af1fb84 ]
Commit dc04db2aa7c9 has caused a small aim7 regression, showing a
small increase in CPU usage in __xfs_btree_check_sblock() as a
result of the extra checking.
This is likely due to the endian conversion of the sibling poitners
being unconditional instead of relying on the compiler to endian
convert the NULL pointer at compile time and avoiding the runtime
conversion for this common case.
Rework the checks so that endian conversion of the sibling pointers
is only done if they are not null as the original code did.
.... and these need to be "inline" because the compiler completely
fails to inline them automatically like it should be doing.
$ size fs/xfs/libxfs/xfs_btree.o*
text data bss dec hex filename
51874 240 0 52114 cb92 fs/xfs/libxfs/xfs_btree.o.orig
51562 240 0 51802 ca5a fs/xfs/libxfs/xfs_btree.o.inline
Just when you think the tools have advanced sufficiently we don't
have to care about stuff like this anymore, along comes a reminder
that *our tools still suck*.
Fixes: dc04db2aa7c9 ("xfs: detect self referencing btree sibling pointers")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f0f5f658065a5af09126ec892e4c383540a1c77f ]
We don't check that the v4 feature flags taht v5 requires to be set
are actually set anywhere. Do this check when we see that the
filesystem is a v5 filesystem.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dd0d2f9755191690541b09e6385d0f8cd8bc9d8f ]
While xfs_has_nlink() is not used in kernel, it is used in userspace
(e.g. by xfs_db) so we need to set the XFS_FEAT_NLINK flag correctly
in xfs_sb_version_to_features().
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dc04db2aa7c9307e740d6d0e173085301c173b1a ]
To catch the obvious graph cycle problem and hence potential endless
looping.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c230a4a85bcdbfc1a7415deec6caf04e8fca1301 ]
Ever since we added shadown format buffers to the log items, log
items need to handle the item being released with shadow buffers
attached. Due to the fact this requirement was added at the same
time we added new rmap/reflink intents, we missed the cleanup of
those items.
In theory, this means shadow buffers can be leaked in a very small
window when a shutdown is initiated. Testing with KASAN shows this
leak does not happen in practice - we haven't identified a single
leak in several years of shutdown testing since ~v4.8 kernels.
However, the intent whiteout cleanup mechanism results in every
cancelled intent in exactly the same state as this tiny race window
creates and so if intents down clean up shadow buffers on final
release we will leak the shadow buffer for just about every intent
we create.
Hence we start with this patch to close this condition off and
ensure that when whiteouts start to be used we don't leak lots of
memory.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cb512c921639613ce03f87e62c5e93ed9fe8c84d ]
When we first allocate or resize an inline inode fork, we round up
the allocation to 4 byte alingment to make journal alignment
constraints. We don't clear the unused bytes, so we can copy up to
three uninitialised bytes into the journal. Zero those bytes so we
only ever copy zeros into the journal.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0c4862b1c1465e473bc961a02765490578bf5c20 ]
Dan Carpenter points out that the return code was not set in commit
60c8b4aebd8e ("nvmem: core: fix cleanup after dev_set_name()"), but
this is not the only issue - we also need to zero wp_gpio to prevent
gpiod_put() being called on an error value.
Fixes: 560181d3ace6 ("nvmem: core: fix cleanup after dev_set_name()")
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230127104015.23839-10-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ab3428cfd9aa2f3463ee4b2909b5bb2193bd0c4a ]
The i.MX6 CPU frequency driver sometimes fails to register at boot time
due to nvmem_cell_read_u32() sporadically returning -ENOENT.
This happens because there is a window where __nvmem_device_get() in
of_nvmem_cell_get() is able to return the nvmem device, but as cells
have been setup, nvmem_find_cell_entry_by_node() returns NULL.
The occurs because the nvmem core registration code violates one of the
fundamental principles of kernel programming: do not publish data
structures before their setup is complete.
Fix this by making nvmem core code conform with this principle.
Fixes: eace75cfdcf7 ("nvmem: Add a simple NVMEM framework for nvmem providers")
Cc: stable@vger.kernel.org
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230127104015.23839-7-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 560181d3ace61825f4ca9dd3481d6c0ee6709fa8 ]
If dev_set_name() fails, we leak nvmem->wp_gpio as the cleanup does not
put this. While a minimal fix for this would be to add the gpiod_put()
call, we can do better if we split device_register(), and use the
tested nvmem_release() cleanup code by initialising the device early,
and putting the device.
This results in a slightly larger fix, but results in clear code.
Note: this patch depends on "nvmem: core: initialise nvmem->id early"
and "nvmem: core: remove nvmem_config wp_gpio".
Fixes: 5544e90c8126 ("nvmem: core: add error handling for dev_set_name")
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
[Srini: Fixed subject line and error code handing with wp_gpio while applying.]
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230127104015.23839-6-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-dep-of: ab3428cfd9aa ("nvmem: core: fix registration vs use race")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5544e90c81261e82e02bbf7c6015a4b9c8c825ef ]
The type of return value of dev_set_name is int, which may return
wrong result, so we add error handling for it to reclaim memory
of nvmem resource, and return early when an error occurs.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20220916122100.170016-4-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-dep-of: ab3428cfd9aa ("nvmem: core: fix registration vs use race")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eecf2acd4a580e9364e5087daf0effca60a240b7 ]
Add a DMI match for the CWI501 version of the Chuwi Vi8 tablet,
pointing to the same chuwi_vi8_data as the existing CWI506 version
DMI match.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230202103413.331459-1-hdegoede@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>