linux

iv/linux

Author	SHA1	Message	Date
Xiu Jianfeng	29ed17389c	cgroup: Make cgroup_debug static Make cgroup_debug static since it's only used in cgroup.c Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-05-18 06:59:20 -10:00
Waiman Long	213adc63df	kseltest/cgroup: Make test_stress.sh work if run interactively Commit `54de76c012` ("kselftest/cgroup: fix test_stress.sh to use OUTPUT dir") changes the test_core command path from . to $OUTPUT. However, variable OUTPUT may not be defined if the command is run interactively. Fix that by using ${OUTPUT:-.} to cover both cases. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-05-13 09:33:21 -10:00
Phil Auld	54de76c012	kselftest/cgroup: fix test_stress.sh to use OUTPUT dir Running cgroup kselftest with O= fails to run the with_stress test due to hardcoded ./test_core. Find test_core binary using the OUTPUT directory. Fixes: `1a99fcc035` ("selftests: cgroup: Run test_core under interfering stress") Signed-off-by: Phil Auld <pauld@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-05-12 07:11:01 -10:00
David Vernet	5c26993c31	cgroup: Add config file to cgroup selftest suite Most of the test suites in tools/testing/selftests contain a config file that specifies which kernel config options need to be present in order for the test suite to be able to run and perform meaningful validation. There is no config file for the tools/testing/selftests/cgroup test suite, so this patch adds one. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-04-25 07:27:31 -10:00
David Vernet	a79906570f	cgroup: Add test_cpucg_max_nested() testcase The cgroup cpu controller selftests have a test_cpucg_max() testcase that validates the behavior of the cpu.max knob. Let's also add a testcase that verifies that the behavior works correctly when set on a nested cgroup. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-04-25 07:27:31 -10:00
David Vernet	889ab8113e	cgroup: Add test_cpucg_max() testcase The cgroup cpu controller test suite has a number of testcases that validate the expected behavior of the cpu.weight knob, but none for cpu.max. This testcase fixes that by adding a testcase for cpu.max as well. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-04-25 07:27:31 -10:00
David Vernet	89ca0efa84	cgroup: Add test_cpucg_nested_weight_underprovisioned() testcase The cgroup cpu controller test suite currently contains a testcase called test_cpucg_nested_weight_underprovisioned() which verifies the expected behavior of cpu.weight when applied to nested cgroups. That first testcase validated the expected behavior when the processes in the leaf cgroups overcommitted the system. This patch adds a complementary test_cpucg_nested_weight_underprovisioned() testcase which validates behavior when those leaf cgroups undercommit the system. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-04-25 07:27:31 -10:00
David Vernet	b76ee4f576	cgroup: Adding test_cpucg_nested_weight_overprovisioned() testcase The cgroup cpu controller tests in tools/testing/selftests/cgroup/test_cpu.c have some testcases that validate the expected behavior of setting cpu.weight on cgroups, and then hogging CPUs. What is still missing from the suite is a testcase that validates nested cgroups. This patch adds test_cpucg_nested_weight_overprovisioned(), which validates that a parent's cpu.weight will override its children if they overcommit a host, and properly protect any sibling groups of that parent. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-04-25 07:27:31 -10:00
David Vernet	4ab93063c8	cgroup: Add test_cpucg_weight_underprovisioned() testcase test_cpu.c includes testcases that validate the cgroup cpu controller. This patch adds a new testcase called test_cpucg_weight_underprovisioned() that verifies that processes with different cpu.weight that are all running on an underprovisioned system, still get roughly the same amount of cpu time. Because test_cpucg_weight_underprovisioned() is very similar to test_cpucg_weight_overprovisioned(), this patch also pulls the common logic into a separate helper function that is invoked from both testcases, and which uses function pointers to invoke the unique portions of the testcases. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-04-22 08:39:32 -10:00
David Vernet	6376b22cd0	cgroup: Add test_cpucg_weight_overprovisioned() testcase test_cpu.c includes testcases that validate the cgroup cpu controller. This patch adds a new testcase called test_cpucg_weight_overprovisioned() that verifies the expected behavior of creating multiple processes with different cpu.weight, on a system that is overprovisioned. So as to avoid code duplication, this patch also updates cpu_hog_func_param to take a new hog_clock_type enum which informs how time is counted in hog_cpus_timed() (either process time or wall clock time). Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-04-22 08:39:32 -10:00
David Vernet	3c879a1bb8	cgroup: Add test_cpucg_stats() testcase to cgroup cpu selftests test_cpu.c includes testcases that validate the cgroup cpu controller. This patch adds a new testcase called test_cpucg_stats() that verifies the expected behavior of the cpu.stat interface. In doing so, we define a new hog_cpus_timed() function which takes a cpu_hog_func_param struct that configures how many CPUs it uses, and how long it runs. Future patches will also spawn threads that hog CPUs, so this function will eventually serve those use-cases as well. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-04-22 08:39:32 -10:00
David Vernet	820a4f88ee	cgroup: Add new test_cpu.c test suite in cgroup selftests The cgroup selftests suite currently contains tests that validate various aspects of cgroup, such as validating the expected behavior for memory controllers, the expected behavior of cgroup.procs, etc. There are no tests that validate the expected behavior of the cgroup cpu controller. This patch therefore adds a new test_cpu.c file that will contain cpu controller testcases. The file currently only contains a single testcase that validates creating nested cgroups with cgroup.subtree_control including cpu. Future patches will add more sophisticated testcases that validate functional aspects of the cpu controller. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-04-22 08:39:32 -10:00
Linus Torvalds	281b9d9a4b	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "13 patches. Subsystems affected by this patch series: mm (memory-failure, memcg, userfaultfd, hugetlbfs, mremap, oom-kill, kasan, hmm), and kcov" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() kcov: don't generate a warning on vm_insert_page()'s failure MAINTAINERS: add Vincenzo Frascino to KASAN reviewers oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup selftest/vm: add skip support to mremap_test selftest/vm: support xfail in mremap_test selftest/vm: verify remap destination address in mremap_test selftest/vm: verify mmap addr in mremap_test mm, hugetlb: allow for "high" userspace addresses userfaultfd: mark uffd_wp regardless of VM_WRITE flag memcg: sync flush only if periodic flush is delayed mm/memory-failure.c: skip huge_zero_page in memory_failure() mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()	2022-04-22 10:10:43 -07:00
Nicholas Piggin	3b8000ae18	mm/vmalloc: huge vmalloc backing pages should be split rather than compound Huge vmalloc higher-order backing pages were allocated with __GFP_COMP in order to allow the sub-pages to be refcounted by callers such as "remap_vmalloc_page [sic]" (remap_vmalloc_range). However a similar problem exists for other struct page fields callers use, for example fb_deferred_io_fault() takes a vmalloc'ed page and not only refcounts it but uses ->lru, ->mapping, ->index. This is not compatible with compound sub-pages, and can cause bad page state issues like BUG: Bad page state in process swapper/0 pfn:00743 page:(____ptrval____) refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x743 flags: 0x7ffff000000000(node=0\|zone=0\|lastcpupid=0x7ffff) raw: 007ffff000000000 c00c00000001d0c8 c00c00000001d0c8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: corrupted mapping in tail page Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc3-00082-gfc6fff4a7ce1-dirty #2810 Call Trace: dump_stack_lvl+0x74/0xa8 (unreliable) bad_page+0x12c/0x170 free_tail_pages_check+0xe8/0x190 free_pcp_prepare+0x31c/0x4e0 free_unref_page+0x40/0x1b0 __vunmap+0x1d8/0x420 ... The correct approach is to use split high-order pages for the huge vmalloc backing. These allow callers to treat them in exactly the same way as individually-allocated order-0 pages. Link: https://lore.kernel.org/all/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Menzel <pmenzel@molgen.mpg.de> Cc: Song Liu <songliubraving@fb.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-22 09:20:16 -07:00
Linus Torvalds	d569e86915	drm-fixes for 5.18-rc4 msm: - revert iommu change that broke some platforms. i915: - Unset enable_psr2_sel_fetch if PSR2 detection fails - Fix to detect when VRR is turned off from panel settings -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmJiAeQACgkQDHTzWXnE hr4XAw/9EqxTOgJxEDowxbeVo1SEyPSAKEE+LyDK0o3pDUZEppbd/K4qFrc374wV nr8z0RZgHk6V8iSALIQUnqWIwECDiQ97gP60aERKtJVQ/tANwHHHn3/CkjD/lFG5 rVjYD3GDa2+gwvuVowssH0bvUWXtiWzXQYy4i56F8OMnRF6gpNnJQ8mVFfTs5uLm SF//pQzOHYi59tPiL2zcHswu5BTPzSMxyjQIjknIJRv1PNuPU8PeZxrNE9meqZcJ 7a+H6HAMdabIb/z9n8U+bSs+IDUq8/7MTLJ8Eo83O93lGz5S/ixYhoq3s7uUDrf+ rT0VTnuAdmx2b95SKFb3PQmkzMGL+W3M9J4NZmUw/BBm5EkasiPttdT5oF5xYu35 IDVAqHPdg9Fdv7jm6YIExxmzrO5yY4wqYmEFjEbn4IpEED2Vge7weyJGTyBjMzjG QnGMj8VK5qwl8FL+1vUtb4v8gs6rmqT/90Wr4v+FIpblsksrNRM7dMaiodyAotFS jww4UmSFtxubAOLRsSErurX+YJyk2VgNrxwXSwSWv/D9KsrbCL7Ghdno2rCWpLxy F5Zt1pvmGuiH4ZjPQgnBjEPDg7tzmHM5VE0tPAAZCw2FrVPIpmlA3wnN1QGQmDdw ZIkCoyghxtPXtFVvWSMPuugk/sB6nznvdAuPTwI/QFksMHFNMoc= =ttKr -----END PGP SIGNATURE----- Merge tag 'drm-fixes-2022-04-22' of git://anongit.freedesktop.org/drm/drm Pull drm fixes from Dave Airlie: "Extra quiet after Easter, only have minor i915 and msm pulls. However I haven't seen a PR from our misc tree in a little while, I've cc'ed all the suspects. Once that unblocks I expect a bit larger bunch of patches to arrive. Otherwise as I said, one msm revert and two i915 fixes. msm: - revert iommu change that broke some platforms. i915: - Unset enable_psr2_sel_fetch if PSR2 detection fails - Fix to detect when VRR is turned off from panel settings" * tag 'drm-fixes-2022-04-22' of git://anongit.freedesktop.org/drm/drm: drm/i915/display/psr: Unset enable_psr2_sel_fetch if other checks in intel_psr2_config_valid() fails drm/msm: Revert "drm/msm: Stop using iommu_present()" drm/i915/display/vrr: Reset VRR capable property on a long hpd	2022-04-21 20:10:43 -07:00
Alistair Popple	319561669a	mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() In some cases it is possible for mmu_interval_notifier_remove() to race with mn_tree_inv_end() allowing it to return while the notifier data structure is still in use. Consider the following sequence: CPU0 - mn_tree_inv_end() CPU1 - mmu_interval_notifier_remove() ----------------------------------- ------------------------------------ spin_lock(subscriptions->lock); seq = subscriptions->invalidate_seq; spin_lock(subscriptions->lock); spin_unlock(subscriptions->lock); subscriptions->invalidate_seq++; wait_event(invalidate_seq != seq); return; interval_tree_remove(interval_sub); kfree(interval_sub); spin_unlock(subscriptions->lock); wake_up_all(); As the wait_event() condition is true it will return immediately. This can lead to use-after-free type errors if the caller frees the data structure containing the interval notifier subscription while it is still on a deferred list. Fix this by taking the appropriate lock when reading invalidate_seq to ensure proper synchronisation. I observed this whilst running stress testing during some development. You do have to be pretty unlucky, but it leads to the usual problems of use-after-free (memory corruption, kernel crash, difficult to diagnose WARN_ON, etc). Link: https://lkml.kernel.org/r/20220420043734.476348-1-apopple@nvidia.com Fixes: `99cb252f5e` ("mm/mmu_notifier: add an interval tree notifier") Signed-off-by: Alistair Popple <apopple@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Christian König <christian.koenig@amd.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-21 20:01:10 -07:00
Aleksandr Nogikh	ecc04463d1	kcov: don't generate a warning on vm_insert_page()'s failure vm_insert_page()'s failure is not an unexpected condition, so don't do WARN_ONCE() in such a case. Instead, print a kernel message and just return an error code. This flaw has been reported under an OOM condition by sysbot [1]. The message is mainly for the benefit of the test log, in this case the fuzzer's log so that humans inspecting the log can figure out what was going on. KCOV is a testing tool, so I think being a little more chatty when KCOV unexpectedly is about to fail will save someone debugging time. We don't want the WARN, because it's not a kernel bug that syzbot should report, and failure can happen if the fuzzer tries hard enough (as above). Link: https://lkml.kernel.org/r/Ylkr2xrVbhQYwNLf@elver.google.com [1] Link: https://lkml.kernel.org/r/20220401182512.249282-1-nogikh@google.com Fixes: `b3d7fe86fb` ("kcov: properly handle subsequent mmap calls"), Signed-off-by: Aleksandr Nogikh <nogikh@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Taras Madan <tarasmadan@google.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-21 20:01:10 -07:00
Vincenzo Frascino	415fccf859	MAINTAINERS: add Vincenzo Frascino to KASAN reviewers Add my email address to KASAN reviewers list to make sure that I am Cc'ed in all the KASAN changes that may affect arm64 MTE. Link: https://lkml.kernel.org/r/20220419170640.21404-1-vincenzo.frascino@arm.com Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-21 20:01:10 -07:00
Nico Pache	e4a38402c3	oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup The pthread struct is allocated on PRIVATE\|ANONYMOUS memory [1] which can be targeted by the oom reaper. This mapping is used to store the futex robust list head; the kernel does not keep a copy of the robust list and instead references a userspace address to maintain the robustness during a process death. A race can occur between exit_mm and the oom reaper that allows the oom reaper to free the memory of the futex robust list before the exit path has handled the futex death: CPU1 CPU2 -------------------------------------------------------------------- page_fault do_exit "signal" wake_oom_reaper oom_reaper oom_reap_task_mm (invalidates mm) exit_mm exit_mm_release futex_exit_release futex_cleanup exit_robust_list get_user (EFAULT- can't access memory) If the get_user EFAULT's, the kernel will be unable to recover the waiters on the robust_list, leaving userspace mutexes hung indefinitely. Delay the OOM reaper, allowing more time for the exit path to perform the futex cleanup. Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer Based on a patch by Michal Hocko. Link: https://elixir.bootlin.com/glibc/glibc-2.35/source/nptl/allocatestack.c#L370 [1] Link: https://lkml.kernel.org/r/20220414144042.677008-1-npache@redhat.com Fixes: `2129258024` ("mm: oom: let oom_reap_task and exit_mmap run concurrently") Signed-off-by: Joel Savitz <jsavitz@redhat.com> Signed-off-by: Nico Pache <npache@redhat.com> Co-developed-by: Joel Savitz <jsavitz@redhat.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Waiman Long <longman@redhat.com> Cc: Herton R. Krzesinski <herton@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Savitz <jsavitz@redhat.com> Cc: Darren Hart <dvhart@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-21 20:01:10 -07:00
Sidhartha Kumar	80df2fb95d	selftest/vm: add skip support to mremap_test Allow the mremap test to be skipped due to errors such as failing to parse the mmap_min_addr sysctl. Link: https://lkml.kernel.org/r/20220420215721.4868-4-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-21 20:01:10 -07:00
Sidhartha Kumar	e5508fc52c	selftest/vm: support xfail in mremap_test Use ksft_test_result_xfail for the tests which are expected to fail. Link: https://lkml.kernel.org/r/20220420215721.4868-3-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-21 20:01:10 -07:00
Sidhartha Kumar	18d609daa5	selftest/vm: verify remap destination address in mremap_test Because mremap does not have a MAP_FIXED_NOREPLACE flag, it can destroy existing mappings. This causes a segfault when regions such as text are remapped and the permissions are changed. Verify the requested mremap destination address does not overlap any existing mappings by using mmap's MAP_FIXED_NOREPLACE flag. Keep incrementing the destination address until a valid mapping is found or fail the current test once the max address is reached. Link: https://lkml.kernel.org/r/20220420215721.4868-2-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-21 20:01:10 -07:00
Sidhartha Kumar	9c85a9bae2	selftest/vm: verify mmap addr in mremap_test Avoid calling mmap with requested addresses that are less than the system's mmap_min_addr. When run as root, mmap returns EACCES when trying to map addresses < mmap_min_addr. This is not one of the error codes for the condition to retry the mmap in the test. Rather than arbitrarily retrying on EACCES, don't attempt an mmap until addr > vm.mmap_min_addr. Add a munmap call after an alignment check as the mappings are retained after the retry and can reach the vm.max_map_count sysctl. Link: https://lkml.kernel.org/r/20220420215721.4868-1-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-21 20:01:09 -07:00
Christophe Leroy	5f24d5a579	mm, hugetlb: allow for "high" userspace addresses This is a fix for commit `f6795053da` ("mm: mmap: Allow for "high" userspace addresses") for hugetlb. This patch adds support for "high" userspace addresses that are optionally supported on the system and have to be requested via a hint mechanism ("high" addr parameter to mmap). Architectures such as powerpc and x86 achieve this by making changes to their architectural versions of hugetlb_get_unmapped_area() function. However, arm64 uses the generic version of that function. So take into account arch_get_mmap_base() and arch_get_mmap_end() in hugetlb_get_unmapped_area(). To allow that, move those two macros out of mm/mmap.c into include/linux/sched/mm.h If these macros are not defined in architectural code then they default to (TASK_SIZE) and (base) so should not introduce any behavioural changes to architectures that do not define them. For the time being, only ARM64 is affected by this change. Catalin (ARM64) said "We should have fixed hugetlb_get_unmapped_area() as well when we added support for 52-bit VA. The reason for commit `f6795053da` was to prevent normal mmap() from returning addresses above 48-bit by default as some user-space had hard assumptions about this. It's a slight ABI change if you do this for hugetlb_get_unmapped_area() but I doubt anyone would notice. It's more likely that the current behaviour would cause issues, so I'd rather have them consistent. Basically when arm64 gained support for 52-bit addresses we did not want user-space calling mmap() to suddenly get such high addresses, otherwise we could have inadvertently broken some programs (similar behaviour to x86 here). Hence we added commit `f6795053da`. But we missed hugetlbfs which could still get such high mmap() addresses. So in theory that's a potential regression that should have bee addressed at the same time as commit `f6795053da` (and before arm64 enabled 52-bit addresses)" Link: https://lkml.kernel.org/r/ab847b6edb197bffdfe189e70fb4ac76bfe79e0d.1650033747.git.christophe.leroy@csgroup.eu Fixes: `f6795053da` ("mm: mmap: Allow for "high" userspace addresses") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> [5.0.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-21 20:01:09 -07:00
Nadav Amit	0e88904cb7	userfaultfd: mark uffd_wp regardless of VM_WRITE flag When a PTE is set by UFFD operations such as UFFDIO_COPY, the PTE is currently only marked as write-protected if the VMA has VM_WRITE flag set. This seems incorrect or at least would be unexpected by the users. Consider the following sequence of operations that are being performed on a certain page: mprotect(PROT_READ) UFFDIO_COPY(UFFDIO_COPY_MODE_WP) mprotect(PROT_READ\|PROT_WRITE) At this point the user would expect to still get UFFD notification when the page is accessed for write, but the user would not get one, since the PTE was not marked as UFFD_WP during UFFDIO_COPY. Fix it by always marking PTEs as UFFD_WP regardless on the write-permission in the VMA flags. Link: https://lkml.kernel.org/r/20220217211602.2769-1-namit@vmware.com Fixes: `292924b260` ("userfaultfd: wp: apply _PAGE_UFFD_WP bit") Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-21 20:01:09 -07:00
Shakeel Butt	9b3016154c	memcg: sync flush only if periodic flush is delayed Daniel Dao has reported [1] a regression on workloads that may trigger a lot of refaults (anon and file). The underlying issue is that flushing rstat is expensive. Although rstat flush are batched with (nr_cpus * MEMCG_BATCH) stat updates, it seems like there are workloads which genuinely do stat updates larger than batch value within short amount of time. Since the rstat flush can happen in the performance critical codepaths like page faults, such workload can suffer greatly. This patch fixes this regression by making the rstat flushing conditional in the performance critical codepaths. More specifically, the kernel relies on the async periodic rstat flusher to flush the stats and only if the periodic flusher is delayed by more than twice the amount of its normal time window then the kernel allows rstat flushing from the performance critical codepaths. Now the question: what are the side-effects of this change? The worst that can happen is the refault codepath will see 4sec old lruvec stats and may cause false (or missed) activations of the refaulted page which may under-or-overestimate the workingset size. Though that is not very concerning as the kernel can already miss or do false activations. There are two more codepaths whose flushing behavior is not changed by this patch and we may need to come to them in future. One is the writeback stats used by dirty throttling and second is the deactivation heuristic in the reclaim. For now keeping an eye on them and if there is report of regression due to these codepaths, we will reevaluate then. Link: https://lore.kernel.org/all/CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndgX2MQ@mail.gmail.com [1] Link: https://lkml.kernel.org/r/20220304184040.1304781-1-shakeelb@google.com Fixes: `1f828223b7` ("memcg: flush lruvec stats in the refault") Signed-off-by: Shakeel Butt <shakeelb@google.com> Reported-by: Daniel Dao <dqminh@cloudflare.com> Tested-by: Ivan Babrou <ivan@cloudflare.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Frank Hofmann <fhofmann@cloudflare.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-21 20:01:09 -07:00
Xu Yu	d173d5417f	mm/memory-failure.c: skip huge_zero_page in memory_failure() Kernel panic when injecting memory_failure for the global huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows. Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000 page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00 head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0 flags: 0x17fffc000010001(locked\|head\|node=0\|zone=2\|lastcpupid=0x1ffff) raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000 page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head)) ------------[ cut here ]------------ kernel BUG at mm/huge_memory.c:2499! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11 Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014 RIP: 0010:split_huge_page_to_list+0x66a/0x880 Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246 RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000 R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40 FS: 00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: try_to_split_thp_page+0x3a/0x130 memory_failure+0x128/0x800 madvise_inject_error.cold+0x8b/0xa1 __x64_sys_madvise+0x54/0x60 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fc3754f8bf9 Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8 RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9 RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000 RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490 R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000 This makes huge_zero_page bail out explicitly before split in memory_failure(), thus the panic above won't happen again. Link: https://lkml.kernel.org/r/497d3835612610e370c74e697ea3c721d1d55b9c.1649775850.git.xuyu@linux.alibaba.com Fixes: `6a46079cf5` ("HWPOISON: The high level memory error handler in the VM v7") Signed-off-by: Xu Yu <xuyu@linux.alibaba.com> Reported-by: Abaci <abaci@linux.alibaba.com> Suggested-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-21 20:01:09 -07:00
Naoya Horiguchi	405ce05123	mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb() There is a race condition between memory_failure_hugetlb() and hugetlb free/demotion, which causes setting PageHWPoison flag on the wrong page. The one simple result is that wrong processes can be killed, but another (more serious) one is that the actual error is left unhandled, so no one prevents later access to it, and that might lead to more serious results like consuming corrupted data. Think about the below race window: CPU 1 CPU 2 memory_failure_hugetlb struct page *head = compound_head(p); hugetlb page might be freed to buddy, or even changed to another compound page. get_hwpoison_page -- page is not what we want now... The current code first does prechecks roughly and then reconfirms after taking refcount, but it's found that it makes code overly complicated, so move the prechecks in a single hugetlb_lock range. A newly introduced function, try_memory_failure_hugetlb(), always takes hugetlb_lock (even for non-hugetlb pages). That can be improved, but memory_failure() is rare in principle, so should not be a big problem. Link: https://lkml.kernel.org/r/20220408135323.1559401-2-naoya.horiguchi@linux.dev Fixes: `761ad8d7c7` ("mm: hwpoison: introduce memory_failure_hugetlb()") Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reported-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-21 20:01:09 -07:00
Dave Airlie	70da382e1c	Merge tag 'drm-msm-fixes-2022-04-20' of https://gitlab.freedesktop.org/drm/msm into drm-fixes Revert to fix iommu regression. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGtvPo4xD2peAztDMPP2n4utb7d9WQboMFwsba9E8U2rCw@mail.gmail.com	2022-04-22 09:25:47 +10:00
Linus Torvalds	b05a5683eb	dmaengine fixes for v5.17 Bunch of driver fixes for: - idxd device RO checks and device cleanup - dw-edma unaligned access and alignment - qcom: missing minItems in binding - mediatek pm usage fix - imx init script -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmJhUYAACgkQfBQHDyUj g0ct6xAAtLRFniEFmrdk9YzxIQmmfNZL3v7kksXA+xHqqmTYO1bwGnuq/3/GT0YF W6jEUo895T2ip2Z5ORYrxBlE4VpZNtNqkk3wkjo/1nNXJ/O6mTOUDx05iROMhdfs UZfD2z/kdJbKYsbew7cEP3k71wFjXhTyvldxWkK24gzWu8ncXInQNFBAw5C/xqWc 5u/bi0Fa8NuAHKlr7DBxcNjE+aZCIQtUY0FFV0kEolz7BeOUc94baU5lF+uDZe9d wzhTkNEZS+qjBZ/hCmsGWaF3D0trIbrf6Ymq6JJZMRGOF020wNNczukrBN0Ix/O4 B7AN2c/yBUju85i3fF4gCgDP0boP8HrlPfbJjG2kV/jKNULeWniNEZmJPD6W7tFJ XUeJTnplnEJITvMM5RjPr3wH2b3TnSCUu/OigQi3OtZMtE66ALkTC0bYIOONejvh w+CrIIKB8ZcDxm3R89Qay4R8W1/pMfLPC4MF/VxHYqyKLsQQqKOJW/SWldQAt840 2xid8fb3pbD+iVuDfPj3iqIs3rtOvIWZJFmnIQJwF6xjBMr5vD1ECormSJaqVVV8 4vcPTJzxc+d0Lve0qFppIHREK0dzh2fZm16Xg3UoRgAkTzldI0N6/JPLKGVDVj6c o1G15ITUcmRXCYNkasMF1HTpFqO3AcPJnFhVBDa9IRAcv4KwuP8= =4gle -----END PGP SIGNATURE----- Merge tag 'dmaengine-fix-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine fixes from Vinod Koul: "A bunch of driver fixes: - idxd device RO checks and device cleanup - dw-edma unaligned access and alignment - qcom: missing minItems in binding - mediatek pm usage fix - imx init script" * tag 'dmaengine-fix-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: dt-bindings: dmaengine: qcom: gpi: Add minItems for interrupts dmaengine: idxd: skip clearing device context when device is read-only dmaengine: idxd: add RO check for wq max_transfer_size write dmaengine: idxd: add RO check for wq max_batch_size write dmaengine: idxd: fix retry value to be constant for duration of function call dmaengine: idxd: match type for retries var in idxd_enqcmds() dmaengine: dw-edma: Fix inconsistent indenting dmaengine: dw-edma: Fix unaligned 64bit access dmaengine: mediatek:Fix PM usage reference leak of mtk_uart_apdma_alloc_chan_resources dmaengine: imx-sdma: Fix error checking in sdma_event_remap dma: at_xdmac: fix a missing check on list iterator dmaengine: imx-sdma: fix init of uart scripts dmaengine: idxd: fix device cleanup on disable	2022-04-21 16:24:31 -07:00
Dave Airlie	e827d149fd	Merge tag 'drm-intel-fixes-2022-04-20' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Unset enable_psr2_sel_fetch if PSR2 detection fails - Fix to detect when VRR is turned off from panel settings Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YmAKuHwon7hGyIoC@jlahtine-mobl.ger.corp.intel.com	2022-04-22 06:34:36 +10:00
Linus Torvalds	59f0c2447e	Networking fixes for 5.18-rc4, including fixes from xfrm and can. Current release - regressions: - rxrpc: restore removed timer deletion Current release - new code bugs: - gre: fix device lookup for l3mdev use-case - xfrm: fix egress device lookup for l3mdev use-case Previous releases - regressions: - sched: cls_u32: fix netns refcount changes in u32_change() - smc: fix sock leak when release after smc_shutdown() - xfrm: limit skb_page_frag_refill use to a single page - eth: atlantic: invert deep par in pm functions, preventing null derefs - eth: stmmac: use readl_poll_timeout_atomic() in atomic state Previous releases - always broken: - gre: fix skb_under_panic on xmit - openvswitch: fix OOB access in reserve_sfa_size() - dsa: hellcreek: calculate checksums in tagger - eth: ice: fix crash in switchdev mode - eth: igc: - fix infinite loop in release_swfw_sync - fix scheduling while atomic Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmJhLbcSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOknxkP/jiAATyBt/HQFykQNiZ7cdrcv1gzJ3Lg BmrV1QmbrNwfffBtmBHRliP7x0vNF6fV8LUKjfyQh1YgJw8I9F/FDH/1fojhBZq/ JJpZrh+TFikBBM4RDMJ0aQi6ssOEo8S9gfN4W48F/49O4S3Q/Gbgv7Lk0jL8utRz 7RgGUVxX+xOSklvh2Tn/xHdYPeebPhLojiKhmH+l6xghyDEUHkemF3AkLwV9QMnq LXmNP4y100xcdCW1bLbyVcq0lbwdLSg4SL+2wC2bvgEDRR0MUezQyNxD6Oqrmusn sASZYgNK92R9ekLBqTX/QwV/XIT+17hclTk4u0eV8GnemnibqOq7DhDqtKyeAzbD mfU6Z5Ku6LuXA1U+2w1jxnd4cJTacA+dCRKcQD91ReguBbCd6zOweB996iBNLucK Kf+r6qWWLxt+JmhSexb/T+oQHsdgvIPSQXNHUH2W8w+2DdTB/EPcSL76DlbZUxrP up4EC3Nr3oxJjHbv7Iq4d9mHuRlwoOOpNJ3mARlfRDL6iuL0zECTweST3qT9YyIH Cz4FGj7kwEDTxGtufoTVia+/JmS39f2lBrMKuhbTVo+qcYhs3zJM4Ki9bAgOKXqI Qf+I73x9yQZ182afq4DsRXLnq/BajmRMyX2/kebY8KsARzRsPAktBhsT17SI6tUG 3MiLiHiIb0qM =thBq -----END PGP SIGNATURE----- Merge tag 'net-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from xfrm and can. Current release - regressions: - rxrpc: restore removed timer deletion Current release - new code bugs: - gre: fix device lookup for l3mdev use-case - xfrm: fix egress device lookup for l3mdev use-case Previous releases - regressions: - sched: cls_u32: fix netns refcount changes in u32_change() - smc: fix sock leak when release after smc_shutdown() - xfrm: limit skb_page_frag_refill use to a single page - eth: atlantic: invert deep par in pm functions, preventing null derefs - eth: stmmac: use readl_poll_timeout_atomic() in atomic state Previous releases - always broken: - gre: fix skb_under_panic on xmit - openvswitch: fix OOB access in reserve_sfa_size() - dsa: hellcreek: calculate checksums in tagger - eth: ice: fix crash in switchdev mode - eth: igc: - fix infinite loop in release_swfw_sync - fix scheduling while atomic" * tag 'net-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits) drivers: net: hippi: Fix deadlock in rr_close() selftests: mlxsw: vxlan_flooding_ipv6: Prevent flooding of unwanted packets selftests: mlxsw: vxlan_flooding: Prevent flooding of unwanted packets nfc: MAINTAINERS: add Bug entry net: stmmac: Use readl_poll_timeout_atomic() in atomic state doc/ip-sysctl: add bc_forwarding netlink: reset network and mac headers in netlink_dump() net: mscc: ocelot: fix broken IP multicast flooding net: dsa: hellcreek: Calculate checksums in tagger net: atlantic: invert deep par in pm functions, preventing null derefs can: isotp: stop timeout monitoring when no first frame was sent bonding: do not discard lowest hash bit for non layer3+4 hashing net: lan966x: Make sure to release ptp interrupt ipv6: make ip6_rt_gc_expire an atomic_t net: Handle l3mdev in ip_tunnel_init_flow l3mdev: l3mdev_master_upper_ifindex_by_index_rcu should be using netdev_master_upper_dev_get_rcu net/sched: cls_u32: fix possible leak in u32_init_knode() net/sched: cls_u32: fix netns refcount changes in u32_change() powerpc: Update MAINTAINERS for ibmvnic and VAS net: restore alpha order to Ethernet devices in config ...	2022-04-21 12:29:08 -07:00
Duoming Zhou	bc6de28784	drivers: net: hippi: Fix deadlock in rr_close() There is a deadlock in rr_close(), which is shown below: (Thread 1) \| (Thread 2) \| rr_open() rr_close() \| add_timer() spin_lock_irqsave() //(1) \| (wait a time) ... \| rr_timer() del_timer_sync() \| spin_lock_irqsave() //(2) (wait timer to stop) \| ... We hold rrpriv->lock in position (1) of thread 1 and use del_timer_sync() to wait timer to stop, but timer handler also need rrpriv->lock in position (2) of thread 2. As a result, rr_close() will block forever. This patch extracts del_timer_sync() from the protection of spin_lock_irqsave(), which could let timer handler to obtain the needed lock. Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Link: https://lore.kernel.org/r/20220417125519.82618-1-duoming@zju.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-21 10:30:45 +02:00
Linus Torvalds	b253435746	Xtensa fixes for v5.18: - fix patching CPU selection in patch_text - fix potential deadlock in ISS platform serial driver - fix potential register clobbering in coprocessor exception handler -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEK2eFS5jlMn3N6xfYUfnMkfg/oEQFAmJbgwQTHGpjbXZia2Jj QGdtYWlsLmNvbQAKCRBR+cyR+D+gRPYwD/9/aVrsMKPkJsxUQDI/A2fX4ZTcM5fj cnBsKvKGc+CpKwtWDK5YGAlLLGskBicRSUXqqiNNYapUUx6R9nwyRzEz2srKUa+P V/4gUnKCCVUyBiTMf1gYa9srO+KdrgY2460uDZWCvPsA1PTLotrQQQcZR4lkIKeT RvHjflEAY5f1l0A96WoH4F3xSNqrDdDInB4QpqwmZ2oWP4qjO9AA1RY25HFsvWVz xGErCrPKlJeHR6N5q1N0IzJYJdJeT2lKIMi5kwepLurNnn60+8KL9eQwFFd546fz EHuzcThg8c1q/jOhaOabkp0HWtWxv5dZiWzKmYnGXnrtjZwDQFk4PFdrHqYyY6OU LKsyV/bQDsDmzxT+oU1ylWC/buUsY3ulI30yfESWGXZzPw3iQDimO9O0jEJJ//W9 FNl6hU+KvVcr5bedYV6UIMrQeAXhGpUzpJL9qebuJmR/66sjiwZJGIYVB5RViF9o eVutlfkDByQsKY4ATJZyaj+8/p6Px2KNaOdMI4rzphrzpsQGAgUg7bNEgDt+dWMr vFPBcj+Ad8u1CQbBn6hmdovLutFdT1mmO/ePWepslcavhRQf4AnN7KMY7A8LIoiF dgegaICNrNA3XRAHp1UkBUQoGgqn++xvVYRZ8biHgjnBF8sVfythooc3tjHT+C6V RSyz+S+8Yiuh0w== =zDe5 -----END PGP SIGNATURE----- Merge tag 'xtensa-20220416' of https://github.com/jcmvbkbc/linux-xtensa Pull xtensa fixes from Max Filippov: - fix patching CPU selection in patch_text - fix potential deadlock in ISS platform serial driver - fix potential register clobbering in coprocessor exception handler * tag 'xtensa-20220416' of https://github.com/jcmvbkbc/linux-xtensa: xtensa: fix a7 clobbering in coprocessor context load/store arch: xtensa: platforms: Fix deadlock in rs_close() xtensa: patch_text: Fixup last cpu should be master	2022-04-20 12:43:27 -07:00
Linus Torvalds	10c5f102e2	Changes since last update: - Fix use-after-free of the on-stack z_erofs_decompressqueue; - Fix sysfs documentation Sphinx warnings. -----BEGIN PGP SIGNATURE----- iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCYl2A0REceGlhbmdAa2Vy bmVsLm9yZwAKCRA5NzHcH7XmBEe7AQCh7aNhYEncBnoHvrB276HCP0xmMwmc0gPq 6UeSWgarpAEArgfgLyo8tFgS+gvXbhB8/P1FqEMwaRZVLWathXID3QU= =GFAS -----END PGP SIGNATURE----- Merge tag 'erofs-for-5.18-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "One patch to fix a use-after-free race related to the on-stack z_erofs_decompressqueue, which happens very rarely but needs to be fixed properly soon. The other patch fixes some sysfs Sphinx warnings" * tag 'erofs-for-5.18-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: Documentation/ABI: sysfs-fs-erofs: Fix Sphinx errors erofs: fix use-after-free of on-stack io[]	2022-04-20 12:35:20 -07:00
Linus Torvalds	906f904097	Revert "fs/pipe: use kvcalloc to allocate a pipe_buffer array" This reverts commit `5a519c8fe4`. It turns out that making the pipe almost arbitrarily large has some rather unexpected downsides. The kernel test robot reports a kernel warning that is due to pipe->max_usage now growing to the point where the iter_file_splice_write() buffer allocation can no longer be satisfied as a slab allocation, and the int nbufs = pipe->max_usage; struct bio_vec *array = kcalloc(nbufs, sizeof(struct bio_vec), GFP_KERNEL); code sequence there will now always fail as a result. That code could be modified to use kvcalloc() too, but I feel very uncomfortable making those kinds of changes for a very niche use case that really should have other options than make these kinds of fundamental changes to pipe behavior. Maybe the CRIU process dumping should be multi-threaded, and use multiple pipes and multiple cores, rather than try to use one larger pipe to minimize splice() calls. Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/all/20220420073717.GD16310@xsang-OptiPlex-9020/ Cc: Andrei Vagin <avagin@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-20 12:07:53 -07:00
Mikulas Patocka	a6823e4e36	x86: __memcpy_flushcache: fix wrong alignment if size > 2^32 The first "if" condition in __memcpy_flushcache is supposed to align the "dest" variable to 8 bytes and copy data up to this alignment. However, this condition may misbehave if "size" is greater than 4GiB. The statement min_t(unsigned, size, ALIGN(dest, 8) - dest); casts both arguments to unsigned int and selects the smaller one. However, the cast truncates high bits in "size" and it results in misbehavior. For example: suppose that size == 0x100000001, dest == 0x200000002 min_t(unsigned, size, ALIGN(dest, 8) - dest) == min_t(0x1, 0xe) == 0x1; ... dest += 0x1; so we copy just one byte "and" dest remains unaligned. This patch fixes the bug by replacing unsigned with size_t. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-20 11:38:49 -07:00
Ido Schimmel	5e6242151d	selftests: mlxsw: vxlan_flooding_ipv6: Prevent flooding of unwanted packets The test verifies that packets are correctly flooded by the bridge and the VXLAN device by matching on the encapsulated packets at the other end. However, if packets other than those generated by the test also ingress the bridge (e.g., MLD packets), they will be flooded as well and interfere with the expected count. Make the test more robust by making sure that only the packets generated by the test can ingress the bridge. Drop all the rest using tc filters on the egress of 'br0' and 'h1'. In the software data path, the problem can be solved by matching on the inner destination MAC or dropping unwanted packets at the egress of the VXLAN device, but this is not currently supported by mlxsw. Fixes: `d01724dd2a` ("selftests: mlxsw: spectrum-2: Add a test for VxLAN flooding with IPv6") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-20 15:04:27 +01:00
Ido Schimmel	044011fdf1	selftests: mlxsw: vxlan_flooding: Prevent flooding of unwanted packets The test verifies that packets are correctly flooded by the bridge and the VXLAN device by matching on the encapsulated packets at the other end. However, if packets other than those generated by the test also ingress the bridge (e.g., MLD packets), they will be flooded as well and interfere with the expected count. Make the test more robust by making sure that only the packets generated by the test can ingress the bridge. Drop all the rest using tc filters on the egress of 'br0' and 'h1'. In the software data path, the problem can be solved by matching on the inner destination MAC or dropping unwanted packets at the egress of the VXLAN device, but this is not currently supported by mlxsw. Fixes: `94d302deae` ("selftests: mlxsw: Add a test for VxLAN flooding") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-20 15:04:27 +01:00
Vinod Koul	7495a5bbf8	dt-bindings: dmaengine: qcom: gpi: Add minItems for interrupts Add the minItems for interrupts property as well. In the absence of this, we get warning if interrupts are less than 13 arch/arm64/boot/dts/qcom/qrb5165-rb5.dtb: dma-controller@800000: interrupts: [[0, 588, 4], [0, 589, 4], [0, 590, 4], [0, 591, 4], [0, 592, 4], [0, 593, 4], [0, 594, 4], [0, 595, 4], [0, 596, 4], [0, 597, 4]] is too short Signed-off-by: Vinod Koul <vkoul@kernel.org> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220414064235.1182195-1-vkoul@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org>	2022-04-20 18:11:20 +05:30
Krzysztof Kozlowski	c5d0fc54be	nfc: MAINTAINERS: add Bug entry Add a Bug section, indicating preferred mailing method for bug reports, to NFC Subsystem entry. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-20 13:24:00 +01:00
Dave Jiang	1cd8e751d9	dmaengine: idxd: skip clearing device context when device is read-only If the device shows up as read-only configuration, skip the clearing of the state as the context must be preserved for device re-enable after being disabled. Fixes: `0dcfe41e9a` ("dmanegine: idxd: cleanup all device related bits after disabling device") Reported-by: Tony Zhu <tony.zhu@intel.com> Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/164971479479.2200566.13980022473526292759.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>	2022-04-20 17:24:43 +05:30
Dave Jiang	505a2d1032	dmaengine: idxd: add RO check for wq max_transfer_size write Block wq_max_transfer_size_store() when the device is configured as read-only and not configurable. Fixes: `d7aad5550e` ("dmaengine: idxd: add support for configurable max wq xfer size") Reported-by: Bernice Zhang <bernice.zhang@intel.com> Tested-by: Bernice Zhang <bernice.zhang@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/164971488154.2200913.10706665404118545941.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>	2022-04-20 17:24:42 +05:30
Dave Jiang	66903461ff	dmaengine: idxd: add RO check for wq max_batch_size write Block wq_max_batch_size_store() when the device is configured as read-only and not configurable. Fixes: `e7184b159d` ("dmaengine: idxd: add support for configurable max wq batch size") Reported-by: Bernice Zhang <bernice.zhang@intel.com> Tested-by: Bernice Zhang <bernice.zhang@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/164971493551.2201159.1942042593642155209.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>	2022-04-20 17:24:42 +05:30
Dave Jiang	bc3452cdfc	dmaengine: idxd: fix retry value to be constant for duration of function call When retries is compared to wq->enqcmds_retries each loop of idxd_enqcmds(), wq->enqcmds_retries can potentially changed by user. Assign the value of retries to wq->enqcmds_retries during initialization so it is the original value set when entering the function. Fixes: `7930d85535` ("dmaengine: idxd: add knob for enqcmds retries") Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/165031760154.3658664.1983547716619266558.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>	2022-04-20 17:24:42 +05:30
Dave Jiang	5d9d16e5aa	dmaengine: idxd: match type for retries var in idxd_enqcmds() wq->enqcmds_retries is defined as unsigned int. However, retries on the stack is defined as int. Change retries to unsigned int to compare the same type. Fixes: `7930d85535` ("dmaengine: idxd: add knob for enqcmds retries") Suggested-by: Thiago Macieira <thiago.macieira@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/165031747059.3658198.6035308204505664375.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>	2022-04-20 17:24:42 +05:30
Jiapeng Chong	d4860224e6	dmaengine: dw-edma: Fix inconsistent indenting Eliminate the follow smatch warning: drivers/dma/dw-edma/dw-edma-v0-core.c:419 dw_edma_v0_core_start() warn: inconsistent indenting. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://lore.kernel.org/r/20220413023442.18856-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Vinod Koul <vkoul@kernel.org>	2022-04-20 17:24:42 +05:30
Kevin Hao	234901de2b	net: stmmac: Use readl_poll_timeout_atomic() in atomic state The init_systime() may be invoked in atomic state. We have observed the following call trace when running "phc_ctl /dev/ptp0 set" on a Intel Agilex board. BUG: sleeping function called from invalid context at drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c:74 in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 381, name: phc_ctl preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 Preemption disabled at: [<ffff80000892ef78>] stmmac_set_time+0x34/0x8c CPU: 2 PID: 381 Comm: phc_ctl Not tainted 5.18.0-rc2-next-20220414-yocto-standard+ #567 Hardware name: SoCFPGA Agilex SoCDK (DT) Call trace: dump_backtrace.part.0+0xc4/0xd0 show_stack+0x24/0x40 dump_stack_lvl+0x7c/0xa0 dump_stack+0x18/0x34 __might_resched+0x154/0x1c0 __might_sleep+0x58/0x90 init_systime+0x78/0x120 stmmac_set_time+0x64/0x8c ptp_clock_settime+0x60/0x9c pc_clock_settime+0x6c/0xc0 __arm64_sys_clock_settime+0x88/0xf0 invoke_syscall+0x5c/0x130 el0_svc_common.constprop.0+0x4c/0x100 do_el0_svc+0x7c/0xa0 el0_svc+0x58/0xcc el0t_64_sync_handler+0xa4/0x130 el0t_64_sync+0x18c/0x190 So we should use readl_poll_timeout_atomic() here instead of readl_poll_timeout(). Also adjust the delay time to 10us to fix a "__bad_udelay" build error reported by "kernel test robot <lkp@intel.com>". I have tested this on Intel Agilex and NXP S32G boards, there is no delay needed at all. So the 10us delay should be long enough for most cases. Fixes: `ff8ed73786` ("net: stmmac: use readl_poll_timeout() function in init_systime()") Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-20 11:10:27 +01:00
Nicolas Dichtel	c6a4254c18	doc/ip-sysctl: add bc_forwarding Let's describe this sysctl. Fixes: `5cbf777cfd` ("route: add support for directed broadcast forwarding") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-20 10:31:43 +01:00
José Roberto de Souza	bb02330408	drm/i915/display/psr: Unset enable_psr2_sel_fetch if other checks in intel_psr2_config_valid() fails If any of the PSR2 checks after intel_psr2_sel_fetch_config_valid() fails, enable_psr2_sel_fetch will be kept enabled causing problems in the functions that only checks for it and not for has_psr2. So here moving the check that do not depend on enable_psr2_sel_fetch and for the remaning ones jumping to a section that unset enable_psr2_sel_fetch in case of failure to support PSR2. Fixes: `6e43e276b8` ("drm/i915: Initial implementation of PSR2 selective fetch") Cc: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Jouni Högander <jouni.hogander@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220414151118.21980-1-jose.souza@intel.com (cherry picked from commit `554ae8dce1`) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2022-04-20 07:51:14 +03:00

1 2 3 4 5 ...

1089506 Commits