linux

iv/linux

Author	SHA1	Message	Date
Yu-Tung Chang	0c45d3e24e	rtc: rx8010: select REGMAP_I2C The rtc-rx8010 uses the I2C regmap but doesn't select it in Kconfig so depending on the configuration the build may fail. Fix it. Signed-off-by: Yu-Tung Chang <mtwget@gmail.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20210830052532.40356-1-mtwget@gmail.com	2021-09-09 10:18:40 +02:00
Guenter Roeck	0c5483a577	Input: analog - always use ktime functions m68k, mips, s390, and sparc allmodconfig images fail to build with the following error. drivers/input/joystick/analog.c:160:2: error: #warning Precise timer not defined for this architecture. Remove architecture specific time handling code and always use ktime functions to determine time deltas. Also remove the now useless use_ktime kernel parameter. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Link: https://lore.kernel.org/r/20210907123734.21520-1-linux@roeck-us.net Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>	2021-09-08 23:39:47 -07:00
Steve French	8d014f5fe9	cifs: move SMB FSCTL definitions to common code The FSCTL definitions are in smbfsctl.h which should be shared by client and server. Move the updated version of smbfsctl.h into smbfs_common and have the client code use it (subsequent patch will change the server to use this common version of the header). Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-09-09 00:09:20 -05:00
Steve French	23e91d8b7c	cifs: rename cifs_common to smbfs_common As we move to common code between client and server, we have been asked to make the names less confusing, and refer less to "cifs" and more to words which include "smb" instead to e.g. "smbfs" for the client (we already have "ksmbd" for the kernel server, and "smbd" for the user space Samba daemon). So to be more consistent in the naming of common code between client and server and reduce the risk of merge conflicts as more common code is added - rename "cifs_common" to "smbfs_common" (in future releases we also will rename the fs/cifs directory to fs/smbfs) Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-09-08 23:59:26 -05:00
Steve French	fc111fb9a6	cifs: update FSCTL definitions Add some missing defines used by ksmbd to the client version of smbfsctl.h, and add a missing newer define mentioned in the protocol definitions (MS-FSCC). This will also make it easier to move to common code. Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-09-08 23:36:31 -05:00
Dave Airlie	de04744d65	drm-misc-next-fixes for v5.15: - Fix ttm_bo_move_memcpy() when ttm_resource is subclassed. - Small fixes to panfrost, mgag200, vc4. - Small ttm compilation fixes. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAmEx7NgACgkQ/lWMcqZw E8OFdQ/8DqIpEr1E4ED/UA99KTUV073je8Oo4/hMKOOPFrCP/8N6+DT0VcsSwWee FK4zJhYJsMaLlmBFqJQSmDIop29FyxUzX+bPD0o14Np+0SV/tXWwxQPy5+1h81YY 0dhUXMRbehnhthU7+36CgGQR1DXZSnR4Af9qGkUUzfnfGBY4qYFiSYv9gwQaxNez 0YPlwjXcabCclpo51KkJTAZN390r6FoHI86bEDnzfyEfed0fKanuSFMzahSHWV1o jE25hofuxjHWaGmeDeGyspLHrU0scxwCIWLp8W05gnCHtI6YINbVBNmLmA/YchRc BCqxptdGbFl5raitrcF/mHK1zI6wOzP8I16tYvWZugGkuRrsmfuqb9c9+CfGwSYp mAkKXpkaVoi179gXUk7Qqf1Z/cge6mAgRgSD9lB4wG6P6TBYiFxZPJRSF3NAqLke 9811SELBSyOvLXzqgjpEVdF6JKL7G+sl0DjHwR8JR5++i73o0Gux96Ufg3vNRg09 4whGCCTNZq4wj/apty8ZK/hhGJGDu1B8Yrs7xpeXHEk3XbmwB2jA7qh3/SzFJ4tF bRIl5USiZTH2lWtOLswo2JT4Yk/EJsxsSoz6u7HJYzPz9QsRGcNs762sQobY1fKd KcdT1Vx5i5j226zaGe/VyooQk/9Z046NTaH4QRuBWUgsFSut41M= =2OzY -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-fixes-2021-09-03' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next-fixes for v5.15: - Fix ttm_bo_move_memcpy() when ttm_resource is subclassed. - Small fixes to panfrost, mgag200, vc4. - Small ttm compilation fixes. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/41ff5e54-0837-2226-a182-97ffd11ef01e@linux.intel.com	2021-09-09 13:35:54 +10:00
Dave Airlie	06b224d516	Merge tag 'amd-drm-next-5.15-2021-09-01' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.15-2021-09-01: amdgpu: - Misc cleanups, typo fixes - EEPROM fix - Add some new PCI IDs - Scatter/Gather display support for Yellow Carp - PCIe DPM fix for RKL platforms - RAS fix amdkfd: - SVM fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210901214015.4488-1-alexander.deucher@amd.com	2021-09-09 13:33:49 +10:00
Jens Axboe	3b33e3f4a6	io-wq: fix silly logic error in io_task_work_match() We check for the func with an OR condition, which means it always ends up being false and we never match the task_work we want to cancel. In the unexpected case that we do exit with that pending, we can trigger a hang waiting for a worker to exit, but it was never created. syzbot reports that as such: INFO: task syz-executor687:8514 blocked for more than 143 seconds. Not tainted 5.14.0-syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor687 state:D stack:27296 pid: 8514 ppid: 8479 flags:0x00024004 Call Trace: context_switch kernel/sched/core.c:4940 [inline] __schedule+0x940/0x26f0 kernel/sched/core.c:6287 schedule+0xd3/0x270 kernel/sched/core.c:6366 schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857 do_wait_for_common kernel/sched/completion.c:85 [inline] __wait_for_common kernel/sched/completion.c:106 [inline] wait_for_common kernel/sched/completion.c:117 [inline] wait_for_completion+0x176/0x280 kernel/sched/completion.c:138 io_wq_exit_workers fs/io-wq.c:1162 [inline] io_wq_put_and_exit+0x40c/0xc70 fs/io-wq.c:1197 io_uring_clean_tctx fs/io_uring.c:9607 [inline] io_uring_cancel_generic+0x5fe/0x740 fs/io_uring.c:9687 io_uring_files_cancel include/linux/io_uring.h:16 [inline] do_exit+0x265/0x2a30 kernel/exit.c:780 do_group_exit+0x125/0x310 kernel/exit.c:922 get_signal+0x47f/0x2160 kernel/signal.c:2868 arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:865 handle_signal_work kernel/entry/common.c:148 [inline] exit_to_user_mode_loop kernel/entry/common.c:172 [inline] exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:209 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:302 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x445cd9 RSP: 002b:00007fc657f4b308 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: 0000000000000001 RBX: 00000000004cb448 RCX: 0000000000445cd9 RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00000000004cb44c RBP: 00000000004cb440 R08: 000000000000000e R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000049b154 R13: 0000000000000003 R14: 00007fc657f4b400 R15: 0000000000022000 While in there, also decrement accr->nr_workers. This isn't strictly needed as we're exiting, but let's make sure the accounting matches up. Fixes: `3146cba99a` ("io-wq: make worker creation resilient against signals") Reported-by: syzbot+f62d3e0a4ea4f38f5326@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-09-08 19:57:26 -06:00
Linus Torvalds	a3fa7a101d	Merge branches 'akpm' and 'akpm-hotfixes' (patches from Andrew) Merge yet more updates and hotfixes from Andrew Morton: "Post-linux-next material, based upon latest upstream to catch the now-merged dependencies: - 10 patches. Subsystems affected by this patch series: mm (vmstat and migration) and compat. And bunch of hotfixes, mostly cc:stable: - 8 patches. Subsystems affected by this patch series: mm (hmm, hugetlb, vmscan, pagealloc, pagemap, kmemleak, mempolicy, and memblock)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: arch: remove compat_alloc_user_space compat: remove some compat entry points mm: simplify compat numa syscalls mm: simplify compat_sys_move_pages kexec: avoid compat_alloc_user_space kexec: move locking into do_kexec_load mm: migrate: change to use bool type for 'page_was_mapped' mm: migrate: fix the incorrect function name in comments mm: migrate: introduce a local variable to get the number of pages mm/vmstat: protect per cpu variables with preempt disable on RT * emailed hotfixes from Andrew Morton <akpm@linux-foundation.org>: nds32/setup: remove unused memblock_region variable in setup_memory() mm/mempolicy: fix a race between offset_il_node and mpol_rebind_task mm/kmemleak: allow __GFP_NOLOCKDEP passed to kmemleak's gfp mmap_lock: change trace and locking order mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype mm,vmscan: fix divide by zero in get_scan_count mm/hugetlb: initialize hugetlb_usage in mm_init mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled	2021-09-08 18:52:05 -07:00
Mike Rapoport	ddb13122aa	nds32/setup: remove unused memblock_region variable in setup_memory() kernel test robot reports unused variable warning: arch/nds32/kernel/setup.c:247:26: warning: Unused variable: region [unusedVariable] struct memblock_region *region; ^ Remove the unused variable. Link: https://lkml.kernel.org/r/20210712125218.28951-1-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: Greentime Hu <green.hu@gmail.com> Cc: Nick Hu <nickhu@andestech.com> Cc: Vincent Chen <deanbo422@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 18:45:53 -07:00
yanghui	276aeee1c5	mm/mempolicy: fix a race between offset_il_node and mpol_rebind_task Servers happened below panic: Kernel version:5.4.56 BUG: unable to handle page fault for address: 0000000000002c48 RIP: 0010:__next_zones_zonelist+0x1d/0x40 Call Trace: __alloc_pages_nodemask+0x277/0x310 alloc_page_interleave+0x13/0x70 handle_mm_fault+0xf99/0x1390 __do_page_fault+0x288/0x500 do_page_fault+0x30/0x110 page_fault+0x3e/0x50 The reason for the panic is that MAX_NUMNODES is passed in the third parameter in __alloc_pages_nodemask(preferred_nid). So access to zonelist->zoneref->zone_idx in __next_zones_zonelist will cause a panic. In offset_il_node(), first_node() returns nid from pol->v.nodes, after this other threads may chang pol->v.nodes before next_node(). This race condition will let next_node return MAX_NUMNODES. So put pol->nodes in a local variable. The race condition is between offset_il_node and cpuset_change_task_nodemask: CPU0: CPU1: alloc_pages_vma() interleave_nid(pol,) offset_il_node(pol,) first_node(pol->v.nodes) cpuset_change_task_nodemask //nodes==0xc mpol_rebind_task mpol_rebind_policy mpol_rebind_nodemask(pol,nodes) //nodes==0x3 next_node(nid, pol->v.nodes)//return MAX_NUMNODES Link: https://lkml.kernel.org/r/20210906034658.48721-1-yanghui.def@bytedance.com Signed-off-by: yanghui <yanghui.def@bytedance.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 18:45:53 -07:00
Naohiro Aota	79d3705040	mm/kmemleak: allow __GFP_NOLOCKDEP passed to kmemleak's gfp In a memory pressure situation, I'm seeing the lockdep WARNING below. Actually, this is similar to a known false positive which is already addressed by commit `6dcde60efd` ("xfs: more lockdep whackamole with kmem_alloc"). This warning still persists because it's not from kmalloc() itself but from an allocation for kmemleak object. While kmalloc() itself suppress the warning with __GFP_NOLOCKDEP, gfp_kmemleak_mask() is dropping the flag for the kmemleak's allocation. Allow __GFP_NOLOCKDEP to be passed to kmemleak's allocation, so that the warning for it is also suppressed. ====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc7-BTRFS-ZNS+ #37 Not tainted ------------------------------------------------------ kswapd0/288 is trying to acquire lock: ffff88825ab45df0 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_ilock+0x8a/0x250 but task is already holding lock: ffffffff848cc1e0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: fs_reclaim_acquire+0x112/0x160 kmem_cache_alloc+0x48/0x400 create_object.isra.0+0x42/0xb10 kmemleak_alloc+0x48/0x80 __kmalloc+0x228/0x440 kmem_alloc+0xd3/0x2b0 kmem_alloc_large+0x5a/0x1c0 xfs_attr_copy_value+0x112/0x190 xfs_attr_shortform_getvalue+0x1fc/0x300 xfs_attr_get_ilocked+0x125/0x170 xfs_attr_get+0x329/0x450 xfs_get_acl+0x18d/0x430 get_acl.part.0+0xb6/0x1e0 posix_acl_xattr_get+0x13a/0x230 vfs_getxattr+0x21d/0x270 getxattr+0x126/0x310 __x64_sys_fgetxattr+0x1a6/0x2a0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (&xfs_nondir_ilock_class){++++}-{3:3}: __lock_acquire+0x2c0f/0x5a00 lock_acquire+0x1a1/0x4b0 down_read_nested+0x50/0x90 xfs_ilock+0x8a/0x250 xfs_can_free_eofblocks+0x34f/0x570 xfs_inactive+0x411/0x520 xfs_fs_destroy_inode+0x2c8/0x710 destroy_inode+0xc5/0x1a0 evict+0x444/0x620 dispose_list+0xfe/0x1c0 prune_icache_sb+0xdc/0x160 super_cache_scan+0x31e/0x510 do_shrink_slab+0x337/0x8e0 shrink_slab+0x362/0x5c0 shrink_node+0x7a7/0x1a40 balance_pgdat+0x64e/0xfe0 kswapd+0x590/0xa80 kthread+0x38c/0x460 ret_from_fork+0x22/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(&xfs_nondir_ilock_class); lock(fs_reclaim); lock(&xfs_nondir_ilock_class); DEADLOCK * 3 locks held by kswapd0/288: #0: ffffffff848cc1e0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30 #1: ffffffff848a08d8 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x269/0x5c0 #2: ffff8881a7a820e8 (&type->s_umount_key#60){++++}-{3:3}, at: super_cache_scan+0x5a/0x510 Link: https://lkml.kernel.org/r/20210907055659.3182992-1-naohiro.aota@wdc.com Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: "Darrick J . Wong" <djwong@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 18:45:53 -07:00
Liam Howlett	1099431608	mmap_lock: change trace and locking order Print to the trace log before releasing the lock to avoid racing with other trace log printers of the same lock type. Link: https://lkml.kernel.org/r/20210903022041.1843024-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michel Lespinasse <walken.cr@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 18:45:53 -07:00
Miaohe Lin	053cfda102	mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype If it's not prepared to free unref page, the pcp page migratetype is unset. Thus we will get rubbish from get_pcppage_migratetype() and might list_del(&page->lru) again after it's already deleted from the list leading to grumble about data corruption. Link: https://lkml.kernel.org/r/20210902115447.57050-1-linmiaohe@huawei.com Fixes: `df1acc8569` ("mm/page_alloc: avoid conflating IRQs disabled with zone->lock") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 18:45:53 -07:00
Rik van Riel	32d4f4b782	mm,vmscan: fix divide by zero in get_scan_count Commit `f56ce412a5` ("mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim") introduced a divide by zero corner case when oomd is being used in combination with cgroup memory.low protection. When oomd decides to kill a cgroup, it will force the cgroup memory to be reclaimed after killing the tasks, by writing to the memory.max file for that cgroup, forcing the remaining page cache and reclaimable slab to be reclaimed down to zero. Previously, on cgroups with some memory.low protection that would result in the memory being reclaimed down to the memory.low limit, or likely not at all, having the page cache reclaimed asynchronously later. With `f56ce412a5` the oomd write to memory.max tries to reclaim all the way down to zero, which may race with another reclaimer, to the point of ending up with the divide by zero below. This patch implements the obvious fix. Link: https://lkml.kernel.org/r/20210826220149.058089c6@imladris.surriel.com Fixes: `f56ce412a5` ("mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim") Signed-off-by: Rik van Riel <riel@surriel.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Chris Down <chris@chrisdown.name> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 18:45:53 -07:00
Liu Zixian	13db8c5047	mm/hugetlb: initialize hugetlb_usage in mm_init After fork, the child process will get incorrect (2x) hugetlb_usage. If a process uses 5 2MB hugetlb pages in an anonymous mapping, HugetlbPages: 10240 kB and then forks, the child will show, HugetlbPages: 20480 kB The reason for double the amount is because hugetlb_usage will be copied from the parent and then increased when we copy page tables from parent to child. Child will have 2x actual usage. Fix this by adding hugetlb_count_init in mm_init. Link: https://lkml.kernel.org/r/20210826071742.877-1-liuzixian4@huawei.com Fixes: `5d317b2b65` ("mm: hugetlb: proc: add HugetlbPages field to /proc/PID/status") Signed-off-by: Liu Zixian <liuzixian4@huawei.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 18:45:53 -07:00
Li Zhijian	4b42fb2136	mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled Previously, we noticed the one rpma example was failed[1] since commit `36f30e486d` ("IB/core: Improve ODP to use hmm_range_fault()"), where it will use ODP feature to do RDMA WRITE between fsdax files. After digging into the code, we found hmm_vma_handle_pte() will still return EFAULT even though all the its requesting flags has been fulfilled. That's because a DAX page will be marked as (_PAGE_SPECIAL \| PAGE_DEVMAP) by pte_mkdevmap(). Link: https://github.com/pmem/rpma/issues/1142 [1] Link: https://lkml.kernel.org/r/20210830094232.203029-1-lizhijian@cn.fujitsu.com Fixes: `4055062749` ("mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling") Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 18:45:52 -07:00
Jens Axboe	009ad9f0c6	io_uring: drop ctx->uring_lock before acquiring sqd->lock The SQPOLL thread dictates the lock order, and we hold the ctx->uring_lock for all the registration opcodes. We also hold a ref to the ctx, and we do drop the lock for other reasons to quiesce, so it's fine to drop the ctx lock temporarily to grab the sqd->lock. This fixes the following lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 5.14.0-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor.5/25433 is trying to acquire lock: ffff888023426870 (&sqd->lock){+.+.}-{3:3}, at: io_register_iowq_max_workers fs/io_uring.c:10551 [inline] ffff888023426870 (&sqd->lock){+.+.}-{3:3}, at: __io_uring_register fs/io_uring.c:10757 [inline] ffff888023426870 (&sqd->lock){+.+.}-{3:3}, at: __do_sys_io_uring_register+0x10aa/0x2e70 fs/io_uring.c:10792 but task is already holding lock: ffff8880885b40a8 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_register+0x2e1/0x2e70 fs/io_uring.c:10791 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&ctx->uring_lock){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:596 [inline] __mutex_lock+0x131/0x12f0 kernel/locking/mutex.c:729 __io_sq_thread fs/io_uring.c:7291 [inline] io_sq_thread+0x65a/0x1370 fs/io_uring.c:7368 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 -> #0 (&sqd->lock){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:3051 [inline] check_prevs_add kernel/locking/lockdep.c:3174 [inline] validate_chain kernel/locking/lockdep.c:3789 [inline] __lock_acquire+0x2a07/0x54a0 kernel/locking/lockdep.c:5015 lock_acquire kernel/locking/lockdep.c:5625 [inline] lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590 __mutex_lock_common kernel/locking/mutex.c:596 [inline] __mutex_lock+0x131/0x12f0 kernel/locking/mutex.c:729 io_register_iowq_max_workers fs/io_uring.c:10551 [inline] __io_uring_register fs/io_uring.c:10757 [inline] __do_sys_io_uring_register+0x10aa/0x2e70 fs/io_uring.c:10792 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ctx->uring_lock); lock(&sqd->lock); lock(&ctx->uring_lock); lock(&sqd->lock); * DEADLOCK * Fixes: `2e480058dd` ("io-wq: provide a way to limit max number of workers") Reported-by: syzbot+97fa56483f69d677969f@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-09-08 19:07:26 -06:00
Linus Torvalds	730bf31b8f	chrome platform changes for 5.15 cros_ec_typec: * Changes the cros_ec_typec driver to use the pre-existing cros_ec_check_features() function sensorhub: * Add trace events for sample misc: * cros_ec_proto - send commands again in the event of a timeout (for the FPMCU) * Fix warnings in cros_ec_trace related to format output -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQCtZK6p/AktxXfkOlzbaomhzOwwgUCYTgIeAAKCRBzbaomhzOw wi8uAP4u8ufBpeJL0xuGYAONV403pLBqsjqY2ICk/Hg0VPwNYwD/VCQJRaULJFNg PBTPndCeTN6+yqHsDjEh7SpmKp6CbgY= =MWB+ -----END PGP SIGNATURE----- Merge tag 'tag-chrome-platform-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux Pull chrome platform updates from Benson Leung: "cros_ec_typec: - make the cros_ec_typec driver to use the pre-existing cros_ec_check_features() function sensorhub: - add trace events for sample misc: - cros_ec_proto - re-send commands in the event of a timeout (for the FPMCU) - fix warnings in cros_ec_trace related to format output" * tag 'tag-chrome-platform-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: platform/chrome: cros_ec_trace: Fix format warnings platform/chrome: cros_ec_typec: Use existing feature check platform/chrome: cros_ec_proto: Send command again when timeout occurs platform/chrome: sensorhub: Add trace events for sample	2021-09-08 16:43:46 -07:00
Linus Torvalds	30f3490978	More power management updates for 5.15-rc1 - Add new cpufreq driver for the MediaTek MT6779 platform called mediatek-hw along with corresponding DT bindings (Hector.Yuan). - Add DCVS interrupt support to the qcom-cpufreq-hw driver (Thara Gopinath). - Make the qcom-cpufreq-hw driver set the dvfs_possible_from_any_cpu policy flag (Taniya Das). - Blocklist more Qualcomm platforms in cpufreq-dt-platdev (Bjorn Andersson). - Make the vexpress cpufreq driver set the CPUFREQ_IS_COOLING_DEV flag (Viresh Kumar). - Add new cpufreq driver callback to allow drivers to register with the Energy Model in a consistent way and make several drivers use it (Viresh Kumar). - Change the remaining users of the .ready() cpufreq driver callback to move the code from it elsewhere and drop it from the cpufreq core (Viresh Kumar). - Revert recent intel_pstate change adding HWP guaranteed performance change notification support to it that led to problems, because the notification in question is triggered prematurely on some systems (Rafael Wysocki). - Convert the OPP DT bindings to DT schema and clean them up while at it (Rob Herring). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmE41VgSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxCN4P+gMjMrrZmuU6gZsbbvpDlaBhCd2Xq3TD xR/DMDi7znkh3TUX3uwL+xnr+k0krIH0jBIeUQUE7NeNIoT6wgbjJ4Ty5rFq76qB AODmmZ4vO7lmnupSyqUQbHfYohDmyICSKiStf8UOEj1o+jSWNmrgUYUv0tDtDUH+ Cn0vByah8gJAnoZX8Y8BM1jmRc3YoNHWpvtTQIhIBPkVZ//+NOKvDZvwUUPZFb+M 1PzMSfX7WsIDiUrUHpdvtZsoBniaMk0WS1EqVBRvEprqUXad1eHF19yuhtLxeUPH 8xh/7o8kYzjqVJvs7blTT8DztxRDScWHeGKSVdwoEJupbCwc5R3qfGaD6PWyhI1x 9R5Swsp64nLptTCwH7ZmgdJbC9IqN3cz1Nadd5v2Q2wr21KvZnj7zI2ijkPKGnZo kqYQHghqnDkGPFVjdls/RKUXGCQIXZFQb+FeuyCvpVlz9Ol8+DnTYtgFjjc6VcU1 kApIqsE8V8GQEyzmm/OIAf6xtsA+mEUUh3Qds16KNCwBCRglC8I6v5IWnBc8PEJz a+wtwjx+tyKSSlAMEvcDNtrWVN+3JNrwCFG+Q+QjMrwLiAgACrtJZ6e6PmLj9sZv FPbZM8rzbzN7Zqd7XVNY37KHjqNs7zzDScAnATTUCKThp8ijDgtmG3ZhExmOPayc 74aQLham4TBO =WSpr -----END PGP SIGNATURE----- Merge tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These are mostly ARM cpufreq driver updates, including one new MediaTek driver that has just passed all of the reviews, with the addition of a revert of a recent intel_pstate commit, some core cpufreq changes and a DT-related update of the operating performance points (OPP) support code. Specifics: - Add new cpufreq driver for the MediaTek MT6779 platform called mediatek-hw along with corresponding DT bindings (Hector.Yuan). - Add DCVS interrupt support to the qcom-cpufreq-hw driver (Thara Gopinath). - Make the qcom-cpufreq-hw driver set the dvfs_possible_from_any_cpu policy flag (Taniya Das). - Blocklist more Qualcomm platforms in cpufreq-dt-platdev (Bjorn Andersson). - Make the vexpress cpufreq driver set the CPUFREQ_IS_COOLING_DEV flag (Viresh Kumar). - Add new cpufreq driver callback to allow drivers to register with the Energy Model in a consistent way and make several drivers use it (Viresh Kumar). - Change the remaining users of the .ready() cpufreq driver callback to move the code from it elsewhere and drop it from the cpufreq core (Viresh Kumar). - Revert recent intel_pstate change adding HWP guaranteed performance change notification support to it that led to problems, because the notification in question is triggered prematurely on some systems (Rafael Wysocki). - Convert the OPP DT bindings to DT schema and clean them up while at it (Rob Herring)" * tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits) Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification" cpufreq: mediatek-hw: Add support for CPUFREQ HW cpufreq: Add of_perf_domain_get_sharing_cpumask dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW cpufreq: Remove ready() callback cpufreq: sh: Remove sh_cpufreq_cpu_ready() cpufreq: acpi: Remove acpi_cpufreq_cpu_ready() cpufreq: qcom-hw: Set dvfs_possible_from_any_cpu cpufreq driver flag cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support cpufreq: scmi: Use .register_em() to register with energy model cpufreq: vexpress: Use .register_em() to register with energy model cpufreq: scpi: Use .register_em() to register with energy model dt-bindings: opp: Convert to DT schema dt-bindings: Clean-up OPP binding node names in examples ARM: dts: omap: Drop references to opp.txt cpufreq: qcom-cpufreq-hw: Use .register_em() to register with energy model cpufreq: omap: Use .register_em() to register with energy model cpufreq: mediatek: Use .register_em() to register with energy model cpufreq: imx6q: Use .register_em() to register with energy model ...	2021-09-08 16:38:25 -07:00
Linus Torvalds	9c566611ac	More ACPI updates for 5.15-rc1 - Add ACPI support to the PCI VMD driver (Rafael Wysocki). - Rearrange suspend-to-idle support code to reflect the platform firmware expectations on some AMD platforms (Mario Limonciello). - Make SSDT overlays documentation follow the code documented by it more closely (Andy Shevchenko). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmE41EgSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxZrgP/iStcM1PdEkzW9KInTbI7MDiQl8Iaem2 4AcbrsQmJxAEfJ+kzUuoArjj+y4T8sf49AA9Akg/q3zwf0oBix8JdtPDEx823oG8 7/0zjPJMigmcmGfGIlnQaSYqE30hatsthqF0iyH9AZjRzM1m9MavAtxrwDOD0Chq m6kMObNorm/C0mjdPy71DAbiPbrcsMTFjw27hXHWnfsQFhZeVAoyhh2aFvk790pG QRxpArI8r3dLb9vORQWo0q4jezPrRU6HzfvULVZEtv5+F8VUAby+qi1oGUSNx6CX OB20Z1MFPSolsJvyRkfE8HEq0x1Es37doBROolhmliaKUQezwKPMZKJGgEYyUaSJ bnWmN2wuE39VB6rIWXIaw6bHX3RwWnUJgoMvTZZIexp4kmmy9nsPB119na2odFVW D06yMPZwx9lCDVWNkIbpcCGHkBWvQSZ+X/tROVOgyutJ2Rgph0PTVxQxVZmTnTWm Pq6Tp8lSeatL16vEY75EX5pXbmKiGDIFrGv28Jxou2Arf31hcagY+rxu6YYORefu NdkC2GD4TlQMGm2Ukfo5D/svFzJ/MQvP75ytVP3Oqi7ZFLkU2RzYOun8oOcZjUk6 76CH5/fmSvV2NbZRCDzhxiDXVhBFPaGm0KFnYYKs1V7AskdhnzOkd03z2bseTC0V XNk0fHL1R38J =8lPe -----END PGP SIGNATURE----- Merge tag 'acpi-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI updates from Rafael Wysocki: "These add ACPI support to the PCI VMD driver, improve suspend-to-idle support for AMD platforms and update documentation. Specifics: - Add ACPI support to the PCI VMD driver (Rafael Wysocki) - Rearrange suspend-to-idle support code to reflect the platform firmware expectations on some AMD platforms (Mario Limonciello) - Make SSDT overlays documentation follow the code documented by it more closely (Andy Shevchenko)" * tag 'acpi-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: PM: s2idle: Run both AMD and Microsoft methods if both are supported Documentation: ACPI: Align the SSDT overlays file with the code PCI: VMD: ACPI: Make ACPI companion lookup work for VMD bus	2021-09-08 16:33:21 -07:00
Linus Torvalds	0f4b9289ba	Another collection of documentation patches, mostly fixes but also includes another set of traditional Chinese translations. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmE5GNkPHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5Y4mcH/2xrLUmUO8Bys+1AotIH0B2rMs7TVdo8RVa1 qdOjngtrm3FTQLBjfsViSo3snue8wZr2zguOBOBCFVtMHDtjwE61kztjXxD8fhdH bC0S/H1LWqSogNFnKBubvdEH1gNMKYlPMjFTMlwNknWBvYp+6Oq9HA17zbXcz9Nw AdY5yEZ915r38WjClfnYw+lVY4XDEDhakggK65IXnfbd7hqAR1WJ/JpyuXWMN+3Z NJyB9ztRgU043zuHPWICxR0IxbGyESOYckykBuYf0/gUACQIu/LcWSqzY9bcAN3A W5Bw7Y4P70WNrcnVoH9poz7CltEkEARaRo3cLEJoz09a3+NvdDA= =95+F -----END PGP SIGNATURE----- Merge tag 'docs-5.15-2' of git://git.lwn.net/linux Pull more documentation updates from Jonathan Corbet: "Another collection of documentation patches, mostly fixes but also includes another set of traditional Chinese translations" * tag 'docs-5.15-2' of git://git.lwn.net/linux: docs: pdfdocs: Fix typo in CJK-language specific font settings docs: kernel-hacking: Remove inappropriate text docs/zh_TW: add translations for zh_TW/filesystems docs/zh_TW: add translations for zh_TW/cpu-freq docs/zh_TW: add translations for zh_TW/arm64 docs/zh_CN: Modify the translator tag and fix the wrong word Documentation/features/vm: correct huge-vmap APIs Documentation: block: blk-mq: Fix small typo in multi-queue docs Documentation: in_irq() cleanup Documentation: arm: marvell: Add 88F6825 model into list Documentation/process/maintainer-pgp-guide: Replace broken link to PGP path finder Documentation: locking: fix references Documentation: Update details of The Linux Kernel Module Programming Guide docs: x86: Remove obsolete information about x86_64 vmalloc() faulting Documentation/process/applying-patches: Activate linux-next man hyperlink	2021-09-08 16:28:14 -07:00
Nick Desaulniers	b83a908498	compiler_attributes.h: move __compiletime_{error\|warning} Clang 14 will add support for __attribute__((__error__(""))) and __attribute__((__warning__(""))). To make use of these in __compiletime_error and __compiletime_warning (as used by BUILD_BUG and friends) for newer clang and detect/fallback for older versions of clang, move these to compiler_attributes.h and guard them with __has_attribute preprocessor guards. Link: https://reviews.llvm.org/D106030 Link: https://bugs.llvm.org/show_bug.cgi?id=16428 Link: https://github.com/ClangBuiltLinux/linux/issues/1173 Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> [Reworded, landed in Clang 14] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>	2021-09-09 01:14:28 +02:00
Linus Torvalds	6dcaf9fb62	Modules updates for v5.15 Summary of modules changes for the 5.15 merge window: - Add Luis Chamberlain as modules maintainer - Fix for .ctors sections in module linker script Signed-off-by: Jessica Yu <jeyu@kernel.org> -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEVrp26glSWYuDNrCUwEV+OM47wXIFAmE4aKoQHGpleXVAa2Vy bmVsLm9yZwAKCRDARX44zjvBcgKfD/0YfJTT5FP4dHV+XzrEVo110Mjbw65VbRct spNXlkmCJ1+uSKpx4YlOJhXE2/xw+qy2CO0nBVL1wV89rUoSFLG/ER9R3xXl7rJN h6MmB50VvwewsoQqzQbI58TGUmQXDhKUjyyoJFuvukwzJf20hKej2GNC8b88QQjH 0O/WORZQWkJpNRDpBmRNOpoBV7hTiPSzL/6Pfq9gOewiH+oDXIkLFPO6LsF890nd sQfrqWAnhgGW3fL17jLNMtY1j5aPP46t7wRq2xijkHTDfxQazUpJGeRt2H65Q1dK 024BtJ9CgENYFxiwyJyENj+aOTm43HWeOcgXqND53LHxYM5tVd8Lwl5PizHGfhkn ztDgyuCBCXKMbfhwyrmTUSPjjX47ktC2vBbe1GnjhhOQvOdhsF29K+bfS8nWnWMU WiJCA7WVcXNvPtCwb1CK9Pxm94Ju62IVw7r+nEFtzkV4F8g2GrXQIQMktBHAO9FV ZMQDld2XJzczo6/V5qG35bk9tOWu+vdnOmC/XcyKXl+2OLC3cpG1dsxsRK7RQ9iV 5CQ2d0Pgc5SntZVeegaBWtxHLyqEAA6ajhB5ctfLrS4EZkxXUPDlcxYT6hcg1+Mf pydevPUsKiHqIfbPNefvPR4Z5MFA4gAfdZAL+St/qbx3lntxCB05opSe8mNUCyi1 +KQXbQboNA== =FK20 -----END PGP SIGNATURE----- Merge tag 'modules-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull module updates from Jessica Yu: "The only main change I have for this round of updates is the modules MAINTAINERS update. As I find myself with less time to devote to upstream these days, Luis has kindly agreed to help maintain the module loader, to eventually transition to being the primary maintainer. Since Luis is already very involved upstream with experience maintaining various areas of the kernel including the kmod usermode helper, I think he is a great fit for this area of the kernel. Summary: - Add Luis Chamberlain as modules maintainer - Fix for .ctors sections in module linker script" * tag 'modules-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: MAINTAINERS: Add Luis Chamberlain as modules maintainer module: combine constructors in module linker script	2021-09-08 16:06:48 -07:00
Linus Torvalds	1511e5d64a	Microblaze patches for 5.15-rc1 - Kbuild clean up -----BEGIN PGP SIGNATURE----- iF0EABECAB0WIQQbPNTMvXmYlBPRwx7KSWXLKUoMIQUCYThuOAAKCRDKSWXLKUoM IYxVAJ9pjTyiG/PtgECUtCqFmX+ipIiSvQCeOqd3+O0Fg1ZeoNeZULdSOzXzcGo= =mlg+ -----END PGP SIGNATURE----- Merge tag 'microblaze-v5.15' of git://git.monstr.eu/linux-2.6-microblaze Pull microblaze update from Michal Simek: - Kbuild clean up * tag 'microblaze-v5.15' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: move core-y in arch/microblaze/Makefile to arch/microblaze/Kbuild	2021-09-08 16:02:13 -07:00
Dan Williams	3fc3725357	Merge branch 'for-5.15/fsdax-cleanups' into for-5.15/libnvdimm Include Christoph's rework of the dax_supported() helpers in the v5.15 libnvdimm update. This supports the ongoing dax-reflink enabling effort.	2021-09-08 15:58:13 -07:00
Linus Torvalds	14e2bc4e8c	Critical bug fixes: - Restore performance on memory-starved servers -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmE49dQACgkQM2qzM29m f5fnpA//dqRbqDAmWLJSoIc4WEkr5dEEqBYzRyqeefGRSANdqf7jQfjnWh9MKkIw rbBfwWRmiLN/9qsjAuxHJGnGeZG6BLBN3LKEgfvFFj3HNUHekEIsKP3HPhXeo49K 5U6JVZhcDHPTiVSqDVpumfdnZLRTAR7BMbduWYxdOK+dWdCFZ/zUrf1HWEoFJO6o Y+Kb2iGzuWQu+ie3Zh8jp797OUSt5FZXLLWPeOS1giBOWRz2+2z5pEj/tBSZuoOS IbzivQHQVWCt1q5CtsYY5sqxtpDgObCdDQQ7Pxo/qsxYv3D+56vll5lbZ513KHkd fWnk1q97QpjJI52jQY3kIx33FLVB0BWEGK0mrANQ8wQA7stq11Xc439GOY6CI1zZ NHz7VelzoR295s1bSMz1V66ZaP9o9d+CUKgWuT7x99hPbyqp90z8K71l6BrcM05u tP2YUObmAGfGusbG3OJvHLJWAo/22u4APowC0ZWVmF3FrCHXIdbDtQOrrb+h1Yqq 5wmshDQYCuh/sqpxx7VqseFUIIg4XQ0ziVDbVcDNxVkwDElu1Abd/mKf98+K3Q3G RYHrGGAEXz9HG4WzVKYl+k0GUV3vUiGH4pvLtBpJfDAGSP6zvsu64lb7IAoZVczm O/bQKWnJjYzEO/CM6vsCY15LFwRMC1F83c+8OhskDyvla2azzwQ= =BXCy -----END PGP SIGNATURE----- Merge tag 'nfsd-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Restore performance on memory-starved servers * tag 'nfsd-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: improve error response to over-size gss credential SUNRPC: don't pause on incomplete allocation	2021-09-08 15:55:42 -07:00
Linus Torvalds	8a05abd0c9	We have: - a set of patches to address fsync stalls caused by depending on periodic rather than triggered MDS journal flushes in some cases (Xiubo Li) - a fix for mtime effectively not getting updated in case of competing writers (Jeff Layton) - a couple of fixes for inode reference leaks and various WARNs after "umount -f" (Xiubo Li) - a new ceph.auth_mds extended attribute (Jeff Layton) - a smattering of fixups and cleanups from Jeff, Xiubo and Colin. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmE46mYTHGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzi9UEB/sGT4eqMzkQLzJ2XjpKUvxXaJNVdPvS Jmg26KV5wc9Y9v6L7ww/eQjxbTOnda3G2/XG0xiE8dC1vq54Vux/FKiAT+H2/z/9 onShFK+SARoF4DilKnY0JNCwcGxQ3FjWAgPqPKqAyTAX2wjVxDKFHB0C+7yhhJay wyDrRaaHyFc4TwHeiEi8xU7dB55XsvxWGUgnHbcOLyUbbBKddt98FadNZ2t9b76y EVwAxgY0RbUUFxOJ9VVjiaNLUP4532iXUn+fehMjRGmDCmjaLNxCrsq6d0p//LJV nhVRG+Mv8IfTjqZwFbnWV8xbGwX0lY+g+hn0cdi7urUH3GDa97vmJF3u =z6dR -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.15-rc1' of git://github.com/ceph/ceph-client Pull ceph updates from Ilya Dryomov: - a set of patches to address fsync stalls caused by depending on periodic rather than triggered MDS journal flushes in some cases (Xiubo Li) - a fix for mtime effectively not getting updated in case of competing writers (Jeff Layton) - a couple of fixes for inode reference leaks and various WARNs after "umount -f" (Xiubo Li) - a new ceph.auth_mds extended attribute (Jeff Layton) - a smattering of fixups and cleanups from Jeff, Xiubo and Colin. * tag 'ceph-for-5.15-rc1' of git://github.com/ceph/ceph-client: ceph: fix dereference of null pointer cf ceph: drop the mdsc_get_session/put_session dout messages ceph: lockdep annotations for try_nonblocking_invalidate ceph: don't WARN if we're forcibly removing the session caps ceph: don't WARN if we're force umounting ceph: remove the capsnaps when removing caps ceph: request Fw caps before updating the mtime in ceph_write_iter ceph: reconnect to the export targets on new mdsmaps ceph: print more information when we can't find snaprealm ceph: add ceph_change_snap_realm() helper ceph: remove redundant initializations from mdsc and session ceph: cancel delayed work instead of flushing on mdsc teardown ceph: add a new vxattr to return auth mds for an inode ceph: remove some defunct forward declarations ceph: flush the mdlog before waiting on unsafe reqs ceph: flush mdlog before umounting ceph: make iterate_sessions a global symbol ceph: make ceph_create_session_msg a global symbol ceph: fix comment about short copies in ceph_write_end ceph: fix memory leak on decode error in ceph_handle_caps	2021-09-08 15:50:32 -07:00
Linus Torvalds	34c59da473	9p for 5.15-rc1 a couple of harmless fixes, increase max tcp msize (64KB -> 1MB), and increase default msize (8KB -> 128KB) The default increase has been discussed with Christian for the qemu side of things but makes sense for all supported transports -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmE4wt4ACgkQq06b7GqY 5nC22RAAhujCsrvvwzRelIEycB5IOiBe0xcZItdyPNOAleWfL6tZ+U8/HC/8hb8z jQIG7D6DS0y+MDFFuCXorU9WChF+Wv2Rjj9AJvpBj0gugkbUUxRD4uKRjJgKopJ3 rONXnXUnaPvxwRTBFRdzecfIxeQUDw8YJo4WmUKZsB4rCOD8wYVNg+DJHl+CoJ3t E/D0/ztiKdQL5pGKT2fl8+MbFMBmWor7aiB5/ms8UaiN8ZaW0cUBI3JLcMJjPEbO ip0NXVfbR1UCs8sK8If2afJ/tUnwYTje42ll3fRJZqPZM9jPjVMgXqsP8b7sn5yi 5+/SpAa3Uszi8A9RxEnCsaEx4UWhbGe+54RFGnYSEcj109ZpRDeOo8V8VVg8tb2p y4f/xN6BdOUJekCxcF1/7e6RkXPCauCzQkN3yX6CL4Giu6jy6764hqO2plO8tlWZ zrL7RZDc2Rx4oborDdJL5pSpCYYfs9yuQz0b1JH+NoBfohDFWN3KFNFiSNxg51Eu hunPQK5gojEKsDD2SjD0hy4QfLt5pRaJILznwoEcu9GX9oMSj862IC+uCWExqZbE WFroQfi2OJmbtFJB/fFEYE/mIFdIeC6++ZxEGbY5MNun8W/hMQKJpK+Y9TBS1N1j dV5JJbTGMQLVAZkphC24L6n2iCtz9SoB5j5gbUXQZsd6LR3NL9c= =PLhf -----END PGP SIGNATURE----- Merge tag '9p-for-5.15-rc1' of git://github.com/martinetd/linux Pull 9p updates from Dominique Martinet: "A couple of harmless fixes, increase max tcp msize (64KB -> 1MB), and increase default msize (8KB -> 128KB) The default increase has been discussed with Christian for the qemu side of things but makes sense for all supported transports" * tag '9p-for-5.15-rc1' of git://github.com/martinetd/linux: net/9p: increase default msize to 128k net/9p: use macro to define default msize net/9p: increase tcp max msize to 1MB 9p/xen: Fix end of loop tests for list_for_each_entry 9p/trans_virtio: Remove sysfs file on probe failure	2021-09-08 15:40:39 -07:00
Arnd Bergmann	a7a08b275a	arch: remove compat_alloc_user_space All users of compat_alloc_user_space() and copy_in_user() have been removed from the kernel, only a few functions in sparc remain that can be changed to calling arch_copy_in_user() instead. Link: https://lkml.kernel.org/r/20210727144859.4150043-7-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 15:32:35 -07:00
Arnd Bergmann	59ab844eed	compat: remove some compat entry points These are all handled correctly when calling the native system call entry point, so remove the special cases. Link: https://lkml.kernel.org/r/20210727144859.4150043-6-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 15:32:35 -07:00
Arnd Bergmann	e130242dc3	mm: simplify compat numa syscalls The compat implementations for mbind, get_mempolicy, set_mempolicy and migrate_pages are just there to handle the subtly different layout of bitmaps on 32-bit hosts. The compat implementation however lacks some of the checks that are present in the native one, in particular for checking that the extra bits are all zero when user space has a larger mask size than the kernel. Worse, those extra bits do not get cleared when copying in or out of the kernel, which can lead to incorrect data as well. Unify the implementation to handle the compat bitmap layout directly in the get_nodes() and copy_nodes_to_user() helpers. Splitting out the get_bitmap() helper from get_nodes() also helps readability of the native case. On x86, two additional problems are addressed by this: compat tasks can pass a bitmap at the end of a mapping, causing a fault when reading across the page boundary for a 64-bit word. x32 tasks might also run into problems with get_mempolicy corrupting data when an odd number of 32-bit words gets passed. On parisc the migrate_pages() system call apparently had the wrong calling convention, as big-endian architectures expect the words inside of a bitmap to be swapped. This is not a problem though since parisc has no NUMA support. [arnd@arndb.de: fix mempolicy crash] Link: https://lkml.kernel.org/r/20210730143417.3700653-1-arnd@kernel.org Link: https://lore.kernel.org/lkml/YQPLG20V3dmOfq3a@osiris/ Link: https://lkml.kernel.org/r/20210727144859.4150043-5-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 15:32:35 -07:00
Arnd Bergmann	5b1b561ba7	mm: simplify compat_sys_move_pages The compat move_pages() implementation uses compat_alloc_user_space() for converting the pointer array. Moving the compat handling into the function itself is a bit simpler and lets us avoid the compat_alloc_user_space() call. Link: https://lkml.kernel.org/r/20210727144859.4150043-4-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 15:32:34 -07:00
Arnd Bergmann	5d700a0fd7	kexec: avoid compat_alloc_user_space kimage_alloc_init() expects a __user pointer, so compat_sys_kexec_load() uses compat_alloc_user_space() to convert the layout and put it back onto the user space caller stack. Moving the user space access into the syscall handler directly actually makes the code simpler, as the conversion for compat mode can now be done on kernel memory. Link: https://lkml.kernel.org/r/20210727144859.4150043-3-arnd@kernel.org Link: https://lore.kernel.org/lkml/YPbtsU4GX6PL7%2F42@infradead.org/ Link: https://lore.kernel.org/lkml/m1y2cbzmnw.fsf@fess.ebiederm.org/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Co-developed-by: Eric Biederman <ebiederm@xmission.com> Co-developed-by: Christoph Hellwig <hch@infradead.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Feng Tang <feng.tang@intel.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 15:32:34 -07:00
Arnd Bergmann	4b692e8616	kexec: move locking into do_kexec_load Patch series "compat: remove compat_alloc_user_space", v5. Going through compat_alloc_user_space() to convert indirect system call arguments tends to add complexity compared to handling the native and compat logic in the same code. This patch (of 6): The locking is the same between the native and compat version of sys_kexec_load(), so it can be done in the common implementation to reduce duplication. Link: https://lkml.kernel.org/r/20210727144859.4150043-1-arnd@kernel.org Link: https://lkml.kernel.org/r/20210727144859.4150043-2-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Co-developed-by: Eric Biederman <ebiederm@xmission.com> Co-developed-by: Christoph Hellwig <hch@infradead.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Feng Tang <feng.tang@intel.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 15:32:34 -07:00
Baolin Wang	213ecb3157	mm: migrate: change to use bool type for 'page_was_mapped' Change to use bool type for 'page_was_mapped' variable making it more readable. Link: https://lkml.kernel.org/r/ce1279df18d2c163998c403e0b5ec6d3f6f90f7a.1629447552.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 15:32:34 -07:00
Baolin Wang	68a9843f14	mm: migrate: fix the incorrect function name in comments since commit `a98a2f0c8c` ("mm/rmap: split migration into its own function"), the migration ptes establishment has been split into a separate try_to_migrate() function, thus update the related comments. Link: https://lkml.kernel.org/r/5b824bad6183259c916ae6cf42f81d14c6118b06.1629447552.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 15:32:34 -07:00
Baolin Wang	2b9b624f5a	mm: migrate: introduce a local variable to get the number of pages Use thp_nr_pages() instead of compound_nr() to get the number of pages for THP page, meanwhile introducing a local variable 'nr_pages' to avoid getting the number of pages repeatedly. Link: https://lkml.kernel.org/r/a8e331ac04392ee230c79186330fb05e86a2aa77.1629447552.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 15:32:34 -07:00
Ingo Molnar	c68ed79457	mm/vmstat: protect per cpu variables with preempt disable on RT Disable preemption on -RT for the vmstat code. On vanila the code runs in IRQ-off regions while on -RT it may not when stats are updated under a local_lock. "preempt_disable" ensures that the same resources is not updated in parallel due to preemption. This patch differs from the preempt-rt version where __count_vm_event and __count_vm_events are also protected. The counters are explicitly "allowed to be to be racy" so there is no need to protect them from preemption. Only the accurate page stats that are updated by a read-modify-write need protection. This patch also differs in that a preempt_[en\|dis]able_rt helper is not used. As vmstat is the only user of the helper, it was suggested that it be open-coded in vmstat.c instead of risking the helper being used in unnecessary contexts. Link: https://lkml.kernel.org/r/20210805160019.1137-2-mgorman@techsingularity.net Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 15:32:34 -07:00
Namjae Jeon	4cf0ccd033	ksmbd: fix control flow issues in sid_to_id() Addresses-Coverity reported Control flow issues in sid_to_id() /fs/ksmbd/smbacl.c: 277 in sid_to_id() 271 272 if (sidtype == SIDOWNER) { 273 kuid_t uid; 274 uid_t id; 275 276 id = le32_to_cpu(psid->sub_auth[psid->num_subauth - 1]); >>> CID 1506810: Control flow issues (NO_EFFECT) >>> This greater-than-or-equal-to-zero comparison of an unsigned value >>> is always true. "id >= 0U". 277 if (id >= 0) { 278 /* 279 * Translate raw sid into kuid in the server's user 280 * namespace. 281 */ 282 uid = make_kuid(&init_user_ns, id); Addresses-Coverity: ("Control flow issues") Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-09-08 17:16:13 -05:00
Namjae Jeon	4ffd5264e8	ksmbd: fix read of uninitialized variable ret in set_file_basic_info Addresses-Coverity reported Uninitialized variables warninig : /fs/ksmbd/smb2pdu.c: 5525 in set_file_basic_info() 5519 if (!rc) { 5520 inode->i_ctime = ctime; 5521 mark_inode_dirty(inode); 5522 } 5523 inode_unlock(inode); 5524 } >>> CID 1506805: Uninitialized variables (UNINIT) >>> Using uninitialized value "rc". 5525 return rc; 5526 } 5527 5528 static int set_file_allocation_info(struct ksmbd_work work, 5529 struct ksmbd_file fp, char *buf) 5530 { Addresses-Coverity: ("Uninitialized variable") Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-09-08 17:16:09 -05:00
Colin Ian King	36bbeb3365	ksmbd: add missing assignments to ret on ndr_read_int64 read calls Currently there are two ndr_read_int64 calls where ret is being checked for failure but ret is not being assigned a return value from the call. Static analyis is reporting the checks on ret as dead code. Fix this. Addresses-Coverity: ("Logical dead code") Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-09-08 17:15:48 -05:00
Pavel Begunkov	c57a91fb1c	io_uring: fix missing mb() before waitqueue_active In case of !SQPOLL, io_cqring_ev_posted_iopoll() doesn't provide a memory barrier required by waitqueue_active(&ctx->poll_wait). There is a wq_has_sleeper(), which does smb_mb() inside, but it's called only for SQPOLL. Fixes: `5fd4617840` ("io_uring: be smarter about waking multiple CQ ring waiters") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2982e53bcea2274006ed435ee2a77197107d8a29.1631130542.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-09-08 13:57:56 -06:00
Linus Torvalds	2d338201d5	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: "147 patches, based on `7d2a07b769`. Subsystems affected by this patch series: mm (memory-hotplug, rmap, ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan), alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib, checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig, selftests, ipc, and scripts" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits) scripts: check_extable: fix typo in user error message mm/workingset: correct kernel-doc notations ipc: replace costly bailout check in sysvipc_find_ipc() selftests/memfd: remove unused variable Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH configs: remove the obsolete CONFIG_INPUT_POLLDEV prctl: allow to setup brk for et_dyn executables pid: cleanup the stale comment mentioning pidmap_init(). kernel/fork.c: unexport get_{mm,task}_exe_file coredump: fix memleak in dump_vma_snapshot() fs/coredump.c: log if a core dump is aborted due to changed file permissions nilfs2: use refcount_dec_and_lock() to fix potential UAF nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group nilfs2: fix NULL pointer in nilfs_##name##_attr_release nilfs2: fix memory leak in nilfs_sysfs_create_device_group trap: cleanup trap_init() init: move usermodehelper_enable() to populate_rootfs() ...	2021-09-08 12:55:35 -07:00
Masami Hiramatsu	cfd799837d	tracing/boot: Fix to loop on only subkeys Since the commit `e5efaeb8a8` ("bootconfig: Support mixing a value and subkeys under a key") allows to co-exist a value node and key nodes under a node, xbc_node_for_each_child() is not only returning key node but also a value node. In the boot-time tracing using xbc_node_for_each_child() to iterate the events, groups and instances, but those must be key nodes. Thus it must use xbc_node_for_each_subkey(). Link: https://lkml.kernel.org/r/163112988361.74896.2267026262061819145.stgit@devnote2 Fixes: `e5efaeb8a8` ("bootconfig: Support mixing a value and subkeys under a key") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2021-09-08 15:44:32 -04:00
Linus Torvalds	cc09ee80c3	SLUB: reduce irq disabled scope and make it RT compatible This series was initially inspired by Mel's pcplist local_lock rewrite, and also interest to better understand SLUB's locking and the new primitives and RT variants and implications. It makes SLUB compatible with PREEMPT_RT and generally more preemption-friendly, apparently without significant regressions, as the fast paths are not affected. The main changes to SLUB by this series: * irq disabling is now only done for minimum amount of time needed to protect the strict kmem_cache_cpu fields, and as part of spin lock, local lock and bit lock operations to make them irq-safe * SLUB is fully PREEMPT_RT compatible Series is based on 5.14-rc6 and also available as a git branch: https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-local-lock-v5r0 The series should now be sufficiently tested in both RT and !RT configs, mainly thanks to Mike. The RFC/v1 version also got basic performance screening by Mel that didn't show major regressions. Mike's testing with hackbench of v2 on !RT reported negligible differences [6]: virgin(ish) tip 5.13.0.g60ab3ed-tip 7,320.67 msec task-clock # 7.792 CPUs utilized ( +- 0.31% ) 221,215 context-switches # 0.030 M/sec ( +- 3.97% ) 16,234 cpu-migrations # 0.002 M/sec ( +- 4.07% ) 13,233 page-faults # 0.002 M/sec ( +- 0.91% ) 27,592,205,252 cycles # 3.769 GHz ( +- 0.32% ) 8,309,495,040 instructions # 0.30 insn per cycle ( +- 0.37% ) 1,555,210,607 branches # 212.441 M/sec ( +- 0.42% ) 5,484,209 branch-misses # 0.35% of all branches ( +- 2.13% ) 0.93949 +- 0.00423 seconds time elapsed ( +- 0.45% ) 0.94608 +- 0.00384 seconds time elapsed ( +- 0.41% ) (repeat) 0.94422 +- 0.00410 seconds time elapsed ( +- 0.43% ) 5.13.0.g60ab3ed-tip +slub-local-lock-v2r3 7,343.57 msec task-clock # 7.776 CPUs utilized ( +- 0.44% ) 223,044 context-switches # 0.030 M/sec ( +- 3.02% ) 16,057 cpu-migrations # 0.002 M/sec ( +- 4.03% ) 13,164 page-faults # 0.002 M/sec ( +- 0.97% ) 27,684,906,017 cycles # 3.770 GHz ( +- 0.45% ) 8,323,273,871 instructions # 0.30 insn per cycle ( +- 0.28% ) 1,556,106,680 branches # 211.901 M/sec ( +- 0.31% ) 5,463,468 branch-misses # 0.35% of all branches ( +- 1.33% ) 0.94440 +- 0.00352 seconds time elapsed ( +- 0.37% ) 0.94830 +- 0.00228 seconds time elapsed ( +- 0.24% ) (repeat) 0.93813 +- 0.00440 seconds time elapsed ( +- 0.47% ) (repeat) RT configs showed some throughput regressions, but that's expected tradeoff for the preemption improvements through the RT mutex. It didn't prevent the v2 to be incorporated to the 5.13 RT tree [7], leading to testing exposure and bugfixes. Before the series, SLUB is lockless in both allocation and free fast paths, but elsewhere, it's disabling irqs for considerable periods of time - especially in allocation slowpath and the bulk allocation, where IRQs are re-enabled only when a new page from the page allocator is needed, and the context allows blocking. The irq disabled sections can then include deactivate_slab() which walks a full freelist and frees the slab back to page allocator or unfreeze_partials() going through a list of percpu partial slabs. The RT tree currently has some patches mitigating these, but we can do much better in mainline too. Patches 1-6 are straightforward improvements or cleanups that could exist outside of this series too, but are prerequsities. Patches 7-9 are also preparatory code changes without functional changes, but not so useful without the rest of the series. Patch 10 simplifies the fast paths on systems with preemption, based on (hopefully correct) observation that the current loops to verify tid are unnecessary. Patches 11-20 focus on reducing irq disabled scope in the allocation slowpath. Patch 11 moves disabling of irqs into ___slab_alloc() from its callers, which are the allocation slowpath, and bulk allocation. Instead these callers only disable preemption to stabilize the cpu. The following patches then gradually reduce the scope of disabled irqs in ___slab_alloc() and the functions called from there. As of patch 14, the re-enabling of irqs based on gfp flags before calling the page allocator is removed from allocate_slab(). As of patch 17, it's possible to reach the page allocator (in case of existing slabs depleted) without disabling and re-enabling irqs a single time. Pathces 21-26 reduce the scope of disabled irqs in functions related to unfreezing percpu partial slab. Patch 27 is preparatory. Patch 28 is adopted from the RT tree and converts the flushing of percpu slabs on all cpus from using IPI to workqueue, so that the processing isn't happening with irqs disabled in the IPI handler. The flushing is not performance critical so it should be acceptable. Patch 29 also comes from RT tree and makes object_map_lock RT compatible. Patch 30 make slab_lock irq-safe on RT where we cannot rely on having irq disabled from the list_lock spin lock usage. Patch 31 changes kmem_cache_cpu->partial handling in put_cpu_partial() from cmpxchg loop to a short irq disabled section, which is used by all other code modifying the field. This addresses a theoretical race scenario pointed out by Jann, and makes the critical section safe wrt with RT local_lock semantics after the conversion in patch 35. Patch 32 changes preempt disable to migrate disable, so that the nested list_lock spinlock is safe to take on RT. Because migrate_disable() is a function call even on !RT, a small set of private wrappers is introduced to keep using the cheaper preempt_disable() on !PREEMPT_RT configurations. As of this patch, SLUB should be already compatible with RT's lock semantics. Finally, patch 33 changes irq disabled sections that protect kmem_cache_cpu fields in the slow paths, with a local lock. However on PREEMPT_RT it means the lockless fast paths can now preempt slow paths which don't expect that, so the local lock has to be taken also in the fast paths and they are no longer lockless. RT folks seem to not mind this tradeoff. The patch also updates the locking documentation in the file's comment. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEjUuTAak14xi+SF7M4CHKc/GJqRAFAmEzSooACgkQ4CHKc/GJ qRC3Agf+MXJB5NVCOkwgEk9wipbFETrJDsvM2Yf2CrqbK9MzKtPNrL82lZHdgtq2 HJ5gT8QZTFQ7n8nbY3P6LRClDdtqYm8b7aX02qtc2JrM29wIQw8A1gummLkQDNRm s+vd0ndPc4V6mqJQqiTk1WB8F+SJ0u3LfjesbIlqgcWREzZaPgm+hw3UUEtz/tXu RiEkWI30u0S0X5/HimqK8pdmwGPvzX8l1N9Sc2VeoQoFPPL/Cm2D5jZR/xHtKLfW q4ZVVXdh/YtOWXMD0jOr9q/bxwLDWCkvWHEmAES5nT2apFmCuusZ3+XWzWf8bSX/ j3eTiiNHTaktf/mndEymEbztnqmfGQ== =3Jty -----END PGP SIGNATURE----- Merge tag 'mm-slub-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux Pull SLUB updates from Vlastimil Babka: "SLUB: reduce irq disabled scope and make it RT compatible This series was initially inspired by Mel's pcplist local_lock rewrite, and also interest to better understand SLUB's locking and the new primitives and RT variants and implications. It makes SLUB compatible with PREEMPT_RT and generally more preemption-friendly, apparently without significant regressions, as the fast paths are not affected. The main changes to SLUB by this series: - irq disabling is now only done for minimum amount of time needed to protect the strict kmem_cache_cpu fields, and as part of spin lock, local lock and bit lock operations to make them irq-safe - SLUB is fully PREEMPT_RT compatible The series should now be sufficiently tested in both RT and !RT configs, mainly thanks to Mike. The RFC/v1 version also got basic performance screening by Mel that didn't show major regressions. Mike's testing with hackbench of v2 on !RT reported negligible differences [6]: virgin(ish) tip 5.13.0.g60ab3ed-tip 7,320.67 msec task-clock # 7.792 CPUs utilized ( +- 0.31% ) 221,215 context-switches # 0.030 M/sec ( +- 3.97% ) 16,234 cpu-migrations # 0.002 M/sec ( +- 4.07% ) 13,233 page-faults # 0.002 M/sec ( +- 0.91% ) 27,592,205,252 cycles # 3.769 GHz ( +- 0.32% ) 8,309,495,040 instructions # 0.30 insn per cycle ( +- 0.37% ) 1,555,210,607 branches # 212.441 M/sec ( +- 0.42% ) 5,484,209 branch-misses # 0.35% of all branches ( +- 2.13% ) 0.93949 +- 0.00423 seconds time elapsed ( +- 0.45% ) 0.94608 +- 0.00384 seconds time elapsed ( +- 0.41% ) (repeat) 0.94422 +- 0.00410 seconds time elapsed ( +- 0.43% ) 5.13.0.g60ab3ed-tip +slub-local-lock-v2r3 7,343.57 msec task-clock # 7.776 CPUs utilized ( +- 0.44% ) 223,044 context-switches # 0.030 M/sec ( +- 3.02% ) 16,057 cpu-migrations # 0.002 M/sec ( +- 4.03% ) 13,164 page-faults # 0.002 M/sec ( +- 0.97% ) 27,684,906,017 cycles # 3.770 GHz ( +- 0.45% ) 8,323,273,871 instructions # 0.30 insn per cycle ( +- 0.28% ) 1,556,106,680 branches # 211.901 M/sec ( +- 0.31% ) 5,463,468 branch-misses # 0.35% of all branches ( +- 1.33% ) 0.94440 +- 0.00352 seconds time elapsed ( +- 0.37% ) 0.94830 +- 0.00228 seconds time elapsed ( +- 0.24% ) (repeat) 0.93813 +- 0.00440 seconds time elapsed ( +- 0.47% ) (repeat) RT configs showed some throughput regressions, but that's expected tradeoff for the preemption improvements through the RT mutex. It didn't prevent the v2 to be incorporated to the 5.13 RT tree [7], leading to testing exposure and bugfixes. Before the series, SLUB is lockless in both allocation and free fast paths, but elsewhere, it's disabling irqs for considerable periods of time - especially in allocation slowpath and the bulk allocation, where IRQs are re-enabled only when a new page from the page allocator is needed, and the context allows blocking. The irq disabled sections can then include deactivate_slab() which walks a full freelist and frees the slab back to page allocator or unfreeze_partials() going through a list of percpu partial slabs. The RT tree currently has some patches mitigating these, but we can do much better in mainline too. Patches 1-6 are straightforward improvements or cleanups that could exist outside of this series too, but are prerequsities. Patches 7-9 are also preparatory code changes without functional changes, but not so useful without the rest of the series. Patch 10 simplifies the fast paths on systems with preemption, based on (hopefully correct) observation that the current loops to verify tid are unnecessary. Patches 11-20 focus on reducing irq disabled scope in the allocation slowpath: - patch 11 moves disabling of irqs into ___slab_alloc() from its callers, which are the allocation slowpath, and bulk allocation. Instead these callers only disable preemption to stabilize the cpu. - The following patches then gradually reduce the scope of disabled irqs in ___slab_alloc() and the functions called from there. As of patch 14, the re-enabling of irqs based on gfp flags before calling the page allocator is removed from allocate_slab(). As of patch 17, it's possible to reach the page allocator (in case of existing slabs depleted) without disabling and re-enabling irqs a single time. Pathces 21-26 reduce the scope of disabled irqs in functions related to unfreezing percpu partial slab. Patch 27 is preparatory. Patch 28 is adopted from the RT tree and converts the flushing of percpu slabs on all cpus from using IPI to workqueue, so that the processing isn't happening with irqs disabled in the IPI handler. The flushing is not performance critical so it should be acceptable. Patch 29 also comes from RT tree and makes object_map_lock RT compatible. Patch 30 make slab_lock irq-safe on RT where we cannot rely on having irq disabled from the list_lock spin lock usage. Patch 31 changes kmem_cache_cpu->partial handling in put_cpu_partial() from cmpxchg loop to a short irq disabled section, which is used by all other code modifying the field. This addresses a theoretical race scenario pointed out by Jann, and makes the critical section safe wrt with RT local_lock semantics after the conversion in patch 35. Patch 32 changes preempt disable to migrate disable, so that the nested list_lock spinlock is safe to take on RT. Because migrate_disable() is a function call even on !RT, a small set of private wrappers is introduced to keep using the cheaper preempt_disable() on !PREEMPT_RT configurations. As of this patch, SLUB should be already compatible with RT's lock semantics. Finally, patch 33 changes irq disabled sections that protect kmem_cache_cpu fields in the slow paths, with a local lock. However on PREEMPT_RT it means the lockless fast paths can now preempt slow paths which don't expect that, so the local lock has to be taken also in the fast paths and they are no longer lockless. RT folks seem to not mind this tradeoff. The patch also updates the locking documentation in the file's comment" Mike Galbraith and Mel Gorman verified that their earlier testing observations still hold for the final series: Link: https://lore.kernel.org/lkml/89ba4f783114520c167cc915ba949ad2c04d6790.camel@gmx.de/ Link: https://lore.kernel.org/lkml/20210907082010.GB3959@techsingularity.net/ * tag 'mm-slub-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux: (33 commits) mm, slub: convert kmem_cpu_slab protection to local_lock mm, slub: use migrate_disable() on PREEMPT_RT mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg mm, slub: make slab_lock() disable irqs with PREEMPT_RT mm: slub: make object_map_lock a raw_spinlock_t mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context mm, slab: split out the cpu offline variant of flush_slab() mm, slub: don't disable irqs in slub_cpu_dead() mm, slub: only disable irq with spin_lock in __unfreeze_partials() mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing mm, slub: detach whole partial list at once in unfreeze_partials() mm, slub: discard slabs in unfreeze_partials() without irqs disabled mm, slub: move irq control into unfreeze_partials() mm, slub: call deactivate_slab() without disabling irqs mm, slub: make locking in deactivate_slab() irq-safe mm, slub: move reset of c->page and freelist out of deactivate_slab() mm, slub: stop disabling irqs around get_partial() mm, slub: check new pages with restored irqs mm, slub: validate slab from partial list or page allocator before making it cpu slab mm, slub: restore irqs around calling new_slab() ...	2021-09-08 12:36:00 -07:00
Steven Rostedt (VMware)	04178ea130	selftests/ftrace: Exclude "(fault)" in testing add/remove eprobe events The original test for adding and removing eprobes used synthetic events and retrieved the filename from the open system call at the end of the system call. This would allow it to always be loaded into the page tables when accessed. Masami suggested that the test was too complex for just testing add and remove, so it was changed to test just adding and removing an event probe on top of the start of the open system call event. Now it is possible that the filename will not be loaded into memory at the time the eprobe is triggered, and will result in "(fault)" being displayed in the event. This causes the test to fail. Account for "(fault)" also being one of the values of the filename field of the event probe. Link: https://lkml.kernel.org/r/20210907230429.5783d519@rorschach.local.home Fixes: `079db70794` ("selftests/ftrace: Add test case to test adding and removing of event probe") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2021-09-08 15:29:16 -04:00
Tom Zanussi	c910db943d	tracing: Dynamically allocate the per-elt hist_elt_data array Setting the hist_elt_data.field_var_str[] array unconditionally to a size of SYNTH_FIELD_MAX elements wastes space unnecessarily. The actual number of elements needed can be calculated at run-time instead. In most cases, this will save a lot of space since it's a per-elt array which isn't normally close to being full. It also allows us to increase SYNTH_FIELD_MAX without worrying about even more wastage when we do that. Link: https://lkml.kernel.org/r/d52ae0ad5e1b59af7c4f54faf3fc098461fd82b3.camel@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2021-09-08 15:29:16 -04:00
Artem Bityutskiy	0be083cee4	tracing: synth events: increase max fields count Sometimes it is useful to construct larger synthetic trace events. Increase 'SYNTH_FIELDS_MAX' (maximum number of fields in a synthetic event) from 32 to 64. Link: https://lkml.kernel.org/r/20210901135513.3087062-1-dedekind1@gmail.com Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2021-09-08 15:29:16 -04:00
Masami Hiramatsu	47914d4e59	tools/bootconfig: Show whole test command for each test case Show whole test command instead of only the 3rd argument. This will clear to show what will be actually tested by each test case. Link: https://lkml.kernel.org/r/163077088607.222577.14786016266462495017.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2021-09-08 15:29:16 -04:00

... 2 3 4 5 6 ...

1042693 Commits