linux

iv/linux

History

Vlastimil Babka 3286222fc6 mm, slub: better heuristic for number of cpus when calculating slab order When creating a new kmem cache, SLUB determines how large the slab pages will based on number of inputs, including the number of CPUs in the system. Larger slab pages mean that more objects can be allocated/free from per-cpu slabs before accessing shared structures, but also potentially more memory can be wasted due to low slab usage and fragmentation. The rough idea of using number of CPUs is that larger systems will be more likely to benefit from reduced contention, and also should have enough memory to spare. Number of CPUs used to be determined as nr_cpu_ids, which is number of possible cpus, but on some systems many will never be onlined, thus commit `045ab8c948` ("mm/slub: let number of online CPUs determine the slub page order") changed it to nr_online_cpus(). However, for kmem caches created early before CPUs are onlined, this may lead to permamently low slab page sizes. Vincent reports a regression [1] of hackbench on arm64 systems: "I'm facing significant performances regression on a large arm64 server system (224 CPUs). Regressions is also present on small arm64 system (8 CPUs) but in a far smaller order of magnitude On 224 CPUs system : 9 iterations of hackbench -l 16000 -g 16 v5.11-rc4 : 9.135sec (+/- 0.45%) v5.11-rc4 + revert this patch: 3.173sec (+/- 0.48%) v5.10: 3.136sec (+/- 0.40%)" Mel reports a regression [2] of hackbench on x86_64, with lockstat suggesting page allocator contention: "i.e. the patch incurs a 7% to 32% performance penalty. This bisected cleanly yesterday when I was looking for the regression and then found the thread. Numerous caches change size. For example, kmalloc-512 goes from order-0 (vanilla) to order-2 with the revert. So mostly this is down to the number of times SLUB calls into the page allocator which only caches order-0 pages on a per-cpu basis" Clearly num_online_cpus() doesn't work too early in bootup. We could change the order dynamically in a memory hotplug callback, but runtime order changing for existing kmem caches has been already shown as dangerous, and removed in `32a6f409b6` ("mm, slub: remove runtime allocation order changes"). It could be resurrected in a safe manner with some effort, but to fix the regression we need something simpler. We could use num_present_cpus() that should be the number of physically present CPUs even before they are onlined. That would work for PowerPC [3], which triggered the original commit, but that still doesn't work on arm64 [4] as explained in [5]. So this patch tries to determine the best available value without specific arch knowledge. - num_present_cpus() if the number is larger than 1, as that means the arch is likely setting it properly - nr_cpu_ids otherwise This should fix the reported regressions while also keeping the effect of `045ab8c948` for PowerPC systems. It's possible there are configurations where num_present_cpus() is 1 during boot while nr_cpu_ids is at the same time bloated, so these (if they exist) would keep the large orders based on nr_cpu_ids as was before `045ab8c948`. [1] https://lore.kernel.org/linux-mm/CAKfTPtA_JgMf_+zdFbcb_V9rM7JBWNPjAz9irgwFj7Rou=xzZg@mail.gmail.com/ [2] https://lore.kernel.org/linux-mm/20210128134512.GF3592@techsingularity.net/ [3] https://lore.kernel.org/linux-mm/20210123051607.GC2587010@in.ibm.com/ [4] https://lore.kernel.org/linux-mm/CAKfTPtAjyVmS5VYvU6DBxg4-JEo5bdmWbngf-03YsY18cmWv_g@mail.gmail.com/ [5] https://lore.kernel.org/linux-mm/20210126230305.GD30941@willie-the-truck/ Link: https://lkml.kernel.org/r/20210208134108.22286-1-vbabka@suse.cz Fixes: `045ab8c948` ("mm/slub: let number of online CPUs determine the slub page order") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Vincent Guittot <vincent.guittot@linaro.org> Reported-by: Mel Gorman <mgorman@techsingularity.net> Tested-by: Mel Gorman <mgorman@techsingularity.net> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Bharata B Rao <bharata@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jann Horn <jannh@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2021-02-10 11:19:27 -08:00
..
kasan	kasan: fix stack traces dependency for HW_TAGS	2021-02-09 17:26:44 -08:00
backing-dev.c	mm:backing-dev: use sysfs_emit in macro defining functions	2020-12-15 12:13:47 -08:00
balloon_compaction.c
cleancache.c
cma_debug.c	debugfs: make sure we can remove u32_array files cleanly	2020-07-10 13:54:00 -07:00
cma.c	mm: cma: improve pr_debug log in cma_release()	2020-12-15 12:13:46 -08:00
cma.h	mm: cma: use CMA_MAX_NAME to define the length of cma name array	2020-09-01 09:19:43 +02:00
compaction.c	mm, compaction: move high_pfn to the for loop scope	2021-02-05 11:03:47 -08:00
debug_page_ref.c
debug_vm_pgtable.c	mm/debug_vm_pgtable: avoid doing memory allocation with pgtable_t mapped.	2020-10-16 11:11:14 -07:00
debug.c	mm: memcontrol: Use helpers to read page's memcg data	2020-12-02 18:28:05 -08:00
dmapool.c	mm/dmapool.c: replace hard coded function name with __func__	2020-10-13 18:38:32 -07:00
early_ioremap.c
fadvise.c	mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED	2020-10-13 18:38:29 -07:00
failslab.c
filemap.c	mm/filemap: add missing mem_cgroup_uncharge() to __add_to_page_cache_locked()	2021-02-05 11:03:47 -08:00
frame_vector.c	mmap locking API: convert mmap_sem comments	2020-06-09 09:39:14 -07:00
frontswap.c	mm/frontswap: mark various intentional data races	2020-08-14 19:56:56 -07:00
gup_test.c	mm/gup_test.c: mark gup_test_init as __init function	2020-12-15 12:13:38 -08:00
gup_test.h	selftests/vm: gup_test: introduce the dump_pages() sub-test	2020-12-15 12:13:38 -08:00
gup.c	Merge branch 'akpm' (patches from Andrew)	2020-12-15 12:53:37 -08:00
highmem.c	mm/highmem: prepare for overriding set_pte_at()	2021-01-24 10:34:52 -08:00
hmm.c	mm: do page fault accounting in handle_mm_fault	2020-08-12 10:58:02 -07:00
huge_memory.c	mm: thp: fix MADV_REMOVE deadlock on shmem THP	2021-02-05 11:03:47 -08:00
hugetlb_cgroup.c	hugetlb_cgroup: fix offline of hugetlb cgroup with reservations	2020-12-06 10:19:07 -08:00
hugetlb.c	mm: hugetlb: fix missing put_page in gather_surplus_pages()	2021-02-05 11:03:47 -08:00
hwpoison-inject.c	mm,hwpoison-inject: don't pin for hwpoison_filter	2020-10-16 11:11:16 -07:00
init-mm.c	mm/gup: prevent gup_fast from racing with COW during fork	2020-12-15 12:13:39 -08:00
internal.h	mm, page_alloc: disable pcplists during memory offline	2020-12-15 12:13:43 -08:00
interval_tree.c
ioremap.c	mm: move p?d_alloc_track to separate header file	2020-08-07 11:33:26 -07:00
Kconfig	mm/Kconfig: fix spelling mistake "whats" -> "what's"	2020-12-19 11:25:41 -08:00
Kconfig.debug	mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO	2020-12-15 12:13:46 -08:00
khugepaged.c	mm: fix some spelling mistakes in comments	2020-12-15 22:46:19 -08:00
kmemleak.c	mm/kmemleak: rely on rcu for task stack scanning	2020-10-13 18:38:27 -07:00
ksm.c	mm: cleanup kstrto*() usage	2020-12-15 12:13:47 -08:00
list_lru.c	mm: list_lru: set shrinker map bit when child nr_items is not zero	2020-12-06 10:19:07 -08:00
maccess.c	uaccess: add force_uaccess_{begin,end} helpers	2020-08-12 10:57:59 -07:00
madvise.c	mm,memory_failure: always pin the page in madvise_inject_error	2020-12-15 12:13:44 -08:00
Makefile	mm: mmap_lock: add tracepoints around lock acquisition	2020-12-15 12:13:41 -08:00
mapping_dirty_helpers.c	mm/mapping_dirty_helpers: enhance the kernel-doc markups	2020-12-15 12:13:41 -08:00
memblock.c	memblock: do not start bottom-up allocations with kernel_end	2021-02-05 11:03:47 -08:00
memcontrol.c	Revert "mm: memcontrol: avoid workload stalls when lowering memory.high"	2021-02-09 17:26:44 -08:00
memfd.c
memory_hotplug.c	mm: memmap defer init doesn't work as expected	2020-12-29 15:36:49 -08:00
memory-failure.c	mm: fix page reference leak in soft_offline_page()	2021-01-24 10:34:52 -08:00
memory.c	mm: generalise COW SMC TLB flushing race comment	2020-12-29 15:36:49 -08:00
mempolicy.c	mm: migrate: initialize err in do_migrate_pages	2021-01-12 18:12:54 -08:00
mempool.c	kasan, mm: rename kasan_poison_kfree	2020-12-22 12:55:09 -08:00
memremap.c	mm/mremap_pages: fix static key devmap_managed_key updates	2020-11-02 12:14:18 -08:00
memtest.c
migrate.c	mm: migrate: do not migrate HugeTLB page whose refcount is one	2021-02-05 11:03:47 -08:00
mincore.c	mm: factor find_get_incore_page out of mincore_page	2020-10-13 18:38:29 -07:00
mlock.c	mm/lru: introduce relock_page_lruvec()	2020-12-15 14:48:04 -08:00
mm_init.c	mm: fix fall-through warnings for Clang	2020-12-15 12:13:47 -08:00
mmap_lock.c	mm: mmap_lock: add tracepoints around lock acquisition	2020-12-15 12:13:41 -08:00
mmap.c	UAPI Changes:	2020-12-18 12:38:28 -08:00
mmu_gather.c	mmap locking API: convert mmap_sem comments	2020-06-09 09:39:14 -07:00
mmu_notifier.c	mm: track mmu notifiers in fs_reclaim_acquire/release	2020-12-15 12:13:41 -08:00
mmzone.c	mm/lru: replace pgdat lru_lock with lruvec lock	2020-12-15 14:48:04 -08:00
mprotect.c	mm: Add 'mprotect' hook to struct vm_operations_struct	2020-11-17 14:36:14 +01:00
mremap.c	mm/mremap: fix BUILD_BUG_ON() error in get_extent	2021-02-09 17:26:44 -08:00
msync.c	mmap locking API: use coccinelle to convert mmap_sem rwsem call sites	2020-06-09 09:39:14 -07:00
nommu.c	mm: cleanup: remove unused tsk arg from __access_remote_vm	2020-12-15 12:13:40 -08:00
oom_kill.c	mm/oom_kill: change comment and rename is_dump_unreclaim_slabs()	2020-12-15 12:13:45 -08:00
page_alloc.c	Revert "mm: fix initialization of struct page for holes in memory layout"	2021-01-26 10:39:46 -08:00
page_counter.c	mm/page_counter: use page_counter_read in page_counter_set_max	2020-12-15 12:13:40 -08:00
page_ext.c	mm: fix some spelling mistakes in comments	2020-12-15 22:46:19 -08:00
page_idle.c	mm: page_idle_get_page() does not need lru_lock	2020-12-15 14:48:03 -08:00
page_io.c	mm: memcontrol: Use helpers to read page's memcg data	2020-12-02 18:28:05 -08:00
page_isolation.c	mm/page_isolation: do not isolate the max order page	2020-12-15 12:13:45 -08:00
page_owner.c	mm/page_owner: record timestamp and pid	2020-12-15 12:13:38 -08:00
page_poison.c	kasan, mm: reset tags when accessing metadata	2020-12-22 12:55:08 -08:00
page_reporting.c	mm: rename page_order() to buddy_order()	2020-10-16 11:11:19 -07:00
page_reporting.h	mm: introduce include/linux/pgtable.h	2020-06-09 09:39:13 -07:00
page_vma_mapped.c	mm/page_vma_mapped.c: add colon to fix kernel-doc markups error for check_pte	2020-12-15 12:13:41 -08:00
page-writeback.c	mm: make wait_on_page_writeback() wait for multiple pending writebacks	2021-01-05 11:33:00 -08:00
pagewalk.c	mmap locking API: convert mmap_sem comments	2020-06-09 09:39:14 -07:00
percpu-internal.h	mm: memcg/percpu: account percpu memory to memory cgroups	2020-08-12 10:57:55 -07:00
percpu-km.c	mm: memcg/percpu: account percpu memory to memory cgroups	2020-08-12 10:57:55 -07:00
percpu-stats.c	mm: memcg/percpu: account percpu memory to memory cgroups	2020-08-12 10:57:55 -07:00
percpu-vm.c	mm: memcg/percpu: account percpu memory to memory cgroups	2020-08-12 10:57:55 -07:00
percpu.c	percpu: convert flexible array initializers to use struct_size()	2020-10-30 23:02:28 +00:00
pgalloc-track.h	mm: move p?d_alloc_track to separate header file	2020-08-07 11:33:26 -07:00
pgtable-generic.c	mm: introduce include/linux/pgtable.h	2020-06-09 09:39:13 -07:00
process_vm_access.c	mm/process_vm_access.c: include compat.h	2021-01-12 18:12:54 -08:00
ptdump.c	kasan, arm64: expand CONFIG_KASAN checks	2020-12-22 12:55:08 -08:00
readahead.c	mm: use limited read-ahead to satisfy read	2020-10-17 13:49:08 -06:00
rmap.c	mm/lru: revise the comments of lru_lock	2020-12-15 14:48:04 -08:00
rodata_test.c	mm/rodata_test.c: fix missing function declaration	2020-08-21 09:52:53 -07:00
shmem.c	mm: shmem: convert shmem_enabled_show to use sysfs_emit_at	2020-12-15 12:13:47 -08:00
shuffle.c	mm: rename page_order() to buddy_order()	2020-10-16 11:11:19 -07:00
shuffle.h	mm/shuffle: remove dynamic reconfiguration	2020-08-07 11:33:29 -07:00
slab_common.c	kasan, mm: allow cache merging with no metadata	2020-12-22 12:55:09 -08:00
slab.c	mm: introduce debug_pagealloc_{map,unmap}_pages() helpers	2020-12-15 12:13:43 -08:00
slab.h	Networking updates for 5.11	2020-12-15 13:22:29 -08:00
slob.c	mm: extract might_alloc() debug check	2020-12-15 12:13:41 -08:00
slub.c	mm, slub: better heuristic for number of cpus when calculating slab order	2021-02-10 11:19:27 -08:00
sparse-vmemmap.c	mm/sparse: only sub-section aligned range would be populated	2020-08-07 11:33:27 -07:00
sparse.c	mm/memory_hotplug: guard more declarations by CONFIG_MEMORY_HOTPLUG	2020-10-16 11:11:18 -07:00
swap_cgroup.c	mm: memcontrol: make swap tracking an integral part of memory control	2020-06-03 20:09:48 -07:00
swap_slots.c	mm/swap_slots.c: remove always zero and unused return value of enable_swap_slots_cache()	2020-10-13 18:38:30 -07:00
swap_state.c	mm: use sysfs_emit for struct kobject * uses	2020-12-15 12:13:47 -08:00
swap.c	mm/lru: introduce relock_page_lruvec()	2020-12-15 14:48:04 -08:00
swapfile.c	mm: fix a race on nr_swap_pages	2020-12-15 22:46:15 -08:00
truncate.c	mm: fix kernel-doc markups	2020-12-15 12:13:47 -08:00
usercopy.c	mm/usercopy.c: delete duplicated word	2020-08-12 10:57:58 -07:00
userfaultfd.c	mm/vmscan: protect the workingset on anonymous LRU	2020-08-12 10:57:55 -07:00
util.c	mm: introduce vma_set_file function v5	2020-11-19 10:36:36 +01:00
vmacache.c	kernel: better document the use_mm/unuse_mm API contract	2020-06-10 19:14:18 -07:00
vmalloc.c	mm/vmalloc.c: fix potential memory leak	2021-01-12 18:12:54 -08:00
vmpressure.c
vmscan.c	mm: don't put pinned pages into the swap cache	2021-01-17 12:08:04 -08:00
vmstat.c	arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL	2020-12-15 12:13:42 -08:00
workingset.c	Merge branch 'akpm' (patches from Andrew)	2020-12-15 14:55:10 -08:00
z3fold.c	z3fold: remove preempt disabled sections for RT	2020-12-15 12:13:45 -08:00
zbud.c	mm/zbud: remove redundant initialization	2020-10-13 18:38:34 -07:00
zpool.c	mm/zpool.c: delete duplicated word and fix grammar	2020-08-12 10:57:58 -07:00
zsmalloc.c	mm/zsmalloc.c: rework the list_add code in insert_zspage()	2020-12-15 12:13:46 -08:00
zswap.c	mm/zswap: move to use crypto_acomp API for hardware acceleration	2020-12-15 12:13:46 -08:00