linux

iv/linux

History

Vlastimil Babka 3450a0e5a6 mm/slub: optimize alloc fastpath code layout

With allocation fastpaths no longer divided between two .c files, we
have better inlining, however checking the disassembly of
kmem_cache_alloc() reveals we can do better to make the fastpaths
smaller and move the less common situations out of line or to separate
functions, to reduce instruction cache pressure.

- split memcg pre/post alloc hooks to inlined checks that use likely()
  to assume there will be no objcg handling necessary, and non-inline
  functions doing the actual handling

- add some more likely/unlikely() to pre/post alloc hooks to indicate
  which scenarios should be out of line

- change gfp_allowed_mask handling in slab_post_alloc_hook() so the
  code can be optimized away when kasan/kmsan/kmemleak is configured out

bloat-o-meter shows:
add/remove: 4/2 grow/shrink: 1/8 up/down: 521/-2924 (-2403)
Function                                     old     new   delta
__memcg_slab_post_alloc_hook                   -     461    +461
kmem_cache_alloc_bulk                        775     791     +16
__pfx_should_failslab.constprop                -      16     +16
__pfx___memcg_slab_post_alloc_hook             -      16     +16
should_failslab.constprop                      -      12     +12
__pfx_memcg_slab_post_alloc_hook              16       -     -16
kmem_cache_alloc_lru                        1295    1023    -272
kmem_cache_alloc_node                       1118     817    -301
kmem_cache_alloc                            1076     772    -304
kmalloc_node_trace                          1149     838    -311
kmalloc_trace                               1102     789    -313
__kmalloc_node_track_caller                 1393    1080    -313
__kmalloc_node                              1397    1082    -315
__kmalloc                                   1374    1059    -315
memcg_slab_post_alloc_hook                   464       -    -464

Note that gcc still decided to inline __memcg_pre_alloc_hook(), but the
code is out of line. Forcing noinline did not improve the results. As a
result the fastpaths are shorter and overal code size is reduced.

Acked-by: David Rientjes <rientjes@google.com>
Tested-by: David Rientjes <rientjes@google.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

2023-12-06 11:57:22 +01:00

damon

As usual, lots of singleton and doubleton patches all over the tree and

2023-11-02 20:53:31 -10:00

kasan

mm/slab: move pre/post-alloc hooks from slab.h to slub.c

2023-12-06 11:57:21 +01:00

kfence

KFENCE: cleanup kfence_guarded_alloc() after CONFIG_SLAB removal

2023-12-05 11:17:58 +01:00

kmsan

mm: kmsan: panic on failure to allocate early boot metadata

2023-10-25 16:47:10 -07:00

backing-dev.c

writeback: remove redundant checks for root memcg

2023-08-21 13:37:48 -07:00

balloon_compaction.c

…

bootmem_info.c

bootmem: use kmemleak_free_part_phys in put_page_bootmem

2023-10-25 16:47:13 -07:00

cma_debug.c

…

cma_sysfs.c

mm: cma: make kobj_type structure constant

2023-03-28 16:20:06 -07:00

cma.c

mm/cma: use nth_page() in place of direct struct page manipulation

2023-10-04 10:32:29 -07:00

cma.h

…

compaction.c

mm/compaction: factor out code to test if we should run compaction for target order

2023-10-04 10:32:19 -07:00

debug_page_alloc.c

mm: page_alloc: split out DEBUG_PAGEALLOC

2023-06-09 16:25:23 -07:00

debug_page_ref.c

…

debug_vm_pgtable.c

mm: fix multiple typos in multiple files

2023-10-25 16:47:14 -07:00

debug.c

mm: update validate_mm() to use vma iterator

2023-06-09 16:25:31 -07:00

dmapool_test.c

dmapool: add alloc/free performance test

2023-04-05 19:42:38 -07:00

dmapool.c

mm/mempool/dmapool: remove CONFIG_DEBUG_SLAB ifdefs

2023-12-05 11:17:58 +01:00

early_ioremap.c

mm/early_ioremap.c: improve the execution efficiency of early_ioremap_setup()

2023-06-09 16:25:56 -07:00

fadvise.c

mm: remove unnecessary pagevec includes

2023-06-23 16:59:31 -07:00

fail_page_alloc.c

mm: page_alloc: split out FAIL_PAGE_ALLOC

2023-06-09 16:25:23 -07:00

failslab.c

mm: fix unexpected changes to {failslab|fail_page_alloc}.attr

2022-11-22 18:50:44 -08:00

filemap.c

mm: drop the assumption that VM_SHARED always implies writable

2023-10-18 14:34:19 -07:00

folio-compat.c

filemap: Add fgf_t typedef

2023-07-24 18:04:30 -04:00

gup_test.c

Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes.

2023-06-23 16:58:19 -07:00

gup_test.h

mm/gup_test: start/stop/read functionality for PIN LONGTERM test

2022-11-08 17:37:15 -08:00

gup.c

mm/gup: make failure to pin an error if FOLL_NOWAIT not specified

2023-10-18 14:34:15 -07:00

highmem.c

mm: ptep_get() conversion

2023-06-19 16:19:25 -07:00

hmm.c

mm: enable page walking API to lock vmas during the walk

2023-08-21 13:07:20 -07:00

huge_memory.c

mm: huge_memory: use folio_xchg_last_cpupid() in __split_huge_page_tail()

2023-10-25 16:47:12 -07:00

hugetlb_cgroup.c

mm, hugetlb: remove HUGETLB_CGROUP_MIN_ORDER

2023-10-18 14:34:17 -07:00

hugetlb_vmemmap.c

hugetlb_vmemmap: use folio argument for hugetlb_vmemmap_* functions

2023-10-25 16:47:08 -07:00

hugetlb_vmemmap.h

mm: hugetlb_vmemmap: fix reference to nonexistent file

2023-10-25 16:47:14 -07:00

hugetlb.c

mempolicy: mmap_lock is not needed while migrating folios

2023-10-25 16:47:16 -07:00

hwpoison-inject.c

…

init-mm.c

mm: move dummy_vm_ops out of a header

2023-08-21 13:37:46 -07:00

internal.h

mm: add page_rmappable_folio() wrapper

2023-10-25 16:47:16 -07:00

interval_tree.c

…

io-mapping.c

…

ioremap.c

mm: ioremap: remove unneeded ioremap_allowed and iounmap_allowed

2023-08-18 10:12:36 -07:00

Kconfig

mm/slab: remove CONFIG_SLAB from all Kconfig and Makefile

2023-12-05 11:14:40 +01:00

Kconfig.debug

mm/slab: remove CONFIG_SLAB from all Kconfig and Makefile

2023-12-05 11:14:40 +01:00

khugepaged.c

As usual, lots of singleton and doubleton patches all over the tree and

2023-11-02 20:53:31 -10:00

kmemleak.c

mm/kmemleak: move the initialisation of object to __link_object

2023-10-25 16:47:13 -07:00

ksm.c

mm/ksm: add pages_skipped metric

2023-10-16 15:44:39 -07:00

list_lru.c

…

maccess.c

mm: Fix copy_from_user_nofault().

2023-04-12 17:36:23 -07:00

madvise.c

mm: drop the assumption that VM_SHARED always implies writable

2023-10-18 14:34:19 -07:00

Makefile

mm/slab: remove CONFIG_SLAB from all Kconfig and Makefile

2023-12-05 11:14:40 +01:00

mapping_dirty_helpers.c

mm: fix clean_record_shared_mapping_range kernel-doc

2023-08-24 16:20:30 -07:00

memblock.c

memblock: report failures when memblock_can_resize is not set

2023-11-08 09:40:13 -08:00

memcontrol.c

mm/slab: move pre/post-alloc hooks from slab.h to slub.c

2023-12-06 11:57:21 +01:00

memfd.c

memfd: drop warning for missing exec-related flags

2023-10-04 10:32:22 -07:00

memory_hotplug.c

mm: memory_hotplug: drop memoryless node from fallback lists

2023-10-25 16:47:14 -07:00

memory-failure.c

mm: convert DAX lock/unlock page to lock/unlock folio

2023-10-04 10:32:20 -07:00

memory-tiers.c

dax, kmem: calculate abstract distance with general interface

2023-10-16 15:44:39 -07:00

memory.c

mm: use folio_xchg_last_cpupid() in wp_page_reuse()

2023-10-25 16:47:13 -07:00

mempolicy.c

Many singleton patches against the MM code. The patch series which are

2023-11-02 19:38:47 -10:00

mempool.c

mm/mempool/dmapool: remove CONFIG_DEBUG_SLAB ifdefs

2023-12-05 11:17:58 +01:00

memremap.c

mm/memremap.c: fix outdated comment in devm_memremap_pages

2023-02-09 16:51:46 -08:00

memtest.c

mm: memtest: convert to memtest_report_meminfo()

2023-08-21 13:37:47 -07:00

migrate_device.c

Add x86 shadow stack support

2023-08-31 12:20:12 -07:00

migrate.c

mm: migrate: record the mlocked page status to remove unnecessary lru drain

2023-10-25 16:47:14 -07:00

mincore.c

mm: enable page walking API to lock vmas during the walk

2023-08-21 13:07:20 -07:00

mlock.c

mm: mlock: avoid folio_within_range() on KSM pages

2023-10-25 16:47:14 -07:00

mm_init.c

mm: hugetlb: skip initialization of gigantic tail struct pages if freed by HVO

2023-10-04 10:32:30 -07:00

mm_slot.h

…

mmap_lock.c

…

mmap.c

Many singleton patches against the MM code. The patch series which are

2023-11-02 19:38:47 -10:00

mmu_gather.c

mm: fix kernel-doc warning from tlb_flush_rmaps()

2023-08-24 16:20:30 -07:00

mmu_notifier.c

mmu_notifiers: rename invalidate_range notifier

2023-08-18 10:12:41 -07:00

mmzone.c

mm: remove page_cpupid_xchg_last()

2023-10-25 16:47:13 -07:00

mprotect.c

mm: mprotect: use a folio in change_pte_range()

2023-10-25 16:47:12 -07:00

mremap.c

mm: abstract VMA merge and extend into vma_merge_extend() helper

2023-10-18 14:34:18 -07:00

msync.c

…

nommu.c

Many singleton patches against the MM code. The patch series which are

2023-11-02 19:38:47 -10:00

oom_kill.c

mm/oom_killer: simplify OOM killer info dump helper

2023-10-25 16:47:10 -07:00

page_alloc.c

mm: add page_rmappable_folio() wrapper

2023-10-25 16:47:16 -07:00

page_counter.c

…

page_ext.c

mm/page_ext: move functions around for minor cleanups to page_ext

2023-08-18 10:12:31 -07:00

page_idle.c

mm: page_idle: convert page idle to use a folio

2023-01-18 17:12:52 -08:00

page_io.c

mm: memcg: add THP swap out info for anonymous reclaim

2023-10-04 10:32:27 -07:00

page_isolation.c

mm/hugetlb: get rid of page_hstate()

2023-08-18 10:12:39 -07:00

page_owner.c

mm/page_owner: remove free_ts from page_owner output

2023-10-18 14:34:19 -07:00

page_poison.c

mm/page_poison: remove unused page_ext.h from page_poison

2023-08-21 13:37:30 -07:00

page_reporting.c

mm, treewide: redefine MAX_ORDER sanely

2023-04-05 19:42:46 -07:00

page_reporting.h

…

page_table_check.c

mm: convert page_table_check_pte_set() to page_table_check_ptes_set()

2023-08-24 16:20:18 -07:00

page_vma_mapped.c

mm: correct stale comment of function check_pte

2023-08-18 10:12:13 -07:00

page-writeback.c

mm: use folio_xor_flags_has_waiters() in folio_end_writeback()

2023-10-18 14:34:17 -07:00

pagewalk.c

mm/pagewalk: fix bootstopping regression from extra pte_unmap()

2023-09-02 08:39:21 -07:00

percpu-internal.h

percpu-internal/pcpu_chunk: re-layout pcpu_chunk structure to reduce false sharing

2023-06-19 16:19:29 -07:00

percpu-km.c

…

percpu-stats.c

…

percpu-vm.c

…

percpu.c

Many singleton patches against the MM code. The patch series which are

2023-11-02 19:38:47 -10:00

pgalloc-track.h

…

pgtable-generic.c

mm/pgtable: notes on pte_offset_map[_lock]()

2023-08-18 10:12:25 -07:00

process_vm_access.c

mm/gup: remove unused vmas parameter from pin_user_pages_remote()

2023-06-09 16:25:25 -07:00

ptdump.c

mm: ptdump should use ptep_get_lockless()

2023-06-19 16:19:24 -07:00

readahead.c

vfs: fix readahead(2) on block devices

2023-10-19 11:02:49 +02:00

rmap.c

mm/rmap: convert page_move_anon_rmap() to folio_move_anon_rmap()

2023-10-18 14:34:14 -07:00

rodata_test.c

mm/rodata_test: use PAGE_ALIGNED() helper

2022-10-03 14:03:05 -07:00

secretmem.c

mm/secretmem: use a folio in secretmem_fault()

2023-08-21 13:38:02 -07:00

shmem_quota.c

shmem: Add default quota limit mount options

2023-08-09 09:15:40 +02:00

shmem.c

As usual, lots of singleton and doubleton patches all over the tree and

2023-11-02 20:53:31 -10:00

show_mem.c

mm: refactor si_mem_available()

2023-10-04 10:32:19 -07:00

shrinker_debug.c

mm: shrinker: convert shrinker_rwsem to mutex

2023-10-04 10:32:26 -07:00

shrinker.c

mm: shrinker: convert shrinker_rwsem to mutex

2023-10-04 10:32:26 -07:00

shuffle.c

mm/shuffle: convert module_param_call to module_param_cb

2022-10-03 14:03:07 -07:00

shuffle.h

mm, treewide: redefine MAX_ORDER sanely

2023-04-05 19:42:46 -07:00

slab_common.c

mm/slab: move kmalloc() functions from slab_common.c to slub.c

2023-12-06 11:57:21 +01:00

slab.h

mm/slab: move kmalloc() functions from slab_common.c to slub.c

2023-12-06 11:57:21 +01:00

slub.c

mm/slub: optimize alloc fastpath code layout

2023-12-06 11:57:22 +01:00

sparse-vmemmap.c

mm/vmemmap: allow architectures to override how vmemmap optimization works

2023-08-18 10:12:53 -07:00

sparse.c

mm/sparse: remove redundant judgments from macro for_each_present_section_nr

2023-08-18 10:12:14 -07:00

swap_cgroup.c

mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled

2022-10-03 14:03:36 -07:00

swap_slots.c

…

swap_state.c

mempolicy: alloc_pages_mpol() for NUMA policy without vma

2023-10-25 16:47:16 -07:00

swap.c

mm: remove references to pagevec

2023-06-23 16:59:30 -07:00

swap.h

mempolicy: alloc_pages_mpol() for NUMA policy without vma

2023-10-25 16:47:16 -07:00

swapfile.c

mm/swap: Convert to use bdev_open_by_dev()

2023-10-28 13:29:19 +02:00

truncate.c

- Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list")

2023-08-29 14:25:26 -07:00

usercopy.c

mm: Fix copy_from_user_nofault().

2023-04-12 17:36:23 -07:00

userfaultfd.c

Add x86 shadow stack support

2023-08-31 12:20:12 -07:00

util.c

Many singleton patches against the MM code. The patch series which are

2023-11-02 19:38:47 -10:00

vmalloc.c

mm/vmalloc: fix the unchecked dereference warning in vread_iter()

2023-11-01 12:38:35 -07:00

vmpressure.c

net-memcg: Fix scope of sockmem pressure indicators

2023-08-16 12:21:32 +01:00

vmscan.c

mm: multi-gen LRU: reuse some legacy trace events

2023-10-18 14:34:14 -07:00

vmstat.c

mm: tune PCP high automatically

2023-10-25 16:47:10 -07:00

workingset.c

mm: workingset: dynamically allocate the mm-shadow shrinker

2023-10-04 10:32:24 -07:00

z3fold.c

mm/z3fold: remove obsolete comment for struct z3fold_pool

2023-08-21 13:37:51 -07:00

zbud.c

mm: zswap: remove shrink from zpool interface

2023-06-19 16:19:27 -07:00

zpool.c

mm: zswap: remove shrink from zpool interface

2023-06-19 16:19:27 -07:00

zsmalloc.c

zsmalloc: use copy_page for full page copy

2023-10-18 14:34:16 -07:00

zswap.c

zswap: export compression failure stats

2023-11-01 12:38:35 -07:00