linux

iv/linux

History

Michal Hocko 6a792697a5 memcg: do not drain charge pcp caches on remote isolated cpus Leonardo Bras has noticed that pcp charge cache draining might be disruptive on workloads relying on 'isolated cpus', a feature commonly used on workloads that are sensitive to interruption and context switching such as vRAN and Industrial Control Systems. There are essentially two ways how to approach the issue. We can either allow the pcp cache to be drained on a different rather than a local cpu or avoid remote flushing on isolated cpus. The current pcp charge cache is really optimized for high performance and it always relies to stick with its cpu. That means it only requires local_lock (preempt_disable on !RT) and draining is handed over to pcp WQ to drain locally again. The former solution (remote draining) would require to add an additional locking to prevent local charges from racing with the draining. This adds an atomic operation to otherwise simple arithmetic fast path in the try_charge path. Another concern is that the remote draining can cause a lock contention for the isolated workloads and therefore interfere with it indirectly via user space interfaces. Another option is to avoid draining scheduling on isolated cpus altogether. That means that those remote cpus would keep their charges even after drain_all_stock returns. This is certainly not optimal either but it shouldn't really cause any major problems. In the worst case (many isolated cpus with charges - each of them with MEMCG_CHARGE_BATCH i.e 64 page) the memory consumption of a memcg would be artificially higher than can be immediately used from other cpus. Theoretically a memcg OOM killer could be triggered pre-maturely. Currently it is not really clear whether this is a practical problem though. Tight memcg limit would be really counter productive to cpu isolated workloads pretty much by definition because any memory reclaimed induced by memcg limit could break user space timing expectations as those usually expect execution in the userspace most of the time. Also charges could be left behind on memcg removal. Any future charge on those isolated cpus will drain that pcp cache so this won't be a permanent leak. Considering cons and pros of both approaches this patch is implementing the second option and simply do not schedule remote draining if the target cpu is isolated. This solution is much more simpler. It doesn't add any new locking and it is more more predictable from the user space POV. Should the pre-mature memcg OOM become a real life problem, we can revisit this decision. [akpm@linux-foundation.org: memcontrol.c needs sched/isolation.h] Link: https://lore.kernel.org/oe-kbuild-all/202303180617.7E3aIlHf-lkp@intel.com/ Link: https://lkml.kernel.org/r/20230317134448.11082-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Reported-by: Leonardo Bras <leobras@redhat.com> Acked-by: Shakeel Butt <shakeelb@google.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>		2023-04-18 16:29:43 -07:00
..
damon	mm/damon/sysfs: make more kobj_type structures constant	2023-04-05 19:42:59 -07:00
kasan	kasan: suppress recursive reports for HW_TAGS	2023-04-05 19:42:43 -07:00
kfence	mm: kfence: fix handling discontiguous page	2023-03-28 15:24:32 -07:00
kmsan	sync mm-stable with mm-hotfixes-stable to pick up depended-upon upstream changes	2023-04-18 14:53:49 -07:00
backing-dev.c	writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs	2023-04-16 10:41:26 -07:00
balloon_compaction.c
bootmem_info.c
cma_debug.c
cma_sysfs.c	mm: cma: make kobj_type structure constant	2023-03-28 16:20:06 -07:00
cma.c	mm: move most of core MM initialization to mm/mm_init.c	2023-04-05 19:42:52 -07:00
cma.h
compaction.c	mm: compaction: fix the possible deadlock when isolating hugetlb pages	2023-04-05 19:42:50 -07:00
debug_page_ref.c
debug_vm_pgtable.c	mm, treewide: redefine MAX_ORDER sanely	2023-04-05 19:42:46 -07:00
debug.c	mm/debug: use %pGt to display page_type in dump_page()	2023-03-28 16:20:09 -07:00
dmapool_test.c	dmapool: add alloc/free performance test	2023-04-05 19:42:38 -07:00
dmapool.c	dmapool: create/destroy cleanup	2023-04-05 19:42:41 -07:00
early_ioremap.c
fadvise.c	mm: support POSIX_FADV_NOREUSE	2023-01-18 17:12:57 -08:00
failslab.c	mm: fix unexpected changes to {failslab\|fail_page_alloc}.attr	2022-11-22 18:50:44 -08:00
filemap.c	mm: return an ERR_PTR from __filemap_get_folio	2023-04-05 19:42:42 -07:00
folio-compat.c	mm: return an ERR_PTR from __filemap_get_folio	2023-04-05 19:42:42 -07:00
frontswap.c
gup_test.c	mm/gup_test: free memory allocated via kvcalloc() using kvfree()	2022-12-15 16:37:48 -08:00
gup_test.h	mm/gup_test: start/stop/read functionality for PIN LONGTERM test	2022-11-08 17:37:15 -08:00
gup.c	mm/gup.c: fix typo in comments	2023-03-28 16:20:14 -07:00
highmem.c	highmem: fix kmap_to_page() for kmap_local_page() addresses	2022-10-12 18:51:51 -07:00
hmm.c	mm/hugetlb: make walk_hugetlb_range() safe to pmd unshare	2023-01-18 17:12:39 -08:00
huge_memory.c	sync mm-stable with mm-hotfixes-stable to pick up depended-upon upstream changes	2023-04-16 12:31:58 -07:00
hugetlb_cgroup.c	mm/hugetlb: increase use of folios in alloc_huge_page()	2023-02-13 15:54:27 -08:00
hugetlb_vmemmap.c	mm: prefer xxx_page() alloc/free functions for order-0 pages	2023-03-28 16:20:16 -07:00
hugetlb_vmemmap.h
hugetlb.c	sync mm-stable with mm-hotfixes-stable to pick up depended-upon upstream changes	2023-04-16 12:31:58 -07:00
hwpoison-inject.c	mm/hwpoison: add __init/__exit annotations to module init/exit funcs	2022-10-03 14:03:05 -07:00
init-mm.c	mm: add per-VMA lock and helper functions to control it	2023-04-05 20:02:57 -07:00
internal.h	mm: conditionally write-lock VMA in free_pgtables	2023-04-05 20:02:59 -07:00
interval_tree.c
io-mapping.c
ioremap.c
Kconfig	mm: introduce CONFIG_PER_VMA_LOCK	2023-04-05 20:02:56 -07:00
Kconfig.debug	mm: introduce per-VMA lock statistics	2023-04-05 20:03:01 -07:00
khugepaged.c	mm: khugepaged: fix kernel BUG in hpage_collapse_scan_file()	2023-04-18 16:29:43 -07:00
kmemleak.c	lib/stackdepot, mm: rename stack_depot_want_early_init	2023-02-16 20:43:49 -08:00
ksm.c	mm: add tracepoints to ksm	2023-03-28 16:20:08 -07:00
list_lru.c
maccess.c	maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault()	2022-11-11 11:44:46 -08:00
madvise.c	- Daniel Verkamp has contributed a memfd series ("mm/memfd: add	2023-02-23 17:09:35 -08:00
Makefile	dmapool: add alloc/free performance test	2023-04-05 19:42:38 -07:00
mapping_dirty_helpers.c	mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export	2023-02-02 22:32:54 -08:00
memblock.c	mm: avoid passing 0 to __ffs()	2023-04-18 16:29:42 -07:00
memcontrol.c	memcg: do not drain charge pcp caches on remote isolated cpus	2023-04-18 16:29:43 -07:00
memfd.c	mm/memfd: add write seals when apply SEAL_EXEC to executable memfd	2023-01-18 17:12:37 -08:00
memory_hotplug.c	mm: avoid passing 0 to __ffs()	2023-04-18 16:29:42 -07:00
memory-failure.c	mm: memory-failure: directly use IS_ENABLED(CONFIG_HWPOISON_INJECT)	2023-03-28 16:20:17 -07:00
memory-tiers.c	memory tier: release the new_memtier in find_create_memory_tier()	2023-02-09 16:51:40 -08:00
memory.c	sync mm-stable with mm-hotfixes-stable to pick up depended-upon upstream changes	2023-04-16 12:31:58 -07:00
mempolicy.c	mm/mempolicy: fix use-after-free of VMA iterator	2023-04-16 10:41:25 -07:00
mempool.c	mempool: do not use ksize() for poisoning	2022-11-30 15:58:41 -08:00
memremap.c	mm/memremap.c: fix outdated comment in devm_memremap_pages	2023-02-09 16:51:46 -08:00
memtest.c	mm/memtest: add results of early memtest to /proc/meminfo	2023-04-05 19:42:55 -07:00
migrate_device.c	mm: change to return bool for isolate_lru_page()	2023-02-20 12:46:17 -08:00
migrate.c	mm/migrate: drop pte_mkhuge() in remove_migration_pte()	2023-03-28 16:20:11 -07:00
mincore.c	mm: return an ERR_PTR from __filemap_get_folio	2023-04-05 19:42:42 -07:00
mlock.c	mm: introduce vm_flags_reset_once to replace WRITE_ONCE vm_flags updates	2023-02-09 16:51:41 -08:00
mm_init.c	mm: make arch_has_descending_max_zone_pfns() static	2023-04-18 16:29:42 -07:00
mm_slot.h	mm: introduce common struct mm_slot	2022-10-03 14:02:43 -07:00
mmap_lock.c
mmap.c	sync mm-stable with mm-hotfixes-stable to pick up depended-upon upstream changes	2023-04-18 14:53:49 -07:00
mmu_gather.c	mm: prefer xxx_page() alloc/free functions for order-0 pages	2023-03-28 16:20:16 -07:00
mmu_notifier.c	mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export	2023-02-02 22:32:54 -08:00
mmzone.c	mm: multi-gen LRU: groundwork	2022-09-26 19:46:09 -07:00
mprotect.c	sync mm-stable with mm-hotfixes-stable to pick up depended-upon upstream changes	2023-04-16 12:31:58 -07:00
mremap.c	mm/mremap: write-lock VMA while remapping it to a new address range	2023-04-05 20:02:58 -07:00
msync.c	mm/msync: use vma_find() instead of vma linked list	2022-09-26 19:46:25 -07:00
nommu.c	mm: vmalloc: convert vread() to vread_iter()	2023-04-05 19:42:57 -07:00
oom_kill.c	mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export	2023-02-02 22:32:54 -08:00
page_alloc.c	sync mm-stable with mm-hotfixes-stable to pick up depended-upon upstream changes	2023-04-18 14:53:49 -07:00
page_counter.c
page_ext.c	mm/page_ext: init page_ext early if there are no deferred struct pages	2023-02-02 22:33:22 -08:00
page_idle.c	mm: page_idle: convert page idle to use a folio	2023-01-18 17:12:52 -08:00
page_io.c	- Daniel Verkamp has contributed a memfd series ("mm/memfd: add	2023-02-23 17:09:35 -08:00
page_isolation.c	mm, treewide: redefine MAX_ORDER sanely	2023-04-05 19:42:46 -07:00
page_owner.c	mm, treewide: redefine MAX_ORDER sanely	2023-04-05 19:42:46 -07:00
page_poison.c
page_reporting.c	mm, treewide: redefine MAX_ORDER sanely	2023-04-05 19:42:46 -07:00
page_reporting.h
page_table_check.c	mm/page_ext: do not allocate space for page_ext->flags if not needed	2023-02-02 22:33:11 -08:00
page_vma_mapped.c	mm/hugetlb: introduce hugetlb_walk()	2023-01-18 17:12:39 -08:00
page-writeback.c	mm,jfs: move write_one_page/folio_write_one to jfs	2023-03-28 16:20:14 -07:00
pagewalk.c	mm/hugetlb: introduce hugetlb_walk()	2023-01-18 17:12:39 -08:00
percpu-internal.h	mm: percpu: fix incorrect size in pcpu_obj_full_size()	2023-02-16 20:43:55 -08:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c	mm: memcontrol: rename memcg_kmem_enabled()	2023-02-16 20:43:56 -08:00
pgalloc-track.h
pgtable-generic.c	mm: add PTE pointer parameter to flush_tlb_fix_spurious_fault()	2023-03-28 16:20:12 -07:00
process_vm_access.c	use less confusing names for iov_iter direction initializers	2022-11-25 13:01:55 -05:00
ptdump.c
readahead.c	readahead: convert readahead_expand() to use a folio	2023-02-02 22:33:21 -08:00
rmap.c	mm/khugepaged: write-lock VMA while collapsing a huge page	2023-04-05 20:02:58 -07:00
rodata_test.c	mm/rodata_test: use PAGE_ALIGNED() helper	2022-10-03 14:03:05 -07:00
secretmem.c	- Daniel Verkamp has contributed a memfd series ("mm/memfd: add	2023-02-23 17:09:35 -08:00
shmem.c	mm: userfaultfd: combine 'mode' and 'wp_copy' arguments	2023-04-05 19:42:48 -07:00
shrinker_debug.c	mm: shrinkers: convert shrinker_rwsem to mutex	2023-03-28 16:20:17 -07:00
shuffle.c	mm/shuffle: convert module_param_call to module_param_cb	2022-10-03 14:03:07 -07:00
shuffle.h	mm, treewide: redefine MAX_ORDER sanely	2023-04-05 19:42:46 -07:00
slab_common.c	mm/kasan: simplify and refine kasan_cache code	2023-01-18 17:12:55 -08:00
slab.c	mm, treewide: redefine MAX_ORDER sanely	2023-04-05 19:42:46 -07:00
slab.h	mm: move kmem_cache_init() declaration to mm/slab.h	2023-04-05 19:42:54 -07:00
slob.c	Merge branch 'slab/for-6.1/kmalloc_size_roundup' into slab/for-next	2022-09-29 11:30:55 +02:00
slub.c	mm, treewide: redefine MAX_ORDER sanely	2023-04-05 19:42:46 -07:00
sparse-vmemmap.c	mm/sparse-vmemmap: generalise vmemmap_populate_hugepages()	2022-12-11 18:12:12 -08:00
sparse.c	mm/sparse: fix "unused function 'pgdat_to_phys'" warning	2023-02-02 22:33:29 -08:00
swap_cgroup.c	mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled	2022-10-03 14:03:36 -07:00
swap_slots.c	mm/swap: convert put_swap_page() to put_swap_folio()	2022-10-03 14:02:46 -07:00
swap_state.c	mm: return an ERR_PTR from __filemap_get_folio	2023-04-05 19:42:42 -07:00
swap.c	mm: swap: fix performance regression on sparsetruncate-tiny	2023-04-16 10:41:24 -07:00
swap.h	mm: remove the __swap_writepage return value	2023-02-02 22:33:33 -08:00
swapfile.c	sync mm-stable with mm-hotfixes-stable to pick up depended-upon upstream changes	2023-04-16 12:31:58 -07:00
truncate.c	mm: return an ERR_PTR from __filemap_get_folio	2023-04-05 19:42:42 -07:00
usercopy.c	mm: use kstrtobool() instead of strtobool()	2022-11-30 15:58:45 -08:00
userfaultfd.c	mm: userfaultfd: add UFFDIO_CONTINUE_MODE_WP to install WP PTEs	2023-04-05 19:42:48 -07:00
util.c	mm: fix typo in __vm_enough_memory warning	2023-02-13 15:54:33 -08:00
vmalloc.c	sync mm-stable with mm-hotfixes-stable to pick up depended-upon upstream changes	2023-04-18 14:53:49 -07:00
vmpressure.c
vmscan.c	mm, treewide: redefine MAX_ORDER sanely	2023-04-05 19:42:46 -07:00
vmstat.c	mm: introduce per-VMA lock statistics	2023-04-05 20:03:01 -07:00
workingset.c	swap_state: update shadow_nodes for anonymous page	2023-02-02 22:33:24 -08:00
z3fold.c	mm: remove PageMovable export	2023-01-18 17:12:57 -08:00
zbud.c	zpool: clean out dead code	2022-12-11 18:12:10 -08:00
zpool.c	zpool: clean out dead code	2022-12-11 18:12:10 -08:00
zsmalloc.c	zsmalloc: reset compaction source zspage pointer after putback_zspage()	2023-04-18 16:29:42 -07:00
zswap.c	mm/zswap: try to avoid worst-case scenario on same element pages	2023-03-28 16:20:07 -07:00