linux

iv/linux

History

Domenico Cerasuolo 04fc781608 mm: fix zswap writeback race condition The zswap writeback mechanism can cause a race condition resulting in memory corruption, where a swapped out page gets swapped in with data that was written to a different page. The race unfolds like this: 1. a page with data A and swap offset X is stored in zswap 2. page A is removed off the LRU by zpool driver for writeback in zswap-shrink work, data for A is mapped by zpool driver 3. user space program faults and invalidates page entry A, offset X is considered free 4. kswapd stores page B at offset X in zswap (zswap could also be full, if so, page B would then be IOed to X, then skip step 5.) 5. entry A is replaced by B in tree->rbroot, this doesn't affect the local reference held by zswap-shrink work 6. zswap-shrink work writes back A at X, and frees zswap entry A 7. swapin of slot X brings A in memory instead of B The fix: Once the swap page cache has been allocated (case ZSWAP_SWAPCACHE_NEW), zswap-shrink work just checks that the local zswap_entry reference is still the same as the one in the tree. If it's not the same it means that it's either been invalidated or replaced, in both cases the writeback is aborted because the local entry contains stale data. Reproducer: I originally found this by running `stress` overnight to validate my work on the zswap writeback mechanism, it manifested after hours on my test machine. The key to make it happen is having zswap writebacks, so whatever setup pumps /sys/kernel/debug/zswap/written_back_pages should do the trick. In order to reproduce this faster on a vm, I setup a system with ~100M of available memory and a 500M swap file, then running `stress --vm 1 --vm-bytes 300000000 --vm-stride 4000` makes it happen in matter of tens of minutes. One can speed things up even more by swinging /sys/module/zswap/parameters/max_pool_percent up and down between, say, 20 and 1; this makes it reproduce in tens of seconds. It's crucial to set `--vm-stride` to something other than 4096 otherwise `stress` won't realize that memory has been corrupted because all pages would have the same data. Link: https://lkml.kernel.org/r/20230503151200.19707-1-cerasuolodomenico@gmail.com Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Chris Li (Google) <chrisl@kernel.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>		2023-05-17 15:24:33 -07:00
..
damon	mm/damon/paddr: fix missing folio_sz update in damon_pa_young()	2023-05-02 17:21:49 -07:00
kasan	kasan: hw_tags: avoid invalid virt_to_page()	2023-05-02 17:23:27 -07:00
kfence	mm: kfence: fix false positives on big endian	2023-05-17 15:24:33 -07:00
kmsan	printk: export console trace point for kcsan/kasan/kfence/kmsan	2023-04-18 16:30:11 -07:00
backing-dev.c	- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of	2023-04-27 19:42:02 -07:00
balloon_compaction.c
bootmem_info.c
cma_debug.c
cma_sysfs.c	mm: cma: make kobj_type structure constant	2023-03-28 16:20:06 -07:00
cma.c	mm: move most of core MM initialization to mm/mm_init.c	2023-04-05 19:42:52 -07:00
cma.h
compaction.c	- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of	2023-04-27 19:42:02 -07:00
debug_page_ref.c
debug_vm_pgtable.c	mm, treewide: redefine MAX_ORDER sanely	2023-04-05 19:42:46 -07:00
debug.c	mm/debug: use %pGt to display page_type in dump_page()	2023-03-28 16:20:09 -07:00
dmapool_test.c	dmapool: add alloc/free performance test	2023-04-05 19:42:38 -07:00
dmapool.c	dmapool: link blocks across pages	2023-05-06 10:33:38 -07:00
early_ioremap.c
fadvise.c	mm: support POSIX_FADV_NOREUSE	2023-01-18 17:12:57 -08:00
failslab.c	mm: fix unexpected changes to {failslab\|fail_page_alloc}.attr	2022-11-22 18:50:44 -08:00
filemap.c	filemap: Handle error return from __filemap_get_folio()	2023-05-06 10:08:59 -07:00
folio-compat.c	- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of	2023-04-27 19:42:02 -07:00
frontswap.c
gup_test.c	mm/gup_test: free memory allocated via kvcalloc() using kvfree()	2022-12-15 16:37:48 -08:00
gup_test.h	mm/gup_test: start/stop/read functionality for PIN LONGTERM test	2022-11-08 17:37:15 -08:00
gup.c	x86-64: make access_ok() independent of LAM	2023-05-03 10:37:22 -07:00
highmem.c	highmem: fix kmap_to_page() for kmap_local_page() addresses	2022-10-12 18:51:51 -07:00
hmm.c	mm/hugetlb: make walk_hugetlb_range() safe to pmd unshare	2023-01-18 17:12:39 -08:00
huge_memory.c	mm: don't check VMA write permissions if the PTE/PMD indicates write permissions	2023-04-21 14:52:03 -07:00
hugetlb_cgroup.c	mm/hugetlb: increase use of folios in alloc_huge_page()	2023-02-13 15:54:27 -08:00
hugetlb_vmemmap.c	mm, page_alloc: use check_pages_enabled static key to check tail pages	2023-04-18 16:29:54 -07:00
hugetlb_vmemmap.h
hugetlb.c	- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of	2023-04-27 19:42:02 -07:00
hwpoison-inject.c
init-mm.c	IOMMU Updates for Linux 6.4	2023-04-30 13:00:38 -07:00
internal.h	mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()	2023-04-21 14:52:02 -07:00
interval_tree.c
io-mapping.c
ioremap.c
Kconfig	- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of	2023-04-27 19:42:02 -07:00
Kconfig.debug	mm: change per-VMA lock statistics to be disabled by default	2023-05-02 17:23:28 -07:00
khugepaged.c	mm/khugepaged: fix conflicting mods to collapse_file()	2023-04-27 13:42:16 -07:00
kmemleak.c	lib/stackdepot, mm: rename stack_depot_want_early_init	2023-02-16 20:43:49 -08:00
ksm.c	mm/ksm: move disabling KSM from s390/gmap code to KSM code	2023-05-02 17:21:50 -07:00
list_lru.c
maccess.c	mm: Fix copy_from_user_nofault().	2023-04-12 17:36:23 -07:00
madvise.c	Add support for new Linear Address Masking CPU feature. This is similar	2023-04-28 09:43:49 -07:00
Makefile	- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of	2023-04-27 19:42:02 -07:00
mapping_dirty_helpers.c	mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export	2023-02-02 22:32:54 -08:00
memblock.c	mm: avoid passing 0 to __ffs()	2023-04-18 16:29:42 -07:00
memcontrol.c	memcg: page_cgroup_ino() get memcg from the page's folio	2023-04-18 16:30:09 -07:00
memfd.c	memfd: pass argument of memfd_fcntl as int	2023-04-18 16:30:11 -07:00
memory_hotplug.c	mm: avoid passing 0 to __ffs()	2023-04-18 16:29:42 -07:00
memory-failure.c	- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of	2023-04-27 19:42:02 -07:00
memory-tiers.c	memory tier: release the new_memtier in find_create_memory_tier()	2023-02-09 16:51:40 -08:00
memory.c	mm: do not increment pgfault stats when page fault handler retries	2023-04-21 14:52:04 -07:00
mempolicy.c	mm/mempolicy: correctly update prev when policy is equal on mbind	2023-05-02 17:23:27 -07:00
mempool.c	mempool: do not use ksize() for poisoning	2022-11-30 15:58:41 -08:00
memremap.c	mm/memremap.c: fix outdated comment in devm_memremap_pages	2023-02-09 16:51:46 -08:00
memtest.c	mm/memtest: add results of early memtest to /proc/meminfo	2023-04-05 19:42:55 -07:00
migrate_device.c	mm: change to return bool for isolate_lru_page()	2023-02-20 12:46:17 -08:00
migrate.c	Add support for new Linear Address Masking CPU feature. This is similar	2023-04-28 09:43:49 -07:00
mincore.c	mm: return an ERR_PTR from __filemap_get_folio	2023-04-05 19:42:42 -07:00
mlock.c	mm: mlock: use folios_put() in mlock_folio_batch()	2023-04-18 16:29:53 -07:00
mm_init.c	mm/vmemmap/devdax: fix kernel crash when probing devdax devices	2023-04-18 16:30:09 -07:00
mm_slot.h
mmap_lock.c
mmap.c	mm/mmap/vma_merge: always check invariants	2023-05-06 10:10:07 -07:00
mmu_gather.c	mm: prefer xxx_page() alloc/free functions for order-0 pages	2023-03-28 16:20:16 -07:00
mmu_notifier.c	mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export	2023-02-02 22:32:54 -08:00
mmzone.c
mprotect.c	mm/userfaultfd: don't consider uffd-wp bit of writable migration entries	2023-04-18 16:29:53 -07:00
mremap.c	mm/mremap: write-lock VMA while remapping it to a new address range	2023-04-05 20:02:58 -07:00
msync.c
nommu.c	mm: vmalloc: convert vread() to vread_iter()	2023-04-05 19:42:57 -07:00
oom_kill.c	mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export	2023-02-02 22:32:54 -08:00
page_alloc.c	- Some DAMON cleanups from Kefeng Wang	2023-05-04 13:09:43 -07:00
page_counter.c
page_ext.c	mm/page_ext: init page_ext early if there are no deferred struct pages	2023-02-02 22:33:22 -08:00
page_idle.c	mm: page_idle: convert page idle to use a folio	2023-01-18 17:12:52 -08:00
page_io.c	- Daniel Verkamp has contributed a memfd series ("mm/memfd: add	2023-02-23 17:09:35 -08:00
page_isolation.c	mm, treewide: redefine MAX_ORDER sanely	2023-04-05 19:42:46 -07:00
page_owner.c	mm, treewide: redefine MAX_ORDER sanely	2023-04-05 19:42:46 -07:00
page_poison.c
page_reporting.c	mm, treewide: redefine MAX_ORDER sanely	2023-04-05 19:42:46 -07:00
page_reporting.h
page_table_check.c	mm/page_ext: do not allocate space for page_ext->flags if not needed	2023-02-02 22:33:11 -08:00
page_vma_mapped.c	mm/hugetlb: introduce hugetlb_walk()	2023-01-18 17:12:39 -08:00
page-writeback.c	mm,jfs: move write_one_page/folio_write_one to jfs	2023-03-28 16:20:14 -07:00
pagewalk.c	mm/hugetlb: introduce hugetlb_walk()	2023-01-18 17:12:39 -08:00
percpu-internal.h	mm: percpu: fix incorrect size in pcpu_obj_full_size()	2023-02-16 20:43:55 -08:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c	mm: memcontrol: rename memcg_kmem_enabled()	2023-02-16 20:43:56 -08:00
pgalloc-track.h
pgtable-generic.c	mm: add PTE pointer parameter to flush_tlb_fix_spurious_fault()	2023-03-28 16:20:12 -07:00
process_vm_access.c	use less confusing names for iov_iter direction initializers	2022-11-25 13:01:55 -05:00
ptdump.c
readahead.c	readahead: convert readahead_expand() to use a folio	2023-02-02 22:33:21 -08:00
rmap.c	mm,unmap: avoid flushing TLB in batch if PTE is inaccessible	2023-04-27 13:42:16 -07:00
rodata_test.c	mm/rodata_test: use PAGE_ALIGNED() helper	2022-10-03 14:03:05 -07:00
secretmem.c	- Daniel Verkamp has contributed a memfd series ("mm/memfd: add	2023-02-23 17:09:35 -08:00
shmem.c	- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of	2023-04-27 19:42:02 -07:00
shrinker_debug.c	mm: shrinkers: fix race condition on debugfs cleanup	2023-05-17 15:24:33 -07:00
shuffle.c	mm/shuffle: convert module_param_call to module_param_cb	2022-10-03 14:03:07 -07:00
shuffle.h	mm, treewide: redefine MAX_ORDER sanely	2023-04-05 19:42:46 -07:00
slab_common.c	mm/slab: document kfree() as allowed for kmem_cache_alloc() objects	2023-03-29 10:35:41 +02:00
slab.c	mm: vmscan: refactor updating current->reclaim_state	2023-04-18 16:30:10 -07:00
slab.h	- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of	2023-04-27 19:42:02 -07:00
slub.c	- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of	2023-04-27 19:42:02 -07:00
sparse-vmemmap.c	mm/vmemmap/devdax: fix kernel crash when probing devdax devices	2023-04-18 16:30:09 -07:00
sparse.c	sparse: remove unnecessary 0 values from rc	2023-04-21 14:52:05 -07:00
swap_cgroup.c	mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled	2022-10-03 14:03:36 -07:00
swap_slots.c
swap_state.c	mm: return an ERR_PTR from __filemap_get_folio	2023-04-05 19:42:42 -07:00
swap.c	mm: swap: fix performance regression on sparsetruncate-tiny	2023-04-16 10:41:24 -07:00
swap.h	mm: remove the __swap_writepage return value	2023-02-02 22:33:33 -08:00
swapfile.c	sync mm-stable with mm-hotfixes-stable to pick up depended-upon upstream changes	2023-04-16 12:31:58 -07:00
truncate.c	mm: return an ERR_PTR from __filemap_get_folio	2023-04-05 19:42:42 -07:00
usercopy.c	mm: Fix copy_from_user_nofault().	2023-04-12 17:36:23 -07:00
userfaultfd.c	userfaultfd: use helper function range_in_vma()	2023-04-21 14:52:02 -07:00
util.c	mm: uninline kstrdup()	2023-04-08 13:45:37 -07:00
vmalloc.c	mm: vmalloc: rename addr_to_vb_xarray() function	2023-04-18 16:29:48 -07:00
vmpressure.c
vmscan.c	mm: shrinkers: fix race condition on debugfs cleanup	2023-05-17 15:24:33 -07:00
vmstat.c	mm: introduce per-VMA lock statistics	2023-04-05 20:03:01 -07:00
workingset.c	mm: workingset: update description of the source file	2023-04-18 16:30:11 -07:00
z3fold.c	mm: remove PageMovable export	2023-01-18 17:12:57 -08:00
zbud.c	zpool: clean out dead code	2022-12-11 18:12:10 -08:00
zpool.c	zpool: remove MODULE_LICENSE in non-modules	2023-04-13 13:13:54 -07:00
zsmalloc.c	zsmalloc: move LRU update from zs_map_object() to zs_malloc()	2023-05-17 15:24:33 -07:00
zswap.c	mm: fix zswap writeback race condition	2023-05-17 15:24:33 -07:00