linux

iv/linux

History

Hugh Dickins 073861ed77 mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)

Twice now, when exercising ext4 looped on shmem huge pages, I have crashed
on the PF_ONLY_HEAD check inside PageWaiters(): ext4_finish_bio() calling
end_page_writeback() calling wake_up_page() on tail of a shmem huge page,
no longer an ext4 page at all.

The problem is that PageWriteback is not accompanied by a page reference
(as the NOTE at the end of test_clear_page_writeback() acknowledges): as
soon as TestClearPageWriteback has been done, that page could be removed
from page cache, freed, and reused for something else by the time that
wake_up_page() is reached.

https://lore.kernel.org/linux-mm/20200827122019.GC14765@casper.infradead.org/
Matthew Wilcox suggested avoiding or weakening the PageWaiters() tail
check; but I'm paranoid about even looking at an unreferenced struct page,
lest its memory might itself have already been reused or hotremoved (and
wake_up_page_bit() may modify that memory with its ClearPageWaiters()).

Then on crashing a second time, realized there's a stronger reason against
that approach.  If my testing just occasionally crashes on that check,
when the page is reused for part of a compound page, wouldn't it be much
more common for the page to get reused as an order-0 page before reaching
wake_up_page()?  And on rare occasions, might that reused page already be
marked PageWriteback by its new user, and already be waited upon?  What
would that look like?

It would look like BUG_ON(PageWriteback) after wait_on_page_writeback()
in write_cache_pages() (though I have never seen that crash myself).

Matthew Wilcox explaining this to himself:
 "page is allocated, added to page cache, dirtied, writeback starts,

  --- thread A ---
  filesystem calls end_page_writeback()
        test_clear_page_writeback()
  --- context switch to thread B ---
  truncate_inode_pages_range() finds the page, it doesn't have writeback set,
  we delete it from the page cache.  Page gets reallocated, dirtied, writeback
  starts again.  Then we call write_cache_pages(), see
  PageWriteback() set, call wait_on_page_writeback()
  --- context switch back to thread A ---
  wake_up_page(page, PG_writeback);
  ... thread B is woken, but because the wakeup was for the old use of
  the page, PageWriteback is still set.

  Devious"

And prior to 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
this would have been much less likely: before that, wake_page_function()'s
non-exclusive case would stop walking and not wake if it found Writeback
already set again; whereas now the non-exclusive case proceeds to wake.

I have not thought of a fix that does not add a little overhead: the
simplest fix is for end_page_writeback() to get_page() before calling
test_clear_page_writeback(), then put_page() after wake_up_page().

Was there a chance of missed wakeups before, since a page freed before
reaching wake_up_page() would have PageWaiters cleared?  I think not,
because each waiter does hold a reference on the page.  This bug comes
when the old use of the page, the one we do TestClearPageWriteback on,
had *no* waiters, so no additional page reference beyond the page cache
(and whoever racily freed it).  The reuse of the page has a waiter
holding a reference, and its own PageWriteback set; but the belated
wake_up_page() has woken the reuse to hit that BUG_ON(PageWriteback).

Reported-by: syzbot+3622cea378100f45d59f@syzkaller.appspotmail.com
Reported-by: Qian Cai <cai@lca.pw>
Fixes: 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2020-11-24 15:23:19 -08:00

kasan

mm: kasan: do not panic if both panic_on_warn and kasan_multishot set

2020-10-13 18:38:32 -07:00

backing-dev.c

bdi: replace BDI_CAP_NO_{WRITEBACK,ACCT_DIRTY} with a single flag

2020-09-24 13:43:39 -06:00

balloon_compaction.c

mm/balloon_compaction: suppress allocation warnings

2019-09-04 07:42:01 -04:00

cleancache.c

Driver Core and debugfs changes for 5.3-rc1

2019-07-12 12:24:03 -07:00

cma_debug.c

debugfs: make sure we can remove u32_array files cleanly

2020-07-10 13:54:00 -07:00

cma.c

cma: don't quit at first error when activating reserved areas

2020-08-12 10:57:57 -07:00

cma.h

mm: cma: use CMA_MAX_NAME to define the length of cma name array

2020-09-01 09:19:43 +02:00

compaction.c

mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate

2020-11-14 11:26:03 -08:00

debug_page_ref.c

…

debug_vm_pgtable.c

mm/debug_vm_pgtable: avoid doing memory allocation with pgtable_t mapped.

2020-10-16 11:11:14 -07:00

debug.c

mm, dump_page: rename head_mapcount() --> head_compound_mapcount()

2020-10-13 18:38:29 -07:00

dmapool.c

mm/dmapool.c: replace hard coded function name with __func__

2020-10-13 18:38:32 -07:00

early_ioremap.c

mm/early_ioremap.c: use %pa to print resource_size_t variables

2020-01-31 10:30:38 -08:00

fadvise.c

mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED

2020-10-13 18:38:29 -07:00

failslab.c

…

filemap.c

mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)

2020-11-24 15:23:19 -08:00

frame_vector.c

mmap locking API: convert mmap_sem comments

2020-06-09 09:39:14 -07:00

frontswap.c

mm/frontswap: mark various intentional data races

2020-08-14 19:56:56 -07:00

gup_benchmark.c

mm/gup_benchmark: take the mmap lock around GUP

2020-10-18 09:27:09 -07:00

gup.c

mm/gup: use unpin_user_pages() in __gup_longterm_locked()

2020-11-14 11:26:03 -08:00

highmem.c

mm/highmem.c: clean up endif comments

2020-10-16 11:11:18 -07:00

hmm.c

mm: do page fault accounting in handle_mm_fault

2020-08-12 10:58:02 -07:00

huge_memory.c

mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()

2020-11-22 10:48:22 -08:00

hugetlb_cgroup.c

hugetlb_cgroup: convert comma to semicolon

2020-08-21 09:52:52 -07:00

hugetlb.c

hugetlbfs: fix anon huge page migration race

2020-11-14 11:26:04 -08:00

hwpoison-inject.c

mm,hwpoison-inject: don't pin for hwpoison_filter

2020-10-16 11:11:16 -07:00

init-mm.c

mmap locking API: add MMAP_LOCK_INITIALIZER

2020-06-09 09:39:14 -07:00

internal.h

mm: rename page_order() to buddy_order()

2020-10-16 11:11:19 -07:00

interval_tree.c

…

ioremap.c

mm: move p?d_alloc_track to separate header file

2020-08-07 11:33:26 -07:00

Kconfig

mm: add a vmap_pfn function

2020-10-18 09:27:10 -07:00

Kconfig.debug

treewide: replace '---help---' in Kconfig files with 'help'

2020-06-14 01:57:21 +09:00

khugepaged.c

mm: remove the now-unnecessary mmget_still_valid() hack

2020-10-16 11:11:22 -07:00

kmemleak.c

mm/kmemleak: rely on rcu for task stack scanning

2020-10-13 18:38:27 -07:00

ksm.c

docs: get rid of :c:type explicit declarations for structs

2020-10-15 07:49:40 +02:00

list_lru.c

mm/list_lru: fix a data race in list_lru_count_one

2020-08-14 19:56:57 -07:00

maccess.c

uaccess: add force_uaccess_{begin,end} helpers

2020-08-12 10:57:59 -07:00

madvise.c

mm: fix madvise WILLNEED performance problem

2020-11-22 10:48:22 -08:00

Makefile

mm,kmemleak-test.c: move kmemleak-test.c to samples dir

2020-10-13 18:38:27 -07:00

mapping_dirty_helpers.c

mm/mapping_dirty_helpers: update huge page-table entry callbacks

2020-04-02 09:35:29 -07:00

memblock.c

memblock: get rid of a :c:type leftover

2020-10-15 07:49:46 +02:00

memcontrol.c

mm: memcg/slab: fix root memcg vmstats

2020-11-22 10:48:22 -08:00

memfd.c

mm: page cache: store only head pages in i_pages

2019-09-24 15:54:08 -07:00

memory_hotplug.c

mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports

2020-11-22 10:48:22 -08:00

memory-failure.c

hugetlbfs: fix anon huge page migration race

2020-11-14 11:26:04 -08:00

memory.c

mm: allow a NULL fn callback in apply_to_page_range

2020-10-18 09:27:10 -07:00

mempolicy.c

mm: mempolicy: fix potential pte_unmap_unlock pte error

2020-11-02 12:14:19 -08:00

mempool.c

mm/mempool: add 'else' to split mutually exclusive case

2020-10-13 18:38:34 -07:00

memremap.c

mm/mremap_pages: fix static key devmap_managed_key updates

2020-11-02 12:14:18 -08:00

memtest.c

…

migrate.c

hugetlbfs: fix anon huge page migration race

2020-11-14 11:26:04 -08:00

mincore.c

mm: factor find_get_incore_page out of mincore_page

2020-10-13 18:38:29 -07:00

mlock.c

mlock: fix unevictable_pgs event counts on THP

2020-09-19 13:13:38 -07:00

mm_init.c

mm: adjust vm_committed_as_batch according to vm overcommit policy

2020-08-07 11:33:26 -07:00

mmap.c

mm/mmap: add inline munmap_vma_range() for code readability

2020-10-18 09:27:09 -07:00

mmu_gather.c

mmap locking API: convert mmap_sem comments

2020-06-09 09:39:14 -07:00

mmu_notifier.c

mm/mmu_notifier: fix mmget() assert in __mmu_interval_notifier_insert

2020-10-16 11:11:17 -07:00

mmzone.c

…

mprotect.c

mm: Introduce arch_validate_flags()

2020-09-04 12:46:07 +01:00

mremap.c

mm/mremap: start addresses are properly aligned

2020-08-07 11:33:27 -07:00

msync.c

mmap locking API: use coccinelle to convert mmap_sem rwsem call sites

2020-06-09 09:39:14 -07:00

nommu.c

mm: remove alloc_vm_area

2020-10-18 09:27:10 -07:00

oom_kill.c

mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary

2020-10-13 18:38:35 -07:00

page_alloc.c

page_frag: Recover from memory pressure

2020-11-18 15:21:56 -08:00

page_counter.c

mm/page_counter: correct the obsolete func name in the comment of page_counter_try_charge()

2020-10-13 18:38:30 -07:00

page_ext.c

mm/page_ext.c: drop pfn_present() check when onlining

2020-04-07 10:43:40 -07:00

page_idle.c

mm/page_idle.c: skip offline pages

2020-06-08 11:05:55 -07:00

page_io.c

mm/page_io.c: remove useless out label in __swap_writepage()

2020-10-13 18:38:30 -07:00

page_isolation.c

mm: rename page_order() to buddy_order()

2020-10-16 11:11:19 -07:00

page_owner.c

mm: rename page_order() to buddy_order()

2020-10-16 11:11:19 -07:00

page_poison.c

mm/page_poison.c: replace bool variable with static key

2020-10-16 11:11:17 -07:00

page_reporting.c

mm: rename page_order() to buddy_order()

2020-10-16 11:11:19 -07:00

page_reporting.h

mm: introduce include/linux/pgtable.h

2020-06-09 09:39:13 -07:00

page_vma_mapped.c

mm: replace hpage_nr_pages with thp_nr_pages

2020-08-14 19:56:56 -07:00

page-writeback.c

mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)

2020-11-24 15:23:19 -08:00

pagewalk.c

mmap locking API: convert mmap_sem comments

2020-06-09 09:39:14 -07:00

percpu-internal.h

mm: memcg/percpu: account percpu memory to memory cgroups

2020-08-12 10:57:55 -07:00

percpu-km.c

mm: memcg/percpu: account percpu memory to memory cgroups

2020-08-12 10:57:55 -07:00

percpu-stats.c

mm: memcg/percpu: account percpu memory to memory cgroups

2020-08-12 10:57:55 -07:00

percpu-vm.c

mm: memcg/percpu: account percpu memory to memory cgroups

2020-08-12 10:57:55 -07:00

percpu.c

percpu: convert flexible array initializers to use struct_size()

2020-10-30 23:02:28 +00:00

pgalloc-track.h

mm: move p?d_alloc_track to separate header file

2020-08-07 11:33:26 -07:00

pgtable-generic.c

mm: introduce include/linux/pgtable.h

2020-06-09 09:39:13 -07:00

process_vm_access.c

mm/process_vm_access: Add missing #include <linux/compat.h>

2020-10-27 12:41:29 -07:00

ptdump.c

mmap locking API: use coccinelle to convert mmap_sem rwsem call sites

2020-06-09 09:39:14 -07:00

readahead.c

mm: use limited read-ahead to satisfy read

2020-10-17 13:49:08 -06:00

rmap.c

hugetlbfs: fix anon huge page migration race

2020-11-14 11:26:04 -08:00

rodata_test.c

mm/rodata_test.c: fix missing function declaration

2020-08-21 09:52:53 -07:00

shmem.c

fs: add a filesystem flag for THPs

2020-10-16 11:11:15 -07:00

shuffle.c

mm: rename page_order() to buddy_order()

2020-10-16 11:11:19 -07:00

shuffle.h

mm/shuffle: remove dynamic reconfiguration

2020-08-07 11:33:29 -07:00

slab_common.c

mm/slab_common.c: delete duplicated word

2020-08-12 10:57:58 -07:00

slab.c

mm: fix some comments formatting

2020-10-16 11:11:19 -07:00

slab.h

mm: kmem: move memcg_kmem_bypass() calls to get_mem/obj_cgroup_from_current()

2020-10-18 09:27:09 -07:00

slob.c

mm: memcg: convert vmstat slab counters to bytes

2020-08-07 11:33:24 -07:00

slub.c

mm/slub: fix panic in slab_alloc_node()

2020-11-14 11:26:03 -08:00

sparse-vmemmap.c

mm/sparse: only sub-section aligned range would be populated

2020-08-07 11:33:27 -07:00

sparse.c

mm/memory_hotplug: guard more declarations by CONFIG_MEMORY_HOTPLUG

2020-10-16 11:11:18 -07:00

swap_cgroup.c

mm: memcontrol: make swap tracking an integral part of memory control

2020-06-03 20:09:48 -07:00

swap_slots.c

mm/swap_slots.c: remove always zero and unused return value of enable_swap_slots_cache()

2020-10-13 18:38:30 -07:00

swap_state.c

mm: fix some broken comments

2020-10-16 11:11:19 -07:00

swap.c

mm: move call to compound_head() in release_pages()

2020-10-13 18:38:33 -07:00

swapfile.c

mm/swapfile.c: fix potential memory leak in sys_swapon

2020-10-13 18:38:30 -07:00

truncate.c

mm/truncate.c: make __invalidate_mapping_pages() static

2020-11-02 12:14:19 -08:00

usercopy.c

mm/usercopy.c: delete duplicated word

2020-08-12 10:57:58 -07:00

userfaultfd.c

mm/vmscan: protect the workingset on anonymous LRU

2020-08-12 10:57:55 -07:00

util.c

mm/util.c: update the kerneldoc for kstrdup_const()

2020-10-16 11:11:17 -07:00

vmacache.c

kernel: better document the use_mm/unuse_mm API contract

2020-06-10 19:14:18 -07:00

vmalloc.c

mm: remove the filename in the top of file comment in vmalloc.c

2020-10-18 09:27:10 -07:00

vmpressure.c

mm: vmpressure: use mem_cgroup_is_root API

2020-04-02 09:35:31 -07:00

vmscan.c

mm/vmscan: fix NR_ISOLATED_FILE corruption on 64-bit

2020-11-14 11:26:03 -08:00

vmstat.c

mm/vmstat.c: use helper macro abs()

2020-10-16 11:11:17 -07:00

workingset.c

XArray updates for 5.9

2020-10-20 14:39:37 -07:00

z3fold.c

mm/z3fold.c: use xx_zalloc instead xx_alloc and memset

2020-10-13 18:38:34 -07:00

zbud.c

mm/zbud: remove redundant initialization

2020-10-13 18:38:34 -07:00

zpool.c

mm/zpool.c: delete duplicated word and fix grammar

2020-08-12 10:57:58 -07:00

zsmalloc.c

zsmalloc: switch from alloc_vm_area to get_vm_area

2020-10-18 09:27:10 -07:00

zswap.c

mm/zswap: allow setting default status, compressor and allocator in Kconfig

2020-04-07 10:43:41 -07:00