Miaohe Lin
00b0752c7f
mm/memory-failure: fix handling of dissolved but not taken off from buddy pages
...
commit 8cf360b9d6a840700e06864236a01a883b34bbad upstream.
When I did memory failure tests recently, below panic occurs:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
raw: 06fffe0000000000 dead000000000100 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000000009 00000000ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(!PageBuddy(page))
------------[ cut here ]------------
kernel BUG at include/linux/page-flags.h:1009!
invalid opcode: 0000 [#1 ] PREEMPT SMP NOPTI
RIP: 0010:__del_page_from_free_list+0x151/0x180
RSP: 0018:ffffa49c90437998 EFLAGS: 00000046
RAX: 0000000000000035 RBX: 0000000000000009 RCX: ffff8dd8dfd1c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff8dd8dfd1c9c0
RBP: ffffd901233b8000 R08: ffffffffab5511f8 R09: 0000000000008c69
R10: 0000000000003c15 R11: ffffffffab5511f8 R12: ffff8dd8fffc0c80
R13: 0000000000000001 R14: ffff8dd8fffc0c80 R15: 0000000000000009
FS: 00007ff916304740(0000) GS:ffff8dd8dfd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055eae50124c8 CR3: 00000008479e0000 CR4: 00000000000006f0
Call Trace:
<TASK>
__rmqueue_pcplist+0x23b/0x520
get_page_from_freelist+0x26b/0xe40
__alloc_pages_noprof+0x113/0x1120
__folio_alloc_noprof+0x11/0xb0
alloc_buddy_hugetlb_folio.isra.0+0x5a/0x130
__alloc_fresh_hugetlb_folio+0xe7/0x140
alloc_pool_huge_folio+0x68/0x100
set_max_huge_pages+0x13d/0x340
hugetlb_sysctl_handler_common+0xe8/0x110
proc_sys_call_handler+0x194/0x280
vfs_write+0x387/0x550
ksys_write+0x64/0xe0
do_syscall_64+0xc2/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff916114887
RSP: 002b:00007ffec8a2fd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000055eae500e350 RCX: 00007ff916114887
RDX: 0000000000000004 RSI: 000055eae500e390 RDI: 0000000000000003
RBP: 000055eae50104c0 R08: 0000000000000000 R09: 000055eae50104c0
R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000004
R13: 0000000000000004 R14: 00007ff916216b80 R15: 00007ff916216a00
</TASK>
Modules linked in: mce_inject hwpoison_inject
---[ end trace 0000000000000000 ]---
And before the panic, there had an warning about bad page state:
BUG: Bad page state in process page-types pfn:8cee00
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
page_type: 0xffffff7f(buddy)
raw: 06fffe0000000000 ffffd901241c0008 ffffd901240f8008 0000000000000000
raw: 0000000000000000 0000000000000009 00000000ffffff7f 0000000000000000
page dumped because: nonzero mapcount
Modules linked in: mce_inject hwpoison_inject
CPU: 8 PID: 154211 Comm: page-types Not tainted 6.9.0-rc4-00499-g5544ec3178e2-dirty #22
Call Trace:
<TASK>
dump_stack_lvl+0x83/0xa0
bad_page+0x63/0xf0
free_unref_page+0x36e/0x5c0
unpoison_memory+0x50b/0x630
simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
debugfs_attr_write+0x42/0x60
full_proxy_write+0x5b/0x80
vfs_write+0xcd/0x550
ksys_write+0x64/0xe0
do_syscall_64+0xc2/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f189a514887
RSP: 002b:00007ffdcd899718 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f189a514887
RDX: 0000000000000009 RSI: 00007ffdcd899730 RDI: 0000000000000003
RBP: 00007ffdcd8997a0 R08: 0000000000000000 R09: 00007ffdcd8994b2
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcda199a8
R13: 0000000000404af1 R14: 000000000040ad78 R15: 00007f189a7a5040
</TASK>
The root cause should be the below race:
memory_failure
try_memory_failure_hugetlb
me_huge_page
__page_handle_poison
dissolve_free_hugetlb_folio
drain_all_pages -- Buddy page can be isolated e.g. for compaction.
take_page_off_buddy -- Failed as page is not in the buddy list.
-- Page can be putback into buddy after compaction.
page_ref_inc -- Leads to buddy page with refcnt = 1.
Then unpoison_memory() can unpoison the page and send the buddy page back
into buddy list again leading to the above bad page state warning. And
bad_page() will call page_mapcount_reset() to remove PageBuddy from buddy
page leading to later VM_BUG_ON_PAGE(!PageBuddy(page)) when trying to
allocate this page.
Fix this issue by only treating __page_handle_poison() as successful when
it returns 1.
Link: https://lkml.kernel.org/r/20240523071217.1696196-1-linmiaohe@huawei.com
Fixes: ceaf8fbea79a ("mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-21 14:35:59 +02:00
..
2024-03-01 13:26:39 +01:00
2024-04-03 15:19:27 +02:00
2023-12-03 07:32:08 +01:00
2024-06-16 13:41:38 +02:00
2023-04-26 14:28:39 +02:00
2022-08-02 12:34:03 -04:00
2022-08-28 14:02:45 -07:00
2022-09-11 20:25:50 -07:00
2024-06-16 13:41:39 +02:00
2022-03-22 15:57:09 -07:00
2024-04-03 15:19:42 +02:00
2022-06-27 12:52:53 -07:00
2022-09-26 19:46:26 -07:00
2022-03-22 15:57:11 -07:00
2022-04-26 13:36:25 -07:00
2022-11-22 18:50:44 -08:00
2024-01-10 17:10:31 +01:00
2022-10-03 14:02:53 -07:00
2022-09-26 12:14:34 -07:00
2022-07-17 17:14:27 -07:00
2023-07-01 13:16:25 +02:00
2022-10-12 18:51:51 -07:00
2022-09-26 19:46:05 -07:00
2024-06-16 13:41:38 +02:00
2024-05-17 11:55:52 +02:00
2023-09-19 12:27:56 +02:00
2022-08-08 18:06:43 -07:00
2024-06-16 13:41:39 +02:00
2022-10-03 14:03:05 -07:00
2022-09-26 19:46:16 -07:00
2024-01-10 17:10:31 +01:00
2022-06-27 12:22:31 +01:00
2023-07-01 13:16:24 +02:00
2023-06-14 11:15:29 +02:00
2024-01-10 17:10:31 +01:00
2022-10-28 13:37:22 -07:00
2023-03-30 12:49:29 +02:00
2022-06-16 19:48:31 -07:00
2023-06-28 11:12:17 +02:00
2023-08-30 16:11:11 +02:00
2022-10-03 14:03:36 -07:00
2024-03-01 13:26:36 +01:00
2024-03-01 13:26:32 +01:00
2023-06-28 11:12:27 +02:00
2024-01-10 17:10:33 +01:00
2024-06-21 14:35:59 +02:00
2023-03-10 09:34:27 +01:00
2024-04-10 16:28:33 +02:00
2023-11-08 14:11:02 +01:00
2022-06-16 19:48:30 -07:00
2022-11-08 15:57:23 -08:00
2024-04-03 15:19:36 +02:00
2022-11-22 18:50:43 -08:00
2024-05-17 11:55:52 +02:00
2023-03-22 13:34:03 +01:00
2022-09-26 19:46:27 -07:00
2022-09-26 19:46:09 -07:00
2022-10-03 14:02:43 -07:00
2023-11-08 14:11:03 +01:00
2022-11-30 14:49:42 -08:00
2022-04-21 20:01:10 -07:00
2022-09-26 19:46:09 -07:00
2022-10-12 15:56:46 -07:00
2023-02-09 11:28:22 +01:00
2022-09-26 19:46:25 -07:00
2023-07-05 18:27:37 +01:00
2022-09-26 19:46:29 -07:00
2024-04-03 15:19:42 +02:00
2022-09-11 20:26:01 -07:00
2022-11-22 18:50:41 -08:00
2022-05-19 14:08:54 -07:00
2023-02-09 11:28:04 +01:00
2022-10-28 13:37:22 -07:00
2022-10-03 14:03:03 -07:00
2023-06-14 11:15:29 +02:00
2022-09-26 19:46:05 -07:00
2024-02-23 09:12:32 +01:00
2022-10-10 17:53:04 -07:00
2022-05-13 07:20:18 -07:00
2022-07-17 17:14:47 -07:00
2024-06-16 13:41:38 +02:00
2023-02-09 11:28:04 +01:00
2022-09-03 10:13:13 -07:00
2024-05-17 11:56:21 +02:00
2023-03-10 09:34:25 +01:00
2022-10-03 14:03:05 -07:00
2022-10-03 14:03:36 -07:00
2023-12-20 17:00:26 +01:00
2023-02-22 12:59:46 +01:00
2022-10-03 14:03:07 -07:00
2023-10-06 14:57:03 +02:00
2023-03-30 12:49:23 +02:00
2022-10-10 17:53:04 -07:00
2022-09-29 11:30:55 +02:00
2022-10-11 17:42:55 -06:00
2022-08-08 18:06:42 -07:00
2024-01-31 16:17:02 -08:00
2022-10-03 14:03:36 -07:00
2022-10-03 14:02:46 -07:00
2022-10-03 14:02:51 -07:00
2022-10-03 14:02:45 -07:00
2024-03-01 13:26:32 +01:00
2024-04-03 15:19:32 +02:00
2024-01-10 17:10:31 +01:00
2023-06-28 11:12:17 +02:00
2024-02-23 09:12:51 +01:00
2023-09-13 09:42:59 +02:00
2024-06-21 14:35:41 +02:00
2023-09-13 09:42:33 +02:00
2024-04-03 15:19:42 +02:00
2022-10-10 17:53:04 -07:00
2023-12-20 17:00:26 +01:00
2022-08-02 12:34:03 -04:00
2023-08-23 17:52:40 +02:00
2024-03-01 13:26:39 +01:00