69ad4ef868
A page fault was encountered in mpt3sas on a LUN reset error path:
[ 145.763216] mpt3sas_cm1: Task abort tm failed: handle(0x0002),timeout(30) tr_method(0x0) smid(3) msix_index(0)
[ 145.778932] scsi 1:0:0:0: task abort: FAILED scmd(0x0000000024ba29a2)
[ 145.817307] scsi 1:0:0:0: attempting device reset! scmd(0x0000000024ba29a2)
[ 145.827253] scsi 1:0:0:0: [sg1] tag#2 CDB: Receive Diagnostic 1c 01 01 ff fc 00
[ 145.837617] scsi target1:0:0: handle(0x0002), sas_address(0x500605b0000272b9), phy(0)
[ 145.848598] scsi target1:0:0: enclosure logical id(0x500605b0000272b8), slot(0)
[ 149.858378] mpt3sas_cm1: Poll ReplyDescriptor queues for completion of smid(0), task_type(0x05), handle(0x0002)
[ 149.875202] BUG: unable to handle page fault for address: 00000007fffc445d
[ 149.885617] #PF: supervisor read access in kernel mode
[ 149.894346] #PF: error_code(0x0000) - not-present page
[ 149.903123] PGD 0 P4D 0
[ 149.909387] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 149.917417] CPU: 24 PID: 3512 Comm: scsi_eh_1 Kdump: loaded Tainted: G S O 5.10.89-altav-1 #1
[ 149.934327] Hardware name: DDN 200NVX2 /200NVX2-MB , BIOS ATHG2.2.02.01 09/10/2021
[ 149.951871] RIP: 0010:_base_process_reply_queue+0x4b/0x900 [mpt3sas]
[ 149.961889] Code: 0f 84 22 02 00 00 8d 48 01 49 89 fd 48 8d 57 38 f0 0f b1 4f 38 0f 85 d8 01 00 00 49 8b 45 10 45 31 e4 41 8b 55 0c 48 8d 1c d0 <0f> b6 03 83 e0 0f 3c 0f 0f 85 a2 00 00 00 e9 e6 01 00 00 0f b7 ee
[ 149.991952] RSP: 0018:ffffc9000f1ebcb8 EFLAGS: 00010246
[ 150.000937] RAX: 0000000000000055 RBX: 00000007fffc445d RCX: 000000002548f071
[ 150.011841] RDX: 00000000ffff8881 RSI: 0000000000000001 RDI: ffff888125ed50d8
[ 150.022670] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffff7fff
[ 150.033445] R10: ffffc9000f1ebb68 R11: ffffc9000f1ebb60 R12: 0000000000000000
[ 150.044204] R13: ffff888125ed50d8 R14: 0000000000000080 R15: 34cdc00034cdea80
[ 150.054963] FS: 0000000000000000(0000) GS:ffff88dfaf200000(0000) knlGS:0000000000000000
[ 150.066715] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 150.076078] CR2: 00000007fffc445d CR3: 000000012448a006 CR4: 0000000000770ee0
[ 150.086887] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 150.097670] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 150.108323] PKRU: 55555554
[ 150.114690] Call Trace:
[ 150.120497] ? printk+0x48/0x4a
[ 150.127049] mpt3sas_scsih_issue_tm.cold.114+0x2e/0x2b3 [mpt3sas]
[ 150.136453] mpt3sas_scsih_issue_locked_tm+0x86/0xb0 [mpt3sas]
[ 150.145759] scsih_dev_reset+0xea/0x300 [mpt3sas]
[ 150.153891] scsi_eh_ready_devs+0x541/0x9e0 [scsi_mod]
[ 150.162206] ? __scsi_host_match+0x20/0x20 [scsi_mod]
[ 150.170406] ? scsi_try_target_reset+0x90/0x90 [scsi_mod]
[ 150.178925] ? blk_mq_tagset_busy_iter+0x45/0x60
[ 150.186638] ? scsi_try_target_reset+0x90/0x90 [scsi_mod]
[ 150.195087] scsi_error_handler+0x3a5/0x4a0 [scsi_mod]
[ 150.203206] ? __schedule+0x1e9/0x610
[ 150.209783] ? scsi_eh_get_sense+0x210/0x210 [scsi_mod]
[ 150.217924] kthread+0x12e/0x150
[ 150.224041] ? kthread_worker_fn+0x130/0x130
[ 150.231206] ret_from_fork+0x1f/0x30
This is caused by mpt3sas_base_sync_reply_irqs() using an invalid reply_q
pointer outside of the list_for_each_entry() loop. At the end of the full
list traversal the pointer is invalid.
Move the _base_process_reply_queue() call inside of the loop.
Link: https://lore.kernel.org/r/d625deae-a958-0ace-2ba3-0888dd0a415b@ddn.com
Fixes:
|
||
---|---|---|
.. | ||
mpi | ||
Kconfig | ||
Makefile | ||
mpt3sas_base.c | ||
mpt3sas_base.h | ||
mpt3sas_config.c | ||
mpt3sas_ctl.c | ||
mpt3sas_ctl.h | ||
mpt3sas_debug.h | ||
mpt3sas_debugfs.c | ||
mpt3sas_scsih.c | ||
mpt3sas_transport.c | ||
mpt3sas_trigger_diag.c | ||
mpt3sas_trigger_diag.h | ||
mpt3sas_trigger_pages.h | ||
mpt3sas_warpdrive.c |