Darrick J. Wong 2b3f004d3d xfs: drop xfarray sortinfo folio on error
Chandan Babu reports the following livelock in xfs/708:

 run fstests xfs/708 at 2024-05-04 15:35:29
 XFS (loop16): EXPERIMENTAL online scrub feature in use. Use at your own risk!
 XFS (loop5): Mounting V5 Filesystem e96086f0-a2f9-4424-a1d5-c75d53d823be
 XFS (loop5): Ending clean mount
 XFS (loop5): Quotacheck needed: Please wait.
 XFS (loop5): Quotacheck: Done.
 XFS (loop5): EXPERIMENTAL online scrub feature in use. Use at your own risk!
 INFO: task xfs_io:143725 blocked for more than 122 seconds.
       Not tainted 6.9.0-rc4+ #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:xfs_io          state:D stack:0     pid:143725 tgid:143725 ppid:117661 flags:0x00004006
 Call Trace:
  <TASK>
  __schedule+0x69c/0x17a0
  schedule+0x74/0x1b0
  io_schedule+0xc4/0x140
  folio_wait_bit_common+0x254/0x650
  shmem_undo_range+0x9d5/0xb40
  shmem_evict_inode+0x322/0x8f0
  evict+0x24e/0x560
  __dentry_kill+0x17d/0x4d0
  dput+0x263/0x430
  __fput+0x2fc/0xaa0
  task_work_run+0x132/0x210
  get_signal+0x1a8/0x1910
  arch_do_signal_or_restart+0x7b/0x2f0
  syscall_exit_to_user_mode+0x1c2/0x200
  do_syscall_64+0x72/0x170
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

The shmem code is trying to drop all the folios attached to a shmem
file and gets stuck on a locked folio after a bnobt repair.  It looks
like the process has a signal pending, so I started looking for places
where we lock an xfile folio and then deal with a fatal signal.

I found a bug in xfarray_sort_scan via code inspection.  This function
is called to set up the scanning phase of a quicksort operation, which
may involve grabbing a locked xfile folio.  If we exit the function with
an error code, the caller does not call xfarray_sort_scan_done to put
the xfile folio.  If _sort_scan returns an error code while si->folio is
set, we leak the reference and never unlock the folio.

Therefore, change xfarray_sort to call _scan_done on exit.  This is safe
to call multiple times because it sets si->folio to NULL and ignores a
NULL si->folio.  Also change _sort_scan to use an intermediate variable
so that we never pollute si->folio with an errptr.

Fixes: 232ea052775f9 ("xfs: enable sorting of xfile-backed arrays")
Reported-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-27 15:55:52 +05:30
..
2024-05-01 18:07:38 +01:00
2024-05-13 12:14:03 -07:00
2024-01-11 20:11:35 -08:00
2024-05-24 09:07:22 -07:00
2024-05-24 09:40:31 -07:00
2024-05-13 14:14:05 -07:00
2024-05-24 09:31:50 -07:00
2024-05-21 09:51:42 -07:00
2024-05-20 13:23:43 -07:00
2024-05-23 12:04:36 -07:00
2024-05-21 09:51:42 -07:00
2023-12-29 11:58:34 -08:00
2024-05-18 12:39:20 -07:00
2024-05-21 09:51:42 -07:00
2024-03-27 13:17:15 +01:00
2024-05-23 13:51:09 -07:00
2024-05-25 14:19:01 -07:00
2024-05-22 09:23:18 -07:00
2024-03-27 13:17:15 +01:00
2024-04-23 13:27:43 +02:00
2024-04-23 15:37:02 +02:00
2024-03-12 20:03:34 -07:00
2024-05-02 20:35:57 +02:00
2023-10-30 19:28:19 -10:00
2024-05-21 09:51:42 -07:00
2024-05-23 12:04:36 -07:00
2024-04-25 20:56:20 -07:00
2024-05-10 08:26:31 +02:00
2024-03-12 20:03:34 -07:00
2024-03-15 09:00:09 -07:00
2024-05-02 20:35:57 +02:00
2024-04-17 13:49:44 +02:00
2024-03-11 10:21:06 -07:00
2024-05-22 09:23:18 -07:00
2024-03-13 12:53:53 -07:00
2024-04-15 16:03:25 -04:00
2024-02-02 13:11:49 +01:00
2024-03-12 20:03:34 -07:00
2024-05-02 16:28:20 +02:00
2024-04-10 16:23:04 -06:00
2024-03-26 09:01:18 +01:00
\n
2024-05-20 12:31:43 -07:00
2024-04-10 16:23:02 -06:00
2024-02-15 23:43:47 -05:00