linux/include
Huang Ying eb085574a7 mm, swap: fix race between swapoff and some swap operations
When swapin is performed, after getting the swap entry information from
the page table, system will swap in the swap entry, without any lock held
to prevent the swap device from being swapoff.  This may cause the race
like below,

CPU 1				CPU 2
-----				-----
				do_swap_page
				  swapin_readahead
				    __read_swap_cache_async
swapoff				      swapcache_prepare
  p->swap_map = NULL		        __swap_duplicate
					  p->swap_map[?] /* !!! NULL pointer access */

Because swapoff is usually done when system shutdown only, the race may
not hit many people in practice.  But it is still a race need to be fixed.

To fix the race, get_swap_device() is added to check whether the specified
swap entry is valid in its swap device.  If so, it will keep the swap
entry valid via preventing the swap device from being swapoff, until
put_swap_device() is called.

Because swapoff() is very rare code path, to make the normal path runs as
fast as possible, rcu_read_lock/unlock() and synchronize_rcu() instead of
reference count is used to implement get/put_swap_device().  >From
get_swap_device() to put_swap_device(), RCU reader side is locked, so
synchronize_rcu() in swapoff() will wait until put_swap_device() is
called.

In addition to swap_map, cluster_info, etc.  data structure in the struct
swap_info_struct, the swap cache radix tree will be freed after swapoff,
so this patch fixes the race between swap cache looking up and swapoff
too.

Races between some other swap cache usages and swapoff are fixed too via
calling synchronize_rcu() between clearing PageSwapCache() and freeing
swap cache data structure.

Another possible method to fix this is to use preempt_off() +
stop_machine() to prevent the swap device from being swapoff when its data
structure is being accessed.  The overhead in hot-path of both methods is
similar.  The advantages of RCU based method are,

1. stop_machine() may disturb the normal execution code path on other
   CPUs.

2. File cache uses RCU to protect its radix tree.  If the similar
   mechanism is used for swap cache too, it is easier to share code
   between them.

3. RCU is used to protect swap cache in total_swapcache_pages() and
   exit_swap_address_space() already.  The two mechanisms can be
   merged to simplify the logic.

Link: http://lkml.kernel.org/r/20190522015423.14418-1-ying.huang@intel.com
Fixes: 235b621767 ("mm/swap: add cluster lock")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Not-nacked-by: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12 11:05:43 -07:00
..
acpi It's been a relatively busy cycle for docs: 2019-07-09 12:34:26 -07:00
asm-generic asm-generic, x86: add bitops instrumentation for KASAN 2019-07-12 11:05:42 -07:00
clocksource clocksource/drivers: Continue making Hyper-V clocksource ISA agnostic 2019-07-03 11:00:59 +02:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2019-07-08 20:57:08 -07:00
drm
dt-bindings Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2019-07-11 10:55:49 -07:00
keys request_key improvements 2019-07-08 19:19:37 -07:00
kvm
linux mm, swap: fix race between swapoff and some swap operations 2019-07-12 11:05:43 -07:00
math-emu
media media updates for v5.3-rc1 2019-07-09 09:47:22 -07:00
memory
misc
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2019-07-11 10:55:49 -07:00
pcmcia It's been a relatively busy cycle for docs: 2019-07-09 12:34:26 -07:00
ras
rdma
scsi
soc
sound ASoC: Updates for v5.3 2019-07-08 14:45:34 +02:00
target
trace Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2019-07-11 10:55:49 -07:00
uapi nilfs2: do not use unexported cpu_to_le32()/le32_to_cpu() in uapi header 2019-07-12 11:05:40 -07:00
vdso
video fbdev changes for v5.3: 2019-07-09 09:55:45 -07:00
xen