linux/mm
Yang Shi 8fd2e0b505 mm: swap: check if swap backing device is congested or not
Swap readahead would read in a few pages regardless if the underlying
device is busy or not.  It may incur long waiting time if the device is
congested, and it may also exacerbate the congestion.

Use inode_read_congested() to check if the underlying device is busy or
not like what file page readahead does.  Get inode from
swap_info_struct.

Although we can add inode information in swap_address_space
(address_space->host), it may lead some unexpected side effect, i.e.  it
may break mapping_cap_account_dirty().  Using inode from
swap_info_struct seems simple and good enough.

Just does the check in vma_cluster_readahead() since
swap_vma_readahead() is just used for non-rotational device which much
less likely has congestion than traditional HDD.

Although swap slots may be consecutive on swap partition, it still may
be fragmented on swap file.  This check would help to reduce excessive
stall for such case.

The test with page_fault1 of will-it-scale (sometimes tracing may just
show runtest.py that is the wrapper script of page_fault1), which
basically launches NR_CPU threads to generate 128MB anonymous pages for
each thread, on my virtual machine with congested HDD shows long tail
latency is reduced significantly.

Without the patch
 page_fault1_thr-1490  [023]   129.311706: funcgraph_entry:      #57377.796 us |  do_swap_page();
 page_fault1_thr-1490  [023]   129.369103: funcgraph_entry:        5.642us   |  do_swap_page();
 page_fault1_thr-1490  [023]   129.369119: funcgraph_entry:      #1289.592 us |  do_swap_page();
 page_fault1_thr-1490  [023]   129.370411: funcgraph_entry:        4.957us   |  do_swap_page();
 page_fault1_thr-1490  [023]   129.370419: funcgraph_entry:        1.940us   |  do_swap_page();
 page_fault1_thr-1490  [023]   129.378847: funcgraph_entry:      #1411.385 us |  do_swap_page();
 page_fault1_thr-1490  [023]   129.380262: funcgraph_entry:        3.916us   |  do_swap_page();
 page_fault1_thr-1490  [023]   129.380275: funcgraph_entry:      #4287.751 us |  do_swap_page();

With the patch
      runtest.py-1417  [020]   301.925911: funcgraph_entry:      #9870.146 us |  do_swap_page();
      runtest.py-1417  [020]   301.935785: funcgraph_entry:        9.802us   |  do_swap_page();
      runtest.py-1417  [020]   301.935799: funcgraph_entry:        3.551us   |  do_swap_page();
      runtest.py-1417  [020]   301.935806: funcgraph_entry:        2.142us   |  do_swap_page();
      runtest.py-1417  [020]   301.935853: funcgraph_entry:        6.938us   |  do_swap_page();
      runtest.py-1417  [020]   301.935864: funcgraph_entry:        3.765us   |  do_swap_page();
      runtest.py-1417  [020]   301.935871: funcgraph_entry:        3.600us   |  do_swap_page();
      runtest.py-1417  [020]   301.935878: funcgraph_entry:        7.202us   |  do_swap_page();

[akpm@linux-foundation.org: code cleanup]
[yang.shi@linux.alibaba.com: add comment]
  Link: http://lkml.kernel.org/r/bbc7bda7-62d0-df1a-23ef-d369e865bdca@linux.alibaba.com
Link: http://lkml.kernel.org/r/1546543673-108536-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Acked-by: Tim Chen <tim.c.chen@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Hugh Dickins <hughd@google.com
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:16 -08:00
..
kasan kasan: fix coccinelle warnings in kasan_p*_table 2019-03-05 21:07:13 -08:00
backing-dev.c writeback: synchronize sync(2) against cgroup writeback membership switches 2019-01-22 14:39:38 -07:00
balloon_compaction.c
cleancache.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
cma_debug.c mm/cma: remove unsupported gfp_mask parameter from cma_alloc() 2018-08-17 16:20:32 -07:00
cma.c kasan, mm, arm64: tag non slab memory allocated via pagealloc 2018-12-28 12:11:44 -08:00
cma.h
compaction.c mm: remove sysctl_extfrag_handler() 2019-03-05 21:07:15 -08:00
debug_page_ref.c
debug.c mm/debug.c: fix __dump_page() for poisoned pages 2019-02-21 09:01:00 -08:00
dmapool.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
early_ioremap.c
fadvise.c vfs: implement readahead(2) using POSIX_FADV_WILLNEED 2018-08-30 20:01:32 +02:00
failslab.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
filemap.c mm/filemap.c: remove redundant test from find_get_pages_contig 2019-03-05 21:07:16 -08:00
frame_vector.c
frontswap.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
gup_benchmark.c mm/gup_benchmark.c: prevent integer overflow in ioctl 2018-10-31 08:54:12 -07:00
gup.c mm/gup: fix gup_pmd_range() for dax 2019-02-12 16:33:18 -08:00
highmem.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
hmm.c mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm 2018-12-28 12:11:52 -08:00
huge_memory.c mm: replace all open encodings for NUMA_NO_NODE 2019-03-05 21:07:14 -08:00
hugetlb_cgroup.c
hugetlb.c mm/hugetlb: distinguish between migratability and movability 2019-03-05 21:07:15 -08:00
hwpoison-inject.c
init-mm.c mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids 2018-07-17 09:35:30 +02:00
internal.h mm/page_alloc.c: memory hotplug: free pages as higher order 2019-03-05 21:07:14 -08:00
interval_tree.c
Kconfig ksm: replace jhash2 with xxhash 2018-12-28 12:11:46 -08:00
Kconfig.debug mm: clarify CONFIG_PAGE_POISONING and usage 2018-08-22 10:52:44 -07:00
khugepaged.c mm/mmu_notifier: use structure for invalidate_range_start/end calls v2 2018-12-28 12:11:50 -08:00
kmemleak-test.c
kmemleak.c kmemleak: account for tagged pointers when calculating pointer range 2019-02-21 09:01:00 -08:00
ksm.c mm: reuse only-pte-mapped KSM page in do_wp_page() 2019-03-05 21:07:15 -08:00
list_lru.c mm/list_lru: introduce list_lru_shrink_walk_irq() 2018-08-17 16:20:32 -07:00
maccess.c Revert "x86/fault: BUG() when uaccess helpers fault on kernel addresses" 2019-02-25 09:10:51 -08:00
madvise.c mm/mmu_notifier: use structure for invalidate_range_start/end calls v2 2018-12-28 12:11:50 -08:00
Makefile mm: remove nobootmem 2018-10-31 08:54:16 -07:00
memblock.c arm64, mm, efi: Account for GICv3 LPI tables in static memblock reserve table 2019-02-16 15:02:03 +01:00
memcontrol.c mm/memcontrol.c: use struct_size() in kmalloc() 2019-03-05 21:07:15 -08:00
memfd.c memfd: Convert memfd_tag_pins to XArray 2018-10-21 10:46:41 -04:00
memory_hotplug.c mm: remove extra drain pages on pcp list 2019-03-05 21:07:15 -08:00
memory-failure.c mm: hwpoison: fix thp split handing in soft_offline_in_use_page() 2019-03-05 21:07:13 -08:00
memory.c mm: reuse only-pte-mapped KSM page in do_wp_page() 2019-03-05 21:07:15 -08:00
mempolicy.c mm: replace all open encodings for NUMA_NO_NODE 2019-03-05 21:07:14 -08:00
mempool.c mm/mempool.c: add missing parameter description 2018-08-22 10:52:44 -07:00
memtest.c
migrate.c mm/hugetlb: distinguish between migratability and movability 2019-03-05 21:07:15 -08:00
mincore.c Revert "Change mincore() to count "mapped" pages rather than "cached" pages" 2019-01-24 09:04:37 +13:00
mlock.c dax: remove VM_MIXEDMAP for fsdax and device dax 2018-08-17 16:20:27 -07:00
mm_init.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
mmap.c mm: enforce min addr even if capable() in expand_downwards() 2019-02-27 17:27:02 -08:00
mmu_context.c
mmu_gather.c mm: Replace call_rcu_sched() with call_rcu() 2018-11-27 09:21:46 -08:00
mmu_notifier.c mm/mmu_notifier: use structure for invalidate_range_start/end calls v2 2018-12-28 12:11:50 -08:00
mmzone.c
mprotect.c mm/mmu_notifier: use structure for invalidate_range_start/end calls v2 2018-12-28 12:11:50 -08:00
mremap.c mm: speed up mremap by 20x on large regions 2019-01-04 13:13:48 -08:00
msync.c
nommu.c mm/gup: cache dev_pagemap while pinning pages 2018-10-26 16:38:15 -07:00
oom_kill.c mm, oom: fix use-after-free in oom_kill_process 2019-02-01 15:46:23 -08:00
page_alloc.c mm: remove extra drain pages on pcp list 2019-03-05 21:07:15 -08:00
page_counter.c
page_ext.c mm: replace all open encodings for NUMA_NO_NODE 2019-03-05 21:07:14 -08:00
page_idle.c mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
page_io.c mm/page_io.c: fix polled swap page in 2019-01-04 13:13:48 -08:00
page_isolation.c mm: only report isolation failures when offlining memory 2018-12-28 12:11:46 -08:00
page_owner.c mm/page_owner: clamp read count to PAGE_SIZE 2018-12-28 12:11:46 -08:00
page_poison.c page_poison: play nicely with KASAN 2019-03-05 21:07:13 -08:00
page_vma_mapped.c mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly 2018-10-31 08:54:11 -07:00
page-writeback.c mm/page-writeback.c: don't break integrity writeback on ->writepage() error 2018-12-28 12:11:49 -08:00
pagewalk.c
percpu-internal.h
percpu-km.c percpu: convert spin_lock_irq to spin_lock_irqsave. 2018-12-18 09:04:08 -08:00
percpu-stats.c treewide: Use array_size() in vmalloc() 2018-06-12 16:19:22 -07:00
percpu-vm.c
percpu.c Merge branch 'for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu 2018-11-01 09:27:57 -07:00
pgtable-generic.c x86/mm: Page size aware flush_tlb_mm_range() 2018-10-09 16:51:11 +02:00
process_vm_access.c
quicklist.c
readahead.c mm/readahead.c: simplify get_next_ra_size() 2018-12-28 12:11:46 -08:00
rmap.c mm/mmu_notifier: mm/rmap.c: Fix a mmu_notifier range bug in try_to_unmap_one 2019-01-10 02:58:21 -08:00
rodata_test.c
shmem.c tmpfs: fix uninitialized return value in shmem_link 2019-02-25 11:49:22 -08:00
slab_common.c kmemleak: account for tagged pointers when calculating pointer range 2019-02-21 09:01:00 -08:00
slab.c mm/slab.c: kmemleak no scan alien caches 2019-03-05 21:07:14 -08:00
slab.h memcg: localize memcg_kmem_enabled() check 2019-03-05 21:07:15 -08:00
slob.c
slub.c mm, slub: make the comment of put_cpu_partial() complete 2019-03-05 21:07:15 -08:00
sparse-vmemmap.c mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
sparse.c mm, sparse: pass nid instead of pgdat to sparse_add_one_section() 2018-12-28 12:11:49 -08:00
swap_cgroup.c
swap_slots.c mm, swap, get_swap_pages: use entry_size instead of cluster in parameter 2018-08-22 10:52:44 -07:00
swap_state.c mm: swap: check if swap backing device is congested or not 2019-03-05 21:07:16 -08:00
swap.c mm: handle lru_add_drain_all for UP properly 2019-02-21 09:01:00 -08:00
swapfile.c mm, swap: fix swapoff with KSM pages 2018-12-28 12:11:52 -08:00
truncate.c mm: cleancache: fix corruption on missed inode invalidation 2018-11-30 14:56:14 -08:00
usercopy.c mm/usercopy.c: no check page span for stack objects 2019-01-08 17:15:11 -08:00
userfaultfd.c hugetlbfs: revert "use i_mmap_rwsem for more pmd sharing synchronization" 2019-01-08 17:15:11 -08:00
util.c mm: don't let userspace spam allocations warnings 2019-02-21 09:01:01 -08:00
vmacache.c mm: get rid of vmacache_flush_all() entirely 2018-09-13 15:18:04 -10:00
vmalloc.c vmalloc: export __vmalloc_node_range for CONFIG_TEST_VMALLOC_MODULE 2019-03-05 21:07:15 -08:00
vmpressure.c
vmscan.c Revert "mm: slowly shrink slabs with a relatively small number of objects" 2019-02-12 16:33:18 -08:00
vmstat.c mm: convert zone->managed_pages to atomic variable 2018-12-28 12:11:47 -08:00
workingset.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
z3fold.c z3fold: fix possible reclaim races 2018-11-18 10:15:09 -08:00
zbud.c
zpool.c
zsmalloc.c mm/zsmalloc.c: fix fall-through annotation 2018-10-26 16:26:35 -07:00
zswap.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00