Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:
 "More mm/ work, plenty more to come

  Subsystems affected by this patch series: slub, memcg, gup, kasan,
  pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs,
  thp, mmap, kconfig"

* akpm: (131 commits)
  arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
  x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
  riscv: support DEBUG_WX
  mm: add DEBUG_WX support
  drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup
  mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
  powerpc/mm: drop platform defined pmd_mknotpresent()
  mm: thp: don't need to drain lru cache when splitting and mlocking THP
  hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
  sparc32: register memory occupied by kernel as memblock.memory
  include/linux/memblock.h: fix minor typo and unclear comment
  mm, mempolicy: fix up gup usage in lookup_node
  tools/vm/page_owner_sort.c: filter out unneeded line
  mm: swap: memcg: fix memcg stats for huge pages
  mm: swap: fix vmstats for huge pages
  mm: vmscan: limit the range of LRU type balancing
  mm: vmscan: reclaim writepage is IO cost
  mm: vmscan: determine anon/file pressure balance at the reclaim root
  mm: balance LRU lists based on relative thrashing
  mm: only count actual rotations as LRU reclaim cost
  ...
This commit is contained in:
Linus Torvalds
2020-06-03 20:24:15 -07:00
147 changed files with 3456 additions and 2683 deletions

View File

@ -199,11 +199,11 @@ An RSS page is unaccounted when it's fully unmapped. A PageCache page is
unaccounted when it's removed from radix-tree. Even if RSS pages are fully
unmapped (by kswapd), they may exist as SwapCache in the system until they
are really freed. Such SwapCaches are also accounted.
A swapped-in page is not accounted until it's mapped.
A swapped-in page is accounted after adding into swapcache.
Note: The kernel does swapin-readahead and reads multiple swaps at once.
This means swapped-in pages may contain pages for other tasks than a task
causing page fault. So, we avoid accounting at swap-in I/O.
Since page's memcg recorded into swap whatever memsw enabled, the page will
be accounted after swapin.
At page migration, accounting information is kept.
@ -222,18 +222,13 @@ the cgroup that brought it in -- this will happen on memory pressure).
But see section 8.2: when moving a task to another cgroup, its pages may
be recharged to the new cgroup, if move_charge_at_immigrate has been chosen.
Exception: If CONFIG_MEMCG_SWAP is not used.
When you do swapoff and make swapped-out pages of shmem(tmpfs) to
be backed into memory in force, charges for pages are accounted against the
caller of swapoff rather than the users of shmem.
2.4 Swap Extension (CONFIG_MEMCG_SWAP)
2.4 Swap Extension
--------------------------------------
Swap Extension allows you to record charge for swap. A swapped-in page is
charged back to original page allocator if possible.
Swap usage is always recorded for each of cgroup. Swap Extension allows you to
read and limit it.
When swap is accounted, following files are added.
When CONFIG_SWAP is enabled, following files are added.
- memory.memsw.usage_in_bytes.
- memory.memsw.limit_in_bytes.

View File

@ -834,12 +834,15 @@
See also Documentation/networking/decnet.rst.
default_hugepagesz=
[same as hugepagesz=] The size of the default
HugeTLB page size. This is the size represented by
the legacy /proc/ hugepages APIs, used for SHM, and
default size when mounting hugetlbfs filesystems.
Defaults to the default architecture's huge page size
if not specified.
[HW] The size of the default HugeTLB page. This is
the size represented by the legacy /proc/ hugepages
APIs. In addition, this is the default hugetlb size
used for shmget(), mmap() and mounting hugetlbfs
filesystems. If not specified, defaults to the
architecture's default huge page size. Huge page
sizes are architecture dependent. See also
Documentation/admin-guide/mm/hugetlbpage.rst.
Format: size[KMG]
deferred_probe_timeout=
[KNL] Debugging option to set a timeout in seconds for
@ -1484,13 +1487,24 @@
hugepages using the cma allocator. If enabled, the
boot-time allocation of gigantic hugepages is skipped.
hugepages= [HW,X86-32,IA-64] HugeTLB pages to allocate at boot.
hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages.
On x86-64 and powerpc, this option can be specified
multiple times interleaved with hugepages= to reserve
huge pages of different sizes. Valid pages sizes on
x86-64 are 2M (when the CPU supports "pse") and 1G
(when the CPU supports the "pdpe1gb" cpuinfo flag).
hugepages= [HW] Number of HugeTLB pages to allocate at boot.
If this follows hugepagesz (below), it specifies
the number of pages of hugepagesz to be allocated.
If this is the first HugeTLB parameter on the command
line, it specifies the number of pages to allocate for
the default huge page size. See also
Documentation/admin-guide/mm/hugetlbpage.rst.
Format: <integer>
hugepagesz=
[HW] The size of the HugeTLB pages. This is used in
conjunction with hugepages (above) to allocate huge
pages of a specific size at boot. The pair
hugepagesz=X hugepages=Y can be specified once for
each supported huge page size. Huge page sizes are
architecture dependent. See also
Documentation/admin-guide/mm/hugetlbpage.rst.
Format: size[KMG]
hung_task_panic=
[KNL] Should the hung task detector generate panics.

View File

@ -100,6 +100,41 @@ with a huge page size selection parameter "hugepagesz=<size>". <size> must
be specified in bytes with optional scale suffix [kKmMgG]. The default huge
page size may be selected with the "default_hugepagesz=<size>" boot parameter.
Hugetlb boot command line parameter semantics
hugepagesz - Specify a huge page size. Used in conjunction with hugepages
parameter to preallocate a number of huge pages of the specified
size. Hence, hugepagesz and hugepages are typically specified in
pairs such as:
hugepagesz=2M hugepages=512
hugepagesz can only be specified once on the command line for a
specific huge page size. Valid huge page sizes are architecture
dependent.
hugepages - Specify the number of huge pages to preallocate. This typically
follows a valid hugepagesz or default_hugepagesz parameter. However,
if hugepages is the first or only hugetlb command line parameter it
implicitly specifies the number of huge pages of default size to
allocate. If the number of huge pages of default size is implicitly
specified, it can not be overwritten by a hugepagesz,hugepages
parameter pair for the default size.
For example, on an architecture with 2M default huge page size:
hugepages=256 hugepagesz=2M hugepages=512
will result in 256 2M huge pages being allocated and a warning message
indicating that the hugepages=512 parameter is ignored. If a hugepages
parameter is preceded by an invalid hugepagesz parameter, it will
be ignored.
default_hugepagesz - Specify the default huge page size. This parameter can
only be specified once on the command line. default_hugepagesz can
optionally be followed by the hugepages parameter to preallocate a
specific number of huge pages of default size. The number of default
sized huge pages to preallocate can also be implicitly specified as
mentioned in the hugepages section above. Therefore, on an
architecture with 2M default huge page size:
hugepages=256
default_hugepagesz=2M hugepages=256
hugepages=256 default_hugepagesz=2M
will all result in 256 2M huge pages being allocated. Valid default
huge page size is architecture dependent.
When multiple huge page sizes are supported, ``/proc/sys/vm/nr_hugepages``
indicates the current number of pre-allocated huge pages of the default size.
Thus, one can use the following command to dynamically allocate/deallocate

View File

@ -220,6 +220,13 @@ memory. A lower value can prevent THPs from being
collapsed, resulting fewer pages being collapsed into
THPs, and lower memory access performance.
``max_ptes_shared`` specifies how many pages can be shared across multiple
processes. Exceeding the number would block the collapse::
/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared
A higher value may increase memory footprint for some workloads.
Boot parameter
==============

View File

@ -831,14 +831,27 @@ tooling to work, you can do::
swappiness
==========
This control is used to define how aggressive the kernel will swap
memory pages. Higher values will increase aggressiveness, lower values
decrease the amount of swap. A value of 0 instructs the kernel not to
initiate swap until the amount of free and file-backed pages is less
than the high water mark in a zone.
This control is used to define the rough relative IO cost of swapping
and filesystem paging, as a value between 0 and 200. At 100, the VM
assumes equal IO cost and will thus apply memory pressure to the page
cache and swap-backed pages equally; lower values signify more
expensive swap IO, higher values indicates cheaper.
Keep in mind that filesystem IO patterns under memory pressure tend to
be more efficient than swap's random IO. An optimal value will require
experimentation and will also be workload-dependent.
The default value is 60.
For in-memory swap, like zram or zswap, as well as hybrid setups that
have swap on faster devices than the filesystem, values beyond 100 can
be considered. For example, if the random IO against the swap device
is on average 2x faster than IO from the filesystem, swappiness should
be 133 (x + 2x = 200, 2x = 133.33).
At 0, the kernel will not initiate swap until the amount of free and
file-backed pages is less than the high watermark in a zone.
unprivileged_userfaultfd
========================