Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton: "More mm/ work, plenty more to come Subsystems affected by this patch series: slub, memcg, gup, kasan, pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs, thp, mmap, kconfig" * akpm: (131 commits) arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined riscv: support DEBUG_WX mm: add DEBUG_WX support drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid() powerpc/mm: drop platform defined pmd_mknotpresent() mm: thp: don't need to drain lru cache when splitting and mlocking THP hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs sparc32: register memory occupied by kernel as memblock.memory include/linux/memblock.h: fix minor typo and unclear comment mm, mempolicy: fix up gup usage in lookup_node tools/vm/page_owner_sort.c: filter out unneeded line mm: swap: memcg: fix memcg stats for huge pages mm: swap: fix vmstats for huge pages mm: vmscan: limit the range of LRU type balancing mm: vmscan: reclaim writepage is IO cost mm: vmscan: determine anon/file pressure balance at the reclaim root mm: balance LRU lists based on relative thrashing mm: only count actual rotations as LRU reclaim cost ...
This commit is contained in:
@ -199,11 +199,11 @@ An RSS page is unaccounted when it's fully unmapped. A PageCache page is
|
||||
unaccounted when it's removed from radix-tree. Even if RSS pages are fully
|
||||
unmapped (by kswapd), they may exist as SwapCache in the system until they
|
||||
are really freed. Such SwapCaches are also accounted.
|
||||
A swapped-in page is not accounted until it's mapped.
|
||||
A swapped-in page is accounted after adding into swapcache.
|
||||
|
||||
Note: The kernel does swapin-readahead and reads multiple swaps at once.
|
||||
This means swapped-in pages may contain pages for other tasks than a task
|
||||
causing page fault. So, we avoid accounting at swap-in I/O.
|
||||
Since page's memcg recorded into swap whatever memsw enabled, the page will
|
||||
be accounted after swapin.
|
||||
|
||||
At page migration, accounting information is kept.
|
||||
|
||||
@ -222,18 +222,13 @@ the cgroup that brought it in -- this will happen on memory pressure).
|
||||
But see section 8.2: when moving a task to another cgroup, its pages may
|
||||
be recharged to the new cgroup, if move_charge_at_immigrate has been chosen.
|
||||
|
||||
Exception: If CONFIG_MEMCG_SWAP is not used.
|
||||
When you do swapoff and make swapped-out pages of shmem(tmpfs) to
|
||||
be backed into memory in force, charges for pages are accounted against the
|
||||
caller of swapoff rather than the users of shmem.
|
||||
|
||||
2.4 Swap Extension (CONFIG_MEMCG_SWAP)
|
||||
2.4 Swap Extension
|
||||
--------------------------------------
|
||||
|
||||
Swap Extension allows you to record charge for swap. A swapped-in page is
|
||||
charged back to original page allocator if possible.
|
||||
Swap usage is always recorded for each of cgroup. Swap Extension allows you to
|
||||
read and limit it.
|
||||
|
||||
When swap is accounted, following files are added.
|
||||
When CONFIG_SWAP is enabled, following files are added.
|
||||
|
||||
- memory.memsw.usage_in_bytes.
|
||||
- memory.memsw.limit_in_bytes.
|
||||
|
@ -834,12 +834,15 @@
|
||||
See also Documentation/networking/decnet.rst.
|
||||
|
||||
default_hugepagesz=
|
||||
[same as hugepagesz=] The size of the default
|
||||
HugeTLB page size. This is the size represented by
|
||||
the legacy /proc/ hugepages APIs, used for SHM, and
|
||||
default size when mounting hugetlbfs filesystems.
|
||||
Defaults to the default architecture's huge page size
|
||||
if not specified.
|
||||
[HW] The size of the default HugeTLB page. This is
|
||||
the size represented by the legacy /proc/ hugepages
|
||||
APIs. In addition, this is the default hugetlb size
|
||||
used for shmget(), mmap() and mounting hugetlbfs
|
||||
filesystems. If not specified, defaults to the
|
||||
architecture's default huge page size. Huge page
|
||||
sizes are architecture dependent. See also
|
||||
Documentation/admin-guide/mm/hugetlbpage.rst.
|
||||
Format: size[KMG]
|
||||
|
||||
deferred_probe_timeout=
|
||||
[KNL] Debugging option to set a timeout in seconds for
|
||||
@ -1484,13 +1487,24 @@
|
||||
hugepages using the cma allocator. If enabled, the
|
||||
boot-time allocation of gigantic hugepages is skipped.
|
||||
|
||||
hugepages= [HW,X86-32,IA-64] HugeTLB pages to allocate at boot.
|
||||
hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages.
|
||||
On x86-64 and powerpc, this option can be specified
|
||||
multiple times interleaved with hugepages= to reserve
|
||||
huge pages of different sizes. Valid pages sizes on
|
||||
x86-64 are 2M (when the CPU supports "pse") and 1G
|
||||
(when the CPU supports the "pdpe1gb" cpuinfo flag).
|
||||
hugepages= [HW] Number of HugeTLB pages to allocate at boot.
|
||||
If this follows hugepagesz (below), it specifies
|
||||
the number of pages of hugepagesz to be allocated.
|
||||
If this is the first HugeTLB parameter on the command
|
||||
line, it specifies the number of pages to allocate for
|
||||
the default huge page size. See also
|
||||
Documentation/admin-guide/mm/hugetlbpage.rst.
|
||||
Format: <integer>
|
||||
|
||||
hugepagesz=
|
||||
[HW] The size of the HugeTLB pages. This is used in
|
||||
conjunction with hugepages (above) to allocate huge
|
||||
pages of a specific size at boot. The pair
|
||||
hugepagesz=X hugepages=Y can be specified once for
|
||||
each supported huge page size. Huge page sizes are
|
||||
architecture dependent. See also
|
||||
Documentation/admin-guide/mm/hugetlbpage.rst.
|
||||
Format: size[KMG]
|
||||
|
||||
hung_task_panic=
|
||||
[KNL] Should the hung task detector generate panics.
|
||||
|
@ -100,6 +100,41 @@ with a huge page size selection parameter "hugepagesz=<size>". <size> must
|
||||
be specified in bytes with optional scale suffix [kKmMgG]. The default huge
|
||||
page size may be selected with the "default_hugepagesz=<size>" boot parameter.
|
||||
|
||||
Hugetlb boot command line parameter semantics
|
||||
hugepagesz - Specify a huge page size. Used in conjunction with hugepages
|
||||
parameter to preallocate a number of huge pages of the specified
|
||||
size. Hence, hugepagesz and hugepages are typically specified in
|
||||
pairs such as:
|
||||
hugepagesz=2M hugepages=512
|
||||
hugepagesz can only be specified once on the command line for a
|
||||
specific huge page size. Valid huge page sizes are architecture
|
||||
dependent.
|
||||
hugepages - Specify the number of huge pages to preallocate. This typically
|
||||
follows a valid hugepagesz or default_hugepagesz parameter. However,
|
||||
if hugepages is the first or only hugetlb command line parameter it
|
||||
implicitly specifies the number of huge pages of default size to
|
||||
allocate. If the number of huge pages of default size is implicitly
|
||||
specified, it can not be overwritten by a hugepagesz,hugepages
|
||||
parameter pair for the default size.
|
||||
For example, on an architecture with 2M default huge page size:
|
||||
hugepages=256 hugepagesz=2M hugepages=512
|
||||
will result in 256 2M huge pages being allocated and a warning message
|
||||
indicating that the hugepages=512 parameter is ignored. If a hugepages
|
||||
parameter is preceded by an invalid hugepagesz parameter, it will
|
||||
be ignored.
|
||||
default_hugepagesz - Specify the default huge page size. This parameter can
|
||||
only be specified once on the command line. default_hugepagesz can
|
||||
optionally be followed by the hugepages parameter to preallocate a
|
||||
specific number of huge pages of default size. The number of default
|
||||
sized huge pages to preallocate can also be implicitly specified as
|
||||
mentioned in the hugepages section above. Therefore, on an
|
||||
architecture with 2M default huge page size:
|
||||
hugepages=256
|
||||
default_hugepagesz=2M hugepages=256
|
||||
hugepages=256 default_hugepagesz=2M
|
||||
will all result in 256 2M huge pages being allocated. Valid default
|
||||
huge page size is architecture dependent.
|
||||
|
||||
When multiple huge page sizes are supported, ``/proc/sys/vm/nr_hugepages``
|
||||
indicates the current number of pre-allocated huge pages of the default size.
|
||||
Thus, one can use the following command to dynamically allocate/deallocate
|
||||
|
@ -220,6 +220,13 @@ memory. A lower value can prevent THPs from being
|
||||
collapsed, resulting fewer pages being collapsed into
|
||||
THPs, and lower memory access performance.
|
||||
|
||||
``max_ptes_shared`` specifies how many pages can be shared across multiple
|
||||
processes. Exceeding the number would block the collapse::
|
||||
|
||||
/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared
|
||||
|
||||
A higher value may increase memory footprint for some workloads.
|
||||
|
||||
Boot parameter
|
||||
==============
|
||||
|
||||
|
@ -831,14 +831,27 @@ tooling to work, you can do::
|
||||
swappiness
|
||||
==========
|
||||
|
||||
This control is used to define how aggressive the kernel will swap
|
||||
memory pages. Higher values will increase aggressiveness, lower values
|
||||
decrease the amount of swap. A value of 0 instructs the kernel not to
|
||||
initiate swap until the amount of free and file-backed pages is less
|
||||
than the high water mark in a zone.
|
||||
This control is used to define the rough relative IO cost of swapping
|
||||
and filesystem paging, as a value between 0 and 200. At 100, the VM
|
||||
assumes equal IO cost and will thus apply memory pressure to the page
|
||||
cache and swap-backed pages equally; lower values signify more
|
||||
expensive swap IO, higher values indicates cheaper.
|
||||
|
||||
Keep in mind that filesystem IO patterns under memory pressure tend to
|
||||
be more efficient than swap's random IO. An optimal value will require
|
||||
experimentation and will also be workload-dependent.
|
||||
|
||||
The default value is 60.
|
||||
|
||||
For in-memory swap, like zram or zswap, as well as hybrid setups that
|
||||
have swap on faster devices than the filesystem, values beyond 100 can
|
||||
be considered. For example, if the random IO against the swap device
|
||||
is on average 2x faster than IO from the filesystem, swappiness should
|
||||
be 133 (x + 2x = 200, 2x = 133.33).
|
||||
|
||||
At 0, the kernel will not initiate swap until the amount of free and
|
||||
file-backed pages is less than the high watermark in a zone.
|
||||
|
||||
|
||||
unprivileged_userfaultfd
|
||||
========================
|
||||
|
Reference in New Issue
Block a user