Merge tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar: - Cleanups for SCHED_DEADLINE - Tracing updates/fixes - CPU Accounting fixes - First wave of changes to optimize the overhead of the scheduler build, from the fast-headers tree - including placeholder *_api.h headers for later header split-ups. - Preempt-dynamic using static_branch() for ARM64 - Isolation housekeeping mask rework; preperatory for further changes - NUMA-balancing: deal with CPU-less nodes - NUMA-balancing: tune systems that have multiple LLC cache domains per node (eg. AMD) - Updates to RSEQ UAPI in preparation for glibc usage - Lots of RSEQ/selftests, for same - Add Suren as PSI co-maintainer * tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits) sched/headers: ARM needs asm/paravirt_api_clock.h too sched/numa: Fix boot crash on arm64 systems headers/prep: Fix header to build standalone: <linux/psi.h> sched/headers: Only include <linux/entry-common.h> when CONFIG_GENERIC_ENTRY=y cgroup: Fix suspicious rcu_dereference_check() usage warning sched/preempt: Tell about PREEMPT_DYNAMIC on kernel headers sched/topology: Remove redundant variable and fix incorrect type in build_sched_domains sched/deadline,rt: Remove unused parameter from pick_next_[rt|dl]_entity() sched/deadline,rt: Remove unused functions for !CONFIG_SMP sched/deadline: Use __node_2_[pdl|dle]() and rb_first_cached() consistently sched/deadline: Merge dl_task_can_attach() and dl_cpu_busy() sched/deadline: Move bandwidth mgmt and reclaim functions into sched class source file sched/deadline: Remove unused def_dl_bandwidth sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE sched/tracing: Don't re-read p->state when emitting sched_switch event sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race sched/cpuacct: Remove redundant RCU read lock sched/cpuacct: Optimize away RCU read lock sched/cpuacct: Fix charge percpu cpuusage sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies ...
This commit is contained in:
@ -609,51 +609,7 @@ be migrated to a local memory node.
|
||||
The unmapping of pages and trapping faults incur additional overhead that
|
||||
ideally is offset by improved memory locality but there is no universal
|
||||
guarantee. If the target workload is already bound to NUMA nodes then this
|
||||
feature should be disabled. Otherwise, if the system overhead from the
|
||||
feature is too high then the rate the kernel samples for NUMA hinting
|
||||
faults may be controlled by the `numa_balancing_scan_period_min_ms,
|
||||
numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms,
|
||||
numa_balancing_scan_size_mb`_, and numa_balancing_settle_count sysctls.
|
||||
|
||||
|
||||
numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb
|
||||
===============================================================================================================================
|
||||
|
||||
|
||||
Automatic NUMA balancing scans tasks address space and unmaps pages to
|
||||
detect if pages are properly placed or if the data should be migrated to a
|
||||
memory node local to where the task is running. Every "scan delay" the task
|
||||
scans the next "scan size" number of pages in its address space. When the
|
||||
end of the address space is reached the scanner restarts from the beginning.
|
||||
|
||||
In combination, the "scan delay" and "scan size" determine the scan rate.
|
||||
When "scan delay" decreases, the scan rate increases. The scan delay and
|
||||
hence the scan rate of every task is adaptive and depends on historical
|
||||
behaviour. If pages are properly placed then the scan delay increases,
|
||||
otherwise the scan delay decreases. The "scan size" is not adaptive but
|
||||
the higher the "scan size", the higher the scan rate.
|
||||
|
||||
Higher scan rates incur higher system overhead as page faults must be
|
||||
trapped and potentially data must be migrated. However, the higher the scan
|
||||
rate, the more quickly a tasks memory is migrated to a local node if the
|
||||
workload pattern changes and minimises performance impact due to remote
|
||||
memory accesses. These sysctls control the thresholds for scan delays and
|
||||
the number of pages scanned.
|
||||
|
||||
``numa_balancing_scan_period_min_ms`` is the minimum time in milliseconds to
|
||||
scan a tasks virtual memory. It effectively controls the maximum scanning
|
||||
rate for each task.
|
||||
|
||||
``numa_balancing_scan_delay_ms`` is the starting "scan delay" used for a task
|
||||
when it initially forks.
|
||||
|
||||
``numa_balancing_scan_period_max_ms`` is the maximum time in milliseconds to
|
||||
scan a tasks virtual memory. It effectively controls the minimum scanning
|
||||
rate for each task.
|
||||
|
||||
``numa_balancing_scan_size_mb`` is how many megabytes worth of pages are
|
||||
scanned for a given scan.
|
||||
|
||||
feature should be disabled.
|
||||
|
||||
oops_all_cpu_backtrace
|
||||
======================
|
||||
|
Reference in New Issue
Block a user