Merge branch 'master' into mm-hotfixes-stable

This commit is contained in:
Andrew Morton 2023-06-30 08:41:39 -07:00
commit 44f10dbefd
10103 changed files with 356967 additions and 108980 deletions

View File

@ -254,6 +254,7 @@ ForEachMacros:
- 'for_each_free_mem_range'
- 'for_each_free_mem_range_reverse'
- 'for_each_func_rsrc'
- 'for_each_group_device'
- 'for_each_group_evsel'
- 'for_each_group_member'
- 'for_each_hstate'

1
.gitattributes vendored
View File

@ -2,3 +2,4 @@
*.[ch] diff=cpp
*.dts diff=dts
*.dts[io] diff=dts
*.rs diff=rust

View File

@ -183,6 +183,8 @@ Henrik Rydberg <rydberg@bitmath.org>
Herbert Xu <herbert@gondor.apana.org.au>
Huacai Chen <chenhuacai@kernel.org> <chenhc@lemote.com>
Huacai Chen <chenhuacai@kernel.org> <chenhuacai@loongson.cn>
J. Bruce Fields <bfields@fieldses.org> <bfields@redhat.com>
J. Bruce Fields <bfields@fieldses.org> <bfields@citi.umich.edu>
Jacob Shin <Jacob.Shin@amd.com>
Jaegeuk Kim <jaegeuk@kernel.org> <jaegeuk@google.com>
Jaegeuk Kim <jaegeuk@kernel.org> <jaegeuk.kim@samsung.com>
@ -330,6 +332,7 @@ Mauro Carvalho Chehab <mchehab@kernel.org> <m.chehab@samsung.com>
Mauro Carvalho Chehab <mchehab@kernel.org> <mchehab@s-opensource.com>
Maxim Mikityanskiy <maxtram95@gmail.com> <maximmi@mellanox.com>
Maxim Mikityanskiy <maxtram95@gmail.com> <maximmi@nvidia.com>
Maxime Ripard <mripard@kernel.org> <maxime@cerno.tech>
Maxime Ripard <mripard@kernel.org> <maxime.ripard@bootlin.com>
Maxime Ripard <mripard@kernel.org> <maxime.ripard@free-electrons.com>
Mayuresh Janorkar <mayur@ti.com>

View File

@ -383,6 +383,12 @@ E: tomas@nocrew.org
W: http://tomas.nocrew.org/
D: dsp56k device driver
N: Srivatsa S. Bhat
E: srivatsa@csail.mit.edu
D: Maintainer of Generic Paravirt-Ops subsystem
D: Maintainer of VMware hypervisor interface
D: Maintainer of VMware virtual PTP clock driver (ptp_vmw)
N: Ross Biro
E: ross.biro@gmail.com
D: Original author of the Linux networking code

View File

@ -13,6 +13,11 @@ Description:
Specifies the duration of the LED blink in milliseconds.
Defaults to 50 ms.
With hw_control ON, the interval value MUST be set to the
default value and cannot be changed.
Trying to set any value in this specific mode will return
an EINVAL error.
What: /sys/class/leds/<led>/link
Date: Dec 2017
KernelVersion: 4.16
@ -39,6 +44,9 @@ Description:
If set to 1, the LED will blink for the milliseconds specified
in interval to signal transmission.
With hw_control ON, the blink interval is controlled by hardware
and won't reflect the value set in interval.
What: /sys/class/leds/<led>/rx
Date: Dec 2017
KernelVersion: 4.16
@ -50,3 +58,84 @@ Description:
If set to 1, the LED will blink for the milliseconds specified
in interval to signal reception.
With hw_control ON, the blink interval is controlled by hardware
and won't reflect the value set in interval.
What: /sys/class/leds/<led>/hw_control
Date: Jun 2023
KernelVersion: 6.5
Contact: linux-leds@vger.kernel.org
Description:
Communicate whether the LED trigger modes are driven by hardware
or software fallback is used.
If 0, the LED is using software fallback to blink.
If 1, the LED is using hardware control to blink and signal the
requested modes.
What: /sys/class/leds/<led>/link_10
Date: Jun 2023
KernelVersion: 6.5
Contact: linux-leds@vger.kernel.org
Description:
Signal the link speed state of 10Mbps of the named network device.
If set to 0 (default), the LED's normal state is off.
If set to 1, the LED's normal state reflects the link state
speed of 10MBps of the named network device.
Setting this value also immediately changes the LED state.
What: /sys/class/leds/<led>/link_100
Date: Jun 2023
KernelVersion: 6.5
Contact: linux-leds@vger.kernel.org
Description:
Signal the link speed state of 100Mbps of the named network device.
If set to 0 (default), the LED's normal state is off.
If set to 1, the LED's normal state reflects the link state
speed of 100Mbps of the named network device.
Setting this value also immediately changes the LED state.
What: /sys/class/leds/<led>/link_1000
Date: Jun 2023
KernelVersion: 6.5
Contact: linux-leds@vger.kernel.org
Description:
Signal the link speed state of 1000Mbps of the named network device.
If set to 0 (default), the LED's normal state is off.
If set to 1, the LED's normal state reflects the link state
speed of 1000Mbps of the named network device.
Setting this value also immediately changes the LED state.
What: /sys/class/leds/<led>/half_duplex
Date: Jun 2023
KernelVersion: 6.5
Contact: linux-leds@vger.kernel.org
Description:
Signal the link half duplex state of the named network device.
If set to 0 (default), the LED's normal state is off.
If set to 1, the LED's normal state reflects the link half
duplex state of the named network device.
Setting this value also immediately changes the LED state.
What: /sys/class/leds/<led>/full_duplex
Date: Jun 2023
KernelVersion: 6.5
Contact: linux-leds@vger.kernel.org
Description:
Signal the link full duplex state of the named network device.
If set to 0 (default), the LED's normal state is off.
If set to 1, the LED's normal state reflects the link full
duplex state of the named network device.
Setting this value also immediately changes the LED state.

View File

@ -670,7 +670,7 @@ Description: Preferred MTE tag checking mode
"async" Prefer asynchronous mode
================ ==============================================
See also: Documentation/arm64/memory-tagging-extension.rst
See also: Documentation/arch/arm64/memory-tagging-extension.rst
What: /sys/devices/system/cpu/nohz_full
Date: Apr 2015

View File

@ -2071,41 +2071,7 @@ call.
Because RCU avoids interrupting idle CPUs, it is illegal to execute an
RCU read-side critical section on an idle CPU. (Kernels built with
``CONFIG_PROVE_RCU=y`` will splat if you try it.) The RCU_NONIDLE()
macro and ``_rcuidle`` event tracing is provided to work around this
restriction. In addition, rcu_is_watching() may be used to test
whether or not it is currently legal to run RCU read-side critical
sections on this CPU. I learned of the need for diagnostics on the one
hand and RCU_NONIDLE() on the other while inspecting idle-loop code.
Steven Rostedt supplied ``_rcuidle`` event tracing, which is used quite
heavily in the idle loop. However, there are some restrictions on the
code placed within RCU_NONIDLE():
#. Blocking is prohibited. In practice, this is not a serious
restriction given that idle tasks are prohibited from blocking to
begin with.
#. Although nesting RCU_NONIDLE() is permitted, they cannot nest
indefinitely deeply. However, given that they can be nested on the
order of a million deep, even on 32-bit systems, this should not be a
serious restriction. This nesting limit would probably be reached
long after the compiler OOMed or the stack overflowed.
#. Any code path that enters RCU_NONIDLE() must sequence out of that
same RCU_NONIDLE(). For example, the following is grossly
illegal:
::
1 RCU_NONIDLE({
2 do_something();
3 goto bad_idea; /* BUG!!! */
4 do_something_else();});
5 bad_idea:
It is just as illegal to transfer control into the middle of
RCU_NONIDLE()'s argument. Yes, in theory, you could transfer in
as long as you also transferred out, but in practice you could also
expect to get sharply worded review comments.
``CONFIG_PROVE_RCU=y`` will splat if you try it.)
It is similarly socially unacceptable to interrupt an ``nohz_full`` CPU
running in userspace. RCU must therefore track ``nohz_full`` userspace

View File

@ -1117,7 +1117,6 @@ All: lockdep-checked RCU utility APIs::
RCU_LOCKDEP_WARN
rcu_sleep_check
RCU_NONIDLE
All: Unchecked RCU-protected pointer access::

View File

@ -508,9 +508,6 @@ cache_miss_collisions
cache miss, but raced with a write and data was already present (usually 0
since the synchronization for cache misses was rewritten)
cache_readaheads
Count of times readahead occurred.
Sysfs - cache set
~~~~~~~~~~~~~~~~~

View File

@ -297,7 +297,7 @@ Lock order is as follows::
Page lock (PG_locked bit of page->flags)
mm->page_table_lock or split pte_lock
lock_page_memcg (memcg->move_lock)
folio_memcg_lock (memcg->move_lock)
mapping->i_pages lock
lruvec->lru_lock.

View File

@ -1580,6 +1580,13 @@ PAGE_SIZE multiple when read back.
Healthy workloads are not expected to reach this limit.
memory.swap.peak
A read-only single value file which exists on non-root
cgroups.
The max swap usage recorded for the cgroup and its
descendants since the creation of the cgroup.
memory.swap.max
A read-write single value file which exists on non-root
cgroups. The default is "max".
@ -2022,31 +2029,33 @@ that attribute:
no-change
Do not modify the I/O priority class.
none-to-rt
For requests that do not have an I/O priority class (NONE),
change the I/O priority class into RT. Do not modify
the I/O priority class of other requests.
promote-to-rt
For requests that have a non-RT I/O priority class, change it into RT.
Also change the priority level of these requests to 4. Do not modify
the I/O priority of requests that have priority class RT.
restrict-to-be
For requests that do not have an I/O priority class or that have I/O
priority class RT, change it into BE. Do not modify the I/O priority
class of requests that have priority class IDLE.
priority class RT, change it into BE. Also change the priority level
of these requests to 0. Do not modify the I/O priority class of
requests that have priority class IDLE.
idle
Change the I/O priority class of all requests into IDLE, the lowest
I/O priority class.
none-to-rt
Deprecated. Just an alias for promote-to-rt.
The following numerical values are associated with the I/O priority policies:
+-------------+---+
| no-change | 0 |
+-------------+---+
| none-to-rt | 1 |
+-------------+---+
| rt-to-be | 2 |
+-------------+---+
| all-to-idle | 3 |
+-------------+---+
+----------------+---+
| no-change | 0 |
+----------------+---+
| rt-to-be | 2 |
+----------------+---+
| all-to-idle | 3 |
+----------------+---+
The numerical value that corresponds to each I/O priority class is as follows:
@ -2062,9 +2071,13 @@ The numerical value that corresponds to each I/O priority class is as follows:
The algorithm to set the I/O priority class for a request is as follows:
- Translate the I/O priority class policy into a number.
- Change the request I/O priority class into the maximum of the I/O priority
class policy number and the numerical I/O priority class.
- If I/O priority class policy is promote-to-rt, change the request I/O
priority class to IOPRIO_CLASS_RT and change the request I/O priority
level to 4.
- If I/O priorityt class is not promote-to-rt, translate the I/O priority
class policy into a number, then change the request I/O priority class
into the maximum of the I/O priority class policy number and the numerical
I/O priority class.
PID
---
@ -2437,7 +2450,7 @@ Miscellaneous controller provides 3 interface files. If two misc resources (res_
res_b 10
misc.current
A read-only flat-keyed file shown in the non-root cgroups. It shows
A read-only flat-keyed file shown in the all cgroups. It shows
the current usage of the resources in the cgroup and its children.::
$ cat misc.current

View File

@ -304,7 +304,7 @@
EL0 is indicated by /sys/devices/system/cpu/aarch32_el0
and hot-unplug operations may be restricted.
See Documentation/arm64/asymmetric-32bit.rst for more
See Documentation/arch/arm64/asymmetric-32bit.rst for more
information.
amd_iommu= [HW,X86-64]
@ -323,6 +323,7 @@
option with care.
pgtbl_v1 - Use v1 page table for DMA-API (Default).
pgtbl_v2 - Use v2 page table for DMA-API.
irtcachedis - Disable Interrupt Remapping Table (IRT) caching.
amd_iommu_dump= [HW,X86-64]
Enable AMD IOMMU driver option to dump the ACPI table
@ -429,6 +430,9 @@
arm64.nosme [ARM64] Unconditionally disable Scalable Matrix
Extension support
arm64.nomops [ARM64] Unconditionally disable Memory Copy and Memory
Set instructions support
ataflop= [HW,M68k]
atarimouse= [HW,MOUSE] Atari Mouse
@ -818,20 +822,6 @@
Format:
<first_slot>,<last_slot>,<port>,<enum_bit>[,<debug>]
cpu0_hotplug [X86] Turn on CPU0 hotplug feature when
CONFIG_BOOTPARAM_HOTPLUG_CPU0 is off.
Some features depend on CPU0. Known dependencies are:
1. Resume from suspend/hibernate depends on CPU0.
Suspend/hibernate will fail if CPU0 is offline and you
need to online CPU0 before suspend/hibernate.
2. PIC interrupts also depend on CPU0. CPU0 can't be
removed if a PIC interrupt is detected.
It's said poweroff/reboot may depend on CPU0 on some
machines although I haven't seen such issues so far
after CPU0 is offline on a few tested machines.
If the dependencies are under your control, you can
turn on cpu0_hotplug.
cpuidle.off=1 [CPU_IDLE]
disable the cpuidle sub-system
@ -852,6 +842,12 @@
on every CPU online, such as boot, and resume from suspend.
Default: 10000
cpuhp.parallel=
[SMP] Enable/disable parallel bringup of secondary CPUs
Format: <bool>
Default is enabled if CONFIG_HOTPLUG_PARALLEL=y. Otherwise
the parameter has no effect.
crash_kexec_post_notifiers
Run kdump after running panic-notifiers and dumping
kmsg. This only for the users who doubt kdump always
@ -2117,6 +2113,16 @@
disable
Do not enable intel_pstate as the default
scaling driver for the supported processors
active
Use intel_pstate driver to bypass the scaling
governors layer of cpufreq and provides it own
algorithms for p-state selection. There are two
P-state selection algorithms provided by
intel_pstate in the active mode: powersave and
performance. The way they both operate depends
on whether or not the hardware managed P-states
(HWP) feature has been enabled in the processor
and possibly on the processor model.
passive
Use intel_pstate as a scaling driver, but configure it
to work with generic cpufreq governors (instead of
@ -2551,12 +2557,13 @@
If the value is 0 (the default), KVM will pick a period based
on the ratio, such that a page is zapped after 1 hour on average.
kvm-amd.nested= [KVM,AMD] Allow nested virtualization in KVM/SVM.
Default is 1 (enabled)
kvm-amd.nested= [KVM,AMD] Control nested virtualization feature in
KVM/SVM. Default is 1 (enabled).
kvm-amd.npt= [KVM,AMD] Disable nested paging (virtualized MMU)
for all guests.
Default is 1 (enabled) if in 64-bit or 32-bit PAE mode.
kvm-amd.npt= [KVM,AMD] Control KVM's use of Nested Page Tables,
a.k.a. Two-Dimensional Page Tables. Default is 1
(enabled). Disable by KVM if hardware lacks support
for NPT.
kvm-arm.mode=
[KVM,ARM] Select one of KVM/arm64's modes of operation.
@ -2602,30 +2609,33 @@
Format: <integer>
Default: 5
kvm-intel.ept= [KVM,Intel] Disable extended page tables
(virtualized MMU) support on capable Intel chips.
Default is 1 (enabled)
kvm-intel.ept= [KVM,Intel] Control KVM's use of Extended Page Tables,
a.k.a. Two-Dimensional Page Tables. Default is 1
(enabled). Disable by KVM if hardware lacks support
for EPT.
kvm-intel.emulate_invalid_guest_state=
[KVM,Intel] Disable emulation of invalid guest state.
Ignored if kvm-intel.enable_unrestricted_guest=1, as
guest state is never invalid for unrestricted guests.
This param doesn't apply to nested guests (L2), as KVM
never emulates invalid L2 guest state.
Default is 1 (enabled)
[KVM,Intel] Control whether to emulate invalid guest
state. Ignored if kvm-intel.enable_unrestricted_guest=1,
as guest state is never invalid for unrestricted
guests. This param doesn't apply to nested guests (L2),
as KVM never emulates invalid L2 guest state.
Default is 1 (enabled).
kvm-intel.flexpriority=
[KVM,Intel] Disable FlexPriority feature (TPR shadow).
Default is 1 (enabled)
[KVM,Intel] Control KVM's use of FlexPriority feature
(TPR shadow). Default is 1 (enabled). Disalbe by KVM if
hardware lacks support for it.
kvm-intel.nested=
[KVM,Intel] Enable VMX nesting (nVMX).
Default is 0 (disabled)
[KVM,Intel] Control nested virtualization feature in
KVM/VMX. Default is 1 (enabled).
kvm-intel.unrestricted_guest=
[KVM,Intel] Disable unrestricted guest feature
(virtualized real and unpaged mode) on capable
Intel chips. Default is 1 (enabled)
[KVM,Intel] Control KVM's use of unrestricted guest
feature (virtualized real and unpaged mode). Default
is 1 (enabled). Disable by KVM if EPT is disabled or
hardware lacks support for it.
kvm-intel.vmentry_l1d_flush=[KVM,Intel] Mitigation for L1 Terminal Fault
CVE-2018-3620.
@ -2639,9 +2649,10 @@
Default is cond (do L1 cache flush in specific instances)
kvm-intel.vpid= [KVM,Intel] Disable Virtual Processor Identification
feature (tagged TLBs) on capable Intel chips.
Default is 1 (enabled)
kvm-intel.vpid= [KVM,Intel] Control KVM's use of Virtual Processor
Identification feature (tagged TLBs). Default is 1
(enabled). Disable by KVM if hardware lacks support
for it.
l1d_flush= [X86,INTEL]
Control mitigation for L1D based snooping vulnerability.
@ -3423,6 +3434,10 @@
[HW] Make the MicroTouch USB driver use raw coordinates
('y', default) or cooked coordinates ('n')
mtrr=debug [X86]
Enable printing debug information related to MTRR
registers at boot time.
mtrr_chunk_size=nn[KMG] [X86]
used for mtrr cleanup. It is largest continuous chunk
that could hold holes aka. UC entries.
@ -3702,8 +3717,8 @@
nohibernate [HIBERNATION] Disable hibernation and resume.
nohlt [ARM,ARM64,MICROBLAZE,SH] Forces the kernel to busy wait
in do_idle() and not use the arch_cpu_idle()
nohlt [ARM,ARM64,MICROBLAZE,MIPS,SH] Forces the kernel to
busy wait in do_idle() and not use the arch_cpu_idle()
implementation; requires CONFIG_GENERIC_IDLE_POLL_SETUP
to be effective. This is useful on platforms where the
sleep(SH) or wfi(ARM,ARM64) instructions do not work
@ -3838,7 +3853,7 @@
nosmp [SMP] Tells an SMP kernel to act as a UP kernel,
and disable the IO APIC. legacy for "maxcpus=0".
nosmt [KNL,S390] Disable symmetric multithreading (SMT).
nosmt [KNL,MIPS,S390] Disable symmetric multithreading (SMT).
Equivalent to smt=1.
[KNL,X86] Disable symmetric multithreading (SMT).
@ -4736,43 +4751,6 @@
the propagation of recent CPU-hotplug changes up
the rcu_node combining tree.
rcutree.use_softirq= [KNL]
If set to zero, move all RCU_SOFTIRQ processing to
per-CPU rcuc kthreads. Defaults to a non-zero
value, meaning that RCU_SOFTIRQ is used by default.
Specify rcutree.use_softirq=0 to use rcuc kthreads.
But note that CONFIG_PREEMPT_RT=y kernels disable
this kernel boot parameter, forcibly setting it
to zero.
rcutree.rcu_fanout_exact= [KNL]
Disable autobalancing of the rcu_node combining
tree. This is used by rcutorture, and might
possibly be useful for architectures having high
cache-to-cache transfer latencies.
rcutree.rcu_fanout_leaf= [KNL]
Change the number of CPUs assigned to each
leaf rcu_node structure. Useful for very
large systems, which will choose the value 64,
and for NUMA systems with large remote-access
latencies, which will choose a value aligned
with the appropriate hardware boundaries.
rcutree.rcu_min_cached_objs= [KNL]
Minimum number of objects which are cached and
maintained per one CPU. Object size is equal
to PAGE_SIZE. The cache allows to reduce the
pressure to page allocator, also it makes the
whole algorithm to behave better in low memory
condition.
rcutree.rcu_delay_page_cache_fill_msec= [KNL]
Set the page-cache refill delay (in milliseconds)
in response to low-memory conditions. The range
of permitted values is in the range 0:100000.
rcutree.jiffies_till_first_fqs= [KNL]
Set delay from grace-period initialization to
first attempt to force quiescent states.
@ -4811,21 +4789,6 @@
When RCU_NOCB_CPU is set, also adjust the
priority of NOCB callback kthreads.
rcutree.rcu_divisor= [KNL]
Set the shift-right count to use to compute
the callback-invocation batch limit bl from
the number of callbacks queued on this CPU.
The result will be bounded below by the value of
the rcutree.blimit kernel parameter. Every bl
callbacks, the softirq handler will exit in
order to allow the CPU to do other work.
Please note that this callback-invocation batch
limit applies only to non-offloaded callback
invocation. Offloaded callbacks are instead
invoked in the context of an rcuoc kthread, which
scheduler will preempt as it does any other task.
rcutree.nocb_nobypass_lim_per_jiffy= [KNL]
On callback-offloaded (rcu_nocbs) CPUs,
RCU reduces the lock contention that would
@ -4839,14 +4802,6 @@
the ->nocb_bypass queue. The definition of "too
many" is supplied by this kernel boot parameter.
rcutree.rcu_nocb_gp_stride= [KNL]
Set the number of NOCB callback kthreads in
each group, which defaults to the square root
of the number of CPUs. Larger numbers reduce
the wakeup overhead on the global grace-period
kthread, but increases that same overhead on
each group's NOCB grace-period kthread.
rcutree.qhimark= [KNL]
Set threshold of queued RCU callbacks beyond which
batch limiting is disabled.
@ -4864,6 +4819,56 @@
on rcutree.qhimark at boot time and to zero to
disable more aggressive help enlistment.
rcutree.rcu_delay_page_cache_fill_msec= [KNL]
Set the page-cache refill delay (in milliseconds)
in response to low-memory conditions. The range
of permitted values is in the range 0:100000.
rcutree.rcu_divisor= [KNL]
Set the shift-right count to use to compute
the callback-invocation batch limit bl from
the number of callbacks queued on this CPU.
The result will be bounded below by the value of
the rcutree.blimit kernel parameter. Every bl
callbacks, the softirq handler will exit in
order to allow the CPU to do other work.
Please note that this callback-invocation batch
limit applies only to non-offloaded callback
invocation. Offloaded callbacks are instead
invoked in the context of an rcuoc kthread, which
scheduler will preempt as it does any other task.
rcutree.rcu_fanout_exact= [KNL]
Disable autobalancing of the rcu_node combining
tree. This is used by rcutorture, and might
possibly be useful for architectures having high
cache-to-cache transfer latencies.
rcutree.rcu_fanout_leaf= [KNL]
Change the number of CPUs assigned to each
leaf rcu_node structure. Useful for very
large systems, which will choose the value 64,
and for NUMA systems with large remote-access
latencies, which will choose a value aligned
with the appropriate hardware boundaries.
rcutree.rcu_min_cached_objs= [KNL]
Minimum number of objects which are cached and
maintained per one CPU. Object size is equal
to PAGE_SIZE. The cache allows to reduce the
pressure to page allocator, also it makes the
whole algorithm to behave better in low memory
condition.
rcutree.rcu_nocb_gp_stride= [KNL]
Set the number of NOCB callback kthreads in
each group, which defaults to the square root
of the number of CPUs. Larger numbers reduce
the wakeup overhead on the global grace-period
kthread, but increases that same overhead on
each group's NOCB grace-period kthread.
rcutree.rcu_kick_kthreads= [KNL]
Cause the grace-period kthread to get an extra
wake_up() if it sleeps three times longer than
@ -4871,6 +4876,13 @@
This wake_up() will be accompanied by a
WARN_ONCE() splat and an ftrace_dump().
rcutree.rcu_resched_ns= [KNL]
Limit the time spend invoking a batch of RCU
callbacks to the specified number of nanoseconds.
By default, this limit is checked only once
every 32 callbacks in order to limit the pain
inflicted by local_clock() overhead.
rcutree.rcu_unlock_delay= [KNL]
In CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels,
this specifies an rcu_read_unlock()-time delay
@ -4885,6 +4897,16 @@
rcu_node tree with an eye towards determining
why a new grace period has not yet started.
rcutree.use_softirq= [KNL]
If set to zero, move all RCU_SOFTIRQ processing to
per-CPU rcuc kthreads. Defaults to a non-zero
value, meaning that RCU_SOFTIRQ is used by default.
Specify rcutree.use_softirq=0 to use rcuc kthreads.
But note that CONFIG_PREEMPT_RT=y kernels disable
this kernel boot parameter, forcibly setting it
to zero.
rcuscale.gp_async= [KNL]
Measure performance of asynchronous
grace-period primitives such as call_rcu().
@ -5087,8 +5109,17 @@
rcutorture.stall_cpu_block= [KNL]
Sleep while stalling if set. This will result
in warnings from preemptible RCU in addition
to any other stall-related activity.
in warnings from preemptible RCU in addition to
any other stall-related activity. Note that
in kernels built with CONFIG_PREEMPTION=n and
CONFIG_PREEMPT_COUNT=y, this parameter will
cause the CPU to pass through a quiescent state.
Given CONFIG_PREEMPTION=n, this will suppress
RCU CPU stall warnings, but will instead result
in scheduling-while-atomic splats.
Use of this module parameter results in splats.
rcutorture.stall_cpu_holdoff= [KNL]
Time to wait (s) after boot before inducing stall.
@ -5452,7 +5483,12 @@
port and the regular usb controller gets disabled.
root= [KNL] Root filesystem
See name_to_dev_t comment in init/do_mounts.c.
Usually this a a block device specifier of some kind,
see the early_lookup_bdev comment in
block/early-lookup.c for details.
Alternatively this can be "ram" for the legacy initial
ramdisk, "nfs" and "cifs" for root on a network file
system, or "mtd" and "ubi" for mounting from raw flash.
rootdelay= [KNL] Delay (in seconds) to pause before attempting to
mount the root filesystem
@ -5735,7 +5771,7 @@
1: Fast pin select (default)
2: ATC IRMode
smt= [KNL,S390] Set the maximum number of threads (logical
smt= [KNL,MIPS,S390] Set the maximum number of threads (logical
CPUs) to use per physical CPU on systems capable of
symmetric multithreading (SMT). Will be capped to the
actual hardware limit.
@ -6563,6 +6599,12 @@
unknown_nmi_panic
[X86] Cause panic on unknown NMI.
unwind_debug [X86-64]
Enable unwinder debug output. This can be
useful for debugging certain unwinder error
conditions, including corrupt stacks and
bad/missing unwinder metadata.
usbcore.authorized_default=
[USB] Default USB device authorization:
(default -1 = authorized except for wireless USB,
@ -6931,6 +6973,18 @@
it can be updated at runtime by writing to the
corresponding sysfs file.
workqueue.cpu_intensive_thresh_us=
Per-cpu work items which run for longer than this
threshold are automatically considered CPU intensive
and excluded from concurrency management to prevent
them from noticeably delaying other per-cpu work
items. Default is 10000 (10ms).
If CONFIG_WQ_CPU_INTENSIVE_REPORT is set, the kernel
will report the work functions which violate this
threshold repeatedly. They are likely good
candidates for using WQ_UNBOUND workqueues instead.
workqueue.disable_numa
By default, all work items queued to unbound
workqueues are affine to the NUMA nodes they're

View File

@ -119,9 +119,9 @@ set size has chronologically changed.::
Data Access Pattern Aware Memory Management
===========================================
Below three commands make every memory region of size >=4K that doesn't
accessed for >=60 seconds in your workload to be swapped out. ::
Below command makes every memory region of size >=4K that has not accessed for
>=60 seconds in your workload to be swapped out. ::
$ echo "#min-size max-size min-acc max-acc min-age max-age action" > test_scheme
$ echo "4K max 0 0 60s max pageout" >> test_scheme
$ damo schemes -c test_scheme <pid of your workload>
$ sudo damo schemes --damos_access_rate 0 0 --damos_sz_region 4K max \
--damos_age 60s max --damos_action pageout \
<pid of your workload>

View File

@ -10,9 +10,8 @@ DAMON provides below interfaces for different users.
`This <https://github.com/awslabs/damo>`_ is for privileged people such as
system administrators who want a just-working human-friendly interface.
Using this, users can use the DAMONs major features in a human-friendly way.
It may not be highly tuned for special cases, though. It supports both
virtual and physical address spaces monitoring. For more detail, please
refer to its `usage document
It may not be highly tuned for special cases, though. For more detail,
please refer to its `usage document
<https://github.com/awslabs/damo/blob/next/USAGE.md>`_.
- *sysfs interface.*
:ref:`This <sysfs_interface>` is for privileged user space programmers who
@ -20,11 +19,7 @@ DAMON provides below interfaces for different users.
features by reading from and writing to special sysfs files. Therefore,
you can write and use your personalized DAMON sysfs wrapper programs that
reads/writes the sysfs files instead of you. The `DAMON user space tool
<https://github.com/awslabs/damo>`_ is one example of such programs. It
supports both virtual and physical address spaces monitoring. Note that this
interface provides only simple :ref:`statistics <damos_stats>` for the
monitoring results. For detailed monitoring results, DAMON provides a
:ref:`tracepoint <tracepoint>`.
<https://github.com/awslabs/damo>`_ is one example of such programs.
- *debugfs interface. (DEPRECATED!)*
:ref:`This <debugfs_interface>` is almost identical to :ref:`sysfs interface
<sysfs_interface>`. This is deprecated, so users should move to the
@ -139,7 +134,7 @@ scheme of the kdamond. Writing ``clear_schemes_tried_regions`` to ``state``
file clears the DAMON-based operating scheme action tried regions directory for
each DAMON-based operation scheme of the kdamond. For details of the
DAMON-based operation scheme action tried regions directory, please refer to
:ref:tried_regions section <sysfs_schemes_tried_regions>`.
:ref:`tried_regions section <sysfs_schemes_tried_regions>`.
If the state is ``on``, reading ``pid`` shows the pid of the kdamond thread.
@ -259,12 +254,9 @@ be equal or smaller than ``start`` of directory ``N+1``.
contexts/<N>/schemes/
---------------------
For usual DAMON-based data access aware memory management optimizations, users
would normally want the system to apply a memory management action to a memory
region of a specific access pattern. DAMON receives such formalized operation
schemes from the user and applies those to the target memory regions. Users
can get and set the schemes by reading from and writing to files under this
directory.
The directory for DAMON-based Operation Schemes (:ref:`DAMOS
<damon_design_damos>`). Users can get and set the schemes by reading from and
writing to files under this directory.
In the beginning, this directory has only one file, ``nr_schemes``. Writing a
number (``N``) to the file creates the number of child directories named ``0``
@ -277,12 +269,12 @@ In each scheme directory, five directories (``access_pattern``, ``quotas``,
``watermarks``, ``filters``, ``stats``, and ``tried_regions``) and one file
(``action``) exist.
The ``action`` file is for setting and getting what action you want to apply to
memory regions having specific access pattern of the interest. The keywords
that can be written to and read from the file and their meaning are as below.
The ``action`` file is for setting and getting the scheme's :ref:`action
<damon_design_damos_action>`. The keywords that can be written to and read
from the file and their meaning are as below.
Note that support of each action depends on the running DAMON operations set
`implementation <sysfs_contexts>`.
:ref:`implementation <sysfs_contexts>`.
- ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``.
Supported by ``vaddr`` and ``fvaddr`` operations set.
@ -304,32 +296,21 @@ Note that support of each action depends on the running DAMON operations set
schemes/<N>/access_pattern/
---------------------------
The target access pattern of each DAMON-based operation scheme is constructed
with three ranges including the size of the region in bytes, number of
monitored accesses per aggregate interval, and number of aggregated intervals
for the age of the region.
The directory for the target access :ref:`pattern
<damon_design_damos_access_pattern>` of the given DAMON-based operation scheme.
Under the ``access_pattern`` directory, three directories (``sz``,
``nr_accesses``, and ``age``) each having two files (``min`` and ``max``)
exist. You can set and get the access pattern for the given scheme by writing
to and reading from the ``min`` and ``max`` files under ``sz``,
``nr_accesses``, and ``age`` directories, respectively.
``nr_accesses``, and ``age`` directories, respectively. Note that the ``min``
and the ``max`` form a closed interval.
schemes/<N>/quotas/
-------------------
Optimal ``target access pattern`` for each ``action`` is workload dependent, so
not easy to find. Worse yet, setting a scheme of some action too aggressive
can cause severe overhead. To avoid such overhead, users can limit time and
size quota for each scheme. In detail, users can ask DAMON to try to use only
up to specific time (``time quota``) for applying the action, and to apply the
action to only up to specific amount (``size quota``) of memory regions having
the target access pattern within a given time interval (``reset interval``).
When the quota limit is expected to be exceeded, DAMON prioritizes found memory
regions of the ``target access pattern`` based on their size, access frequency,
and age. For personalized prioritization, users can set the weights for the
three properties.
The directory for the :ref:`quotas <damon_design_damos_quotas>` of the given
DAMON-based operation scheme.
Under ``quotas`` directory, three files (``ms``, ``bytes``,
``reset_interval_ms``) and one directory (``weights``) having three files
@ -337,23 +318,26 @@ Under ``quotas`` directory, three files (``ms``, ``bytes``,
You can set the ``time quota`` in milliseconds, ``size quota`` in bytes, and
``reset interval`` in milliseconds by writing the values to the three files,
respectively. You can also set the prioritization weights for size, access
frequency, and age in per-thousand unit by writing the values to the three
files under the ``weights`` directory.
respectively. Then, DAMON tries to use only up to ``time quota`` milliseconds
for applying the ``action`` to memory regions of the ``access_pattern``, and to
apply the action to only up to ``bytes`` bytes of memory regions within the
``reset_interval_ms``. Setting both ``ms`` and ``bytes`` zero disables the
quota limits.
You can also set the :ref:`prioritization weights
<damon_design_damos_quotas_prioritization>` for size, access frequency, and age
in per-thousand unit by writing the values to the three files under the
``weights`` directory.
schemes/<N>/watermarks/
-----------------------
To allow easy activation and deactivation of each scheme based on system
status, DAMON provides a feature called watermarks. The feature receives five
values called ``metric``, ``interval``, ``high``, ``mid``, and ``low``. The
``metric`` is the system metric such as free memory ratio that can be measured.
If the metric value of the system is higher than the value in ``high`` or lower
than ``low`` at the memoent, the scheme is deactivated. If the value is lower
than ``mid``, the scheme is activated.
The directory for the :ref:`watermarks <damon_design_damos_watermarks>` of the
given DAMON-based operation scheme.
Under the watermarks directory, five files (``metric``, ``interval_us``,
``high``, ``mid``, and ``low``) for setting each value exist. You can set and
``high``, ``mid``, and ``low``) for setting the metric, the time interval
between check of the metric, and the three watermarks exist. You can set and
get the five values by writing to the files, respectively.
Keywords and meanings of those that can be written to the ``metric`` file are
@ -367,12 +351,8 @@ The ``interval`` should written in microseconds unit.
schemes/<N>/filters/
--------------------
Users could know something more than the kernel for specific types of memory.
In the case, users could do their own management for the memory and hence
doesn't want DAMOS bothers that. Users could limit DAMOS by setting the access
pattern of the scheme and/or the monitoring regions for the purpose, but that
can be inefficient in some cases. In such cases, users could set non-access
pattern driven filters using files in this directory.
The directory for the :ref:`filters <damon_design_damos_filters>` of the given
DAMON-based operation scheme.
In the beginning, this directory has only one file, ``nr_filters``. Writing a
number (``N``) to the file creates the number of child directories named ``0``
@ -432,13 +412,17 @@ starting from ``0`` under this directory. Each directory contains files
exposing detailed information about each of the memory region that the
corresponding scheme's ``action`` has tried to be applied under this directory,
during next :ref:`aggregation interval <sysfs_monitoring_attrs>`. The
information includes address range, ``nr_accesses``, , and ``age`` of the
region.
information includes address range, ``nr_accesses``, and ``age`` of the region.
The directories will be removed when another special keyword,
``clear_schemes_tried_regions``, is written to the relevant
``kdamonds/<N>/state`` file.
The expected usage of this directory is investigations of schemes' behaviors,
and query-like efficient data access monitoring results retrievals. For the
latter use case, in particular, users can set the ``action`` as ``stat`` and
set the ``access pattern`` as their interested pattern that they want to query.
tried_regions/<N>/
------------------
@ -600,15 +584,10 @@ update.
Schemes
-------
For usual DAMON-based data access aware memory management optimizations, users
would simply want the system to apply a memory management action to a memory
region of a specific access pattern. DAMON receives such formalized operation
schemes from the user and applies those to the target processes.
Users can get and set the schemes by reading from and writing to ``schemes``
debugfs file. Reading the file also shows the statistics of each scheme. To
the file, each of the schemes should be represented in each line in below
form::
Users can get and set the DAMON-based operation :ref:`schemes
<damon_design_damos>` by reading from and writing to ``schemes`` debugfs file.
Reading the file also shows the statistics of each scheme. To the file, each
of the schemes should be represented in each line in below form::
<target access pattern> <action> <quota> <watermarks>
@ -617,8 +596,9 @@ You can disable schemes by simply writing an empty string to the file.
Target Access Pattern
~~~~~~~~~~~~~~~~~~~~~
The ``<target access pattern>`` is constructed with three ranges in below
form::
The target access :ref:`pattern <damon_design_damos_access_pattern>` of the
scheme. The ``<target access pattern>`` is constructed with three ranges in
below form::
min-size max-size min-acc max-acc min-age max-age
@ -631,9 +611,9 @@ closed interval.
Action
~~~~~~
The ``<action>`` is a predefined integer for memory management actions, which
DAMON will apply to the regions having the target access pattern. The
supported numbers and their meanings are as below.
The ``<action>`` is a predefined integer for memory management :ref:`actions
<damon_design_damos_action>`. The supported numbers and their meanings are as
below.
- 0: Call ``madvise()`` for the region with ``MADV_WILLNEED``. Ignored if
``target`` is ``paddr``.
@ -649,10 +629,8 @@ supported numbers and their meanings are as below.
Quota
~~~~~
Optimal ``target access pattern`` for each ``action`` is workload dependent, so
not easy to find. Worse yet, setting a scheme of some action too aggressive
can cause severe overhead. To avoid such overhead, users can limit time and
size quota for the scheme via the ``<quota>`` in below form::
Users can set the :ref:`quotas <damon_design_damos_quotas>` of the given scheme
via the ``<quota>`` in below form::
<ms> <sz> <reset interval> <priority weights>
@ -662,19 +640,17 @@ the action to memory regions of the ``target access pattern`` within the
``<sz>`` bytes of memory regions within the ``<reset interval>``. Setting both
``<ms>`` and ``<sz>`` zero disables the quota limits.
When the quota limit is expected to be exceeded, DAMON prioritizes found memory
regions of the ``target access pattern`` based on their size, access frequency,
and age. For personalized prioritization, users can set the weights for the
three properties in ``<priority weights>`` in below form::
For the :ref:`prioritization <damon_design_damos_quotas_prioritization>`, users
can set the weights for the three properties in ``<priority weights>`` in below
form::
<size weight> <access frequency weight> <age weight>
Watermarks
~~~~~~~~~~
Some schemes would need to run based on current value of the system's specific
metrics like free memory ratio. For such cases, users can specify watermarks
for the condition.::
Users can specify :ref:`watermarks <damon_design_damos_watermarks>` of the
given scheme via ``<watermarks>`` in below form::
<metric> <check interval> <high mark> <middle mark> <low mark>
@ -797,10 +773,12 @@ root directory only.
Tracepoint for Monitoring Results
=================================
DAMON provides the monitoring results via a tracepoint,
``damon:damon_aggregated``. While the monitoring is turned on, you could
record the tracepoint events and show results using tracepoint supporting tools
like ``perf``. For example::
Users can get the monitoring results via the :ref:`tried_regions
<sysfs_schemes_tried_regions>` or a tracepoint, ``damon:damon_aggregated``.
While the tried regions directory is useful for getting a snapshot, the
tracepoint is useful for getting a full record of the results. While the
monitoring is turned on, you could record the tracepoint events and show
results using tracepoint supporting tools like ``perf``. For example::
# echo on > monitor_on
# perf record -e damon:damon_aggregated &

View File

@ -56,14 +56,14 @@ Example usage of perf::
For HiSilicon uncore PMU v2 whose identifier is 0x30, the topology is the same
as PMU v1, but some new functions are added to the hardware.
(a) L3C PMU supports filtering by core/thread within the cluster which can be
1. L3C PMU supports filtering by core/thread within the cluster which can be
specified as a bitmap::
$# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_core=0x3/ sleep 5
This will only count the operations from core/thread 0 and 1 in this cluster.
(b) Tracetag allow the user to chose to count only read, write or atomic
2. Tracetag allow the user to chose to count only read, write or atomic
operations via the tt_req parameeter in perf. The default value counts all
operations. tt_req is 3bits, 3'b100 represents read operations, 3'b101
represents write operations, 3'b110 represents atomic store operations and
@ -73,14 +73,16 @@ represents write operations, 3'b110 represents atomic store operations and
This will only count the read operations in this cluster.
(c) Datasrc allows the user to check where the data comes from. It is 5 bits.
3. Datasrc allows the user to check where the data comes from. It is 5 bits.
Some important codes are as follows:
5'b00001: comes from L3C in this die;
5'b01000: comes from L3C in the cross-die;
5'b01001: comes from L3C which is in another socket;
5'b01110: comes from the local DDR;
5'b01111: comes from the cross-die DDR;
5'b10000: comes from cross-socket DDR;
- 5'b00001: comes from L3C in this die;
- 5'b01000: comes from L3C in the cross-die;
- 5'b01001: comes from L3C which is in another socket;
- 5'b01110: comes from the local DDR;
- 5'b01111: comes from the cross-die DDR;
- 5'b10000: comes from cross-socket DDR;
etc, it is mainly helpful to find that the data source is nearest from the CPU
cores. If datasrc_cfg is used in the multi-chips, the datasrc_skt shall be
configured in perf command::
@ -88,15 +90,25 @@ configured in perf command::
$# perf stat -a -e hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xE/,
hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xF/ sleep 5
(d)Some HiSilicon SoCs encapsulate multiple CPU and IO dies. Each CPU die
4. Some HiSilicon SoCs encapsulate multiple CPU and IO dies. Each CPU die
contains several Compute Clusters (CCLs). The I/O dies are called Super I/O
clusters (SICL) containing multiple I/O clusters (ICLs). Each CCL/ICL in the
SoC has a unique ID. Each ID is 11bits, include a 6-bit SCCL-ID and 5-bit
CCL/ICL-ID. For I/O die, the ICL-ID is followed by:
5'b00000: I/O_MGMT_ICL;
5'b00001: Network_ICL;
5'b00011: HAC_ICL;
5'b10000: PCIe_ICL;
- 5'b00000: I/O_MGMT_ICL;
- 5'b00001: Network_ICL;
- 5'b00011: HAC_ICL;
- 5'b10000: PCIe_ICL;
5. uring_channel: UC PMU events 0x47~0x59 supports filtering by tx request
uring channel. It is 2 bits. Some important codes are as follows:
- 2'b11: count the events which sent to the uring_ext (MATA) channel;
- 2'b01: is the same as 2'b11;
- 2'b10: count the events which sent to the uring (non-MATA) channel;
- 2'b00: default value, count the events which sent to the both uring and
uring_ext channel;
Users could configure IDs to count data come from specific CCL/ICL, by setting
srcid_cmd & srcid_msk, and data desitined for specific CCL/ICL by setting

View File

@ -949,7 +949,7 @@ user space can read performance monitor counter registers directly.
The default value is 0 (access disabled).
See Documentation/arm64/perf.rst for more information.
See Documentation/arch/arm64/perf.rst for more information.
pid_max

View File

@ -386,8 +386,8 @@ Default : 0 (for compatibility reasons)
txrehash
--------
Controls default hash rethink behaviour on listening socket when SO_TXREHASH
option is set to SOCK_TXREHASH_DEFAULT (i. e. not overridden by setsockopt).
Controls default hash rethink behaviour on socket when SO_TXREHASH option is set
to SOCK_TXREHASH_DEFAULT (i. e. not overridden by setsockopt).
If set to 1 (default), hash rethink is performed on listening socket.
If set to 0, hash rethink is not performed.

View File

@ -17,16 +17,37 @@ For ACPI on arm64, tables also fall into the following categories:
- Recommended: BERT, EINJ, ERST, HEST, PCCT, SSDT
- Optional: BGRT, CPEP, CSRT, DBG2, DRTM, ECDT, FACS, FPDT, IBFT,
IORT, MCHI, MPST, MSCT, NFIT, PMTT, RASF, SBST, SLIT, SPMI, SRAT,
STAO, TCPA, TPM2, UEFI, XENV
- Optional: AGDI, BGRT, CEDT, CPEP, CSRT, DBG2, DRTM, ECDT, FACS, FPDT,
HMAT, IBFT, IORT, MCHI, MPAM, MPST, MSCT, NFIT, PMTT, PPTT, RASF, SBST,
SDEI, SLIT, SPMI, SRAT, STAO, TCPA, TPM2, UEFI, XENV
- Not supported: BOOT, DBGP, DMAR, ETDT, HPET, IVRS, LPIT, MSDM, OEMx,
PSDT, RSDT, SLIC, WAET, WDAT, WDRT, WPBT
- Not supported: AEST, APMT, BOOT, DBGP, DMAR, ETDT, HPET, IVRS, LPIT,
MSDM, OEMx, PDTT, PSDT, RAS2, RSDT, SLIC, WAET, WDAT, WDRT, WPBT
====== ========================================================================
Table Usage for ARMv8 Linux
====== ========================================================================
AEST Signature Reserved (signature == "AEST")
**Arm Error Source Table**
This table informs the OS of any error nodes in the system that are
compliant with the Arm RAS architecture.
AGDI Signature Reserved (signature == "AGDI")
**Arm Generic diagnostic Dump and Reset Device Interface Table**
This table describes a non-maskable event, that is used by the platform
firmware, to request the OS to generate a diagnostic dump and reset the device.
APMT Signature Reserved (signature == "APMT")
**Arm Performance Monitoring Table**
This table describes the properties of PMU support implmented by
components in the system.
BERT Section 18.3 (signature == "BERT")
**Boot Error Record Table**
@ -47,6 +68,13 @@ BGRT Section 5.2.22 (signature == "BGRT")
Optional, not currently supported, with no real use-case for an
ARM server.
CEDT Signature Reserved (signature == "CEDT")
**CXL Early Discovery Table**
This table allows the OS to discover any CXL Host Bridges and the Host
Bridge registers.
CPEP Section 5.2.18 (signature == "CPEP")
**Corrected Platform Error Polling table**
@ -184,6 +212,15 @@ HEST Section 18.3.2 (signature == "HEST")
Must be supplied if RAS support is provided by the platform. It
is recommended this table be supplied.
HMAT Section 5.2.28 (signature == "HMAT")
**Heterogeneous Memory Attribute Table**
This table describes the memory attributes, such as memory side cache
attributes and bandwidth and latency details, related to Memory Proximity
Domains. The OS uses this information to optimize the system memory
configuration.
HPET Signature Reserved (signature == "HPET")
**High Precision Event timer Table**
@ -241,6 +278,13 @@ MCHI Signature Reserved (signature == "MCHI")
Optional, not currently supported.
MPAM Signature Reserved (signature == "MPAM")
**Memory Partitioning And Monitoring table**
This table allows the OS to discover the MPAM controls implemented by
the subsystems.
MPST Section 5.2.21 (signature == "MPST")
**Memory Power State Table**
@ -281,18 +325,39 @@ PCCT Section 14.1 (signature == "PCCT)
Recommend for use on arm64; use of PCC is recommended when using CPPC
to control performance and power for platform processors.
PDTT Section 5.2.29 (signature == "PDTT")
**Platform Debug Trigger Table**
This table describes PCC channels used to gather debug logs of
non-architectural features.
PMTT Section 5.2.21.12 (signature == "PMTT")
**Platform Memory Topology Table**
Optional, not currently supported.
PPTT Section 5.2.30 (signature == "PPTT")
**Processor Properties Topology Table**
This table provides the processor and cache topology.
PSDT Section 5.2.11.3 (signature == "PSDT")
**Persistent System Description Table**
Obsolete table, will not be supported.
RAS2 Section 5.2.21 (signature == "RAS2")
**RAS Features 2 table**
This table provides interfaces for the RAS capabilities implemented in
the platform.
RASF Section 5.2.20 (signature == "RASF")
**RAS Feature table**
@ -318,6 +383,12 @@ SBST Section 5.2.14 (signature == "SBST")
Optional, not currently supported.
SDEI Signature Reserved (signature == "SDEI")
**Software Delegated Exception Interface table**
This table advertises the presence of the SDEI interface.
SLIC Signature Reserved (signature == "SLIC")
**Software LIcensing table**

View File

@ -1,40 +1,41 @@
=====================
ACPI on ARMv8 Servers
=====================
===================
ACPI on Arm systems
===================
ACPI can be used for ARMv8 general purpose servers designed to follow
the ARM SBSA (Server Base System Architecture) [0] and SBBR (Server
Base Boot Requirements) [1] specifications. Please note that the SBBR
can be retrieved simply by visiting [1], but the SBSA is currently only
available to those with an ARM login due to ARM IP licensing concerns.
ACPI can be used for Armv8 and Armv9 systems designed to follow
the BSA (Arm Base System Architecture) [0] and BBR (Arm
Base Boot Requirements) [1] specifications. Both BSA and BBR are publicly
accessible documents.
Arm Servers, in addition to being BSA compliant, comply with a set
of rules defined in SBSA (Server Base System Architecture) [2].
The ARMv8 kernel implements the reduced hardware model of ACPI version
The Arm kernel implements the reduced hardware model of ACPI version
5.1 or later. Links to the specification and all external documents
it refers to are managed by the UEFI Forum. The specification is
available at http://www.uefi.org/specifications and documents referenced
by the specification can be found via http://www.uefi.org/acpi.
If an ARMv8 system does not meet the requirements of the SBSA and SBBR,
If an Arm system does not meet the requirements of the BSA and BBR,
or cannot be described using the mechanisms defined in the required ACPI
specifications, then ACPI may not be a good fit for the hardware.
While the documents mentioned above set out the requirements for building
industry-standard ARMv8 servers, they also apply to more than one operating
industry-standard Arm systems, they also apply to more than one operating
system. The purpose of this document is to describe the interaction between
ACPI and Linux only, on an ARMv8 system -- that is, what Linux expects of
ACPI and Linux only, on an Arm system -- that is, what Linux expects of
ACPI and what ACPI can expect of Linux.
Why ACPI on ARM?
Why ACPI on Arm?
----------------
Before examining the details of the interface between ACPI and Linux, it is
useful to understand why ACPI is being used. Several technologies already
exist in Linux for describing non-enumerable hardware, after all. In this
section we summarize a blog post [2] from Grant Likely that outlines the
reasoning behind ACPI on ARMv8 servers. Actually, we snitch a good portion
section we summarize a blog post [3] from Grant Likely that outlines the
reasoning behind ACPI on Arm systems. Actually, we snitch a good portion
of the summary text almost directly, to be honest.
The short form of the rationale for ACPI on ARM is:
The short form of the rationale for ACPI on Arm is:
- ACPIs byte code (AML) allows the platform to encode hardware behavior,
while DT explicitly does not support this. For hardware vendors, being
@ -47,7 +48,7 @@ The short form of the rationale for ACPI on ARM is:
- In the enterprise server environment, ACPI has established bindings (such
as for RAS) which are currently used in production systems. DT does not.
Such bindings could be defined in DT at some point, but doing so means ARM
Such bindings could be defined in DT at some point, but doing so means Arm
and x86 would end up using completely different code paths in both firmware
and the kernel.
@ -108,7 +109,7 @@ recent version of the kernel.
Relationship with Device Tree
-----------------------------
ACPI support in drivers and subsystems for ARMv8 should never be mutually
ACPI support in drivers and subsystems for Arm should never be mutually
exclusive with DT support at compile time.
At boot time the kernel will only use one description method depending on
@ -121,11 +122,11 @@ time).
Booting using ACPI tables
-------------------------
The only defined method for passing ACPI tables to the kernel on ARMv8
The only defined method for passing ACPI tables to the kernel on Arm
is via the UEFI system configuration table. Just so it is explicit, this
means that ACPI is only supported on platforms that boot via UEFI.
When an ARMv8 system boots, it can either have DT information, ACPI tables,
When an Arm system boots, it can either have DT information, ACPI tables,
or in some very unusual cases, both. If no command line parameters are used,
the kernel will try to use DT for device enumeration; if there is no DT
present, the kernel will try to use ACPI tables, but only if they are present.
@ -169,7 +170,7 @@ hardware reduced mode must be set to zero.
For the ACPI core to operate properly, and in turn provide the information
the kernel needs to configure devices, it expects to find the following
tables (all section numbers refer to the ACPI 6.1 specification):
tables (all section numbers refer to the ACPI 6.5 specification):
- RSDP (Root System Description Pointer), section 5.2.5
@ -184,20 +185,76 @@ tables (all section numbers refer to the ACPI 6.1 specification):
- GTDT (Generic Timer Description Table), section 5.2.24
- PPTT (Processor Properties Topology Table), section 5.2.30
- DBG2 (DeBuG port table 2), section 5.2.6, specifically Table 5-6.
- APMT (Arm Performance Monitoring unit Table), section 5.2.6, specifically Table 5-6.
- AGDI (Arm Generic diagnostic Dump and Reset Device Interface Table), section 5.2.6, specifically Table 5-6.
- If PCI is supported, the MCFG (Memory mapped ConFiGuration
Table), section 5.2.6, specifically Table 5-31.
Table), section 5.2.6, specifically Table 5-6.
- If booting without a console=<device> kernel parameter is
supported, the SPCR (Serial Port Console Redirection table),
section 5.2.6, specifically Table 5-31.
section 5.2.6, specifically Table 5-6.
- If necessary to describe the I/O topology, SMMUs and GIC ITSs,
the IORT (Input Output Remapping Table, section 5.2.6, specifically
Table 5-31).
Table 5-6).
- If NUMA is supported, the following tables are required:
- SRAT (System Resource Affinity Table), section 5.2.16
- SLIT (System Locality distance Information Table), section 5.2.17
- If NUMA is supported, and the system contains heterogeneous memory,
the HMAT (Heterogeneous Memory Attribute Table), section 5.2.28.
- If the ACPI Platform Error Interfaces are required, the following
tables are conditionally required:
- BERT (Boot Error Record Table, section 18.3.1)
- EINJ (Error INJection table, section 18.6.1)
- ERST (Error Record Serialization Table, section 18.5)
- HEST (Hardware Error Source Table, section 18.3.2)
- SDEI (Software Delegated Exception Interface table, section 5.2.6,
specifically Table 5-6)
- AEST (Arm Error Source Table, section 5.2.6,
specifically Table 5-6)
- RAS2 (ACPI RAS2 feature table, section 5.2.21)
- If the system contains controllers using PCC channel, the
PCCT (Platform Communications Channel Table), section 14.1
- If the system contains a controller to capture board-level system state,
and communicates with the host via PCC, the PDTT (Platform Debug Trigger
Table), section 5.2.29.
- If NVDIMM is supported, the NFIT (NVDIMM Firmware Interface Table), section 5.2.26
- If video framebuffer is present, the BGRT (Boot Graphics Resource Table), section 5.2.23
- If IPMI is implemented, the SPMI (Server Platform Management Interface),
section 5.2.6, specifically Table 5-6.
- If the system contains a CXL Host Bridge, the CEDT (CXL Early Discovery
Table), section 5.2.6, specifically Table 5-6.
- If the system supports MPAM, the MPAM (Memory Partitioning And Monitoring table), section 5.2.6,
specifically Table 5-6.
- If the system lacks persistent storage, the IBFT (ISCSI Boot Firmware
Table), section 5.2.6, specifically Table 5-6.
- If NUMA is supported, the SRAT (System Resource Affinity Table)
and SLIT (System Locality distance Information Table), sections
5.2.16 and 5.2.17, respectively.
If the above tables are not all present, the kernel may or may not be
able to boot properly since it may not be able to configure all of the
@ -269,16 +326,14 @@ Drivers should look for device properties in the _DSD object ONLY; the _DSD
object is described in the ACPI specification section 6.2.5, but this only
describes how to define the structure of an object returned via _DSD, and
how specific data structures are defined by specific UUIDs. Linux should
only use the _DSD Device Properties UUID [5]:
only use the _DSD Device Properties UUID [4]:
- UUID: daffd814-6eba-4d8c-8a91-bc9bbf4aa301
- https://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf
The UEFI Forum provides a mechanism for registering device properties [4]
so that they may be used across all operating systems supporting ACPI.
Device properties that have not been registered with the UEFI Forum should
not be used.
Common device properties can be registered by creating a pull request to [4] so
that they may be used across all operating systems supporting ACPI.
Device properties that have not been registered with the UEFI Forum can be used
but not as "uefi-" common properties.
Before creating new device properties, check to be sure that they have not
been defined before and either registered in the Linux kernel documentation
@ -306,7 +361,7 @@ process.
Once registration and review have been completed, the kernel provides an
interface for looking up device properties in a manner independent of
whether DT or ACPI is being used. This API should be used [6]; it can
whether DT or ACPI is being used. This API should be used [5]; it can
eliminate some duplication of code paths in driver probing functions and
discourage divergence between DT bindings and ACPI device properties.
@ -448,15 +503,15 @@ ASWG
----
The ACPI specification changes regularly. During the year 2014, for instance,
version 5.1 was released and version 6.0 substantially completed, with most of
the changes being driven by ARM-specific requirements. Proposed changes are
the changes being driven by Arm-specific requirements. Proposed changes are
presented and discussed in the ASWG (ACPI Specification Working Group) which
is a part of the UEFI Forum. The current version of the ACPI specification
is 6.1 release in January 2016.
is 6.5 release in August 2022.
Participation in this group is open to all UEFI members. Please see
http://www.uefi.org/workinggroup for details on group membership.
It is the intent of the ARMv8 ACPI kernel code to follow the ACPI specification
It is the intent of the Arm ACPI kernel code to follow the ACPI specification
as closely as possible, and to only implement functionality that complies with
the released standards from UEFI ASWG. As a practical matter, there will be
vendors that provide bad ACPI tables or violate the standards in some way.
@ -470,12 +525,12 @@ likely be willing to assist in submitting ECRs.
Linux Code
----------
Individual items specific to Linux on ARM, contained in the Linux
Individual items specific to Linux on Arm, contained in the Linux
source code, are in the list that follows:
ACPI_OS_NAME
This macro defines the string to be returned when
an ACPI method invokes the _OS method. On ARM64
an ACPI method invokes the _OS method. On Arm
systems, this macro will be "Linux" by default.
The command line parameter acpi_os=<string>
can be used to set it to some other value. The
@ -485,36 +540,28 @@ ACPI_OS_NAME
ACPI Objects
------------
Detailed expectations for ACPI tables and object are listed in the file
Documentation/arm64/acpi_object_usage.rst.
Documentation/arch/arm64/acpi_object_usage.rst.
References
----------
[0] http://silver.arm.com
document ARM-DEN-0029, or newer:
"Server Base System Architecture", version 2.3, dated 27 Mar 2014
[0] https://developer.arm.com/documentation/den0094/latest
document Arm-DEN-0094: "Arm Base System Architecture", version 1.0C, dated 6 Oct 2022
[1] http://infocenter.arm.com/help/topic/com.arm.doc.den0044a/Server_Base_Boot_Requirements.pdf
Document ARM-DEN-0044A, or newer: "Server Base Boot Requirements, System
Software on ARM Platforms", dated 16 Aug 2014
[1] https://developer.arm.com/documentation/den0044/latest
Document Arm-DEN-0044: "Arm Base Boot Requirements", version 2.0G, dated 15 Apr 2022
[2] http://www.secretlab.ca/archives/151,
[2] https://developer.arm.com/documentation/den0029/latest
Document Arm-DEN-0029: "Arm Server Base System Architecture", version 7.1, dated 06 Oct 2022
[3] http://www.secretlab.ca/archives/151,
10 Jan 2015, Copyright (c) 2015,
Linaro Ltd., written by Grant Likely.
[3] AMD ACPI for Seattle platform documentation
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Seattle_ACPI_Guide.pdf
[4] _DSD (Device Specific Data) Implementation Guide
https://github.com/UEFI/DSD-Guide/blob/main/dsd-guide.pdf
[4] http://www.uefi.org/acpi
please see the link for the "ACPI _DSD Device
Property Registry Instructions"
[5] http://www.uefi.org/acpi
please see the link for the "_DSD (Device
Specific Data) Implementation Guide"
[6] Kernel code for the unified device
[5] Kernel code for the unified device
property interface can be found in
include/linux/property.h and drivers/base/property.c.

View File

@ -379,6 +379,38 @@ Before jumping into the kernel, the following conditions must be met:
- SMCR_EL2.EZT0 (bit 30) must be initialised to 0b1.
For CPUs with Memory Copy and Memory Set instructions (FEAT_MOPS):
- If the kernel is entered at EL1 and EL2 is present:
- HCRX_EL2.MSCEn (bit 11) must be initialised to 0b1.
For CPUs with the Extended Translation Control Register feature (FEAT_TCR2):
- If EL3 is present:
- SCR_EL3.TCR2En (bit 43) must be initialised to 0b1.
- If the kernel is entered at EL1 and EL2 is present:
- HCRX_EL2.TCR2En (bit 14) must be initialised to 0b1.
For CPUs with the Stage 1 Permission Indirection Extension feature (FEAT_S1PIE):
- If EL3 is present:
- SCR_EL3.PIEn (bit 45) must be initialised to 0b1.
- If the kernel is entered at EL1 and EL2 is present:
- HFGRTR_EL2.nPIR_EL1 (bit 58) must be initialised to 0b1.
- HFGWTR_EL2.nPIR_EL1 (bit 58) must be initialised to 0b1.
- HFGRTR_EL2.nPIRE0_EL1 (bit 57) must be initialised to 0b1.
- HFGRWR_EL2.nPIRE0_EL1 (bit 57) must be initialised to 0b1.
The requirements described above for CPU mode, caches, MMUs, architected
timers, coherency and system registers apply to all CPUs. All CPUs must
enter the kernel in the same exception level. Where the values documented

View File

@ -288,6 +288,8 @@ infrastructure:
+------------------------------+---------+---------+
| Name | bits | visible |
+------------------------------+---------+---------+
| MOPS | [19-16] | y |
+------------------------------+---------+---------+
| RPRES | [7-4] | y |
+------------------------------+---------+---------+
| WFXT | [3-0] | y |

View File

@ -102,7 +102,7 @@ HWCAP_ASIMDHP
HWCAP_CPUID
EL0 access to certain ID registers is available, to the extent
described by Documentation/arm64/cpu-feature-registers.rst.
described by Documentation/arch/arm64/cpu-feature-registers.rst.
These ID registers may imply the availability of features.
@ -163,12 +163,12 @@ HWCAP_SB
HWCAP_PACA
Functionality implied by ID_AA64ISAR1_EL1.APA == 0b0001 or
ID_AA64ISAR1_EL1.API == 0b0001, as described by
Documentation/arm64/pointer-authentication.rst.
Documentation/arch/arm64/pointer-authentication.rst.
HWCAP_PACG
Functionality implied by ID_AA64ISAR1_EL1.GPA == 0b0001 or
ID_AA64ISAR1_EL1.GPI == 0b0001, as described by
Documentation/arm64/pointer-authentication.rst.
Documentation/arch/arm64/pointer-authentication.rst.
HWCAP2_DCPODP
Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0010.
@ -226,7 +226,7 @@ HWCAP2_BTI
HWCAP2_MTE
Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0010, as described
by Documentation/arm64/memory-tagging-extension.rst.
by Documentation/arch/arm64/memory-tagging-extension.rst.
HWCAP2_ECV
Functionality implied by ID_AA64MMFR0_EL1.ECV == 0b0001.
@ -239,11 +239,11 @@ HWCAP2_RPRES
HWCAP2_MTE3
Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0011, as described
by Documentation/arm64/memory-tagging-extension.rst.
by Documentation/arch/arm64/memory-tagging-extension.rst.
HWCAP2_SME
Functionality implied by ID_AA64PFR1_EL1.SME == 0b0001, as described
by Documentation/arm64/sme.rst.
by Documentation/arch/arm64/sme.rst.
HWCAP2_SME_I16I64
Functionality implied by ID_AA64SMFR0_EL1.I16I64 == 0b1111.
@ -302,6 +302,9 @@ HWCAP2_SMEB16B16
HWCAP2_SMEF16F16
Functionality implied by ID_AA64SMFR0_EL1.F16F16 == 0b1
HWCAP2_MOPS
Functionality implied by ID_AA64ISAR2_EL1.MOPS == 0b0001.
4. Unused AT_HWCAP bits
-----------------------

View File

@ -15,11 +15,13 @@ ARM64 Architecture
cpu-feature-registers
elf_hwcaps
hugetlbpage
kdump
legacy_instructions
memory
memory-tagging-extension
perf
pointer-authentication
ptdump
silicon-errata
sme
sve

View File

@ -0,0 +1,92 @@
=======================================
crashkernel memory reservation on arm64
=======================================
Author: Baoquan He <bhe@redhat.com>
Kdump mechanism is used to capture a corrupted kernel vmcore so that
it can be subsequently analyzed. In order to do this, a preliminarily
reserved memory is needed to pre-load the kdump kernel and boot such
kernel if corruption happens.
That reserved memory for kdump is adapted to be able to minimally
accommodate the kdump kernel and the user space programs needed for the
vmcore collection.
Kernel parameter
================
Through the kernel parameters below, memory can be reserved accordingly
during the early stage of the first kernel booting so that a continuous
large chunk of memomy can be found. The low memory reservation needs to
be considered if the crashkernel is reserved from the high memory area.
- crashkernel=size@offset
- crashkernel=size
- crashkernel=size,high crashkernel=size,low
Low memory and high memory
==========================
For kdump reservations, low memory is the memory area under a specific
limit, usually decided by the accessible address bits of the DMA-capable
devices needed by the kdump kernel to run. Those devices not related to
vmcore dumping can be ignored. On arm64, the low memory upper bound is
not fixed: it is 1G on the RPi4 platform but 4G on most other systems.
On special kernels built with CONFIG_ZONE_(DMA|DMA32) disabled, the
whole system RAM is low memory. Outside of the low memory described
above, the rest of system RAM is considered high memory.
Implementation
==============
1) crashkernel=size@offset
--------------------------
The crashkernel memory must be reserved at the user-specified region or
fail if already occupied.
2) crashkernel=size
-------------------
The crashkernel memory region will be reserved in any available position
according to the search order:
Firstly, the kernel searches the low memory area for an available region
with the specified size.
If searching for low memory fails, the kernel falls back to searching
the high memory area for an available region of the specified size. If
the reservation in high memory succeeds, a default size reservation in
the low memory will be done. Currently the default size is 128M,
sufficient for the low memory needs of the kdump kernel.
Note: crashkernel=size is the recommended option for crashkernel kernel
reservations. The user would not need to know the system memory layout
for a specific platform.
3) crashkernel=size,high crashkernel=size,low
---------------------------------------------
crashkernel=size,(high|low) are an important supplement to
crashkernel=size. They allows the user to specify how much memory needs
to be allocated from the high memory and low memory respectively. On
many systems the low memory is precious and crashkernel reservations
from this area should be kept to a minimum.
To reserve memory for crashkernel=size,high, searching is first
attempted from the high memory region. If the reservation succeeds, the
low memory reservation will be done subsequently.
If reservation from the high memory failed, the kernel falls back to
searching the low memory with the specified size in crashkernel=,high.
If it succeeds, no further reservation for low memory is needed.
Notes:
- If crashkernel=,low is not specified, the default low memory
reservation will be done automatically.
- if crashkernel=0,low is specified, it means that the low memory
reservation is omitted intentionally.

View File

@ -221,7 +221,7 @@ programs should not retry in case of a non-zero system call return.
``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged
address ABI control and MTE configuration of a process as per the
``prctl()`` options described in
Documentation/arm64/tagged-address-abi.rst and above. The corresponding
Documentation/arch/arm64/tagged-address-abi.rst and above. The corresponding
``regset`` is 1 element of 8 bytes (``sizeof(long))``).
Core dump support

View File

@ -33,8 +33,8 @@ AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit)::
0000000000000000 0000ffffffffffff 256TB user
ffff000000000000 ffff7fffffffffff 128TB kernel logical memory map
[ffff600000000000 ffff7fffffffffff] 32TB [kasan shadow region]
ffff800000000000 ffff800007ffffff 128MB modules
ffff800008000000 fffffbffefffffff 124TB vmalloc
ffff800000000000 ffff80007fffffff 2GB modules
ffff800080000000 fffffbffefffffff 124TB vmalloc
fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down)
fffffbfffe000000 fffffbfffe7fffff 8MB [guard region]
fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space
@ -50,8 +50,8 @@ AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support):
0000000000000000 000fffffffffffff 4PB user
fff0000000000000 ffff7fffffffffff ~4PB kernel logical memory map
[fffd800000000000 ffff7fffffffffff] 512TB [kasan shadow region]
ffff800000000000 ffff800007ffffff 128MB modules
ffff800008000000 fffffbffefffffff 124TB vmalloc
ffff800000000000 ffff80007fffffff 2GB modules
ffff800080000000 fffffbffefffffff 124TB vmalloc
fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down)
fffffbfffe000000 fffffbfffe7fffff 8MB [guard region]
fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space

View File

@ -0,0 +1,96 @@
======================
Kernel page table dump
======================
ptdump is a debugfs interface that provides a detailed dump of the
kernel page tables. It offers a comprehensive overview of the kernel
virtual memory layout as well as the attributes associated with the
various regions in a human-readable format. It is useful to dump the
kernel page tables to verify permissions and memory types. Examining the
page table entries and permissions helps identify potential security
vulnerabilities such as mappings with overly permissive access rights or
improper memory protections.
Memory hotplug allows dynamic expansion or contraction of available
memory without requiring a system reboot. To maintain the consistency
and integrity of the memory management data structures, arm64 makes use
of the ``mem_hotplug_lock`` semaphore in write mode. Additionally, in
read mode, ``mem_hotplug_lock`` supports an efficient implementation of
``get_online_mems()`` and ``put_online_mems()``. These protect the
offlining of memory being accessed by the ptdump code.
In order to dump the kernel page tables, enable the following
configurations and mount debugfs::
CONFIG_GENERIC_PTDUMP=y
CONFIG_PTDUMP_CORE=y
CONFIG_PTDUMP_DEBUGFS=y
mount -t debugfs nodev /sys/kernel/debug
cat /sys/kernel/debug/kernel_page_tables
On analysing the output of ``cat /sys/kernel/debug/kernel_page_tables``
one can derive information about the virtual address range of the entry,
followed by size of the memory region covered by this entry, the
hierarchical structure of the page tables and finally the attributes
associated with each page. The page attributes provide information about
access permissions, execution capability, type of mapping such as leaf
level PTE or block level PGD, PMD and PUD, and access status of a page
within the kernel memory. Assessing these attributes can assist in
understanding the memory layout, access patterns and security
characteristics of the kernel pages.
Kernel virtual memory layout example::
start address end address size attributes
+---------------------------------------------------------------------------------------+
| ---[ Linear Mapping start ]---------------------------------------------------------- |
| .................. |
| 0xfff0000000000000-0xfff0000000210000 2112K PTE RW NX SHD AF UXN MEM/NORMAL-TAGGED |
| 0xfff0000000210000-0xfff0000001c00000 26560K PTE ro NX SHD AF UXN MEM/NORMAL |
| .................. |
| ---[ Linear Mapping end ]------------------------------------------------------------ |
+---------------------------------------------------------------------------------------+
| ---[ Modules start ]----------------------------------------------------------------- |
| .................. |
| 0xffff800000000000-0xffff800008000000 128M PTE |
| .................. |
| ---[ Modules end ]------------------------------------------------------------------- |
+---------------------------------------------------------------------------------------+
| ---[ vmalloc() area ]---------------------------------------------------------------- |
| .................. |
| 0xffff800008010000-0xffff800008200000 1984K PTE ro x SHD AF UXN MEM/NORMAL |
| 0xffff800008200000-0xffff800008e00000 12M PTE ro x SHD AF CON UXN MEM/NORMAL |
| .................. |
| ---[ vmalloc() end ]----------------------------------------------------------------- |
+---------------------------------------------------------------------------------------+
| ---[ Fixmap start ]------------------------------------------------------------------ |
| .................. |
| 0xfffffbfffdb80000-0xfffffbfffdb90000 64K PTE ro x SHD AF UXN MEM/NORMAL |
| 0xfffffbfffdb90000-0xfffffbfffdba0000 64K PTE ro NX SHD AF UXN MEM/NORMAL |
| .................. |
| ---[ Fixmap end ]-------------------------------------------------------------------- |
+---------------------------------------------------------------------------------------+
| ---[ PCI I/O start ]----------------------------------------------------------------- |
| .................. |
| 0xfffffbfffe800000-0xfffffbffff800000 16M PTE |
| .................. |
| ---[ PCI I/O end ]------------------------------------------------------------------- |
+---------------------------------------------------------------------------------------+
| ---[ vmemmap start ]----------------------------------------------------------------- |
| .................. |
| 0xfffffc0002000000-0xfffffc0002200000 2M PTE RW NX SHD AF UXN MEM/NORMAL |
| 0xfffffc0002200000-0xfffffc0020000000 478M PTE |
| .................. |
| ---[ vmemmap end ]------------------------------------------------------------------- |
+---------------------------------------------------------------------------------------+
``cat /sys/kernel/debug/kernel_page_tables`` output::
0xfff0000001c00000-0xfff0000080000000 2020M PTE RW NX SHD AF UXN MEM/NORMAL-TAGGED
0xfff0000080000000-0xfff0000800000000 30G PMD
0xfff0000800000000-0xfff0000800700000 7M PTE RW NX SHD AF UXN MEM/NORMAL-TAGGED
0xfff0000800700000-0xfff0000800710000 64K PTE ro NX SHD AF UXN MEM/NORMAL-TAGGED
0xfff0000800710000-0xfff0000880000000 2089920K PTE RW NX SHD AF UXN MEM/NORMAL-TAGGED
0xfff0000880000000-0xfff0040000000000 4062G PMD
0xfff0040000000000-0xffff800000000000 3964T PGD

View File

@ -140,6 +140,10 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | MMU-500 | #841119,826419 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | MMU-600 | #1076982,1209401| N/A |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | MMU-700 | #2268618,2812531| N/A |
+----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_845719 |
+----------------+-----------------+-----------------+-----------------------------+
@ -214,3 +218,7 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| Fujitsu | A64FX | E#010001 | FUJITSU_ERRATUM_010001 |
+----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
| ASR | ASR8601 | #8601001 | N/A |
+----------------+-----------------+-----------------+-----------------------------+

View File

@ -465,4 +465,4 @@ References
[2] arch/arm64/include/uapi/asm/ptrace.h
AArch64 Linux ptrace ABI definitions
[3] Documentation/arm64/cpu-feature-registers.rst
[3] Documentation/arch/arm64/cpu-feature-registers.rst

View File

@ -606,7 +606,7 @@ References
[2] arch/arm64/include/uapi/asm/ptrace.h
AArch64 Linux ptrace ABI definitions
[3] Documentation/arm64/cpu-feature-registers.rst
[3] Documentation/arch/arm64/cpu-feature-registers.rst
[4] ARM IHI0055C
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf

View File

@ -107,7 +107,7 @@ following behaviours are guaranteed:
A definition of the meaning of tagged pointers on AArch64 can be found
in Documentation/arm64/tagged-pointers.rst.
in Documentation/arch/arm64/tagged-pointers.rst.
3. AArch64 Tagged Address ABI Exceptions
-----------------------------------------

Some files were not shown because too many files have changed in this diff Show More