Merge remote-tracking branch 'wireless/main' into wireless-next

Pull in wireless/main content since some new code would
otherwise conflict with it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This commit is contained in:
Johannes Berg 2022-10-10 11:03:31 +02:00
commit dfd2d876b3
1238 changed files with 50305 additions and 13754 deletions

6
.gitignore vendored
View File

@ -37,6 +37,8 @@
*.o
*.o.*
*.patch
*.rmeta
*.rsi
*.s
*.so
*.so.dbg
@ -97,6 +99,7 @@ modules.order
!.gitattributes
!.gitignore
!.mailmap
!.rustfmt.toml
#
# Generated include files
@ -162,3 +165,6 @@ x509.genkey
# Documentation toolchain
sphinx_*/
# Rust analyzer configuration
/rust-project.json

12
.rustfmt.toml Normal file
View File

@ -0,0 +1,12 @@
edition = "2021"
newline_style = "Unix"
# Unstable options that help catching some mistakes in formatting and that we may want to enable
# when they become stable.
#
# They are kept here since they are useful to run from time to time.
#format_code_in_doc_comments = true
#reorder_impl_items = true
#comment_width = 100
#wrap_comments = true
#normalize_comments = true

View File

@ -3,7 +3,7 @@ Date: May 2011
KernelVersion: 3.0
Contact: Rafał Miłecki <zajec5@gmail.com>
Description:
Each BCMA core has it's manufacturer id. See
Each BCMA core has its manufacturer id. See
include/linux/bcma/bcma.h for possible values.
What: /sys/bus/bcma/devices/.../id

View File

@ -31,7 +31,7 @@ Description: 'FCoE Controller' instances on the fcoe bus.
1) Write interface name to ctlr_create 2) Configure the FCoE
Controller (ctlr_X) 3) Enable the FCoE Controller to begin
discovery and login. The FCoE Controller is destroyed by
writing it's name, i.e. ctlr_X to the ctlr_delete file.
writing its name, i.e. ctlr_X to the ctlr_delete file.
Attributes:

View File

@ -18,7 +18,7 @@ Description:
on the signal from which time of flight measurements are
taken.
The appropriate values to take is dependent on both the
sensor and it's operating environment:
sensor and its operating environment:
* as3935 (0-31 range)
18 = indoors (default)
14 = outdoors

View File

@ -296,7 +296,7 @@ Description: Processor frequency boosting control
This switch controls the boost setting for the whole system.
Boosting allows the CPU and the firmware to run at a frequency
beyond it's nominal limit.
beyond its nominal limit.
More details can be found in
Documentation/admin-guide/pm/cpufreq.rst

View File

@ -2,8 +2,8 @@ What: /sys/bus/platform/devices/ci_hdrc.0/role
Date: Mar 2017
Contact: Peter Chen <peter.chen@nxp.com>
Description:
It returns string "gadget" or "host" when read it, it indicates
current controller role.
When read, it returns string "gadget" or "host", indicating
the current controller role.
It will do role switch when write "gadget" or "host" to it.
It will do role switch when "gadget" or "host" is written to it.
Only controller at dual-role configuration supports writing.

View File

@ -152,7 +152,7 @@ Description:
case further investigation is required to determine which
device is causing the problem. Note that genuine RTC clock
values (such as when pm_trace has not been used), can still
match a device and output it's name here.
match a device and output its name here.
What: /sys/power/pm_async
Date: January 2009

View File

@ -66,8 +66,13 @@ over a rather long period of time, but improvements are always welcome!
As a rough rule of thumb, any dereference of an RCU-protected
pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(),
rcu_read_lock_sched(), or by the appropriate update-side lock.
Disabling of preemption can serve as rcu_read_lock_sched(), but
is less readable and prevents lockdep from detecting locking issues.
Explicit disabling of preemption (preempt_disable(), for example)
can serve as rcu_read_lock_sched(), but is less readable and
prevents lockdep from detecting locking issues.
Please not that you *cannot* rely on code known to be built
only in non-preemptible kernels. Such code can and will break,
especially in kernels built with CONFIG_PREEMPT_COUNT=y.
Letting RCU-protected pointers "leak" out of an RCU read-side
critical section is every bit as bad as letting them leak out
@ -185,6 +190,9 @@ over a rather long period of time, but improvements are always welcome!
5. If call_rcu() or call_srcu() is used, the callback function will
be called from softirq context. In particular, it cannot block.
If you need the callback to block, run that code in a workqueue
handler scheduled from the callback. The queue_rcu_work()
function does this for you in the case of call_rcu().
6. Since synchronize_rcu() can block, it cannot be called
from any sort of irq context. The same rule applies
@ -297,7 +305,8 @@ over a rather long period of time, but improvements are always welcome!
the machine.
d. Periodically invoke synchronize_rcu(), permitting a limited
number of updates per grace period.
number of updates per grace period. Better yet, periodically
invoke rcu_barrier() to wait for all outstanding callbacks.
The same cautions apply to call_srcu() and kfree_rcu().
@ -477,6 +486,6 @@ over a rather long period of time, but improvements are always welcome!
So if you need to wait for both an RCU grace period and for
all pre-existing call_rcu() callbacks, you will need to execute
both rcu_barrier() and synchronize_rcu(), if necessary, using
something like workqueues to to execute them concurrently.
something like workqueues to execute them concurrently.
See rcubarrier.rst for more information.

View File

@ -61,7 +61,7 @@ checking of rcu_dereference() primitives:
rcu_access_pointer(p):
Return the value of the pointer and omit all barriers,
but retain the compiler constraints that prevent duplicating
or coalescsing. This is useful when when testing the
or coalescsing. This is useful when testing the
value of the pointer itself, for example, against NULL.
The rcu_dereference_check() check expression can be any boolean

View File

@ -128,10 +128,16 @@ Follow these rules to keep your RCU code working properly:
This sort of comparison occurs frequently when scanning
RCU-protected circular linked lists.
Note that if checks for being within an RCU read-side
critical section are not required and the pointer is never
dereferenced, rcu_access_pointer() should be used in place
of rcu_dereference().
Note that if the pointer comparison is done outside
of an RCU read-side critical section, and the pointer
is never dereferenced, rcu_access_pointer() should be
used in place of rcu_dereference(). In most cases,
it is best to avoid accidental dereferences by testing
the rcu_access_pointer() return value directly, without
assigning it to a variable.
Within an RCU read-side critical section, there is little
reason to use rcu_access_pointer().
- The comparison is against a pointer that references memory
that was initialized "a long time ago." The reason

View File

@ -6,13 +6,15 @@ What is RCU? -- "Read, Copy, Update"
Please note that the "What is RCU?" LWN series is an excellent place
to start learning about RCU:
| 1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/
| 2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/
| 3. RCU part 3: the RCU API http://lwn.net/Articles/264090/
| 4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/
| 2010 Big API Table http://lwn.net/Articles/419086/
| 5. The RCU API, 2014 Edition http://lwn.net/Articles/609904/
| 2014 Big API Table http://lwn.net/Articles/609973/
| 1. What is RCU, Fundamentally? https://lwn.net/Articles/262464/
| 2. What is RCU? Part 2: Usage https://lwn.net/Articles/263130/
| 3. RCU part 3: the RCU API https://lwn.net/Articles/264090/
| 4. The RCU API, 2010 Edition https://lwn.net/Articles/418853/
| 2010 Big API Table https://lwn.net/Articles/419086/
| 5. The RCU API, 2014 Edition https://lwn.net/Articles/609904/
| 2014 Big API Table https://lwn.net/Articles/609973/
| 6. The RCU API, 2019 Edition https://lwn.net/Articles/777036/
| 2019 Big API Table https://lwn.net/Articles/777165/
What is RCU?
@ -915,13 +917,18 @@ which an RCU reference is held include:
The understanding that RCU provides a reference that only prevents a
change of type is particularly visible with objects allocated from a
slab cache marked ``SLAB_TYPESAFE_BY_RCU``. RCU operations may yield a
reference to an object from such a cache that has been concurrently
freed and the memory reallocated to a completely different object,
though of the same type. In this case RCU doesn't even protect the
identity of the object from changing, only its type. So the object
found may not be the one expected, but it will be one where it is safe
to take a reference or spinlock and then confirm that the identity
matches the expectations.
reference to an object from such a cache that has been concurrently freed
and the memory reallocated to a completely different object, though of
the same type. In this case RCU doesn't even protect the identity of the
object from changing, only its type. So the object found may not be the
one expected, but it will be one where it is safe to take a reference
(and then potentially acquiring a spinlock), allowing subsequent code
to check whether the identity matches expectations. It is tempting
to simply acquire the spinlock without first taking the reference, but
unfortunately any spinlock in a ``SLAB_TYPESAFE_BY_RCU`` object must be
initialized after each and every call to kmem_cache_alloc(), which renders
reference-free spinlock acquisition completely unsafe. Therefore, when
using ``SLAB_TYPESAFE_BY_RCU``, make proper use of a reference counter.
With traditional reference counting -- such as that implemented by the
kref library in Linux -- there is typically code that runs when the last
@ -1057,14 +1064,20 @@ SRCU: Initialization/cleanup::
init_srcu_struct
cleanup_srcu_struct
All: lockdep-checked RCU-protected pointer access::
All: lockdep-checked RCU utility APIs::
rcu_access_pointer
rcu_dereference_raw
RCU_LOCKDEP_WARN
rcu_sleep_check
RCU_NONIDLE
All: Unchecked RCU-protected pointer access::
rcu_dereference_raw
All: Unchecked RCU-protected pointer access with dereferencing prohibited::
rcu_access_pointer
See the comment headers in the source code (or the docbook generated
from them) for more information.

View File

@ -262,8 +262,6 @@ Compiling the kernel
- Make sure you have at least gcc 5.1 available.
For more information, refer to :ref:`Documentation/process/changes.rst <changes>`.
Please note that you can still run a.out user programs with this kernel.
- Do a ``make`` to create a compressed kernel image. It is also
possible to do ``make install`` if you have lilo installed to suit the
kernel makefiles, but you may want to check your particular lilo setup first.
@ -332,85 +330,10 @@ Compiling the kernel
If something goes wrong
-----------------------
- If you have problems that seem to be due to kernel bugs, please check
the file MAINTAINERS to see if there is a particular person associated
with the part of the kernel that you are having trouble with. If there
isn't anyone listed there, then the second best thing is to mail
them to me (torvalds@linux-foundation.org), and possibly to any other
relevant mailing-list or to the newsgroup.
If you have problems that seem to be due to kernel bugs, please follow the
instructions at 'Documentation/admin-guide/reporting-issues.rst'.
- In all bug-reports, *please* tell what kernel you are talking about,
how to duplicate the problem, and what your setup is (use your common
sense). If the problem is new, tell me so, and if the problem is
old, please try to tell me when you first noticed it.
- If the bug results in a message like::
unable to handle kernel paging request at address C0000010
Oops: 0002
EIP: 0010:XXXXXXXX
eax: xxxxxxxx ebx: xxxxxxxx ecx: xxxxxxxx edx: xxxxxxxx
esi: xxxxxxxx edi: xxxxxxxx ebp: xxxxxxxx
ds: xxxx es: xxxx fs: xxxx gs: xxxx
Pid: xx, process nr: xx
xx xx xx xx xx xx xx xx xx xx
or similar kernel debugging information on your screen or in your
system log, please duplicate it *exactly*. The dump may look
incomprehensible to you, but it does contain information that may
help debugging the problem. The text above the dump is also
important: it tells something about why the kernel dumped code (in
the above example, it's due to a bad kernel pointer). More information
on making sense of the dump is in Documentation/admin-guide/bug-hunting.rst
- If you compiled the kernel with CONFIG_KALLSYMS you can send the dump
as is, otherwise you will have to use the ``ksymoops`` program to make
sense of the dump (but compiling with CONFIG_KALLSYMS is usually preferred).
This utility can be downloaded from
https://www.kernel.org/pub/linux/utils/kernel/ksymoops/ .
Alternatively, you can do the dump lookup by hand:
- In debugging dumps like the above, it helps enormously if you can
look up what the EIP value means. The hex value as such doesn't help
me or anybody else very much: it will depend on your particular
kernel setup. What you should do is take the hex value from the EIP
line (ignore the ``0010:``), and look it up in the kernel namelist to
see which kernel function contains the offending address.
To find out the kernel function name, you'll need to find the system
binary associated with the kernel that exhibited the symptom. This is
the file 'linux/vmlinux'. To extract the namelist and match it against
the EIP from the kernel crash, do::
nm vmlinux | sort | less
This will give you a list of kernel addresses sorted in ascending
order, from which it is simple to find the function that contains the
offending address. Note that the address given by the kernel
debugging messages will not necessarily match exactly with the
function addresses (in fact, that is very unlikely), so you can't
just 'grep' the list: the list will, however, give you the starting
point of each kernel function, so by looking for the function that
has a starting address lower than the one you are searching for but
is followed by a function with a higher address you will find the one
you want. In fact, it may be a good idea to include a bit of
"context" in your problem report, giving a few lines around the
interesting one.
If you for some reason cannot do the above (you have a pre-compiled
kernel image or similar), telling me as much about your setup as
possible will help. Please read
'Documentation/admin-guide/reporting-issues.rst' for details.
- Alternatively, you can use gdb on a running kernel. (read-only; i.e. you
cannot change values or set break points.) To do this, first compile the
kernel with -g; edit arch/x86/Makefile appropriately, then do a ``make
clean``. You'll also need to enable CONFIG_PROC_FS (via ``make config``).
After you've rebooted with the new kernel, do ``gdb vmlinux /proc/kcore``.
You can now use all the usual gdb commands. The command to look up the
point where your system crashed is ``l *0xXXXXXXXX``. (Replace the XXXes
with the EIP value.)
gdb'ing a non-running kernel currently fails because ``gdb`` (wrongly)
disregards the starting offset for which the kernel is compiled.
Hints on understanding kernel bug reports are in
'Documentation/admin-guide/bug-hunting.rst'. More on debugging the kernel
with gdb is in 'Documentation/dev-tools/gdb-kernel-debugging.rst' and
'Documentation/dev-tools/kgdb.rst'.

View File

@ -1,13 +0,0 @@
.. SPDX-License-Identifier: GPL-2.0
===============
Overriding DSDT
===============
Linux supports a method of overriding the BIOS DSDT:
CONFIG_ACPI_CUSTOM_DSDT - builds the image into the kernel.
When to use this method is described in detail on the
Linux/ACPI home page:
https://01.org/linux-acpi/documentation/overriding-dsdt

View File

@ -613,6 +613,7 @@ kernel command line.
eibrs enhanced IBRS
eibrs,retpoline enhanced IBRS + Retpolines
eibrs,lfence enhanced IBRS + LFENCE
ibrs use IBRS to protect kernel
Not specifying this option is equivalent to
spectre_v2=auto.

View File

@ -200,7 +200,7 @@ prb
A pointer to the printk ringbuffer (struct printk_ringbuffer). This
may be pointing to the static boot ringbuffer or the dynamically
allocated ringbuffer, depending on when the the core dump occurred.
allocated ringbuffer, depending on when the core dump occurred.
Used by user-space tools to read the active kernel log buffer.
printk_rb_static

View File

@ -3801,6 +3801,10 @@
nox2apic [X86-64,APIC] Do not enable x2APIC mode.
NOTE: this parameter will be ignored on systems with the
LEGACY_XAPIC_DISABLED bit set in the
IA32_XAPIC_DISABLE_STATUS MSR.
nps_mtm_hs_ctr= [KNL,ARC]
This parameter sets the maximum duration, in
cycles, each HW thread of the CTOP can run

View File

@ -65,7 +65,7 @@ HugePages_Surp
may be temporarily larger than the maximum number of surplus huge
pages when the system is under memory pressure.
Hugepagesize
is the default hugepage size (in Kb).
is the default hugepage size (in kB).
Hugetlb
is the total amount of memory (in kB), consumed by huge
pages of all sizes.

View File

@ -102,6 +102,9 @@ Values:
- 1 - enable JIT hardening for unprivileged users only
- 2 - enable JIT hardening for all users
where "privileged user" in this context means a process having
CAP_BPF or CAP_SYS_ADMIN in the root user name space.
bpf_jit_kallsyms
----------------

View File

@ -134,6 +134,12 @@ More detailed explanation for tainting
scsi/snic on something else than x86_64, scsi/ips on non
x86/x86_64/itanium, have broken firmware settings for the
irqchip/irq-gic on arm64 ...).
- x86/x86_64: Microcode late loading is dangerous and will result in
tainting the kernel. It requires that all CPUs rendezvous to make sure
the update happens when the system is as quiescent as possible. However,
a higher priority MCE/SMI/NMI can move control flow away from that
rendezvous and interrupt the update, which can be detrimental to the
machine.
3) ``R`` if a module was force unloaded by ``rmmod -f``, ``' '`` if all
modules were unloaded normally.

View File

@ -0,0 +1,30 @@
.. contents::
.. sectnum::
==========================
Clang implementation notes
==========================
This document provides more details specific to the Clang/LLVM implementation of the eBPF instruction set.
Versions
========
Clang defined "CPU" versions, where a CPU version of 3 corresponds to the current eBPF ISA.
Clang can select the eBPF ISA version using ``-mcpu=v3`` for example to select version 3.
Arithmetic instructions
=======================
For CPU versions prior to 3, Clang v7.0 and later can enable ``BPF_ALU`` support with
``-Xclang -target-feature -Xclang +alu32``. In CPU version 3, support is automatically included.
Atomic operations
=================
Clang can generate atomic instructions by default when ``-mcpu=v3`` is
enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction
Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable
the atomics features, while keeping a lower ``-mcpu`` version, you can use
``-Xclang -target-feature -Xclang +alu32``.

View File

@ -26,6 +26,8 @@ that goes into great technical depth about the BPF Architecture.
classic_vs_extended.rst
bpf_licensing
test_debug
clang-notes
linux-notes
other
.. only:: subproject and html

View File

@ -1,7 +1,12 @@
.. contents::
.. sectnum::
========================================
eBPF Instruction Set Specification, v1.0
========================================
This document specifies version 1.0 of the eBPF instruction set.
====================
eBPF Instruction Set
====================
Registers and calling convention
================================
@ -11,10 +16,10 @@ all of which are 64-bits wide.
The eBPF calling convention is defined as:
* R0: return value from function calls, and exit value for eBPF programs
* R1 - R5: arguments for function calls
* R6 - R9: callee saved registers that function calls will preserve
* R10: read-only frame pointer to access stack
* R0: return value from function calls, and exit value for eBPF programs
* R1 - R5: arguments for function calls
* R6 - R9: callee saved registers that function calls will preserve
* R10: read-only frame pointer to access stack
R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if
necessary across calls.
@ -24,17 +29,17 @@ Instruction encoding
eBPF has two instruction encodings:
* the basic instruction encoding, which uses 64 bits to encode an instruction
* the wide instruction encoding, which appends a second 64-bit immediate value
(imm64) after the basic instruction for a total of 128 bits.
* the basic instruction encoding, which uses 64 bits to encode an instruction
* the wide instruction encoding, which appends a second 64-bit immediate value
(imm64) after the basic instruction for a total of 128 bits.
The basic instruction encoding looks as follows:
============= ======= =============== ==================== ============
32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
============= ======= =============== ==================== ============
immediate offset source register destination register opcode
============= ======= =============== ==================== ============
============= ======= =============== ==================== ============
32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
============= ======= =============== ==================== ============
immediate offset source register destination register opcode
============= ======= =============== ==================== ============
Note that most instructions do not use all of the fields.
Unused fields shall be cleared to zero.
@ -44,30 +49,30 @@ Instruction classes
The three LSB bits of the 'opcode' field store the instruction class:
========= ===== ===============================
class value description
========= ===== ===============================
BPF_LD 0x00 non-standard load operations
BPF_LDX 0x01 load into register operations
BPF_ST 0x02 store from immediate operations
BPF_STX 0x03 store from register operations
BPF_ALU 0x04 32-bit arithmetic operations
BPF_JMP 0x05 64-bit jump operations
BPF_JMP32 0x06 32-bit jump operations
BPF_ALU64 0x07 64-bit arithmetic operations
========= ===== ===============================
========= ===== =============================== ===================================
class value description reference
========= ===== =============================== ===================================
BPF_LD 0x00 non-standard load operations `Load and store instructions`_
BPF_LDX 0x01 load into register operations `Load and store instructions`_
BPF_ST 0x02 store from immediate operations `Load and store instructions`_
BPF_STX 0x03 store from register operations `Load and store instructions`_
BPF_ALU 0x04 32-bit arithmetic operations `Arithmetic and jump instructions`_
BPF_JMP 0x05 64-bit jump operations `Arithmetic and jump instructions`_
BPF_JMP32 0x06 32-bit jump operations `Arithmetic and jump instructions`_
BPF_ALU64 0x07 64-bit arithmetic operations `Arithmetic and jump instructions`_
========= ===== =============================== ===================================
Arithmetic and jump instructions
================================
For arithmetic and jump instructions (BPF_ALU, BPF_ALU64, BPF_JMP and
BPF_JMP32), the 8-bit 'opcode' field is divided into three parts:
For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` and
``BPF_JMP32``), the 8-bit 'opcode' field is divided into three parts:
============== ====== =================
4 bits (MSB) 1 bit 3 bits (LSB)
============== ====== =================
operation code source instruction class
============== ====== =================
============== ====== =================
4 bits (MSB) 1 bit 3 bits (LSB)
============== ====== =================
operation code source instruction class
============== ====== =================
The 4th bit encodes the source operand:
@ -84,66 +89,66 @@ The four MSB bits store the operation code.
Arithmetic instructions
-----------------------
BPF_ALU uses 32-bit wide operands while BPF_ALU64 uses 64-bit wide operands for
``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
otherwise identical operations.
The code field encodes the operation as below:
The 'code' field encodes the operation as below:
======== ===== =================================================
code value description
======== ===== =================================================
BPF_ADD 0x00 dst += src
BPF_SUB 0x10 dst -= src
BPF_MUL 0x20 dst \*= src
BPF_DIV 0x30 dst /= src
BPF_OR 0x40 dst \|= src
BPF_AND 0x50 dst &= src
BPF_LSH 0x60 dst <<= src
BPF_RSH 0x70 dst >>= src
BPF_NEG 0x80 dst = ~src
BPF_MOD 0x90 dst %= src
BPF_XOR 0xa0 dst ^= src
BPF_MOV 0xb0 dst = src
BPF_ARSH 0xc0 sign extending shift right
BPF_END 0xd0 byte swap operations (see separate section below)
======== ===== =================================================
======== ===== ==========================================================
code value description
======== ===== ==========================================================
BPF_ADD 0x00 dst += src
BPF_SUB 0x10 dst -= src
BPF_MUL 0x20 dst \*= src
BPF_DIV 0x30 dst /= src
BPF_OR 0x40 dst \|= src
BPF_AND 0x50 dst &= src
BPF_LSH 0x60 dst <<= src
BPF_RSH 0x70 dst >>= src
BPF_NEG 0x80 dst = ~src
BPF_MOD 0x90 dst %= src
BPF_XOR 0xa0 dst ^= src
BPF_MOV 0xb0 dst = src
BPF_ARSH 0xc0 sign extending shift right
BPF_END 0xd0 byte swap operations (see `Byte swap instructions`_ below)
======== ===== ==========================================================
BPF_ADD | BPF_X | BPF_ALU means::
``BPF_ADD | BPF_X | BPF_ALU`` means::
dst_reg = (u32) dst_reg + (u32) src_reg;
BPF_ADD | BPF_X | BPF_ALU64 means::
``BPF_ADD | BPF_X | BPF_ALU64`` means::
dst_reg = dst_reg + src_reg
BPF_XOR | BPF_K | BPF_ALU means::
``BPF_XOR | BPF_K | BPF_ALU`` means::
src_reg = (u32) src_reg ^ (u32) imm32
BPF_XOR | BPF_K | BPF_ALU64 means::
``BPF_XOR | BPF_K | BPF_ALU64`` means::
src_reg = src_reg ^ imm32
Byte swap instructions
----------------------
~~~~~~~~~~~~~~~~~~~~~~
The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit
code field of ``BPF_END``.
'code' field of ``BPF_END``.
The byte swap instructions operate on the destination register
only and do not use a separate source register or immediate value.
The 1-bit source operand field in the opcode is used to to select what byte
The 1-bit source operand field in the opcode is used to select what byte
order the operation convert from or to:
========= ===== =================================================
source value description
========= ===== =================================================
BPF_TO_LE 0x00 convert between host byte order and little endian
BPF_TO_BE 0x08 convert between host byte order and big endian
========= ===== =================================================
========= ===== =================================================
source value description
========= ===== =================================================
BPF_TO_LE 0x00 convert between host byte order and little endian
BPF_TO_BE 0x08 convert between host byte order and big endian
========= ===== =================================================
The imm field encodes the width of the swap operations. The following widths
The 'imm' field encodes the width of the swap operations. The following widths
are supported: 16, 32 and 64.
Examples:
@ -156,35 +161,31 @@ Examples:
dst_reg = htobe64(dst_reg)
``BPF_FROM_LE`` and ``BPF_FROM_BE`` exist as aliases for ``BPF_TO_LE`` and
``BPF_TO_BE`` respectively.
Jump instructions
-----------------
BPF_JMP32 uses 32-bit wide operands while BPF_JMP uses 64-bit wide operands for
``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for
otherwise identical operations.
The code field encodes the operation as below:
The 'code' field encodes the operation as below:
======== ===== ========================= ============
code value description notes
======== ===== ========================= ============
BPF_JA 0x00 PC += off BPF_JMP only
BPF_JEQ 0x10 PC += off if dst == src
BPF_JGT 0x20 PC += off if dst > src unsigned
BPF_JGE 0x30 PC += off if dst >= src unsigned
BPF_JSET 0x40 PC += off if dst & src
BPF_JNE 0x50 PC += off if dst != src
BPF_JSGT 0x60 PC += off if dst > src signed
BPF_JSGE 0x70 PC += off if dst >= src signed
BPF_CALL 0x80 function call
BPF_EXIT 0x90 function / program return BPF_JMP only
BPF_JLT 0xa0 PC += off if dst < src unsigned
BPF_JLE 0xb0 PC += off if dst <= src unsigned
BPF_JSLT 0xc0 PC += off if dst < src signed
BPF_JSLE 0xd0 PC += off if dst <= src signed
======== ===== ========================= ============
======== ===== ========================= ============
code value description notes
======== ===== ========================= ============
BPF_JA 0x00 PC += off BPF_JMP only
BPF_JEQ 0x10 PC += off if dst == src
BPF_JGT 0x20 PC += off if dst > src unsigned
BPF_JGE 0x30 PC += off if dst >= src unsigned
BPF_JSET 0x40 PC += off if dst & src
BPF_JNE 0x50 PC += off if dst != src
BPF_JSGT 0x60 PC += off if dst > src signed
BPF_JSGE 0x70 PC += off if dst >= src signed
BPF_CALL 0x80 function call
BPF_EXIT 0x90 function / program return BPF_JMP only
BPF_JLT 0xa0 PC += off if dst < src unsigned
BPF_JLE 0xb0 PC += off if dst <= src unsigned
BPF_JSLT 0xc0 PC += off if dst < src signed
BPF_JSLE 0xd0 PC += off if dst <= src signed
======== ===== ========================= ============
The eBPF program needs to store the return value into register R0 before doing a
BPF_EXIT.
@ -193,14 +194,26 @@ BPF_EXIT.
Load and store instructions
===========================
For load and store instructions (BPF_LD, BPF_LDX, BPF_ST and BPF_STX), the
For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_STX``), the
8-bit 'opcode' field is divided as:
============ ====== =================
3 bits (MSB) 2 bits 3 bits (LSB)
============ ====== =================
mode size instruction class
============ ====== =================
============ ====== =================
3 bits (MSB) 2 bits 3 bits (LSB)
============ ====== =================
mode size instruction class
============ ====== =================
The mode modifier is one of:
============= ===== ==================================== =============
mode modifier value description reference
============= ===== ==================================== =============
BPF_IMM 0x00 64-bit immediate instructions `64-bit immediate instructions`_
BPF_ABS 0x20 legacy BPF packet access (absolute) `Legacy BPF Packet access instructions`_
BPF_IND 0x40 legacy BPF packet access (indirect) `Legacy BPF Packet access instructions`_
BPF_MEM 0x60 regular load and store operations `Regular load and store operations`_
BPF_ATOMIC 0xc0 atomic operations `Atomic operations`_
============= ===== ==================================== =============
The size modifier is one of:
@ -213,19 +226,6 @@ The size modifier is one of:
BPF_DW 0x18 double word (8 bytes)
============= ===== =====================
The mode modifier is one of:
============= ===== ====================================
mode modifier value description
============= ===== ====================================
BPF_IMM 0x00 64-bit immediate instructions
BPF_ABS 0x20 legacy BPF packet access (absolute)
BPF_IND 0x40 legacy BPF packet access (indirect)
BPF_MEM 0x60 regular load and store operations
BPF_ATOMIC 0xc0 atomic operations
============= ===== ====================================
Regular load and store operations
---------------------------------
@ -256,44 +256,42 @@ by other eBPF programs or means outside of this specification.
All atomic operations supported by eBPF are encoded as store operations
that use the ``BPF_ATOMIC`` mode modifier as follows:
* ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
* ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
* 8-bit and 16-bit wide atomic operations are not supported.
* ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
* ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
* 8-bit and 16-bit wide atomic operations are not supported.
The imm field is used to encode the actual atomic operation.
The 'imm' field is used to encode the actual atomic operation.
Simple atomic operation use a subset of the values defined to encode
arithmetic operations in the imm field to encode the atomic operation:
arithmetic operations in the 'imm' field to encode the atomic operation:
======== ===== ===========
imm value description
======== ===== ===========
BPF_ADD 0x00 atomic add
BPF_OR 0x40 atomic or
BPF_AND 0x50 atomic and
BPF_XOR 0xa0 atomic xor
======== ===== ===========
======== ===== ===========
imm value description
======== ===== ===========
BPF_ADD 0x00 atomic add
BPF_OR 0x40 atomic or
BPF_AND 0x50 atomic and
BPF_XOR 0xa0 atomic xor
======== ===== ===========
``BPF_ATOMIC | BPF_W | BPF_STX`` with imm = BPF_ADD means::
``BPF_ATOMIC | BPF_W | BPF_STX`` with 'imm' = BPF_ADD means::
*(u32 *)(dst_reg + off16) += src_reg
``BPF_ATOMIC | BPF_DW | BPF_STX`` with imm = BPF ADD means::
``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
*(u64 *)(dst_reg + off16) += src_reg
``BPF_XADD`` is a deprecated name for ``BPF_ATOMIC | BPF_ADD``.
In addition to the simple atomic operations, there also is a modifier and
two complex atomic operations:
=========== ================ ===========================
imm value description
=========== ================ ===========================
BPF_FETCH 0x01 modifier: return old value
BPF_XCHG 0xe0 | BPF_FETCH atomic exchange
BPF_CMPXCHG 0xf0 | BPF_FETCH atomic compare and exchange
=========== ================ ===========================
=========== ================ ===========================
imm value description
=========== ================ ===========================
BPF_FETCH 0x01 modifier: return old value
BPF_XCHG 0xe0 | BPF_FETCH atomic exchange
BPF_CMPXCHG 0xf0 | BPF_FETCH atomic compare and exchange
=========== ================ ===========================
The ``BPF_FETCH`` modifier is optional for simple atomic operations, and
always set for the complex atomic operations. If the ``BPF_FETCH`` flag
@ -309,16 +307,10 @@ The ``BPF_CMPXCHG`` operation atomically compares the value addressed by
value that was at ``dst_reg + off`` before the operation is zero-extended
and loaded back to ``R0``.
Clang can generate atomic instructions by default when ``-mcpu=v3`` is
enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction
Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable
the atomics features, while keeping a lower ``-mcpu`` version, you can use
``-Xclang -target-feature -Xclang +alu32``.
64-bit immediate instructions
-----------------------------
Instructions with the ``BPF_IMM`` mode modifier use the wide instruction
Instructions with the ``BPF_IMM`` 'mode' modifier use the wide instruction
encoding for an extra imm64 value.
There is currently only one such instruction.
@ -331,36 +323,6 @@ There is currently only one such instruction.
Legacy BPF Packet access instructions
-------------------------------------
eBPF has special instructions for access to packet data that have been
carried over from classic BPF to retain the performance of legacy socket
filters running in the eBPF interpreter.
The instructions come in two forms: ``BPF_ABS | <size> | BPF_LD`` and
``BPF_IND | <size> | BPF_LD``.
These instructions are used to access packet data and can only be used when
the program context is a pointer to networking packet. ``BPF_ABS``
accesses packet data at an absolute offset specified by the immediate data
and ``BPF_IND`` access packet data at an offset that includes the value of
a register in addition to the immediate data.
These instructions have seven implicit operands:
* Register R6 is an implicit input that must contain pointer to a
struct sk_buff.
* Register R0 is an implicit output which contains the data fetched from
the packet.
* Registers R1-R5 are scratch registers that are clobbered after a call to
``BPF_ABS | BPF_LD`` or ``BPF_IND | BPF_LD`` instructions.
These instructions have an implicit program exit condition as well. When an
eBPF program is trying to access the data beyond the packet boundary, the
program execution will be aborted.
``BPF_ABS | BPF_W | BPF_LD`` means::
R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + imm32))
``BPF_IND | BPF_W | BPF_LD`` means::
R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
eBPF previously introduced special instructions for access to packet data that were
carried over from classic BPF. However, these instructions are
deprecated and should no longer be used.

View File

@ -137,14 +137,22 @@ KF_ACQUIRE and KF_RET_NULL flags.
--------------------------
The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
indicates that the all pointer arguments will always be refcounted, and have
their offset set to 0. It can be used to enforce that a pointer to a refcounted
object acquired from a kfunc or BPF helper is passed as an argument to this
kfunc without any modifications (e.g. pointer arithmetic) such that it is
trusted and points to the original object. This flag is often used for kfuncs
that operate (change some property, perform some operation) on an object that
was obtained using an acquire kfunc. Such kfuncs need an unchanged pointer to
ensure the integrity of the operation being performed on the expected object.
indicates that the all pointer arguments will always have a guaranteed lifetime,
and pointers to kernel objects are always passed to helpers in their unmodified
form (as obtained from acquire kfuncs).
It can be used to enforce that a pointer to a refcounted object acquired from a
kfunc or BPF helper is passed as an argument to this kfunc without any
modifications (e.g. pointer arithmetic) such that it is trusted and points to
the original object.
Meanwhile, it is also allowed pass pointers to normal memory to such kfuncs,
but those can have a non-zero offset.
This flag is often used for kfuncs that operate (change some property, perform
some operation) on an object that was obtained using an acquire kfunc. Such
kfuncs need an unchanged pointer to ensure the integrity of the operation being
performed on the expected object.
2.4.6 KF_SLEEPABLE flag
-----------------------

View File

@ -0,0 +1,53 @@
.. contents::
.. sectnum::
==========================
Linux implementation notes
==========================
This document provides more details specific to the Linux kernel implementation of the eBPF instruction set.
Byte swap instructions
======================
``BPF_FROM_LE`` and ``BPF_FROM_BE`` exist as aliases for ``BPF_TO_LE`` and ``BPF_TO_BE`` respectively.
Legacy BPF Packet access instructions
=====================================
As mentioned in the `ISA standard documentation <instruction-set.rst#legacy-bpf-packet-access-instructions>`_,
Linux has special eBPF instructions for access to packet data that have been
carried over from classic BPF to retain the performance of legacy socket
filters running in the eBPF interpreter.
The instructions come in two forms: ``BPF_ABS | <size> | BPF_LD`` and
``BPF_IND | <size> | BPF_LD``.
These instructions are used to access packet data and can only be used when
the program context is a pointer to a networking packet. ``BPF_ABS``
accesses packet data at an absolute offset specified by the immediate data
and ``BPF_IND`` access packet data at an offset that includes the value of
a register in addition to the immediate data.
These instructions have seven implicit operands:
* Register R6 is an implicit input that must contain a pointer to a
struct sk_buff.
* Register R0 is an implicit output which contains the data fetched from
the packet.
* Registers R1-R5 are scratch registers that are clobbered by the
instruction.
These instructions have an implicit program exit condition as well. If an
eBPF program attempts access data beyond the packet boundary, the
program execution will be aborted.
``BPF_ABS | BPF_W | BPF_LD`` (0x20) means::
R0 = ntohl(*(u32 *) ((struct sk_buff *) R6->data + imm))
where ``ntohl()`` converts a 32-bit value from network byte order to host byte order.
``BPF_IND | BPF_W | BPF_LD`` (0x40) means::
R0 = ntohl(*(u32 *) ((struct sk_buff *) R6->data + src + imm))

View File

@ -31,7 +31,7 @@ The map uses key of type of either ``__u64 cgroup_inode_id`` or
};
``cgroup_inode_id`` is the inode id of the cgroup directory.
``attach_type`` is the the program's attach type.
``attach_type`` is the program's attach type.
Linux 5.9 added support for type ``__u64 cgroup_inode_id`` as the key type.
When this key type is used, then all attach types of the particular cgroup and
@ -155,7 +155,7 @@ However, the BPF program can still only associate with one map of each type
``BPF_MAP_TYPE_CGROUP_STORAGE`` or more than one
``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE``.
In all versions, userspace may use the the attach parameters of cgroup and
In all versions, userspace may use the attach parameters of cgroup and
attach type pair in ``struct bpf_cgroup_storage_key`` as the key to the BPF map
APIs to read or update the storage for a given attachment. For Linux 5.9
attach type shared storages, only the first value in the struct, cgroup inode

View File

@ -15,6 +15,18 @@
import sys
import os
import sphinx
import shutil
# helper
# ------
def have_command(cmd):
"""Search ``cmd`` in the ``PATH`` environment.
If found, return True.
If not found, return False.
"""
return shutil.which(cmd) is not None
# Get Sphinx version
major, minor, patch = sphinx.version_info[:3]
@ -107,7 +119,32 @@ else:
autosectionlabel_prefix_document = True
autosectionlabel_maxdepth = 2
extensions.append("sphinx.ext.imgmath")
# Load math renderer:
# For html builder, load imgmath only when its dependencies are met.
# mathjax is the default math renderer since Sphinx 1.8.
have_latex = have_command('latex')
have_dvipng = have_command('dvipng')
load_imgmath = have_latex and have_dvipng
# Respect SPHINX_IMGMATH (for html docs only)
if 'SPHINX_IMGMATH' in os.environ:
env_sphinx_imgmath = os.environ['SPHINX_IMGMATH']
if 'yes' in env_sphinx_imgmath:
load_imgmath = True
elif 'no' in env_sphinx_imgmath:
load_imgmath = False
else:
sys.stderr.write("Unknown env SPHINX_IMGMATH=%s ignored.\n" % env_sphinx_imgmath)
# Always load imgmath for Sphinx <1.8 or for epub docs
load_imgmath = (load_imgmath or (major == 1 and minor < 8)
or 'epub' in sys.argv)
if load_imgmath:
extensions.append("sphinx.ext.imgmath")
math_renderer = 'imgmath'
else:
math_renderer = 'mathjax'
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
@ -333,7 +370,8 @@ html_static_path = ['sphinx-static']
html_use_smartypants = False
# Custom sidebar templates, maps document names to template names.
#html_sidebars = {}
# Note that the RTD theme ignores this.
html_sidebars = { '**': ['searchbox.html', 'localtoc.html', 'sourcelink.html']}
# Additional templates that should be rendered to pages, maps page names to
# template names.

View File

@ -43,10 +43,11 @@ annotated objects like this, tools can be run on them to generate more useful
information. In particular, on properly annotated objects, ``objtool`` can be
run to check and fix the object if needed. Currently, ``objtool`` can report
missing frame pointer setup/destruction in functions. It can also
automatically generate annotations for :doc:`ORC unwinder <x86/orc-unwinder>`
automatically generate annotations for the ORC unwinder
(Documentation/x86/orc-unwinder.rst)
for most code. Both of these are especially important to support reliable
stack traces which are in turn necessary for :doc:`Kernel live patching
<livepatch/livepatch>`.
stack traces which are in turn necessary for kernel live patching
(Documentation/livepatch/livepatch.rst).
Caveat and Discussion
---------------------

View File

@ -560,7 +560,7 @@ available:
* cpuhp_state_remove_instance(state, node)
* cpuhp_state_remove_instance_nocalls(state, node)
The arguments are the same as for the the cpuhp_state_add_instance*()
The arguments are the same as for the cpuhp_state_add_instance*()
variants above.
The functions differ in the way how the installed callbacks are treated:

View File

@ -23,6 +23,7 @@ it.
printk-formats
printk-index
symbol-namespaces
asm-annotations
Data structures and low-level utilities
=======================================
@ -44,6 +45,8 @@ Library functionality that is used throughout the kernel.
this_cpu_ops
timekeeping
errseq
wrappers/atomic_t
wrappers/atomic_bitops
Low level entry and exit
========================
@ -67,6 +70,7 @@ Documentation/locking/index.rst for more related documentation.
local_ops
padata
../RCU/index
wrappers/memory-barriers.rst
Low-level hardware management
=============================

View File

@ -71,7 +71,7 @@ variety of methods:
Note that irq domain lookups must happen in contexts that are
compatible with a RCU read-side critical section.
The irq_create_mapping() function must be called *atleast once*
The irq_create_mapping() function must be called *at least once*
before any call to irq_find_mapping(), lest the descriptor will not
be allocated.

View File

@ -625,6 +625,16 @@ Examples::
%p4cc Y10 little-endian (0x20303159)
%p4cc NV12 big-endian (0xb231564e)
Rust
----
::
%pA
Only intended to be used from Rust code to format ``core::fmt::Arguments``.
Do *not* use it from C.
Thanks
======

View File

@ -0,0 +1,18 @@
.. SPDX-License-Identifier: GPL-2.0
This is a simple wrapper to bring atomic_bitops.txt into the RST world
until such a time as that file can be converted directly.
=============
Atomic bitops
=============
.. raw:: latex
\footnotesize
.. include:: ../../atomic_bitops.txt
:literal:
.. raw:: latex
\normalsize

View File

@ -0,0 +1,19 @@
.. SPDX-License-Identifier: GPL-2.0
This is a simple wrapper to bring atomic_t.txt into the RST world
until such a time as that file can be converted directly.
============
Atomic types
============
.. raw:: latex
\footnotesize
.. include:: ../../atomic_t.txt
:literal:
.. raw:: latex
\normalsize

View File

@ -0,0 +1,18 @@
.. SPDX-License-Identifier: GPL-2.0
This is a simple wrapper to bring memory-barriers.txt into the RST world
until such a time as that file can be converted directly.
============================
Linux kernel memory barriers
============================
.. raw:: latex
\footnotesize
.. include:: ../../memory-barriers.txt
:literal:
.. raw:: latex
\normalsize

View File

@ -57,6 +57,11 @@ properties:
- description: interrupt ID for I2C event
- description: interrupt ID for I2C error
interrupt-names:
items:
- const: event
- const: error
resets:
maxItems: 1
@ -92,6 +97,8 @@ properties:
- description: register offset within syscfg
- description: register bitmask for FMP bit
wakeup-source: true
required:
- compatible
- reg

View File

@ -144,6 +144,12 @@ properties:
Mark the corresponding energy efficient ethernet mode as
broken and request the ethernet to stop advertising it.
pses:
$ref: /schemas/types.yaml#/definitions/phandle-array
maxItems: 1
description:
Specifies a reference to a node representing a Power Sourcing Equipment.
phy-is-integrated:
$ref: /schemas/types.yaml#/definitions/flag
description:

View File

@ -128,7 +128,7 @@ examples:
i2c-int-rising;
reset-n-io = <&gpio3 19 GPIO_ACTIVE_HIGH>;
reset-n-io = <&gpio3 19 GPIO_ACTIVE_LOW>;
};
};
@ -151,7 +151,7 @@ examples:
interrupt-parent = <&gpio1>;
interrupts = <17 IRQ_TYPE_EDGE_RISING>;
reset-n-io = <&gpio3 19 GPIO_ACTIVE_HIGH>;
reset-n-io = <&gpio3 19 GPIO_ACTIVE_LOW>;
};
};
@ -162,7 +162,7 @@ examples:
nfc {
compatible = "marvell,nfc-uart";
reset-n-io = <&gpio3 16 GPIO_ACTIVE_HIGH>;
reset-n-io = <&gpio3 16 GPIO_ACTIVE_LOW>;
hci-muxed;
flow-control;

View File

@ -0,0 +1,40 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/net/pse-pd/podl-pse-regulator.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Regulator based Power Sourcing Equipment
maintainers:
- Oleksij Rempel <o.rempel@pengutronix.de>
description: Regulator based PoDL PSE controller. The device must be referenced
by the PHY node to control power injection to the Ethernet cable.
allOf:
- $ref: "pse-controller.yaml#"
properties:
compatible:
const: podl-pse-regulator
'#pse-cells':
const: 0
pse-supply:
description: Power supply for the PSE controller
additionalProperties: false
required:
- compatible
- pse-supply
examples:
- |
ethernet-pse {
compatible = "podl-pse-regulator";
pse-supply = <&reg_t1l1>;
#pse-cells = <0>;
};

View File

@ -0,0 +1,33 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/net/pse-pd/pse-controller.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Power Sourcing Equipment (PSE).
description: Binding for the Power Sourcing Equipment (PSE) as defined in the
IEEE 802.3 specification. It is designed for hardware which is delivering
power over twisted pair/ethernet cable. The ethernet-pse nodes should be
used to describe PSE controller and referenced by the ethernet-phy node.
maintainers:
- Oleksij Rempel <o.rempel@pengutronix.de>
properties:
$nodename:
pattern: "^ethernet-pse(@.*)?$"
"#pse-cells":
description:
Used to uniquely identify a PSE instance within an IC. Will be
0 on PSE nodes with only a single output and at least 1 on nodes
controlling several outputs.
enum: [0, 1]
required:
- "#pse-cells"
additionalProperties: true
...

View File

@ -14,6 +14,9 @@ when it is embedded in source files.
reasons. The kernel source contains tens of thousands of kernel-doc
comments. Please stick to the style described here.
.. note:: kernel-doc does not cover Rust code: please see
Documentation/rust/general-information.rst instead.
The kernel-doc structure is extracted from the comments, and proper
`Sphinx C Domain`_ function and type descriptions with anchors are
generated from them. The descriptions are filtered for special kernel-doc

View File

@ -48,10 +48,6 @@ or ``virtualenv``, depending on how your distribution packaged Python 3.
on the Sphinx version, it should be installed separately,
with ``pip install sphinx_rtd_theme``.
#) Some ReST pages contain math expressions. Due to the way Sphinx works,
those expressions are written using LaTeX notation. It needs texlive
installed with amsfonts and amsmath in order to evaluate them.
In summary, if you want to install Sphinx version 2.4.4, you should do::
$ virtualenv sphinx_2.4.4
@ -86,6 +82,27 @@ Depending on the distribution, you may also need to install a series of
``texlive`` packages that provide the minimal set of functionalities
required for ``XeLaTeX`` to work.
Math Expressions in HTML
------------------------
Some ReST pages contain math expressions. Due to the way Sphinx works,
those expressions are written using LaTeX notation.
There are two options for Sphinx to render math expressions in html output.
One is an extension called `imgmath`_ which converts math expressions into
images and embeds them in html pages.
The other is an extension called `mathjax`_ which delegates math rendering
to JavaScript capable web browsers.
The former was the only option for pre-6.1 kernel documentation and it
requires quite a few texlive packages including amsfonts and amsmath among
others.
Since kernel release 6.1, html pages with math expressions can be built
without installing any texlive packages. See `Choice of Math Renderer`_ for
further info.
.. _imgmath: https://www.sphinx-doc.org/en/master/usage/extensions/math.html#module-sphinx.ext.imgmath
.. _mathjax: https://www.sphinx-doc.org/en/master/usage/extensions/math.html#module-sphinx.ext.mathjax
.. _sphinx-pre-install:
Checking for Sphinx dependencies
@ -164,6 +181,38 @@ To remove the generated documentation, run ``make cleandocs``.
as well would improve the quality of images embedded in PDF
documents, especially for kernel releases 5.18 and later.
Choice of Math Renderer
-----------------------
Since kernel release 6.1, mathjax works as a fallback math renderer for
html output.\ [#sph1_8]_
Math renderer is chosen depending on available commands as shown below:
.. table:: Math Renderer Choices for HTML
============= ================= ============
Math renderer Required commands Image format
============= ================= ============
imgmath latex, dvipng PNG (raster)
mathjax
============= ================= ============
The choice can be overridden by setting an environment variable
``SPHINX_IMGMATH`` as shown below:
.. table:: Effect of Setting ``SPHINX_IMGMATH``
====================== ========
Setting Renderer
====================== ========
``SPHINX_IMGMATH=yes`` imgmath
``SPHINX_IMGMATH=no`` mathjax
====================== ========
.. [#sph1_8] Fallback of math renderer requires Sphinx >=1.8.
Writing Documentation
=====================

View File

@ -301,6 +301,7 @@ IO region
devm_release_region()
devm_release_resource()
devm_request_mem_region()
devm_request_free_mem_region()
devm_request_region()
devm_request_resource()
@ -334,7 +335,7 @@ IRQ
devm_irq_alloc_descs_from()
devm_irq_alloc_generic_chip()
devm_irq_setup_generic_chip()
devm_irq_sim_init()
devm_irq_domain_create_sim()
LED
devm_led_classdev_register()
@ -392,7 +393,9 @@ PHY
PINCTRL
devm_pinctrl_get()
devm_pinctrl_put()
devm_pinctrl_get_select()
devm_pinctrl_register()
devm_pinctrl_register_and_init()
devm_pinctrl_unregister()
POWER
@ -427,6 +430,8 @@ SLAVE DMA ENGINE
devm_acpi_dma_controller_register()
SPI
devm_spi_alloc_master()
devm_spi_alloc_slave()
devm_spi_register_master()
WATCHDOG

View File

@ -100,7 +100,7 @@ I believe platform_data is available for this, but if rather not, moving
the isa_driver pointer to the private struct isa_dev is ofcourse fine as
well.
Then, if the the driver did not provide a .match, it matches. If it did,
Then, if the driver did not provide a .match, it matches. If it did,
the driver match() method is called to determine a match.
If it did **not** match, dev->platform_data is reset to indicate this to

View File

@ -86,17 +86,24 @@ Module Options
Special configuration for udlfb is usually unnecessary. There are a few
options, however.
From the command line, pass options to modprobe
modprobe udlfb fb_defio=0 console=1 shadow=1
From the command line, pass options to modprobe::
Or modify options on the fly at /sys/module/udlfb/parameters directory via
sudo nano fb_defio
change the parameter in place, and save the file.
modprobe udlfb fb_defio=0 console=1 shadow=1
Unplug/replug USB device to apply with new settings
Or change options on the fly by editing
/sys/module/udlfb/parameters/PARAMETER_NAME ::
Or for permanent option, create file like /etc/modprobe.d/udlfb.conf with text
options udlfb fb_defio=0 console=1 shadow=1
cd /sys/module/udlfb/parameters
ls # to see a list of parameter names
sudo nano PARAMETER_NAME
# change the parameter in place, and save the file.
Unplug/replug USB device to apply with new settings.
Or to apply options permanently, create a modprobe configuration file
like /etc/modprobe.d/udlfb.conf with text::
options udlfb fb_defio=0 console=1 shadow=1
Accepted boolean options:

View File

@ -122,7 +122,7 @@ volumes, calling::
to tell fscache that a volume has been withdrawn. This waits for all
outstanding accesses on the volume to complete before returning.
When the the cache is completely withdrawn, fscache should be notified by
When the cache is completely withdrawn, fscache should be notified by
calling::
void fscache_relinquish_cache(struct fscache_cache *cache);

View File

@ -456,15 +456,15 @@ The ext4 superblock is laid out as follows in
* - 0x277
- __u8
- s_lastcheck_hi
- Upper 8 bits of the s_lastcheck_hi field.
- Upper 8 bits of the s_lastcheck field.
* - 0x278
- __u8
- s_first_error_time_hi
- Upper 8 bits of the s_first_error_time_hi field.
- Upper 8 bits of the s_first_error_time field.
* - 0x279
- __u8
- s_last_error_time_hi
- Upper 8 bits of the s_last_error_time_hi field.
- Upper 8 bits of the s_last_error_time field.
* - 0x27A
- __u8
- s_pad[2]

View File

@ -286,9 +286,8 @@ compress_algorithm=%s:%d Control compress algorithm and its compress level, now,
algorithm level range
lz4 3 - 16
zstd 1 - 22
compress_log_size=%u Support configuring compress cluster size, the size will
be 4KB * (1 << %u), 16KB is minimum size, also it's
default size.
compress_log_size=%u Support configuring compress cluster size. The size will
be 4KB * (1 << %u). The default and minimum sizes are 16KB.
compress_extension=%s Support adding specified extension, so that f2fs can enable
compression on those corresponding files, e.g. if all files
with '.ext' has high compression rate, we can set the '.ext'

View File

@ -661,7 +661,7 @@ idmappings::
mount idmapping: u0:k10000:r10000
Assume a file owned by ``u1000`` is read from disk. The filesystem maps this id
to ``k21000`` according to it's idmapping. This is what is stored in the
to ``k21000`` according to its idmapping. This is what is stored in the
inode's ``i_uid`` and ``i_gid`` fields.
When the caller queries the ownership of this file via ``stat()`` the kernel

View File

@ -176,7 +176,7 @@ Then userspace.
The requirement for a static, fixed preallocated system area comes from how
qnx6fs deals with writes.
Each superblock got it's own half of the system area. So superblock #1
Each superblock got its own half of the system area. So superblock #1
always uses blocks from the lower half while superblock #2 just writes to
blocks represented by the upper half bitmap system area bits.

View File

@ -227,7 +227,7 @@ Files
from the data buffer, updating the value of the specified signal
notification register. The signal notification register will
either be replaced with the input data or will be updated to the
bitwise OR or the old value and the input data, depending on the
bitwise OR of the old value and the input data, depending on the
contents of the signal1_type, or signal2_type respectively,
file.

View File

@ -100,7 +100,7 @@ transactions together::
ntp = xfs_trans_dup(tp);
xfs_trans_commit(tp);
xfs_log_reserve(ntp);
xfs_trans_reserve(ntp);
This results in a series of "rolling transactions" where the inode is locked
across the entire chain of transactions. Hence while this series of rolling
@ -191,7 +191,7 @@ transaction rolling mechanism to re-reserve space on every transaction roll. We
know from the implementation of the permanent transactions how many transaction
rolls are likely for the common modifications that need to be made.
For example, and inode allocation is typically two transactions - one to
For example, an inode allocation is typically two transactions - one to
physically allocate a free inode chunk on disk, and another to allocate an inode
from an inode chunk that has free inodes in it. Hence for an inode allocation
transaction, we might set the reservation log count to a value of 2 to indicate
@ -200,7 +200,7 @@ chain. Each time a permanent transaction rolls, it consumes an entire unit
reservation.
Hence when the permanent transaction is first allocated, the log space
reservation is increases from a single unit reservation to multiple unit
reservation is increased from a single unit reservation to multiple unit
reservations. That multiple is defined by the reservation log count, and this
means we can roll the transaction multiple times before we have to re-reserve
log space when we roll the transaction. This ensures that the common
@ -259,7 +259,7 @@ the next transaction in the sequeunce, but we have none remaining. We cannot
sleep during the transaction commit process waiting for new log space to become
available, as we may end up on the end of the FIFO queue and the items we have
locked while we sleep could end up pinning the tail of the log before there is
enough free space in the log to fulfil all of the pending reservations and
enough free space in the log to fulfill all of the pending reservations and
then wake up transaction commit in progress.
To take a new reservation without sleeping requires us to be able to take a
@ -551,14 +551,14 @@ Essentially, this shows that an item that is in the AIL can still be modified
and relogged, so any tracking must be separate to the AIL infrastructure. As
such, we cannot reuse the AIL list pointers for tracking committed items, nor
can we store state in any field that is protected by the AIL lock. Hence the
committed item tracking needs it's own locks, lists and state fields in the log
committed item tracking needs its own locks, lists and state fields in the log
item.
Similar to the AIL, tracking of committed items is done through a new list
called the Committed Item List (CIL). The list tracks log items that have been
committed and have formatted memory buffers attached to them. It tracks objects
in transaction commit order, so when an object is relogged it is removed from
it's place in the list and re-inserted at the tail. This is entirely arbitrary
its place in the list and re-inserted at the tail. This is entirely arbitrary
and done to make it easy for debugging - the last items in the list are the
ones that are most recently modified. Ordering of the CIL is not necessary for
transactional integrity (as discussed in the next section) so the ordering is
@ -615,7 +615,7 @@ those changes into the current checkpoint context. We then initialise a new
context and attach that to the CIL for aggregation of new transactions.
This allows us to unlock the CIL immediately after transfer of all the
committed items and effectively allow new transactions to be issued while we
committed items and effectively allows new transactions to be issued while we
are formatting the checkpoint into the log. It also allows concurrent
checkpoints to be written into the log buffers in the case of log force heavy
workloads, just like the existing transaction commit code does. This, however,
@ -884,9 +884,9 @@ pin the object the first time it is inserted into the CIL - if it is already in
the CIL during a transaction commit, then we do not pin it again. Because there
can be multiple outstanding checkpoint contexts, we can still see elevated pin
counts, but as each checkpoint completes the pin count will retain the correct
value according to it's context.
value according to its context.
Just to make matters more slightly more complex, this checkpoint level context
Just to make matters slightly more complex, this checkpoint level context
for the pin count means that the pinning of an item must take place under the
CIL commit/flush lock. If we pin the object outside this lock, we cannot
guarantee which context the pin count is associated with. This is because of

View File

@ -21,7 +21,7 @@ possible we decided to do following:
- Devices behind real busses where there is a connector resource
are represented as struct spi_device or struct i2c_device. Note
that standard UARTs are not busses so there is no struct uart_device,
although some of them may be represented by sturct serdev_device.
although some of them may be represented by struct serdev_device.
As both ACPI and Device Tree represent a tree of devices (and their
resources) this implementation follows the Device Tree way as much as
@ -205,7 +205,7 @@ Here is what the ACPI namespace for a SPI slave might look like::
}
...
The SPI device drivers only need to add ACPI IDs in a similar way than with
The SPI device drivers only need to add ACPI IDs in a similar way to
the platform device drivers. Below is an example where we add ACPI support
to at25 SPI eeprom driver (this is meant for the above ACPI snippet)::
@ -362,7 +362,7 @@ These GPIO numbers are controller relative and path "\\_SB.PCI0.GPI0"
specifies the path to the controller. In order to use these GPIOs in Linux
we need to translate them to the corresponding Linux GPIO descriptors.
There is a standard GPIO API for that and is documented in
There is a standard GPIO API for that and it is documented in
Documentation/admin-guide/gpio/.
In the above example we can get the corresponding two GPIO descriptors with
@ -538,8 +538,8 @@ information.
PCI hierarchy representation
============================
Sometimes could be useful to enumerate a PCI device, knowing its position on the
PCI bus.
Sometimes it could be useful to enumerate a PCI device, knowing its position on
the PCI bus.
For example, some systems use PCI devices soldered directly on the mother board,
in a fixed position (ethernet, Wi-Fi, serial ports, etc.). In this conditions it
@ -550,7 +550,7 @@ To identify a PCI device, a complete hierarchical description is required, from
the chipset root port to the final device, through all the intermediate
bridges/switches of the board.
For example, let us assume to have a system with a PCIe serial port, an
For example, let's assume we have a system with a PCIe serial port, an
Exar XR17V3521, soldered on the main board. This UART chip also includes
16 GPIOs and we want to add the property ``gpio-line-names`` [1] to these pins.
In this case, the ``lspci`` output for this component is::
@ -593,8 +593,8 @@ of the chipset bridge (also called "root port") with address::
Bus: 0 - Device: 14 - Function: 1
To find this information is necessary disassemble the BIOS ACPI tables, in
particular the DSDT (see also [2])::
To find this information, it is necessary to disassemble the BIOS ACPI tables,
in particular the DSDT (see also [2])::
mkdir ~/tables/
cd ~/tables/

View File

@ -41,26 +41,23 @@ But it is likely that they will all eventually be added.
What should an OEM do if they want to support Linux and Windows
using the same BIOS image? Often they need to do something different
for Linux to deal with how Linux is different from Windows.
Here the BIOS should ask exactly what it wants to know:
In this case, the OEM should create custom ASL to be executed by the
Linux kernel and changes to Linux kernel drivers to execute this custom
ASL. The easiest way to accomplish this is to introduce a device specific
method (_DSM) that is called from the Linux kernel.
In the past the kernel used to support something like:
_OSI("Linux-OEM-my_interface_name")
where 'OEM' is needed if this is an OEM-specific hook,
and 'my_interface_name' describes the hook, which could be a
quirk, a bug, or a bug-fix.
In addition, the OEM should send a patch to upstream Linux
via the linux-acpi@vger.kernel.org mailing list. When that patch
is checked into Linux, the OS will answer "YES" when the BIOS
on the OEM's system uses _OSI to ask if the interface is supported
by the OS. Linux distributors can back-port that patch for Linux
pre-installs, and it will be included by all distributions that
re-base to upstream. If the distribution can not update the kernel binary,
they can also add an acpi_osi=Linux-OEM-my_interface_name
cmdline parameter to the boot loader, as needed.
If the string refers to a feature where the upstream kernel
eventually grows support, a patch should be sent to remove
the string when that support is added to the kernel.
However this was discovered to be abused by other BIOS vendors to change
completely unrelated code on completely unrelated systems. This prompted
an evaluation of all of it's uses. This uncovered that they aren't needed
for any of the original reasons. As such, the kernel will not respond to
any custom Linux-* strings by default.
That was easy. Read on, to find out how to do it wrong.

View File

@ -1,11 +1,5 @@
.. SPDX-License-Identifier: GPL-2.0
.. The Linux Kernel documentation master file, created by
sphinx-quickstart on Fri Feb 12 13:51:46 2016.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
.. _linux_doc:
The Linux Kernel documentation
@ -18,26 +12,73 @@ documents into a coherent whole. Please note that improvements to the
documentation are welcome; join the linux-doc list at vger.kernel.org if
you want to help out.
Licensing documentation
-----------------------
Working with the development community
--------------------------------------
The following describes the license of the Linux kernel source code
(GPLv2), how to properly mark the license of individual files in the source
tree, as well as links to the full license text.
The essential guides for interacting with the kernel's development
community and getting your work upstream.
.. toctree::
:maxdepth: 1
process/development-process
process/submitting-patches
Code of conduct <process/code-of-conduct>
maintainer/index
All development-process docs <process/index>
Internal API manuals
--------------------
Manuals for use by developers working to interface with the rest of the
kernel.
.. toctree::
:maxdepth: 1
core-api/index
driver-api/index
subsystem-apis
Locking in the kernel <locking/index>
Development tools and processes
-------------------------------
Various other manuals with useful information for all kernel developers.
.. toctree::
:maxdepth: 1
process/license-rules
doc-guide/index
dev-tools/index
dev-tools/testing-overview
kernel-hacking/index
trace/index
fault-injection/index
livepatch/index
rust/index
* :ref:`kernel_licensing`
User-oriented documentation
---------------------------
The following manuals are written for *users* of the kernel — those who are
trying to get it to work optimally on a given system.
trying to get it to work optimally on a given system and application
developers seeking information on the kernel's user-space APIs.
.. toctree::
:maxdepth: 2
:maxdepth: 1
admin-guide/index
kbuild/index
The kernel build system <kbuild/index>
admin-guide/reporting-issues.rst
User-space tools <tools/index>
userspace-api/index
See also: the `Linux man pages <https://www.kernel.org/doc/man-pages/>`_,
which are kept separately from the kernel's own documentation.
Firmware-related documentation
------------------------------
@ -45,106 +86,11 @@ The following holds information on the kernel's expectations regarding the
platform firmwares.
.. toctree::
:maxdepth: 2
:maxdepth: 1
firmware-guide/index
devicetree/index
Application-developer documentation
-----------------------------------
The user-space API manual gathers together documents describing aspects of
the kernel interface as seen by application developers.
.. toctree::
:maxdepth: 2
userspace-api/index
Introduction to kernel development
----------------------------------
These manuals contain overall information about how to develop the kernel.
The kernel community is quite large, with thousands of developers
contributing over the course of a year. As with any large community,
knowing how things are done will make the process of getting your changes
merged much easier.
.. toctree::
:maxdepth: 2
process/index
dev-tools/index
doc-guide/index
kernel-hacking/index
trace/index
maintainer/index
fault-injection/index
livepatch/index
Kernel API documentation
------------------------
These books get into the details of how specific kernel subsystems work
from the point of view of a kernel developer. Much of the information here
is taken directly from the kernel source, with supplemental material added
as needed (or at least as we managed to add it — probably *not* all that is
needed).
.. toctree::
:maxdepth: 2
driver-api/index
core-api/index
locking/index
accounting/index
block/index
cdrom/index
cpu-freq/index
fb/index
fpga/index
hid/index
i2c/index
iio/index
isdn/index
infiniband/index
leds/index
netlabel/index
networking/index
pcmcia/index
power/index
target/index
timers/index
spi/index
w1/index
watchdog/index
virt/index
input/index
hwmon/index
gpu/index
security/index
sound/index
crypto/index
filesystems/index
mm/index
bpf/index
usb/index
PCI/index
scsi/index
misc-devices/index
scheduler/index
mhi/index
peci/index
Architecture-agnostic documentation
-----------------------------------
.. toctree::
:maxdepth: 2
asm-annotations
Architecture-specific documentation
-----------------------------------
@ -163,9 +109,8 @@ of the documentation body, or may require some adjustments and/or conversion
to ReStructured Text format, or are simply too old.
.. toctree::
:maxdepth: 2
:maxdepth: 1
tools/index
staging/index

View File

@ -90,7 +90,11 @@ e.g., on Ubuntu for gcc-10::
Or on Fedora::
dnf install gcc-plugin-devel
dnf install gcc-plugin-devel libmpc-devel
Or on Fedora when using cross-compilers that include plugins::
dnf install libmpc-devel
Enable the GCC plugin infrastructure and some plugin(s) you want to use
in the kernel config::
@ -99,6 +103,19 @@ in the kernel config::
CONFIG_GCC_PLUGIN_LATENT_ENTROPY=y
...
Run gcc (native or cross-compiler) to ensure plugin headers are detected::
gcc -print-file-name=plugin
CROSS_COMPILE=arm-linux-gnu- ${CROSS_COMPILE}gcc -print-file-name=plugin
The word "plugin" means they are not detected::
plugin
A full path means they are detected::
/usr/lib/gcc/x86_64-redhat-linux/12/plugin
To compile the minimum tool set including the plugin(s)::
make scripts

View File

@ -48,6 +48,10 @@ KCFLAGS
-------
Additional options to the C compiler (for built-in and modules).
KRUSTFLAGS
----------
Additional options to the Rust compiler (for built-in and modules).
CFLAGS_KERNEL
-------------
Additional options for $(CC) when used to compile
@ -57,6 +61,15 @@ CFLAGS_MODULE
-------------
Additional module specific options to use for $(CC).
RUSTFLAGS_KERNEL
----------------
Additional options for $(RUSTC) when used to compile
code that is compiled as built-in.
RUSTFLAGS_MODULE
----------------
Additional module specific options to use for $(RUSTC).
LDFLAGS_MODULE
--------------
Additional options used for $(LD) when linking modules.
@ -69,6 +82,10 @@ HOSTCXXFLAGS
------------
Additional flags to be passed to $(HOSTCXX) when building host programs.
HOSTRUSTFLAGS
-------------
Additional flags to be passed to $(HOSTRUSTC) when building host programs.
HOSTLDFLAGS
-----------
Additional flags to be passed when linking host programs.

View File

@ -29,8 +29,9 @@ This document describes the Linux kernel Makefiles.
--- 4.1 Simple Host Program
--- 4.2 Composite Host Programs
--- 4.3 Using C++ for host programs
--- 4.4 Controlling compiler options for host programs
--- 4.5 When host programs are actually built
--- 4.4 Using Rust for host programs
--- 4.5 Controlling compiler options for host programs
--- 4.6 When host programs are actually built
=== 5 Userspace Program support
--- 5.1 Simple Userspace Program
@ -835,7 +836,24 @@ Both possibilities are described in the following.
qconf-cxxobjs := qconf.o
qconf-objs := check.o
4.4 Controlling compiler options for host programs
4.4 Using Rust for host programs
--------------------------------
Kbuild offers support for host programs written in Rust. However,
since a Rust toolchain is not mandatory for kernel compilation,
it may only be used in scenarios where Rust is required to be
available (e.g. when ``CONFIG_RUST`` is enabled).
Example::
hostprogs := target
target-rust := y
Kbuild will compile ``target`` using ``target.rs`` as the crate root,
located in the same directory as the ``Makefile``. The crate may
consist of several source files (see ``samples/rust/hostprogs``).
4.5 Controlling compiler options for host programs
--------------------------------------------------
When compiling host programs, it is possible to set specific flags.
@ -867,7 +885,7 @@ Both possibilities are described in the following.
When linking qconf, it will be passed the extra option
"-L$(QTDIR)/lib".
4.5 When host programs are actually built
4.6 When host programs are actually built
-----------------------------------------
Kbuild will only build host-programs when they are referenced
@ -1181,6 +1199,17 @@ When kbuild executes, the following steps are followed (roughly):
The first example utilises the trick that a config option expands
to 'y' when selected.
KBUILD_RUSTFLAGS
$(RUSTC) compiler flags
Default value - see top level Makefile
Append or modify as required per architecture.
Often, the KBUILD_RUSTFLAGS variable depends on the configuration.
Note that target specification file generation (for ``--target``)
is handled in ``scripts/generate_rust_target.rs``.
KBUILD_AFLAGS_KERNEL
Assembler options specific for built-in
@ -1208,6 +1237,19 @@ When kbuild executes, the following steps are followed (roughly):
are used for $(CC).
From commandline CFLAGS_MODULE shall be used (see kbuild.rst).
KBUILD_RUSTFLAGS_KERNEL
$(RUSTC) options specific for built-in
$(KBUILD_RUSTFLAGS_KERNEL) contains extra Rust compiler flags used to
compile resident kernel code.
KBUILD_RUSTFLAGS_MODULE
Options for $(RUSTC) when building modules
$(KBUILD_RUSTFLAGS_MODULE) is used to add arch-specific options that
are used for $(RUSTC).
From commandline RUSTFLAGS_MODULE shall be used (see kbuild.rst).
KBUILD_LDFLAGS_MODULE
Options for $(LD) when linking modules

View File

@ -39,7 +39,7 @@ as the writer can invalidate a pointer that the reader is following.
Sequence counters (``seqcount_t``)
==================================
This is the the raw counting mechanism, which does not protect against
This is the raw counting mechanism, which does not protect against
multiple writers. Write side critical sections must thus be serialized
by an external lock.

View File

@ -52,7 +52,7 @@ CONTENTS
- Varieties of memory barrier.
- What may not be assumed about memory barriers?
- Data dependency barriers (historical).
- Address-dependency barriers (historical).
- Control dependencies.
- SMP barrier pairing.
- Examples of memory barrier sequences.
@ -187,9 +187,9 @@ As a further example, consider this sequence of events:
B = 4; Q = P;
P = &B; D = *Q;
There is an obvious data dependency here, as the value loaded into D depends on
the address retrieved from P by CPU 2. At the end of the sequence, any of the
following results are possible:
There is an obvious address dependency here, as the value loaded into D depends
on the address retrieved from P by CPU 2. At the end of the sequence, any of
the following results are possible:
(Q == &A) and (D == 1)
(Q == &B) and (D == 2)
@ -391,58 +391,62 @@ Memory barriers come in four basic varieties:
memory system as time progresses. All stores _before_ a write barrier
will occur _before_ all the stores after the write barrier.
[!] Note that write barriers should normally be paired with read or data
dependency barriers; see the "SMP barrier pairing" subsection.
[!] Note that write barriers should normally be paired with read or
address-dependency barriers; see the "SMP barrier pairing" subsection.
(2) Data dependency barriers.
(2) Address-dependency barriers (historical).
A data dependency barrier is a weaker form of read barrier. In the case
where two loads are performed such that the second depends on the result
of the first (eg: the first load retrieves the address to which the second
load will be directed), a data dependency barrier would be required to
make sure that the target of the second load is updated after the address
obtained by the first load is accessed.
An address-dependency barrier is a weaker form of read barrier. In the
case where two loads are performed such that the second depends on the
result of the first (eg: the first load retrieves the address to which
the second load will be directed), an address-dependency barrier would
be required to make sure that the target of the second load is updated
after the address obtained by the first load is accessed.
A data dependency barrier is a partial ordering on interdependent loads
only; it is not required to have any effect on stores, independent loads
or overlapping loads.
An address-dependency barrier is a partial ordering on interdependent
loads only; it is not required to have any effect on stores, independent
loads or overlapping loads.
As mentioned in (1), the other CPUs in the system can be viewed as
committing sequences of stores to the memory system that the CPU being
considered can then perceive. A data dependency barrier issued by the CPU
under consideration guarantees that for any load preceding it, if that
load touches one of a sequence of stores from another CPU, then by the
time the barrier completes, the effects of all the stores prior to that
touched by the load will be perceptible to any loads issued after the data
dependency barrier.
considered can then perceive. An address-dependency barrier issued by
the CPU under consideration guarantees that for any load preceding it,
if that load touches one of a sequence of stores from another CPU, then
by the time the barrier completes, the effects of all the stores prior to
that touched by the load will be perceptible to any loads issued after
the address-dependency barrier.
See the "Examples of memory barrier sequences" subsection for diagrams
showing the ordering constraints.
[!] Note that the first load really has to have a _data_ dependency and
[!] Note that the first load really has to have an _address_ dependency and
not a control dependency. If the address for the second load is dependent
on the first load, but the dependency is through a conditional rather than
actually loading the address itself, then it's a _control_ dependency and
a full read barrier or better is required. See the "Control dependencies"
subsection for more information.
[!] Note that data dependency barriers should normally be paired with
[!] Note that address-dependency barriers should normally be paired with
write barriers; see the "SMP barrier pairing" subsection.
[!] Kernel release v5.9 removed kernel APIs for explicit address-
dependency barriers. Nowadays, APIs for marking loads from shared
variables such as READ_ONCE() and rcu_dereference() provide implicit
address-dependency barriers.
(3) Read (or load) memory barriers.
A read barrier is a data dependency barrier plus a guarantee that all the
LOAD operations specified before the barrier will appear to happen before
all the LOAD operations specified after the barrier with respect to the
other components of the system.
A read barrier is an address-dependency barrier plus a guarantee that all
the LOAD operations specified before the barrier will appear to happen
before all the LOAD operations specified after the barrier with respect to
the other components of the system.
A read barrier is a partial ordering on loads only; it is not required to
have any effect on stores.
Read memory barriers imply data dependency barriers, and so can substitute
for them.
Read memory barriers imply address-dependency barriers, and so can
substitute for them.
[!] Note that read barriers should normally be paired with write barriers;
see the "SMP barrier pairing" subsection.
@ -550,17 +554,21 @@ There are certain things that the Linux kernel memory barriers do not guarantee:
Documentation/core-api/dma-api.rst
DATA DEPENDENCY BARRIERS (HISTORICAL)
-------------------------------------
ADDRESS-DEPENDENCY BARRIERS (HISTORICAL)
----------------------------------------
As of v4.15 of the Linux kernel, an smp_mb() was added to READ_ONCE() for
DEC Alpha, which means that about the only people who need to pay attention
to this section are those working on DEC Alpha architecture-specific code
and those working on READ_ONCE() itself. For those who need it, and for
those who are interested in the history, here is the story of
data-dependency barriers.
address-dependency barriers.
The usage requirements of data dependency barriers are a little subtle, and
[!] While address dependencies are observed in both load-to-load and
load-to-store relations, address-dependency barriers are not necessary
for load-to-store situations.
The requirement of address-dependency barriers is a little subtle, and
it's not always obvious that they're needed. To illustrate, consider the
following sequence of events:
@ -570,11 +578,14 @@ following sequence of events:
B = 4;
<write barrier>
WRITE_ONCE(P, &B);
Q = READ_ONCE(P);
Q = READ_ONCE_OLD(P);
D = *Q;
There's a clear data dependency here, and it would seem that by the end of the
sequence, Q must be either &A or &B, and that:
[!] READ_ONCE_OLD() corresponds to READ_ONCE() of pre-4.15 kernel, which
doesn't imply an address-dependency barrier.
There's a clear address dependency here, and it would seem that by the end of
the sequence, Q must be either &A or &B, and that:
(Q == &A) implies (D == 1)
(Q == &B) implies (D == 4)
@ -588,8 +599,8 @@ While this may seem like a failure of coherency or causality maintenance, it
isn't, and this behaviour can be observed on certain real CPUs (such as the DEC
Alpha).
To deal with this, a data dependency barrier or better must be inserted
between the address load and the data load:
To deal with this, READ_ONCE() provides an implicit address-dependency barrier
since kernel release v4.15:
CPU 1 CPU 2
=============== ===============
@ -598,7 +609,7 @@ between the address load and the data load:
<write barrier>
WRITE_ONCE(P, &B);
Q = READ_ONCE(P);
<data dependency barrier>
<implicit address-dependency barrier>
D = *Q;
This enforces the occurrence of one of the two implications, and prevents the
@ -615,13 +626,13 @@ odd-numbered bank is idle, one can see the new value of the pointer P (&B),
but the old value of the variable B (2).
A data-dependency barrier is not required to order dependent writes
because the CPUs that the Linux kernel supports don't do writes
until they are certain (1) that the write will actually happen, (2)
of the location of the write, and (3) of the value to be written.
An address-dependency barrier is not required to order dependent writes
because the CPUs that the Linux kernel supports don't do writes until they
are certain (1) that the write will actually happen, (2) of the location of
the write, and (3) of the value to be written.
But please carefully read the "CONTROL DEPENDENCIES" section and the
Documentation/RCU/rcu_dereference.rst file: The compiler can and does
break dependencies in a great many highly creative ways.
Documentation/RCU/rcu_dereference.rst file: The compiler can and does break
dependencies in a great many highly creative ways.
CPU 1 CPU 2
=============== ===============
@ -629,12 +640,12 @@ break dependencies in a great many highly creative ways.
B = 4;
<write barrier>
WRITE_ONCE(P, &B);
Q = READ_ONCE(P);
Q = READ_ONCE_OLD(P);
WRITE_ONCE(*Q, 5);
Therefore, no data-dependency barrier is required to order the read into
Therefore, no address-dependency barrier is required to order the read into
Q with the store into *Q. In other words, this outcome is prohibited,
even without a data-dependency barrier:
even without an implicit address-dependency barrier of modern READ_ONCE():
(Q == &B) && (B == 4)
@ -645,12 +656,12 @@ can be used to record rare error conditions and the like, and the CPUs'
naturally occurring ordering prevents such records from being lost.
Note well that the ordering provided by a data dependency is local to
Note well that the ordering provided by an address dependency is local to
the CPU containing it. See the section on "Multicopy atomicity" for
more information.
The data dependency barrier is very important to the RCU system,
The address-dependency barrier is very important to the RCU system,
for example. See rcu_assign_pointer() and rcu_dereference() in
include/linux/rcupdate.h. This permits the current target of an RCU'd
pointer to be replaced with a new modified target, without the replacement
@ -667,20 +678,21 @@ not understand them. The purpose of this section is to help you prevent
the compiler's ignorance from breaking your code.
A load-load control dependency requires a full read memory barrier, not
simply a data dependency barrier to make it work correctly. Consider the
following bit of code:
simply an (implicit) address-dependency barrier to make it work correctly.
Consider the following bit of code:
q = READ_ONCE(a);
<implicit address-dependency barrier>
if (q) {
<data dependency barrier> /* BUG: No data dependency!!! */
/* BUG: No address dependency!!! */
p = READ_ONCE(b);
}
This will not have the desired effect because there is no actual data
This will not have the desired effect because there is no actual address
dependency, but rather a control dependency that the CPU may short-circuit
by attempting to predict the outcome in advance, so that other CPUs see
the load from b as having happened before the load from a. In such a
case what's actually required is:
the load from b as having happened before the load from a. In such a case
what's actually required is:
q = READ_ONCE(a);
if (q) {
@ -927,9 +939,9 @@ General barriers pair with each other, though they also pair with most
other types of barriers, albeit without multicopy atomicity. An acquire
barrier pairs with a release barrier, but both may also pair with other
barriers, including of course general barriers. A write barrier pairs
with a data dependency barrier, a control dependency, an acquire barrier,
with an address-dependency barrier, a control dependency, an acquire barrier,
a release barrier, a read barrier, or a general barrier. Similarly a
read barrier, control dependency, or a data dependency barrier pairs
read barrier, control dependency, or an address-dependency barrier pairs
with a write barrier, an acquire barrier, a release barrier, or a
general barrier:
@ -948,7 +960,7 @@ Or:
a = 1;
<write barrier>
WRITE_ONCE(b, &a); x = READ_ONCE(b);
<data dependency barrier>
<implicit address-dependency barrier>
y = *x;
Or even:
@ -968,8 +980,8 @@ Basically, the read barrier always has to be there, even though it can be of
the "weaker" type.
[!] Note that the stores before the write barrier would normally be expected to
match the loads after the read barrier or the data dependency barrier, and vice
versa:
match the loads after the read barrier or the address-dependency barrier, and
vice versa:
CPU 1 CPU 2
=================== ===================
@ -1021,8 +1033,8 @@ STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E
V
Secondly, data dependency barriers act as partial orderings on data-dependent
loads. Consider the following sequence of events:
Secondly, address-dependency barriers act as partial orderings on address-
dependent loads. Consider the following sequence of events:
CPU 1 CPU 2
======================= =======================
@ -1067,8 +1079,8 @@ effectively random order, despite the write barrier issued by CPU 1:
In the above example, CPU 2 perceives that B is 7, despite the load of *C
(which would be B) coming after the LOAD of C.
If, however, a data dependency barrier were to be placed between the load of C
and the load of *C (ie: B) on CPU 2:
If, however, an address-dependency barrier were to be placed between the load
of C and the load of *C (ie: B) on CPU 2:
CPU 1 CPU 2
======================= =======================
@ -1078,7 +1090,7 @@ and the load of *C (ie: B) on CPU 2:
<write barrier>
STORE C = &B LOAD X
STORE D = 4 LOAD C (gets &B)
<data dependency barrier>
<address-dependency barrier>
LOAD *C (reads B)
then the following will occur:
@ -1101,7 +1113,7 @@ then the following will occur:
| +-------+ | |
| | X->9 |------>| |
| +-------+ | |
Makes sure all effects ---> \ ddddddddddddddddd | |
Makes sure all effects ---> \ aaaaaaaaaaaaaaaaa | |
prior to the store of C \ +-------+ | |
are perceptible to ----->| B->2 |------>| |
subsequent loads +-------+ | |
@ -1292,7 +1304,7 @@ Which might appear as this:
LOAD with immediate effect : : +-------+
Placing a read barrier or a data dependency barrier just before the second
Placing a read barrier or an address-dependency barrier just before the second
load:
CPU 1 CPU 2
@ -1816,20 +1828,20 @@ which may then reorder things however it wishes.
CPU MEMORY BARRIERS
-------------------
The Linux kernel has eight basic CPU memory barriers:
The Linux kernel has seven basic CPU memory barriers:
TYPE MANDATORY SMP CONDITIONAL
=============== ======================= ===========================
GENERAL mb() smp_mb()
WRITE wmb() smp_wmb()
READ rmb() smp_rmb()
DATA DEPENDENCY READ_ONCE()
TYPE MANDATORY SMP CONDITIONAL
======================= =============== ===============
GENERAL mb() smp_mb()
WRITE wmb() smp_wmb()
READ rmb() smp_rmb()
ADDRESS DEPENDENCY READ_ONCE()
All memory barriers except the data dependency barriers imply a compiler
barrier. Data dependencies do not impose any additional compiler ordering.
All memory barriers except the address-dependency barriers imply a compiler
barrier. Address dependencies do not impose any additional compiler ordering.
Aside: In the case of data dependencies, the compiler would be expected
Aside: In the case of address dependencies, the compiler would be expected
to issue the loads in the correct order (eg. `a[b]` would have to load
the value of b before loading a[b]), however there is no guarantee in
the C specification that the compiler may not speculate the value of b
@ -2749,7 +2761,8 @@ is discarded from the CPU's cache and reloaded. To deal with this, the
appropriate part of the kernel must invalidate the overlapping bits of the
cache on each CPU.
See Documentation/core-api/cachetlb.rst for more information on cache management.
See Documentation/core-api/cachetlb.rst for more information on cache
management.
CACHE COHERENCY VS MMIO
@ -2889,8 +2902,8 @@ AND THEN THERE'S THE ALPHA
The DEC Alpha CPU is one of the most relaxed CPUs there is. Not only that,
some versions of the Alpha CPU have a split data cache, permitting them to have
two semantically-related cache lines updated at separate times. This is where
the data dependency barrier really becomes necessary as this synchronises both
caches with the memory coherence system, thus making it seem like pointer
the address-dependency barrier really becomes necessary as this synchronises
both caches with the memory coherence system, thus making it seem like pointer
changes vs new data occur in the right order.
The Alpha defines the Linux kernel's memory model, although as of v4.15

View File

@ -197,7 +197,7 @@ unevictable list for the memory cgroup and node being scanned.
There may be situations where a page is mapped into a VM_LOCKED VMA, but the
page is not marked as PG_mlocked. Such pages will make it all the way to
shrink_active_list() or shrink_page_list() where they will be detected when
vmscan walks the reverse map in page_referenced() or try_to_unmap(). The page
vmscan walks the reverse map in folio_referenced() or try_to_unmap(). The page
is culled to the unevictable list when it is released by the shrinker.
To "cull" an unevictable page, vmscan simply puts the page back on the LRU list
@ -267,7 +267,7 @@ the LRU. Such pages can be "noticed" by memory management in several places:
(4) in the fault path and when a VM_LOCKED stack segment is expanded; or
(5) as mentioned above, in vmscan:shrink_page_list() when attempting to
reclaim a page in a VM_LOCKED VMA by page_referenced() or try_to_unmap().
reclaim a page in a VM_LOCKED VMA by folio_referenced() or try_to_unmap().
mlocked pages become unlocked and rescued from the unevictable list when:
@ -547,7 +547,7 @@ vmscan's shrink_inactive_list() and shrink_page_list() also divert obviously
unevictable pages found on the inactive lists to the appropriate memory cgroup
and node unevictable list.
rmap's page_referenced_one(), called via vmscan's shrink_active_list() or
rmap's folio_referenced_one(), called via vmscan's shrink_active_list() or
shrink_page_list(), and rmap's try_to_unmap_one() called via shrink_page_list(),
check for (3) pages still mapped into VM_LOCKED VMAs, and call mlock_vma_page()
to correct them. Such pages are culled to the unevictable list when released

View File

@ -220,6 +220,8 @@ Userspace to kernel:
``ETHTOOL_MSG_PHC_VCLOCKS_GET`` get PHC virtual clocks info
``ETHTOOL_MSG_MODULE_SET`` set transceiver module parameters
``ETHTOOL_MSG_MODULE_GET`` get transceiver module parameters
``ETHTOOL_MSG_PSE_SET`` set PSE parameters
``ETHTOOL_MSG_PSE_GET`` get PSE parameters
===================================== =================================
Kernel to userspace:
@ -260,6 +262,7 @@ Kernel to userspace:
``ETHTOOL_MSG_STATS_GET_REPLY`` standard statistics
``ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY`` PHC virtual clocks info
``ETHTOOL_MSG_MODULE_GET_REPLY`` transceiver module parameters
``ETHTOOL_MSG_PSE_GET_REPLY`` PSE parameters
======================================== =================================
``GET`` requests are sent by userspace applications to retrieve device
@ -1627,6 +1630,62 @@ For SFF-8636 modules, low power mode is forced by the host according to table
For CMIS modules, low power mode is forced by the host according to table 6-12
in revision 5.0 of the specification.
PSE_GET
=======
Gets PSE attributes.
Request contents:
===================================== ====== ==========================
``ETHTOOL_A_PSE_HEADER`` nested request header
===================================== ====== ==========================
Kernel response contents:
====================================== ====== =============================
``ETHTOOL_A_PSE_HEADER`` nested reply header
``ETHTOOL_A_PODL_PSE_ADMIN_STATE`` u32 Operational state of the PoDL
PSE functions
``ETHTOOL_A_PODL_PSE_PW_D_STATUS`` u32 power detection status of the
PoDL PSE.
====================================== ====== =============================
When set, the optional ``ETHTOOL_A_PODL_PSE_ADMIN_STATE`` attribute identifies
the operational state of the PoDL PSE functions. The operational state of the
PSE function can be changed using the ``ETHTOOL_A_PODL_PSE_ADMIN_CONTROL``
action. This option is corresponding to ``IEEE 802.3-2018`` 30.15.1.1.2
aPoDLPSEAdminState. Possible values are:
.. kernel-doc:: include/uapi/linux/ethtool.h
:identifiers: ethtool_podl_pse_admin_state
When set, the optional ``ETHTOOL_A_PODL_PSE_PW_D_STATUS`` attribute identifies
the power detection status of the PoDL PSE. The status depend on internal PSE
state machine and automatic PD classification support. This option is
corresponding to ``IEEE 802.3-2018`` 30.15.1.1.3 aPoDLPSEPowerDetectionStatus.
Possible values are:
.. kernel-doc:: include/uapi/linux/ethtool.h
:identifiers: ethtool_podl_pse_pw_d_status
PSE_SET
=======
Sets PSE parameters.
Request contents:
====================================== ====== =============================
``ETHTOOL_A_PSE_HEADER`` nested request header
``ETHTOOL_A_PODL_PSE_ADMIN_CONTROL`` u32 Control PoDL PSE Admin state
====================================== ====== =============================
When set, the optional ``ETHTOOL_A_PODL_PSE_ADMIN_CONTROL`` attribute is used
to control PoDL PSE Admin functions. This option is implementing
``IEEE 802.3-2018`` 30.15.1.2.1 acPoDLPSEAdminControl. See
``ETHTOOL_A_PODL_PSE_ADMIN_STATE`` for supported values.
Request translation
===================

View File

@ -256,8 +256,10 @@ The tags in common use are:
- Cc: the named person received a copy of the patch and had the
opportunity to comment on it.
Be careful in the addition of tags to your patches: only Cc: is appropriate
for addition without the explicit permission of the person named.
Be careful in the addition of tags to your patches, as only Cc: is appropriate
for addition without the explicit permission of the person named; using
Reported-by: is fine most of the time as well, but ask for permission if
the bug was reported in private.
Sending the patch

View File

@ -31,6 +31,8 @@ you probably needn't concern yourself with pcmciautils.
====================== =============== ========================================
GNU C 5.1 gcc --version
Clang/LLVM (optional) 11.0.0 clang --version
Rust (optional) 1.62.0 rustc --version
bindgen (optional) 0.56.0 bindgen --version
GNU make 3.81 make --version
bash 4.2 bash --version
binutils 2.23 ld -v
@ -80,6 +82,29 @@ kernels. Older releases aren't guaranteed to work, and we may drop workarounds
from the kernel that were used to support older versions. Please see additional
docs on :ref:`Building Linux with Clang/LLVM <kbuild_llvm>`.
Rust (optional)
---------------
A particular version of the Rust toolchain is required. Newer versions may or
may not work because the kernel depends on some unstable Rust features, for
the moment.
Each Rust toolchain comes with several "components", some of which are required
(like ``rustc``) and some that are optional. The ``rust-src`` component (which
is optional) needs to be installed to build the kernel. Other components are
useful for developing.
Please see Documentation/rust/quick-start.rst for instructions on how to
satisfy the build requirements of Rust support. In particular, the ``Makefile``
target ``rustavailable`` is useful to check why the Rust toolchain may not
be detected.
bindgen (optional)
------------------
``bindgen`` is used to generate the Rust bindings to the C side of the kernel.
It depends on ``libclang``.
Make
----
@ -348,6 +373,12 @@ Sphinx
Please see :ref:`sphinx_install` in :ref:`Documentation/doc-guide/sphinx.rst <sphinxdoc>`
for details about Sphinx requirements.
rustdoc
-------
``rustdoc`` is used to generate the documentation for Rust code. Please see
Documentation/rust/general-information.rst for more information.
Getting updated software
========================
@ -364,6 +395,16 @@ Clang/LLVM
- :ref:`Getting LLVM <getting_llvm>`.
Rust
----
- Documentation/rust/quick-start.rst.
bindgen
-------
- Documentation/rust/quick-start.rst.
Make
----

View File

@ -51,7 +51,7 @@ the Technical Advisory Board (TAB) or other maintainers if you're
uncertain how to handle situations that come up. It will not be
considered a violation report unless you want it to be. If you are
uncertain about approaching the TAB or any other maintainers, please
reach out to our conflict mediator, Mishi Choudhary <mishi@linux.com>.
reach out to our conflict mediator, Joanna Lee <joanna.lee@gesmer.com>.
In the end, "be kind to each other" is really what the end goal is for
everybody. We know everyone is human and we all fail at times, but the
@ -127,10 +127,12 @@ are listed at https://kernel.org/code-of-conduct.html. Members can not
access reports made before they joined or after they have left the
committee.
The initial Code of Conduct Committee consists of volunteer members of
the TAB, as well as a professional mediator acting as a neutral third
party. The first task of the committee is to establish documented
processes, which will be made public.
The Code of Conduct Committee consists of volunteer community members
appointed by the TAB, as well as a professional mediator acting as a
neutral third party. The processes the Code of Conduct committee will
use to address reports is varied and will depend on the individual
circumstance, however, this file serves as documentation for the
general process used.
Any member of the committee, including the mediator, can be contacted
directly if a reporter does not wish to include the full committee in a
@ -141,16 +143,16 @@ processes (see above) and consults with the TAB as needed and
appropriate, for instance to request and receive information about the
kernel community.
Any decisions by the committee will be brought to the TAB, for
implementation of enforcement with the relevant maintainers if needed.
A decision by the Code of Conduct Committee can be overturned by the TAB
by a two-thirds vote.
Any decisions regarding enforcement recommendations will be brought to
the TAB for implementation of enforcement with the relevant maintainers
if needed. A decision by the Code of Conduct Committee can be overturned
by the TAB by a two-thirds vote.
At quarterly intervals, the Code of Conduct Committee and TAB will
provide a report summarizing the anonymised reports that the Code of
Conduct committee has received and their status, as well details of any
overridden decisions including complete and identifiable voting details.
We expect to establish a different process for Code of Conduct Committee
staffing beyond the bootstrap period. This document will be updated
with that information when this occurs.
Because how we interpret and enforce the Code of Conduct will evolve over
time, this document will be updated when necessary to reflect any
changes.

View File

@ -1186,6 +1186,68 @@ expression used. For instance:
#endif /* CONFIG_SOMETHING */
22) Do not crash the kernel
---------------------------
In general, the decision to crash the kernel belongs to the user, rather
than to the kernel developer.
Avoid panic()
*************
panic() should be used with care and primarily only during system boot.
panic() is, for example, acceptable when running out of memory during boot and
not being able to continue.
Use WARN() rather than BUG()
****************************
Do not add new code that uses any of the BUG() variants, such as BUG(),
BUG_ON(), or VM_BUG_ON(). Instead, use a WARN*() variant, preferably
WARN_ON_ONCE(), and possibly with recovery code. Recovery code is not
required if there is no reasonable way to at least partially recover.
"I'm too lazy to do error handling" is not an excuse for using BUG(). Major
internal corruptions with no way of continuing may still use BUG(), but need
good justification.
Use WARN_ON_ONCE() rather than WARN() or WARN_ON()
**************************************************
WARN_ON_ONCE() is generally preferred over WARN() or WARN_ON(), because it
is common for a given warning condition, if it occurs at all, to occur
multiple times. This can fill up and wrap the kernel log, and can even slow
the system enough that the excessive logging turns into its own, additional
problem.
Do not WARN lightly
*******************
WARN*() is intended for unexpected, this-should-never-happen situations.
WARN*() macros are not to be used for anything that is expected to happen
during normal operation. These are not pre- or post-condition asserts, for
example. Again: WARN*() must not be used for a condition that is expected
to trigger easily, for example, by user space actions. pr_warn_once() is a
possible alternative, if you need to notify the user of a problem.
Do not worry about panic_on_warn users
**************************************
A few more words about panic_on_warn: Remember that ``panic_on_warn`` is an
available kernel option, and that many users set this option. This is why
there is a "Do not WARN lightly" writeup, above. However, the existence of
panic_on_warn users is not a valid reason to avoid the judicious use
WARN*(). That is because, whoever enables panic_on_warn has explicitly
asked the kernel to crash if a WARN*() fires, and such users must be
prepared to deal with the consequences of a system that is somewhat more
likely to crash.
Use BUILD_BUG_ON() for compile-time assertions
**********************************************
The use of BUILD_BUG_ON() is acceptable and encouraged, because it is a
compile-time assertion that has no effect at runtime.
Appendix I) References
----------------------

View File

@ -138,17 +138,20 @@ be NUL terminated. This can lead to various linear read overflows and
other misbehavior due to the missing termination. It also NUL-pads
the destination buffer if the source contents are shorter than the
destination buffer size, which may be a needless performance penalty
for callers using only NUL-terminated strings. The safe replacement is
for callers using only NUL-terminated strings.
When the destination is required to be NUL-terminated, the replacement is
strscpy(), though care must be given to any cases where the return value
of strncpy() was used, since strscpy() does not return a pointer to the
destination, but rather a count of non-NUL bytes copied (or negative
errno when it truncates). Any cases still needing NUL-padding should
instead use strscpy_pad().
If a caller is using non-NUL-terminated strings, strncpy() can
still be used, but destinations should be marked with the `__nonstring
If a caller is using non-NUL-terminated strings, strtomem() should be
used, and the destinations should be marked with the `__nonstring
<https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html>`_
attribute to avoid future compiler warnings.
attribute to avoid future compiler warnings. For cases still needing
NUL-padding, strtomem_pad() can be used.
strlcpy()
---------

View File

@ -5,6 +5,7 @@
.. _process_index:
=============================================
Working with the kernel development community
=============================================

View File

@ -121,57 +121,56 @@ edit your ``~/.gnupg/gpg-agent.conf`` file to set your own values::
to remove anything you had in place for older versions of GnuPG, as
it may not be doing the right thing any more.
Set up a refresh cronjob
~~~~~~~~~~~~~~~~~~~~~~~~
.. _protect_your_key:
You will need to regularly refresh your keyring in order to get the
latest changes on other people's public keys, which is best done with a
daily cronjob::
@daily /usr/bin/gpg2 --refresh >/dev/null 2>&1
Check the full path to your ``gpg`` or ``gpg2`` command and use the
``gpg2`` command if regular ``gpg`` for you is the legacy GnuPG v.1.
.. _master_key:
Protect your master PGP key
===========================
Protect your PGP key
====================
This guide assumes that you already have a PGP key that you use for Linux
kernel development purposes. If you do not yet have one, please see the
"`Protecting Code Integrity`_" document mentioned earlier for guidance
on how to create a new one.
You should also make a new key if your current one is weaker than 2048 bits
(RSA).
You should also make a new key if your current one is weaker than 2048
bits (RSA).
Master key vs. Subkeys
----------------------
Understanding PGP Subkeys
-------------------------
Subkeys are fully independent PGP keypairs that are tied to the "master"
key using certifying key signatures (certificates). It is important to
understand the following:
A PGP key rarely consists of a single keypair -- usually it is a
collection of independent subkeys that can be used for different
purposes based on their capabilities, assigned at their creation time.
PGP defines four capabilities that a key can have:
1. There are no technical differences between the "master key" and "subkeys."
2. At creation time, we assign functional limitations to each key by
giving it specific capabilities.
3. A PGP key can have 4 capabilities:
- **[S]** keys can be used for signing
- **[E]** keys can be used for encryption
- **[A]** keys can be used for authentication
- **[C]** keys can be used for certifying other keys
- **[S]** key can be used for signing
- **[E]** key can be used for encryption
- **[A]** key can be used for authentication
- **[C]** key can be used for certifying other keys
The key with the **[C]** capability is often called the "master" key,
but this terminology is misleading because it implies that the Certify
key can be used in place of any of other subkey on the same chain (like
a physical "master key" can be used to open the locks made for other
keys). Since this is not the case, this guide will refer to it as "the
Certify key" to avoid any ambiguity.
4. A single key may have multiple capabilities.
5. A subkey is fully independent from the master key. A message
encrypted to a subkey cannot be decrypted with the master key. If you
lose your private subkey, it cannot be recreated from the master key
in any way.
It is critical to fully understand the following:
The key carrying the **[C]** (certify) capability is considered the
"master" key because it is the only key that can be used to indicate
relationship with other keys. Only the **[C]** key can be used to:
1. All subkeys are fully independent from each other. If you lose a
private subkey, it cannot be restored or recreated from any other
private key on your chain.
2. With the exception of the Certify key, there can be multiple subkeys
with identical capabilities (e.g. you can have 2 valid encryption
subkeys, 3 valid signing subkeys, but only one valid certification
subkey). All subkeys are fully independent -- a message encrypted to
one **[E]** subkey cannot be decrypted with any other **[E]** subkey
you may also have.
3. A single subkey may have multiple capabilities (e.g. your **[C]** key
can also be your **[S]** key).
The key carrying the **[C]** (certify) capability is the only key that
can be used to indicate relationship with other keys. Only the **[C]**
key can be used to:
- add or revoke other keys (subkeys) with S/E/A capabilities
- add, change or revoke identities (uids) associated with the key
@ -180,7 +179,7 @@ relationship with other keys. Only the **[C]** key can be used to:
By default, GnuPG creates the following when generating new keys:
- A master key carrying both Certify and Sign capabilities (**[SC]**)
- One subkey carrying both Certify and Sign capabilities (**[SC]**)
- A separate subkey with the Encryption capability (**[E]**)
If you used the default parameters when generating your key, then that
@ -192,9 +191,6 @@ for example::
uid [ultimate] Alice Dev <adev@kernel.org>
ssb rsa2048 2018-01-23 [E] [expires: 2020-01-23]
Any key carrying the **[C]** capability is your master key, regardless
of any other capabilities it may have assigned to it.
The long line under the ``sec`` entry is your key fingerprint --
whenever you see ``[fpr]`` in the examples below, that 40-character
string is what it refers to.
@ -215,37 +211,30 @@ strong passphrase. To set it or change it, use::
Create a separate Signing subkey
--------------------------------
Our goal is to protect your master key by moving it to offline media, so
if you only have a combined **[SC]** key, then you should create a separate
signing subkey::
Our goal is to protect your Certify key by moving it to offline media,
so if you only have a combined **[SC]** key, then you should create a
separate signing subkey::
$ gpg --quick-addkey [fpr] ed25519 sign
Remember to tell the keyservers about this change, so others can pull down
your new subkey::
$ gpg --send-key [fpr]
.. note:: ECC support in GnuPG
GnuPG 2.1 and later has full support for Elliptic Curve
Cryptography, with ability to combine ECC subkeys with traditional
RSA master keys. The main upside of ECC cryptography is that it is
much faster computationally and creates much smaller signatures when
RSA keys. The main upside of ECC cryptography is that it is much
faster computationally and creates much smaller signatures when
compared byte for byte with 2048+ bit RSA keys. Unless you plan on
using a smartcard device that does not support ECC operations, we
recommend that you create an ECC signing subkey for your kernel
work.
If for some reason you prefer to stay with RSA subkeys, just replace
"ed25519" with "rsa2048" in the above command. Additionally, if you
plan to use a hardware device that does not support ED25519 ECC
keys, like Nitrokey Pro or a Yubikey, then you should use
"nistp256" instead or "ed25519."
Note, that if you plan to use a hardware device that does not
support ED25519 ECC keys, you should choose "nistp256" instead or
"ed25519."
Back up your master key for disaster recovery
---------------------------------------------
Back up your Certify key for disaster recovery
----------------------------------------------
The more signatures you have on your PGP key from other developers, the
more reasons you have to create a backup version that lives on something
@ -277,9 +266,7 @@ home, such as your bank vault.
Your printer is probably no longer a simple dumb device connected to
your parallel port, but since the output is still encrypted with
your passphrase, printing out even to "cloud-integrated" modern
printers should remain a relatively safe operation. One option is to
change the passphrase on your master key immediately after you are
done with paperkey.
printers should remain a relatively safe operation.
Back up your whole GnuPG directory
----------------------------------
@ -300,7 +287,7 @@ will use for backup purposes. You will need to encrypt them using LUKS
-- refer to your distro's documentation on how to accomplish this.
For the encryption passphrase, you can use the same one as on your
master key.
PGP key.
Once the encryption process is over, re-insert the USB drive and make
sure it gets properly mounted. Copy your entire ``.gnupg`` directory
@ -319,7 +306,7 @@ far away, because you'll need to use it every now and again for things
like editing identities, adding or revoking subkeys, or signing other
people's keys.
Remove the master key from your homedir
Remove the Certify key from your homedir
----------------------------------------
The files in our home directory are not as well protected as we like to
@ -334,7 +321,7 @@ think. They can be leaked or stolen via many different means:
Protecting your key with a good passphrase greatly helps reduce the risk
of any of the above, but passphrases can be discovered via keyloggers,
shoulder-surfing, or any number of other means. For this reason, the
recommended setup is to remove your master key from your home directory
recommended setup is to remove your Certify key from your home directory
and store it on offline storage.
.. warning::
@ -343,7 +330,7 @@ and store it on offline storage.
your GnuPG directory in its entirety. What we are about to do will
render your key useless if you do not have a usable backup!
First, identify the keygrip of your master key::
First, identify the keygrip of your Certify key::
$ gpg --with-keygrip --list-key [fpr]
@ -359,7 +346,7 @@ The output will be something like this::
Keygrip = 3333000000000000000000000000000000000000
Find the keygrip entry that is beneath the ``pub`` line (right under the
master key fingerprint). This will correspond directly to a file in your
Certify key fingerprint). This will correspond directly to a file in your
``~/.gnupg`` directory::
$ cd ~/.gnupg/private-keys-v1.d
@ -369,13 +356,13 @@ master key fingerprint). This will correspond directly to a file in your
3333000000000000000000000000000000000000.key
All you have to do is simply remove the .key file that corresponds to
the master keygrip::
the Certify key keygrip::
$ cd ~/.gnupg/private-keys-v1.d
$ rm 1111000000000000000000000000000000000000.key
Now, if you issue the ``--list-secret-keys`` command, it will show that
the master key is missing (the ``#`` indicates it is not available)::
the Certify key is missing (the ``#`` indicates it is not available)::
$ gpg --list-secret-keys
sec# rsa2048 2018-01-24 [SC] [expires: 2020-01-24]
@ -404,7 +391,7 @@ file, which still contains your private keys.
Move the subkeys to a dedicated crypto device
=============================================
Even though the master key is now safe from being leaked or stolen, the
Even though the Certify key is now safe from being leaked or stolen, the
subkeys are still in your home directory. Anyone who manages to get
their hands on those will be able to decrypt your communication or fake
your signatures (if they know the passphrase). Furthermore, each time a
@ -447,7 +434,8 @@ functionality. There are several options available:
- `Yubikey 5`_: proprietary hardware and software, but cheaper than
Nitrokey Pro and comes available in the USB-C form that is more useful
with newer laptops. Offers additional security features such as FIDO
U2F, among others, and now finally supports ECC keys (NISTP).
U2F, among others, and now finally supports NISTP and ED25519 ECC
keys.
`LWN has a good review`_ of some of the above models, as well as several
others. Your choice will depend on cost, shipping availability in your
@ -460,7 +448,7 @@ geographical region, and open/proprietary hardware considerations.
Foundation.
.. _`Nitrokey Start`: https://shop.nitrokey.com/shop/product/nitrokey-start-6
.. _`Nitrokey Pro 2`: https://shop.nitrokey.com/shop/product/nitrokey-pro-2-3
.. _`Nitrokey Pro 2`: https://shop.nitrokey.com/shop/product/nkpr2-nitrokey-pro-2-3
.. _`Yubikey 5`: https://www.yubico.com/products/yubikey-5-overview/
.. _Gnuk: https://www.fsij.org/doc-gnuk/
.. _`LWN has a good review`: https://lwn.net/Articles/736231/
@ -627,10 +615,10 @@ Other common GnuPG operations
Here is a quick reference for some common operations you'll need to do
with your PGP key.
Mounting your master key offline storage
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Mounting your safe offline storage
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You will need your master key for any of the operations below, so you
You will need your Certify key for any of the operations below, so you
will first need to mount your backup offline storage and tell GnuPG to
use it::
@ -644,7 +632,7 @@ your regular home directory location).
Extending key expiration date
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The master key has the default expiration date of 2 years from the date
The Certify key has the default expiration date of 2 years from the date
of creation. This is done both for security reasons and to make obsolete
keys eventually disappear from keyservers.
@ -685,6 +673,7 @@ remote end.
.. _`Agent Forwarding over SSH`: https://wiki.gnupg.org/AgentForwarding
.. _pgp_with_git:
Using PGP with Git
==================
@ -828,6 +817,63 @@ You can tell git to always sign commits::
.. _verify_identities:
How to work with signed patches
-------------------------------
It is possible to use your PGP key to sign patches sent to kernel
developer mailing lists. Since existing email signature mechanisms
(PGP-Mime or PGP-inline) tend to cause problems with regular code
review tasks, you should use the tool kernel.org created for this
purpose that puts cryptographic attestation signatures into message
headers (a-la DKIM):
- `Patatt Patch Attestation`_
.. _`Patatt Patch Attestation`: https://pypi.org/project/patatt/
Installing and configuring patatt
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Patatt is packaged for many distributions already, so please check there
first. You can also install it from pypi using "``pip install patatt``".
If you already have your PGP key configured with git (via the
``user.signingKey`` configuration parameter), then patatt requires no
further configuration. You can start signing your patches by installing
the git-send-email hook in the repository you want::
patatt install-hook
Now any patches you send with ``git send-email`` will be automatically
signed with your cryptographic signature.
Checking patatt signatures
~~~~~~~~~~~~~~~~~~~~~~~~~~
If you are using ``b4`` to retrieve and apply patches, then it will
automatically attempt to verify all DKIM and patatt signatures it
encounters, for example::
$ b4 am 20220720205013.890942-1-broonie@kernel.org
[...]
Checking attestation on all messages, may take a moment...
---
✓ [PATCH v1 1/3] kselftest/arm64: Correct buffer allocation for SVE Z registers
✓ [PATCH v1 2/3] arm64/sve: Document our actual ABI for clearing registers on syscall
✓ [PATCH v1 3/3] kselftest/arm64: Enforce actual ABI for SVE syscalls
---
✓ Signed: openpgp/broonie@kernel.org
✓ Signed: DKIM/kernel.org
.. note::
Patatt and b4 are still in active development and you should check
the latest documentation for these projects for any new or updated
features.
.. _kernel_identities:
How to verify kernel developer identities
=========================================
@ -899,65 +945,17 @@ the new default in GnuPG v2). To set it, add (or modify) the
trust-model tofu+pgp
How to use keyservers (more) safely
-----------------------------------
Using the kernel.org web of trust repository
--------------------------------------------
If you get a "No public key" error when trying to validate someone's
tag, then you should attempt to lookup that key using a keyserver. It is
important to keep in mind that there is absolutely no guarantee that the
key you retrieve from PGP keyservers belongs to the actual person --
that much is by design. You are supposed to use the Web of Trust to
establish key validity.
Kernel.org maintains a git repository with developers' public keys as a
replacement for replicating keyserver networks that have gone mostly
dark in the past few years. The full documentation for how to set up
that repository as your source of public keys can be found here:
How to properly maintain the Web of Trust is beyond the scope of this
document, simply because doing it properly requires both effort and
dedication that tends to be beyond the caring threshold of most human
beings. Here are some shortcuts that will help you reduce the risk of
importing a malicious key.
- `Kernel developer PGP Keyring`_
First, let's say you've tried to run ``git verify-tag`` but it returned
an error saying the key is not found::
If you are a kernel developer, please consider submitting your key for
inclusion into that keyring.
$ git verify-tag sunxi-fixes-for-4.15-2
gpg: Signature made Sun 07 Jan 2018 10:51:55 PM EST
gpg: using RSA key DA73759BF8619E484E5A3B47389A54219C0F2430
gpg: issuer "wens@...org"
gpg: Can't check signature: No public key
Let's query the keyserver for more info about that key fingerprint (the
fingerprint probably belongs to a subkey, so we can't use it directly
without finding out the ID of the master key it is associated with)::
$ gpg --search DA73759BF8619E484E5A3B47389A54219C0F2430
gpg: data source: hkp://keys.gnupg.net
(1) Chen-Yu Tsai <wens@...org>
4096 bit RSA key C94035C21B4F2AEB, created: 2017-03-14, expires: 2019-03-15
Keys 1-1 of 1 for "DA73759BF8619E484E5A3B47389A54219C0F2430". Enter number(s), N)ext, or Q)uit > q
Locate the ID of the master key in the output, in our example
``C94035C21B4F2AEB``. Now display the key of Linus Torvalds that you
have on your keyring::
$ gpg --list-key torvalds@kernel.org
pub rsa2048 2011-09-20 [SC]
ABAF11C65A2970B130ABE3C479BE3E4300411886
uid [ unknown] Linus Torvalds <torvalds@kernel.org>
sub rsa2048 2011-09-20 [E]
Next, find a trust path from Linus Torvalds to the key-id you found via ``gpg
--search`` of the unknown key. For this, you can use several tools including
https://github.com/mricon/wotmate,
https://git.kernel.org/pub/scm/docs/kernel/pgpkeys.git/tree/graphs, and
https://the.earth.li/~noodles/pathfind.html.
If you get a few decent trust paths, then it's a pretty good indication
that it is a valid key. You can add it to your keyring from the
keyserver now::
$ gpg --recv-key C94035C21B4F2AEB
This process is not perfect, and you are obviously trusting the
administrators of the PGP Pathfinder service to not be malicious (in
fact, this goes against :ref:`devs_not_infra`). However, if you
do not carefully maintain your own web of trust, then it is a marked
improvement over blindly trusting keyservers.
.. _`Kernel developer PGP Keyring`: https://korg.docs.kernel.org/pgpkeys.html

View File

@ -97,6 +97,12 @@ text, like this:
commit <sha1> upstream.
or alternatively:
.. code-block:: none
[ Upstream commit <sha1> ]
Additionally, some patches submitted via :ref:`option_1` may have additional
patch prerequisites which can be cherry-picked. This can be specified in the
following format in the sign-off area:

View File

@ -715,8 +715,8 @@ references.
.. _backtraces:
Backtraces in commit mesages
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Backtraces in commit messages
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Backtraces help document the call chain leading to a problem. However,
not all backtraces are helpful. For example, early boot call chains are

View File

@ -0,0 +1,19 @@
.. SPDX-License-Identifier: GPL-2.0
Arch Support
============
Currently, the Rust compiler (``rustc``) uses LLVM for code generation,
which limits the supported architectures that can be targeted. In addition,
support for building the kernel with LLVM/Clang varies (please see
Documentation/kbuild/llvm.rst). This support is needed for ``bindgen``
which uses ``libclang``.
Below is a general summary of architectures that currently work. Level of
support corresponds to ``S`` values in the ``MAINTAINERS`` file.
============ ================ ==============================================
Architecture Level of support Constraints
============ ================ ==============================================
``x86`` Maintained ``x86_64`` only.
============ ================ ==============================================

View File

@ -0,0 +1,216 @@
.. SPDX-License-Identifier: GPL-2.0
Coding Guidelines
=================
This document describes how to write Rust code in the kernel.
Style & formatting
------------------
The code should be formatted using ``rustfmt``. In this way, a person
contributing from time to time to the kernel does not need to learn and
remember one more style guide. More importantly, reviewers and maintainers
do not need to spend time pointing out style issues anymore, and thus
less patch roundtrips may be needed to land a change.
.. note:: Conventions on comments and documentation are not checked by
``rustfmt``. Thus those are still needed to be taken care of.
The default settings of ``rustfmt`` are used. This means the idiomatic Rust
style is followed. For instance, 4 spaces are used for indentation rather
than tabs.
It is convenient to instruct editors/IDEs to format while typing,
when saving or at commit time. However, if for some reason reformatting
the entire kernel Rust sources is needed at some point, the following can be
run::
make LLVM=1 rustfmt
It is also possible to check if everything is formatted (printing a diff
otherwise), for instance for a CI, with::
make LLVM=1 rustfmtcheck
Like ``clang-format`` for the rest of the kernel, ``rustfmt`` works on
individual files, and does not require a kernel configuration. Sometimes it may
even work with broken code.
Comments
--------
"Normal" comments (i.e. ``//``, rather than code documentation which starts
with ``///`` or ``//!``) are written in Markdown the same way as documentation
comments are, even though they will not be rendered. This improves consistency,
simplifies the rules and allows to move content between the two kinds of
comments more easily. For instance:
.. code-block:: rust
// `object` is ready to be handled now.
f(object);
Furthermore, just like documentation, comments are capitalized at the beginning
of a sentence and ended with a period (even if it is a single sentence). This
includes ``// SAFETY:``, ``// TODO:`` and other "tagged" comments, e.g.:
.. code-block:: rust
// FIXME: The error should be handled properly.
Comments should not be used for documentation purposes: comments are intended
for implementation details, not users. This distinction is useful even if the
reader of the source file is both an implementor and a user of an API. In fact,
sometimes it is useful to use both comments and documentation at the same time.
For instance, for a ``TODO`` list or to comment on the documentation itself.
For the latter case, comments can be inserted in the middle; that is, closer to
the line of documentation to be commented. For any other case, comments are
written after the documentation, e.g.:
.. code-block:: rust
/// Returns a new [`Foo`].
///
/// # Examples
///
// TODO: Find a better example.
/// ```
/// let foo = f(42);
/// ```
// FIXME: Use fallible approach.
pub fn f(x: i32) -> Foo {
// ...
}
One special kind of comments are the ``// SAFETY:`` comments. These must appear
before every ``unsafe`` block, and they explain why the code inside the block is
correct/sound, i.e. why it cannot trigger undefined behavior in any case, e.g.:
.. code-block:: rust
// SAFETY: `p` is valid by the safety requirements.
unsafe { *p = 0; }
``// SAFETY:`` comments are not to be confused with the ``# Safety`` sections
in code documentation. ``# Safety`` sections specify the contract that callers
(for functions) or implementors (for traits) need to abide by. ``// SAFETY:``
comments show why a call (for functions) or implementation (for traits) actually
respects the preconditions stated in a ``# Safety`` section or the language
reference.
Code documentation
------------------
Rust kernel code is not documented like C kernel code (i.e. via kernel-doc).
Instead, the usual system for documenting Rust code is used: the ``rustdoc``
tool, which uses Markdown (a lightweight markup language).
To learn Markdown, there are many guides available out there. For instance,
the one at:
https://commonmark.org/help/
This is how a well-documented Rust function may look like:
.. code-block:: rust
/// Returns the contained [`Some`] value, consuming the `self` value,
/// without checking that the value is not [`None`].
///
/// # Safety
///
/// Calling this method on [`None`] is *[undefined behavior]*.
///
/// [undefined behavior]: https://doc.rust-lang.org/reference/behavior-considered-undefined.html
///
/// # Examples
///
/// ```
/// let x = Some("air");
/// assert_eq!(unsafe { x.unwrap_unchecked() }, "air");
/// ```
pub unsafe fn unwrap_unchecked(self) -> T {
match self {
Some(val) => val,
// SAFETY: The safety contract must be upheld by the caller.
None => unsafe { hint::unreachable_unchecked() },
}
}
This example showcases a few ``rustdoc`` features and some conventions followed
in the kernel:
- The first paragraph must be a single sentence briefly describing what
the documented item does. Further explanations must go in extra paragraphs.
- Unsafe functions must document their safety preconditions under
a ``# Safety`` section.
- While not shown here, if a function may panic, the conditions under which
that happens must be described under a ``# Panics`` section.
Please note that panicking should be very rare and used only with a good
reason. In almost all cases, a fallible approach should be used, typically
returning a ``Result``.
- If providing examples of usage would help readers, they must be written in
a section called ``# Examples``.
- Rust items (functions, types, constants...) must be linked appropriately
(``rustdoc`` will create a link automatically).
- Any ``unsafe`` block must be preceded by a ``// SAFETY:`` comment
describing why the code inside is sound.
While sometimes the reason might look trivial and therefore unneeded,
writing these comments is not just a good way of documenting what has been
taken into account, but most importantly, it provides a way to know that
there are no *extra* implicit constraints.
To learn more about how to write documentation for Rust and extra features,
please take a look at the ``rustdoc`` book at:
https://doc.rust-lang.org/rustdoc/how-to-write-documentation.html
Naming
------
Rust kernel code follows the usual Rust naming conventions:
https://rust-lang.github.io/api-guidelines/naming.html
When existing C concepts (e.g. macros, functions, objects...) are wrapped into
a Rust abstraction, a name as close as reasonably possible to the C side should
be used in order to avoid confusion and to improve readability when switching
back and forth between the C and Rust sides. For instance, macros such as
``pr_info`` from C are named the same in the Rust side.
Having said that, casing should be adjusted to follow the Rust naming
conventions, and namespacing introduced by modules and types should not be
repeated in the item names. For instance, when wrapping constants like:
.. code-block:: c
#define GPIO_LINE_DIRECTION_IN 0
#define GPIO_LINE_DIRECTION_OUT 1
The equivalent in Rust may look like (ignoring documentation):
.. code-block:: rust
pub mod gpio {
pub enum LineDirection {
In = bindings::GPIO_LINE_DIRECTION_IN as _,
Out = bindings::GPIO_LINE_DIRECTION_OUT as _,
}
}
That is, the equivalent of ``GPIO_LINE_DIRECTION_IN`` would be referred to as
``gpio::LineDirection::In``. In particular, it should not be named
``gpio::gpio_line_direction::GPIO_LINE_DIRECTION_IN``.

View File

@ -0,0 +1,79 @@
.. SPDX-License-Identifier: GPL-2.0
General Information
===================
This document contains useful information to know when working with
the Rust support in the kernel.
Code documentation
------------------
Rust kernel code is documented using ``rustdoc``, its built-in documentation
generator.
The generated HTML docs include integrated search, linked items (e.g. types,
functions, constants), source code, etc. They may be read at (TODO: link when
in mainline and generated alongside the rest of the documentation):
http://kernel.org/
The docs can also be easily generated and read locally. This is quite fast
(same order as compiling the code itself) and no special tools or environment
are needed. This has the added advantage that they will be tailored to
the particular kernel configuration used. To generate them, use the ``rustdoc``
target with the same invocation used for compilation, e.g.::
make LLVM=1 rustdoc
To read the docs locally in your web browser, run e.g.::
xdg-open rust/doc/kernel/index.html
To learn about how to write the documentation, please see coding-guidelines.rst.
Extra lints
-----------
While ``rustc`` is a very helpful compiler, some extra lints and analyses are
available via ``clippy``, a Rust linter. To enable it, pass ``CLIPPY=1`` to
the same invocation used for compilation, e.g.::
make LLVM=1 CLIPPY=1
Please note that Clippy may change code generation, thus it should not be
enabled while building a production kernel.
Abstractions vs. bindings
-------------------------
Abstractions are Rust code wrapping kernel functionality from the C side.
In order to use functions and types from the C side, bindings are created.
Bindings are the declarations for Rust of those functions and types from
the C side.
For instance, one may write a ``Mutex`` abstraction in Rust which wraps
a ``struct mutex`` from the C side and calls its functions through the bindings.
Abstractions are not available for all the kernel internal APIs and concepts,
but it is intended that coverage is expanded as time goes on. "Leaf" modules
(e.g. drivers) should not use the C bindings directly. Instead, subsystems
should provide as-safe-as-possible abstractions as needed.
Conditional compilation
-----------------------
Rust code has access to conditional compilation based on the kernel
configuration:
.. code-block:: rust
#[cfg(CONFIG_X)] // Enabled (`y` or `m`)
#[cfg(CONFIG_X="y")] // Enabled as a built-in (`y`)
#[cfg(CONFIG_X="m")] // Enabled as a module (`m`)
#[cfg(not(CONFIG_X))] // Disabled

View File

@ -0,0 +1,22 @@
.. SPDX-License-Identifier: GPL-2.0
Rust
====
Documentation related to Rust within the kernel. To start using Rust
in the kernel, please read the quick-start.rst guide.
.. toctree::
:maxdepth: 1
quick-start
general-information
coding-guidelines
arch-support
.. only:: subproject and html
Indices
=======
* :ref:`genindex`

View File

@ -0,0 +1,232 @@
.. SPDX-License-Identifier: GPL-2.0
Quick Start
===========
This document describes how to get started with kernel development in Rust.
Requirements: Building
----------------------
This section explains how to fetch the tools needed for building.
Some of these requirements might be available from Linux distributions
under names like ``rustc``, ``rust-src``, ``rust-bindgen``, etc. However,
at the time of writing, they are likely not to be recent enough unless
the distribution tracks the latest releases.
To easily check whether the requirements are met, the following target
can be used::
make LLVM=1 rustavailable
This triggers the same logic used by Kconfig to determine whether
``RUST_IS_AVAILABLE`` should be enabled; but it also explains why not
if that is the case.
rustc
*****
A particular version of the Rust compiler is required. Newer versions may or
may not work because, for the moment, the kernel depends on some unstable
Rust features.
If ``rustup`` is being used, enter the checked out source code directory
and run::
rustup override set $(scripts/min-tool-version.sh rustc)
Otherwise, fetch a standalone installer or install ``rustup`` from:
https://www.rust-lang.org
Rust standard library source
****************************
The Rust standard library source is required because the build system will
cross-compile ``core`` and ``alloc``.
If ``rustup`` is being used, run::
rustup component add rust-src
The components are installed per toolchain, thus upgrading the Rust compiler
version later on requires re-adding the component.
Otherwise, if a standalone installer is used, the Rust repository may be cloned
into the installation folder of the toolchain::
git clone --recurse-submodules \
--branch $(scripts/min-tool-version.sh rustc) \
https://github.com/rust-lang/rust \
$(rustc --print sysroot)/lib/rustlib/src/rust
In this case, upgrading the Rust compiler version later on requires manually
updating this clone.
libclang
********
``libclang`` (part of LLVM) is used by ``bindgen`` to understand the C code
in the kernel, which means LLVM needs to be installed; like when the kernel
is compiled with ``CC=clang`` or ``LLVM=1``.
Linux distributions are likely to have a suitable one available, so it is
best to check that first.
There are also some binaries for several systems and architectures uploaded at:
https://releases.llvm.org/download.html
Otherwise, building LLVM takes quite a while, but it is not a complex process:
https://llvm.org/docs/GettingStarted.html#getting-the-source-code-and-building-llvm
Please see Documentation/kbuild/llvm.rst for more information and further ways
to fetch pre-built releases and distribution packages.
bindgen
*******
The bindings to the C side of the kernel are generated at build time using
the ``bindgen`` tool. A particular version is required.
Install it via (note that this will download and build the tool from source)::
cargo install --locked --version $(scripts/min-tool-version.sh bindgen) bindgen
Requirements: Developing
------------------------
This section explains how to fetch the tools needed for developing. That is,
they are not needed when just building the kernel.
rustfmt
*******
The ``rustfmt`` tool is used to automatically format all the Rust kernel code,
including the generated C bindings (for details, please see
coding-guidelines.rst).
If ``rustup`` is being used, its ``default`` profile already installs the tool,
thus nothing needs to be done. If another profile is being used, the component
can be installed manually::
rustup component add rustfmt
The standalone installers also come with ``rustfmt``.
clippy
******
``clippy`` is a Rust linter. Running it provides extra warnings for Rust code.
It can be run by passing ``CLIPPY=1`` to ``make`` (for details, please see
general-information.rst).
If ``rustup`` is being used, its ``default`` profile already installs the tool,
thus nothing needs to be done. If another profile is being used, the component
can be installed manually::
rustup component add clippy
The standalone installers also come with ``clippy``.
cargo
*****
``cargo`` is the Rust native build system. It is currently required to run
the tests since it is used to build a custom standard library that contains
the facilities provided by the custom ``alloc`` in the kernel. The tests can
be run using the ``rusttest`` Make target.
If ``rustup`` is being used, all the profiles already install the tool,
thus nothing needs to be done.
The standalone installers also come with ``cargo``.
rustdoc
*******
``rustdoc`` is the documentation tool for Rust. It generates pretty HTML
documentation for Rust code (for details, please see
general-information.rst).
``rustdoc`` is also used to test the examples provided in documented Rust code
(called doctests or documentation tests). The ``rusttest`` Make target uses
this feature.
If ``rustup`` is being used, all the profiles already install the tool,
thus nothing needs to be done.
The standalone installers also come with ``rustdoc``.
rust-analyzer
*************
The `rust-analyzer <https://rust-analyzer.github.io/>`_ language server can
be used with many editors to enable syntax highlighting, completion, go to
definition, and other features.
``rust-analyzer`` needs a configuration file, ``rust-project.json``, which
can be generated by the ``rust-analyzer`` Make target.
Configuration
-------------
``Rust support`` (``CONFIG_RUST``) needs to be enabled in the ``General setup``
menu. The option is only shown if a suitable Rust toolchain is found (see
above), as long as the other requirements are met. In turn, this will make
visible the rest of options that depend on Rust.
Afterwards, go to::
Kernel hacking
-> Sample kernel code
-> Rust samples
And enable some sample modules either as built-in or as loadable.
Building
--------
Building a kernel with a complete LLVM toolchain is the best supported setup
at the moment. That is::
make LLVM=1
For architectures that do not support a full LLVM toolchain, use::
make CC=clang
Using GCC also works for some configurations, but it is very experimental at
the moment.
Hacking
-------
To dive deeper, take a look at the source code of the samples
at ``samples/rust/``, the Rust support code under ``rust/`` and
the ``Rust hacking`` menu under ``Kernel hacking``.
If GDB/Binutils is used and Rust symbols are not getting demangled, the reason
is the toolchain does not support Rust's new v0 mangling scheme yet.
There are a few ways out:
- Install a newer release (GDB >= 10.2, Binutils >= 2.36).
- Some versions of GDB (e.g. vanilla GDB 10.1) are able to use
the pre-demangled names embedded in the debug info (``CONFIG_DEBUG_INFO``).

View File

@ -94,7 +94,7 @@ other HZ detail. Thus the CFS scheduler has no notion of "timeslices" in the
way the previous scheduler had, and has no heuristics whatsoever. There is
only one central tunable (you have to switch on CONFIG_SCHED_DEBUG):
/proc/sys/kernel/sched_min_granularity_ns
/sys/kernel/debug/sched/min_granularity_ns
which can be used to tune the scheduler from "desktop" (i.e., low latencies) to
"server" (i.e., good batching) workloads. It defaults to a setting suitable

View File

@ -7,7 +7,7 @@ Landlock LSM: kernel documentation
==================================
:Author: Mickaël Salaün
:Date: May 2022
:Date: September 2022
Landlock's goal is to create scoped access-control (i.e. sandboxing). To
harden a whole system, this feature should be available to any process,
@ -49,13 +49,13 @@ Filesystem access rights
------------------------
All access rights are tied to an inode and what can be accessed through it.
Reading the content of a directory doesn't imply to be allowed to read the
Reading the content of a directory does not imply to be allowed to read the
content of a listed inode. Indeed, a file name is local to its parent
directory, and an inode can be referenced by multiple file names thanks to
(hard) links. Being able to unlink a file only has a direct impact on the
directory, not the unlinked inode. This is the reason why
`LANDLOCK_ACCESS_FS_REMOVE_FILE` or `LANDLOCK_ACCESS_FS_REFER` are not allowed
to be tied to files but only to directories.
``LANDLOCK_ACCESS_FS_REMOVE_FILE`` or ``LANDLOCK_ACCESS_FS_REFER`` are not
allowed to be tied to files but only to directories.
Tests
=====

View File

@ -14,45 +14,3 @@ Unsorted Documentation
static-keys
tee
xz
Atomic Types
============
.. raw:: latex
\footnotesize
.. include:: ../atomic_t.txt
:literal:
.. raw:: latex
\normalsize
Atomic bitops
=============
.. raw:: latex
\footnotesize
.. include:: ../atomic_bitops.txt
:literal:
.. raw:: latex
\normalsize
Memory Barriers
===============
.. raw:: latex
\footnotesize
.. include:: ../memory-barriers.txt
:literal:
.. raw:: latex
\normalsize

View File

@ -0,0 +1,58 @@
.. SPDX-License-Identifier: GPL-2.0
==============================
Kernel subsystem documentation
==============================
These books get into the details of how specific kernel subsystems work
from the point of view of a kernel developer. Much of the information here
is taken directly from the kernel source, with supplemental material added
as needed (or at least as we managed to add it — probably *not* all that is
needed).
**Fixme**: much more organizational work is needed here.
.. toctree::
:maxdepth: 1
driver-api/index
core-api/index
locking/index
accounting/index
block/index
cdrom/index
cpu-freq/index
fb/index
fpga/index
hid/index
i2c/index
iio/index
isdn/index
infiniband/index
leds/index
netlabel/index
networking/index
pcmcia/index
power/index
target/index
timers/index
spi/index
w1/index
watchdog/index
virt/index
input/index
hwmon/index
gpu/index
security/index
sound/index
crypto/index
filesystems/index
mm/index
bpf/index
usb/index
PCI/index
scsi/index
misc-devices/index
scheduler/index
mhi/index
peci/index

View File

@ -412,7 +412,7 @@ Extended error information
Because the default sort key above is 'hitcount', the above shows a
the list of call_sites by increasing hitcount, so that at the bottom
we see the functions that made the most kmalloc calls during the
run. If instead we we wanted to see the top kmalloc callers in
run. If instead we wanted to see the top kmalloc callers in
terms of the number of bytes requested rather than the number of
calls, and we wanted the top caller to appear at the top, we can use
the 'sort' parameter, along with the 'descending' modifier::

View File

@ -328,8 +328,8 @@ Configuring Kprobes
===================
When configuring the kernel using make menuconfig/xconfig/oldconfig,
ensure that CONFIG_KPROBES is set to "y". Under "General setup", look
for "Kprobes".
ensure that CONFIG_KPROBES is set to "y", look for "Kprobes" under
"General architecture-dependent options".
So that you can load and unload Kprobes-based instrumentation modules,
make sure "Loadable module support" (CONFIG_MODULES) and "Module

View File

@ -20,7 +20,7 @@ For example::
[root@f32 ~]# cd /sys/kernel/tracing/
[root@f32 tracing]# echo timerlat > current_tracer
It is possible to follow the trace by reading the trace trace file::
It is possible to follow the trace by reading the trace file::
[root@f32 tracing]# cat trace
# tracer: timerlat

View File

@ -1,39 +0,0 @@
Chinese translated version of Documentation/core-api/irq/index.rst
If you have any comment or update to the content, please contact the
original document maintainer directly. However, if you have a problem
communicating in English you can also ask the Chinese maintainer for
help. Contact the Chinese maintainer if this translation is outdated
or if there is a problem with the translation.
Maintainer: Eric W. Biederman <ebiederman@xmission.com>
Chinese maintainer: Fu Wei <tekkamanninja@gmail.com>
---------------------------------------------------------------------
Documentation/core-api/irq/index.rst 的中文翻译
如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文
交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻
译存在问题,请联系中文版维护者。
英文版维护者: Eric W. Biederman <ebiederman@xmission.com>
中文版维护者: 傅炜 Fu Wei <tekkamanninja@gmail.com>
中文版翻译者: 傅炜 Fu Wei <tekkamanninja@gmail.com>
中文版校译者: 傅炜 Fu Wei <tekkamanninja@gmail.com>
以下为正文
---------------------------------------------------------------------
何为 IRQ?
一个 IRQ 是来自某个设备的一个中断请求。目前,它们可以来自一个硬件引脚,
或来自一个数据包。多个设备可能连接到同个硬件引脚,从而共享一个 IRQ。
一个 IRQ 编号是用于告知硬件中断源的内核标识。通常情况下,这是一个
全局 irq_desc 数组的索引,但是除了在 linux/interrupt.h 中的实现,
具体的细节是体系结构特定的。
一个 IRQ 编号是设备上某个可能的中断源的枚举。通常情况下,枚举的编号是
该引脚在系统内中断控制器的所有输入引脚中的编号。对于 ISA 总线中的情况,
枚举的是在两个 i8259 中断控制器中 16 个输入引脚。
架构可以对 IRQ 编号指定额外的含义,在硬件涉及任何手工配置的情况下,
是被提倡的。ISA 的 IRQ 是一个分配这类额外含义的典型例子。

View File

@ -0,0 +1,139 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/PCI/acpi-info.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
=====================
PCI主桥的ACPI注意事项
=====================
一般的规则是ACPI命名空间应该描述操作系统可能使用的所有东西除非有其他方法让操作系
统找到它[1, 2]。
例如没有标准的硬件机制来枚举PCI主桥所以ACPI命名空间必须描述每个主桥、访问它
下面的PCI配置空间的方法、主桥转发到PCI的地址空间窗口使用_CRS以及传统的INTx
中断的路由使用_PRT
在主桥下面的PCI设备通常不需要通过ACPI描述。操作系统可以通过标准的PCI枚举机制来
发现它们使用配置访问来发现和识别设备并读取和测量它们的BAR。然而如果ACPI为它们
提供电源管理或热插拔功能或者如果设备有由平台中断控制器连接的INTx中断需要一个_PRT
来描述这些连接这种情况下ACPI可以描述PCI设备。
ACPI资源描述是通过ACPI命名空间中设备的_CRS对象完成的[2]。_CRS就像一个通用的PCI BAR
操作系统可以读取_CRS并找出正在消耗的资源即使它没有该设备的驱动程序[3]。这一点很重要,
因为它意味着一个旧的操作系统可以正确地工作,即使是在操作系统不知道的新设备的系统上。新设
备可能什么都不做,但操作系统至少可以确保没有资源与它们冲突。
像MCFG、HPET、ECDT等静态表不是保留地址空间的机制。静态表是在操作系统在启动初期且在它
能够解析ACPI命名空间之前需要知道的东西。如果定义了一个新的表即使旧的操作系统忽略了这
个表它也需要正常运行。_CRS允许这样做因为它是通用的可以被旧的操作系统解析而静态表
则不允许。
如果操作系统要管理一个通过ACPI描述的不可发现的设备该设备将有一个特定的_HID/_CID
告诉操作系统与之绑定的驱动程序并且_CRS告诉操作系统和驱动程序该设备的寄存器在哪里。
PCI主桥是PNP0A03或PNP0A08设备。它们的_CRS应该描述它们所消耗的所有地址空间。这包括它
们转发到PCI总线上的所有窗口以及不转发到PCI的主桥本身的寄存器。主桥的寄存器包括次要/下
级总线寄存器,决定了桥下面的总线范围,窗口寄存器描述了桥洞,等等。这些都是设备相关的,非
架构相关的东西所以PNP0A03/PNP0A08驱动可以管理它们的唯一方法是通过_PRS/_CRS/_SRS
它包含了特定于设备的细节。主桥寄存器也包括ECAM空间因为它是由主桥消耗的。
ACPI定义了一个Consumer/Producer位来区分桥寄存器“Consumer”下文译作消费者
桥洞“Producer”下文译作生产者[4, 5]但是早期的BIOS没有正确使用这个位。其结果
目前的ACPI规范只为扩展地址空间描述符定义了消费者/生产者在旧的QWord/Word/Word地
址空间描述符中该位应该被忽略。因此操作系统必须假定所有的QWord/Word/Word描述符都是
窗口。
在增加扩展地址空间描述符之前,消费者/生产者的失败意味着没有办法描述PNP0A03/PNP0A08设
备本身的桥寄存器。解决办法是在PNP0C02捕捉器中描述桥寄存器包括ECAM空间[6]。
除了ECAM之外桥寄存器空间反正是特定于设备的所以通用的PNP0A03/PNP0A08驱动程
序(pci_root.c)没有必要了解它。
新的架构应该能够在PNP0A03设备中使用“消费者”扩展地址空间描述符用于桥寄存器包括
ECAM尽管对[6]的严格解释可能禁止这样做。旧的x86和ia64内核假定所有的地址空间描述
符,包括“消费者”扩展地址空间的描述符,都是窗口,所以在这些架构上以这种方式描述桥寄
存器是不安全的。
PNP0C02“主板”设备基本上是万能的。除了“不要将这些资源用于其他用途”之外没有其他的编
程模型。因此PNP0C02 _CRS应该声明ACPI命名空间中(1)没有被_CRS声明的任何其他设备对
象的地址空间,(2)不应该被OS分配给其他东西。
除非有一个标准的固件接口用于配置访问例如ia64 SAL接口[7]否则PCIe规范要求使用增强
型配置访问方法ECAM。主桥消耗ECAM内存地址空间并将内存访问转换为PCI配置访问。该规范
定义了ECAM地址空间的布局和功能只有地址空间的基础是特定于设备的。ACPI操作系统从静态
MCFG表或PNP0A03设备中的_CBA方法中了解基础地址。
MCFG表必须描述非热插拔主桥的ECAM空间[8]。由于MCFG是一个静态表不能通过热插拔更新
PNP0A03设备中的_CBA方法描述了可热插拔主桥的ECAM空间[9]。请注意对于MCFG和_CBA
基址总是对应于总线0即使桥器下面的总线范围通过_CRS报告不从0开始。
[1] ACPI 6.2, sec 6.1:
对于任何在非枚举类型的总线上的设备例如ISA总线OSPM会枚举设备的标识符ACPI
系统固件必须为每个设备提供一个_HID对象...以使OSPM能够做到这一点。
[2] ACPI 6.2, sec 3.7:
操作系统枚举主板设备时只需通过读取ACPI命名空间来寻找具有硬件ID的设备。
ACPI枚举的每个设备都包括ACPI命名空间中ACPI定义的对象该对象报告设备可能占用的硬
件资源[_PRS],报告设备当前使用的资源[_CRS]的对象,以及配置这些资源的对象[_SRS]。
这些信息被即插即用操作系统OSPM用来配置设备。
[3] ACPI 6.2, sec 6.2:
OSPM使用设备配置对象来配置通过ACPI列举的设备的硬件资源。设备配置对象提供了关于当前
和可能的资源需求的信息,共享资源之间的关系,以及配置硬件资源的方法。
当OSPM枚举一个设备时它调用_PRS来确定该设备的资源需求。它也可以调用_CRS来找到该设
备的当前资源设置。利用这些信息,即插即用系统决定设备应该消耗什么资源,并通过调用设备
的_SRS控制方法来设置这些资源。
在ACPI中设备可以消耗资源例如传统的键盘提供资源例如一个专有的PCI桥
或者两者都做。除非另有规定,设备的资源被假定为来自设备层次结构中设备上方最近的匹配资
源。
[4] ACPI 6.2, sec 6.4.3.5.1, 2, 3, 4:
QWord/DWord/Word 地址空间描述符 (.1, .2, .3)
常规标志: Bit [0] 被忽略。
扩展地址空间描述符 (.4)
常规标志: Bit [0] 消费者/生产者:
* 1 这个设备消费这个资源
* 0 该设备生产和消费该资源
[5] ACPI 6.2, sec 19.6.43:
ResourceUsage指定内存范围是由这个设备ResourceConsumer消费还是传递给子设备
ResourceProducer。如果没有指定那么就假定是ResourceConsumer。
[6] PCI Firmware 3.2, sec 4.1.2:
如果操作系统不能原生的懂得保留MMCFG区域MMCFG区域必须由固件保留。在MCFG表中或通
过_CBA方法见第4.1.3节)报告的地址范围必须通过声明主板资源来保留。对于大多数系统,
主板资源将出现在ACPI命名空间的根部在_SB下在一个节点的_HID为EISAIDPNP0C0
2在这种情况下的资源不应该要求在根PCI总线的_CRS。这些资源可以选择在Int15 E820
或EFIGetMemoryMap中作为保留内存返回但必须始终通过ACPI作为主板资源报告。
[7] PCI Express 4.0, sec 7.2.2:
对于PC兼容的系统或者没有实现允许访问配置空间的处理器架构特定固件接口标准的系统
要使用本节中定义的ECAM。
[8] PCI Firmware 3.2, sec 4.1.2:
MCFG表是一个ACPI表用于沟通的基础地址对应的非热的可移动的PCI段组范围内的PCI段组在
启动时提供给操作系统。这对PC兼容系统来说是必需的。
MCFG表仅用于沟通在启动时系统可用的PCI段组对应的基址。
[9] PCI Firmware 3.2, sec 4.1.3:
_CBA (Memory mapped Configuration Base Address) 控制方法是一个可选的ACPI对
用于返回热插拔主桥的64位内存映射的配置基址。_CBA 返回的基址是与处理器相关的地址。
_CBA 控制方法被评估为一个整数。
这个控制方法出现在主桥对象下。当_CBA方法出现在一个活动的主桥对象下时操作系统会评
估这个结构以确定内存映射的配置基址对应于_CRS方法中指定的总线编号范围的PCI段组。
一个包含_CBA方法的ACPI命名空间对象也必须包含一个相应的_SEG方法。

View File

@ -10,9 +10,6 @@
:校译:
.. _cn_PCI_index.rst:
===================
Linux PCI总线子系统
===================
@ -26,12 +23,12 @@ Linux PCI总线子系统
pci-iov-howto
msi-howto
sysfs-pci
acpi-info
Todolist:
acpi-info
pci-error-recovery
pcieaer-howto
endpoint/index
boot-interrupts
* pci-error-recovery
* pcieaer-howto
* endpoint/index
* boot-interrupts

View File

@ -6,10 +6,10 @@
吴想成 Wu XiangCheng <bobwxc@email.cn>
Linux内核5.x版本 <http://kernel.org/>
Linux内核6.x版本 <http://kernel.org/>
=========================================
以下是Linux版本5的发行注记。仔细阅读它们,
以下是Linux版本6的发行注记。仔细阅读它们,
它们会告诉你这些都是什么,解释如何安装内核,以及遇到问题时该如何做。
什么是Linux
@ -61,27 +61,27 @@ Linux内核5.x版本 <http://kernel.org/>
- 如果您要安装完整的源代码请把内核tar档案包放在您有权限的目录中例如您
的主目录)并将其解包::
xz -cd linux-5.x.tar.xz | tar xvf -
xz -cd linux-6.x.tar.xz | tar xvf -
将“X”替换成最新内核的版本号。
【不要】使用 /usr/src/linux 目录!这里有一组库头文件使用的内核头文件
(通常是不完整的)。它们应该与库匹配,而不是被内核的变化搞得一团糟。
- 您还可以通过打补丁在5.x版本之间升级。补丁以xz格式分发。要通过打补丁进行
安装请获取所有较新的补丁文件进入内核源代码linux-5.x的目录并
- 您还可以通过打补丁在6.x版本之间升级。补丁以xz格式分发。要通过打补丁进行
安装请获取所有较新的补丁文件进入内核源代码linux-6.x的目录并
执行::
xz -cd ../patch-5.x.xz | patch -p1
xz -cd ../patch-6.x.xz | patch -p1
请【按顺序】替换所有大于当前源代码树版本的“x”这样就可以了。您可能想要
删除备份文件文件名类似xxx~ 或 xxx.orig),并确保没有失败的补丁(文件名
类似xxx# 或 xxx.rej。如果有不是你就是我犯了错误。
5.x内核的补丁不同5.x.y内核也称为稳定版内核的补丁不是增量的而是
直接应用于基本的5.x内核。例如如果您的基本内核是5.0并且希望应用5.0.3
补丁,则不应先应用5.0.1和5.0.2的补丁。类似地如果您运行的是5.0.2内核,
并且希望跳转到5.0.3那么在应用5.0.3补丁之前必须首先撤销5.0.2补丁
6.x内核的补丁不同6.x.y内核也称为稳定版内核的补丁不是增量的而是
直接应用于基本的6.x内核。例如如果您的基本内核是6.0并且希望应用6.0.3
补丁,则不应先应用6.0.1和6.0.2的补丁。类似地如果您运行的是6.0.2内核,
并且希望跳转到6.0.3那么在应用6.0.3补丁之前必须首先撤销6.0.2补丁
即patch -R。更多关于这方面的内容请阅读
:ref:`Documentation/process/applying-patches.rst <applying_patches>`
@ -103,7 +103,7 @@ Linux内核5.x版本 <http://kernel.org/>
软件要求
---------
编译和运行5.x内核需要各种软件包的最新版本。请参考
编译和运行6.x内核需要各种软件包的最新版本。请参考
:ref:`Documentation/process/changes.rst <changes>`
来了解最低版本要求以及如何升级软件包。请注意,使用过旧版本的这些包可能会
导致很难追踪的间接错误,因此不要以为在生成或操作过程中出现明显问题时可以
@ -116,12 +116,12 @@ Linux内核5.x版本 <http://kernel.org/>
``make O=output/dir`` 选项可以为输出文件(包括 .config指定备用位置。
例如::
kernel source code: /usr/src/linux-5.x
kernel source code: /usr/src/linux-6.x
build directory: /home/name/build/kernel
要配置和构建内核,请使用::
cd /usr/src/linux-5.x
cd /usr/src/linux-6.x
make O=/home/name/build/kernel menuconfig
make O=/home/name/build/kernel
sudo make O=/home/name/build/kernel modules_install install
@ -227,8 +227,6 @@ Linux内核5.x版本 <http://kernel.org/>
- 确保您至少有gcc 5.1可用。
有关更多信息,请参阅 :ref:`Documentation/process/changes.rst <changes>`
请注意您仍然可以使用此内核运行a.out用户程序。
- 执行 ``make`` 来创建压缩内核映像。如果您安装了lilo以适配内核makefile
那么也可以进行 ``make install`` 但是您可能需要先检查特定的lilo设置。
@ -282,67 +280,12 @@ Linux内核5.x版本 <http://kernel.org/>
若遇到问题
-----------
- 如果您发现了一些可能由于内核缺陷所导致的问题请检查MAINTAINERS维护者
文件看看是否有人与令您遇到麻烦的内核部分相关。如果无人在此列出,那么第二
个最好的方案就是把它们发给我torvalds@linux-foundation.org也可能发送
到任何其他相关的邮件列表或新闻组。
如果您发现了一些可能由于内核缺陷所导致的问题,请参阅:
Documentation/translations/zh_CN/admin-guide/reporting-issues.rst 。
- 在所有的缺陷报告中,【请】告诉我们您在说什么内核,如何复现问题,以及您的
设置是什么的(使用您的常识)。如果问题是新的,请告诉我;如果问题是旧的,
请尝试告诉我您什么时候首次注意到它。
想要理解内核错误报告,请参阅:
Documentation/translations/zh_CN/admin-guide/bug-hunting.rst 。
- 如果缺陷导致如下消息::
unable to handle kernel paging request at address C0000010
Oops: 0002
EIP: 0010:XXXXXXXX
eax: xxxxxxxx ebx: xxxxxxxx ecx: xxxxxxxx edx: xxxxxxxx
esi: xxxxxxxx edi: xxxxxxxx ebp: xxxxxxxx
ds: xxxx es: xxxx fs: xxxx gs: xxxx
Pid: xx, process nr: xx
xx xx xx xx xx xx xx xx xx xx
或者类似的内核调试信息显示在屏幕上或在系统日志里,请【如实】复制它。
可能对你来说转储dump看起来不可理解但它确实包含可能有助于调试问题的
信息。转储上方的文本也很重要:它说明了内核转储代码的原因(在上面的示例中,
是由于内核指针错误)。更多关于如何理解转储的信息,请参见
Documentation/admin-guide/bug-hunting.rst。
- 如果使用 CONFIG_KALLSYMS 编译内核,则可以按原样发送转储,否则必须使用
``ksymoops`` 程序来理解转储但通常首选使用CONFIG_KALLSYMS编译
此实用程序可从
https://www.kernel.org/pub/linux/utils/kernel/ksymoops/ 下载。
或者,您可以手动执行转储查找:
- 在调试像上面这样的转储时如果您可以查找EIP值的含义这将非常有帮助。
十六进制值本身对我或其他任何人都没有太大帮助:它会取决于特定的内核设置。
您应该做的是从EIP行获取十六进制值忽略 ``0010:`` ),然后在内核名字列表
中查找它,以查看哪个内核函数包含有问题的地址。
要找到内核函数名,您需要找到与显示症状的内核相关联的系统二进制文件。就是
文件“linux/vmlinux”。要提取名字列表并将其与内核崩溃中的EIP进行匹配
请执行::
nm vmlinux | sort | less
这将为您提供一个按升序排序的内核地址列表,从中很容易找到包含有问题的地址
的函数。请注意,内核调试消息提供的地址不一定与函数地址完全匹配(事实上,
这是不可能的因此您不能只“grep”列表不过列表将为您提供每个内核函数
的起点,因此通过查找起始地址低于你正在搜索的地址,但后一个函数的高于的
函数,你会找到您想要的。实际上,在您的问题报告中加入一些“上下文”可能是
一个好主意,给出相关的上下几行。
如果您由于某些原因无法完成上述操作(如您使用预编译的内核映像或类似的映像),
请尽可能多地告诉我您的相关设置信息,这会有所帮助。有关详细信息请阅读
Documentation/admin-guide/reporting-issues.rst
- 或者您可以在正在运行的内核上使用gdb只读的即不能更改值或设置断点
为此,请首先使用-g编译内核适当地编辑arch/x86/Makefile然后执行 ``make
clean`` 。您还需要启用CONFIG_PROC_FS通过 ``make config`` )。
使用新内核重新启动后,执行 ``gdb vmlinux /proc/kcore`` 。现在可以使用所有
普通的gdb命令。查找系统崩溃点的命令是 ``l *0xXXXXXXXX`` 将xxx替换为EIP
值)。
用gdb无法调试一个当前未运行的内核是由于gdb错误地忽略了编译内核的起始
偏移量。
更多用GDB调试内核的信息请参阅
Documentation/translations/zh_CN/dev-tools/gdb-kernel-debugging.rst
和 Documentation/dev-tools/kgdb.rst 。

View File

@ -0,0 +1,293 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/admin-guide/bootconfig.rst
:译者: 吴想成 Wu XiangCheng <bobwxc@email.cn>
========
引导配置
========
:作者: Masami Hiramatsu <mhiramat@kernel.org>
概述
====
引导配置扩展了现有的内核命令行,以一种更有效率的方式在引导内核时进一步支持
键值数据。这允许管理员传递一份结构化关键字的配置文件。
配置文件语法
============
引导配置文件的语法采用非常简单的键值结构。每个关键字由点连接的单词组成,键
和值由 ``=`` 连接。值以分号( ``;`` )或换行符( ``\n`` )结尾。数组值中每
个元素由逗号( ``,`` )分隔。::
KEY[.WORD[...]] = VALUE[, VALUE2[...]][;]
与内核命令行语法不同,逗号和 ``=`` 周围允许有空格。
关键字只允许包含字母、数字、连字符( ``-`` )和下划线( ``_`` )。值可包含
可打印字符和空格,但分号( ``;`` )、换行符( ``\n`` )、逗号( ``,`` )、
井号( ``#`` )和右大括号( ``}`` )等分隔符除外。
如果你需要在值中使用这些分隔符,可以用双引号( ``"VALUE"`` )或单引号
``'VALUE'`` )括起来。注意,引号无法转义。
键的值可以为空或不存在。这些键用于检查该键是否存在(类似布尔值)。
键值语法
--------
引导配置文件语法允许用户通过大括号合并键名部分相同的关键字。例如::
foo.bar.baz = value1
foo.bar.qux.quux = value2
也可以写成::
foo.bar {
baz = value1
qux.quux = value2
}
或者更紧凑一些,写成::
foo.bar { baz = value1; qux.quux = value2 }
在这两种样式中,引导解析时相同的关键字都会自动合并。因此可以追加类似的树或
键值。
相同关键字的值
--------------
禁止两个或多个值或数组共享同一个关键字。例如::
foo = bar, baz
foo = qux # !错误! 我们不可以重定义相同的关键字
如果你想要更新值,必须显式使用覆盖操作符 ``:=`` 。例如::
foo = bar, baz
foo := qux
这样 ``foo`` 关键字的值就变成了 ``qux`` 。这对于通过添加(部分)自定义引导
配置来覆盖默认值非常有用,免于解析默认引导配置。
如果你想对现有关键字追加值作为数组成员,可以使用 ``+=`` 操作符。例如::
foo = bar, baz
foo += qux
这样, ``foo`` 关键字就同时拥有了 ``bar`` ``baz````qux``
此外,父关键字下可同时存在值和子关键字。
例如,下列配置是可行的。::
foo = value1
foo.bar = value2
foo := value3 # 这会更新foo的值。
注意,裸值不能直接放进结构化关键字中,必须在大括号外定义它。例如::
foo {
bar = value1
bar {
baz = value2
qux = value3
}
}
同时,关键字下值节点的顺序是固定的。如果值和子关键字同时存在,值永远是该关
键字的第一个子节点。因此如果用户先指定子关键字,如::
foo.bar = value1
foo = value2
则在程序(和/proc/bootconfig它会按如下显示::
foo = value2
foo.bar = value1
注释
----
配置语法接受shell脚本风格的注释。注释以井号 ``#`` )开始,到换行符
``\n`` )结束。
::
# comment line
foo = value # value is set to foo.
bar = 1, # 1st element
2, # 2nd element
3 # 3rd element
会被解析为::
foo = value
bar = 1, 2, 3
注意你不能把注释放在值和分隔符( ``,````;`` )之间。如下配置语法是错误的::
key = 1 # comment
,2
/proc/bootconfig
================
/proc/bootconfig是引导配置的用户空间接口。与/proc/cmdline不同此文件内容以
键值列表样式显示。
每个键值对一行,样式如下::
KEY[.WORDS...] = "[VALUE]"[,"VALUE2"...]
用引导配置引导内核
==================
用引导配置引导内核有两种方法将引导配置附加到initrd镜像或直接嵌入内核中。
*initrd: initial RAM disk初始内存磁盘*
将引导配置附加到initrd
----------------------
由于默认情况下引导配置文件是用initrd加载的因此它将被添加到initrdinitramfs
镜像文件的末尾其中包含填充、大小、校验值和12字节幻数如下所示::
[initrd][bootconfig][padding][size(le32)][checksum(le32)][#BOOTCONFIG\n]
大小和校验值为小端序存放的32位无符号值。
当引导配置被加到initrd镜像时整个文件大小会对齐到4字节。空字符 ``\0``
会填补对齐空隙。因此 ``size`` 就是引导配置文件的长度+填充的字节。
Linux内核在内存中解码initrd镜像的最后部分以获取引导配置数据。由于这种“背负式”
的方法只要引导加载器传递了正确的initrd文件大小就无需更改或更新引导加载器
和内核镜像本身。如果引导加载器意外传递了更长的大小,内核将无法找到引导配置数
据。
Linux内核在tools/bootconfig下提供了 ``bootconfig`` 命令来完成此操作,管理员
可以用它从initrd镜像中删除或追加配置文件。你可以用以下命令来构建它::
# make -C tools/bootconfig
要向initrd镜像添加你的引导配置文件请按如下命令操作旧数据会自动移除::
# tools/bootconfig/bootconfig -a your-config /boot/initrd.img-X.Y.Z
要从镜像中移除配置,可以使用-d选项::
# tools/bootconfig/bootconfig -d /boot/initrd.img-X.Y.Z
然后在内核命令行上添加 ``bootconfig`` 告诉内核去initrd文件末尾寻找内核配置。
将引导配置嵌入内核
------------------
如果你不能使用initrd也可以通过Kconfig选项将引导配置文件嵌入内核中。在此情
况下,你需要用以下选项重新编译内核::
CONFIG_BOOT_CONFIG_EMBED=y
CONFIG_BOOT_CONFIG_EMBED_FILE="/引导配置/文件/的/路径"
``CONFIG_BOOT_CONFIG_EMBED_FILE`` 需要从源码树或对象树开始的引导配置文件的
绝对/相对路径。内核会将其嵌入作为默认引导配置。
与将引导配置附加到initrd一样你也需要在内核命令行上添加 ``bootconfig`` 告诉
内核去启用内嵌的引导配置。
注意即使你已经设置了此选项仍可用附加到initrd的其他引导配置覆盖内嵌的引导
配置。
通过引导配置传递内核参数
========================
除了内核命令行,引导配置也可以用于传递内核参数。所有 ``kernel`` 关键字下的键
值对都将直接传递给内核命令行。此外, ``init`` 下的键值对将通过命令行传递给
init进程。参数按以下顺序与用户给定的内核命令行字符串相连因此命令行参数可以
覆盖引导配置参数(这取决于子系统如何处理参数,但通常前面的参数将被后面的参数
覆盖)::
[bootconfig params][cmdline params] -- [bootconfig init params][cmdline init params]
如果引导配置文件给出的kernel/init参数是::
kernel {
root = 01234567-89ab-cdef-0123-456789abcd
}
init {
splash
}
这将被复制到内核命令行字符串中,如下所示::
root="01234567-89ab-cdef-0123-456789abcd" -- splash
如果用户给出的其他命令行是::
ro bootconfig -- quiet
则最后的内核命令行如下::
root="01234567-89ab-cdef-0123-456789abcd" ro bootconfig -- splash quiet
配置文件的限制
==============
当前最大的配置大小是32KB关键字总数不是键值条目必须少于1024个节点。
注意这不是条目数而是节点数条目必须消耗超过2个节点一个关键字和一个值
所以从理论上讲最多512个键值对。如果关键字平均包含3个单词则可有256个键值对。
在大多数情况下配置项的数量将少于100个条目小于8KB因此这应该足够了。如果
节点数超过1024解析器将返回错误即使文件大小小于32KB。请注意此最大尺寸
不包括填充的空字符。)
无论如何,因为 ``bootconfig`` 命令在附加启动配置到initrd映像时会验证它用户
可以在引导之前注意到它。
引导配置API
===========
用户可以查询或遍历键值对,也可以查找(前缀)根关键字节点,并在查找该节点下的
键值。
如果您有一个关键字字符串,则可以直接使用 xbc_find_value() 查询该键的值。如果
你想知道引导配置里有哪些关键字,可以使用 xbc_for_each_key_value() 迭代键值对。
请注意,您需要使用 xbc_array_for_each_value() 访问数组的值,例如::
vnode = NULL;
xbc_find_value("key.word", &vnode);
if (vnode && xbc_node_is_array(vnode))
xbc_array_for_each_value(vnode, value) {
printk("%s ", value);
}
如果您想查找具有前缀字符串的键,可以使用 xbc_find_node() 通过前缀字符串查找
节点,然后用 xbc_node_for_each_key_value() 迭代前缀节点下的键。
但最典型的用法是获取前缀下的命名值或前缀下的命名数组,例如::
root = xbc_find_node("key.prefix");
value = xbc_node_find_value(root, "option", &vnode);
...
xbc_node_for_each_array_value(root, "array-option", value, anode) {
...
}
这将访问值“key.prefix.option”的值和“key.prefix.array-option”的数组。
锁是不需要的,因为在初始化之后配置只读。如果需要修改,必须复制所有数据和关键字。
函数与结构体
============
相关定义的kernel-doc参见
- include/linux/bootconfig.h
- lib/bootconfig.c

View File

@ -63,6 +63,7 @@ Todolist:
.. toctree::
:maxdepth: 1
bootconfig
clearing-warn-once
cpu-load
cputopology
@ -80,7 +81,6 @@ Todolist:
* binderfs
* binfmt-misc
* blockdev/index
* bootconfig
* braille-console
* btmrvl
* cgroup-v1/index

View File

@ -0,0 +1,210 @@
.. SPDX-License-Identifier: GPL-2.0+
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/core-api/circular-buffers.rst
:翻译:
周彬彬 Binbin Zhou <zhoubinbin@loongson.cn>
:校译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
吴想成 Wu Xiangcheng <bobwxc@email.cn>
时奎亮 Alex Shi <alexs@kernel.org>
==========
环形缓冲区
==========
:作者: David Howells <dhowells@redhat.com>
:作者: Paul E. McKenney <paulmck@linux.ibm.com>
Linux 提供了许多可用于实现循环缓冲的特性。有两组这样的特性:
(1) 用于确定2次方大小的缓冲区信息的便利函数。
(2) 可以代替缓冲区中对象的生产者和消费者共享锁的内存屏障。
如下所述,要使用这些设施,只需要一个生产者和一个消费者。可以通过序列化来处理多个
生产者,并通过序列化来处理多个消费者。
.. Contents:
(*) 什么是环形缓冲区?
(*) 测量2次幂缓冲区
(*) 内存屏障与环形缓冲区的结合使用
- 生产者
- 消费者
(*) 延伸阅读
什么是环形缓冲区?
==================
首先,什么是环形缓冲区?环形缓冲区是具有固定的有限大小的缓冲区,它有两个索引:
(1) 'head'索引 - 生产者将元素插入缓冲区的位置。
(2) 'tail'索引 - 消费者在缓冲区中找到下一个元素的位置。
通常当tail指针等于head指针时表明缓冲区是空的而当head指针比tail指针少一个时
表明缓冲区是满的。
添加元素时递增head索引删除元素时递增tail索引。tail索引不应该跳过head索引
两个索引在到达缓冲区末端时都应该被赋值为0从而允许海量的数据流过缓冲区。
通常情况下,元素都有相同的单元大小,但这并不是使用以下技术的严格要求。如果要在缓
冲区中包含多个元素或可变大小的元素则索引可以增加超过1前提是两个索引都没有超过
另一个。然而,实现者必须小心,因为超过一个单位大小的区域可能会覆盖缓冲区的末端并
且缓冲区会被分成两段。
测量2次幂缓冲区
===============
计算任意大小的环形缓冲区的占用或剩余容量通常是一个费时的操作,需要使用模(除法)
指令。但是如果缓冲区的大小为2次幂则可以使用更快的按位与指令代替。
Linux提供了一组用于处理2次幂环形缓冲区的宏。可以通过以下方式使用::
#include <linux/circ_buf.h>
这些宏包括:
(#) 测量缓冲区的剩余容量::
CIRC_SPACE(head_index, tail_index, buffer_size);
返回缓冲区[1]中可插入元素的剩余空间大小。
(#) 测量缓冲区中的最大连续立即可用空间::
CIRC_SPACE_TO_END(head_index, tail_index, buffer_size);
返回缓冲区[1]中剩余的连续空间的大小,元素可以立即插入其中,而不必绕回到缓冲
区的开头。
(#) 测量缓冲区的使用数::
CIRC_CNT(head_index, tail_index, buffer_size);
返回当前占用缓冲区[2]的元素数量。
(#) 测量缓冲区的连续使用数::
CIRC_CNT_TO_END(head_index, tail_index, buffer_size);
返回可以从缓冲区中提取的连续元素[2]的数量,而不必绕回到缓冲区的开头。
这里的每一个宏名义上都会返回一个介于0和buffer_size-1之间的值但是
(1) CIRC_SPACE*()是为了在生产者中使用。对生产者来说,它们将返回一个下限,因为生
产者控制着head索引但消费者可能仍然在另一个CPU上耗尽缓冲区并移动tail索引。
对消费者来说,它将显示一个上限,因为生产者可能正忙于耗尽空间。
(2) CIRC_CNT*()是为了在消费者中使用。对消费者来说,它们将返回一个下限,因为消费
者控制着tail索引但生产者可能仍然在另一个CPU上填充缓冲区并移动head索引。
对于生产者,它将显示一个上限,因为消费者可能正忙于清空缓冲区。
(3) 对于第三方来说,生产者和消费者对索引的写入顺序是无法保证的,因为它们是独立的,
而且可能是在不同的CPU上进行的所以在这种情况下的结果只是一种猜测甚至可能
是错误的。
内存屏障与环形缓冲区的结合使用
==============================
通过将内存屏障与环形缓冲区结合使用,可以避免以下需求:
(1) 使用单个锁来控制对缓冲区两端的访问,从而允许同时填充和清空缓冲区;以及
(2) 使用原子计数器操作。
这有两个方面:填充缓冲区的生产者和清空缓冲区的消费者。在任何时候,只应有一个生产
者在填充缓冲区,同样的也只应有一个消费者在清空缓冲区,但双方可以同时操作。
生产者
------
生产者看起来像这样::
spin_lock(&producer_lock);
unsigned long head = buffer->head;
/* spin_unlock()和下一个spin_lock()提供必要的排序。 */
unsigned long tail = READ_ONCE(buffer->tail);
if (CIRC_SPACE(head, tail, buffer->size) >= 1) {
/* 添加一个元素到缓冲区 */
struct item *item = buffer[head];
produce_item(item);
smp_store_release(buffer->head,
(head + 1) & (buffer->size - 1));
/* wake_up()将确保在唤醒任何人之前提交head */
wake_up(consumer);
}
spin_unlock(&producer_lock);
这将表明CPU必须在head索引使其对消费者可用之前写入新项目的内容同时CPU必须在唤醒
消费者之前写入修改后的head索引。
请注意wake_up()并不保证任何形式的屏障,除非确实唤醒了某些东西。因此我们不能依靠
它来进行排序。但是数组中始终有一个元素留空,因此生产者必须产生两个元素,然后才可
能破坏消费者当前正在读取的元素。同时,消费者连续调用之间成对的解锁-加锁提供了索引
读取(指示消费者已清空给定元素)和生产者对该相同元素的写入之间的必要顺序。
消费者
------
消费者看起来像这样::
spin_lock(&consumer_lock);
/* 读取该索引处的内容之前,先读取索引 */
unsigned long head = smp_load_acquire(buffer->head);
unsigned long tail = buffer->tail;
if (CIRC_CNT(head, tail, buffer->size) >= 1) {
/* 从缓冲区中提取一个元素 */
struct item *item = buffer[tail];
consume_item(item);
/* 在递增tail之前完成对描述符的读取。 */
smp_store_release(buffer->tail,
(tail + 1) & (buffer->size - 1));
}
spin_unlock(&consumer_lock);
这表明CPU在读取新元素之前确保索引是最新的然后在写入新的尾指针之前应确保CPU已完
成读取该元素,这将擦除该元素。
请注意使用READ_ONCE()和smp_load_acquire()来读取反向head索引。这可以防止编译
器丢弃并重新加载其缓存值。如果您能确定反向head索引将仅使用一次则这不是必须
的。smp_load_acquire()还可以强制CPU对后续的内存引用进行排序。类似地两种算法都使
用smp_store_release()来写入线程的索引。这记录了我们正在写入可以并发读取的内容的事
实,以防止编译器破坏存储,并强制对以前的访问进行排序。
延伸阅读
========
关于Linux的内存屏障设施的描述请查看Documentation/memory-barriers.txt。

View File

@ -0,0 +1,23 @@
.. SPDX-License-Identifier: GPL-2.0+
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/core-api/generic-radix-tree.rst
:翻译:
周彬彬 Binbin Zhou <zhoubinbin@loongson.cn>
===================
通用基数树/稀疏数组
===================
通用基数树/稀疏数组的相关内容请见include/linux/generic-radix-tree.h文件中的
“DOC: Generic radix trees/sparse arrays”。
通用基数树函数
--------------
该API在以下内核代码中:
include/linux/generic-radix-tree.h

View File

@ -0,0 +1,80 @@
.. SPDX-License-Identifier: GPL-2.0+
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/core-api/idr.rst
:翻译:
周彬彬 Binbin Zhou <zhoubinbin@loongson.cn>
:校译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
吴想成 Wu Xiangcheng <bobwxc@email.cn>
时奎亮 Alex Shi <alexs@kernel.org>
======
ID分配
======
:作者: Matthew Wilcox
概述
====
要解决的一个常见问题是分配标识符IDs它通常是标识事物的数字。比如包括文件描述
符、进程ID、网络协议中的数据包标识符、SCSI标记和设备实例编号。IDR和IDA为这个问题
提供了一个合理的解决方案以避免每个人都自创。IDR提供将ID映射到指针的能力而IDA
仅提供ID分配因此内存效率更高。
IDR接口已经被废弃请使用 ``XArray``
IDR的用法
=========
首先初始化一个IDR对于静态分配的IDR使用DEFINE_IDR()或者对于动态分配的IDR使用
idr_init()。
您可以调用idr_alloc()来分配一个未使用的ID。通过调用idr_find()查询与该ID相关的指针
并通过调用idr_remove()释放该ID。
如果需要更改与一个ID相关联的指针可以调用idr_replace()。这样做的一个常见原因是通
过将 ``NULL`` 指针传递给分配函数来保留ID用保留的ID初始化对象最后将初始化的对
象插入IDR。
一些用户需要分配大于 ``INT_MAX`` 的ID。到目前为止所有这些用户都满足 ``UINT_MAX``
的限制他们使用idr_alloc_u32()。如果您需要超出u32的ID我们将与您合作以满足您的
需求。
如果需要按顺序分配ID可以使用idr_alloc_cyclic()。处理较大数量的ID时IDR的效率会
降低,所以使用这个函数会有一点代价。
要对IDR使用的所有指针进行操作您可以使用基于回调的idr_for_each()或迭代器样式的
idr_for_each_entry()。您可能需要使用idr_for_each_entry_continue()来继续迭代。如果
迭代器不符合您的需求您也可以使用idr_get_next()。
当使用完IDR后您可以调用idr_destroy()来释放IDR占用的内存。这并不会释放IDR指向的
对象;如果您想这样做,请使用其中一个迭代器来执行此操作。
您可以使用idr_is_empty()来查看当前是否分配了任何ID。
如果在从IDR分配一个新ID时需要带锁您可能需要传递一组限制性的GFP标志但这可能导
致IDR无法分配内存。为了解决该问题您可以在获取锁之前调用idr_preload(),然后在分
配之后调用idr_preload_end()。
IDR同步的相关内容请见include/linux/idr.h文件中的“DOC: idr sync”。
IDA的用法
=========
IDA的用法的相关内容请见lib/idr.c文件中的“DOC: IDA description”。
函数和数据结构
==============
该API在以下内核代码中:
include/linux/idr.h
lib/idr.c

View File

@ -44,15 +44,15 @@
assoc_array
xarray
rbtree
idr
circular-buffers
generic-radix-tree
packing
Todolist:
idr
circular-buffers
generic-radix-tree
packing
this_cpu_ops
timekeeping
errseq

View File

@ -0,0 +1,160 @@
.. SPDX-License-Identifier: GPL-2.0+
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/core-api/packing.rst
:翻译:
周彬彬 Binbin Zhou <zhoubinbin@loongson.cn>
:校译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
吴想成 Wu Xiangcheng <bobwxc@email.cn>
时奎亮 Alex Shi <alexs@kernel.org>
========================
通用的位域打包和解包函数
========================
问题陈述
--------
使用硬件时,必须在几种与其交互的方法之间进行选择。
可以将指针映射到在硬件设备的内存区上精心设计的结构体,并将其字段作为结构成员(可
能声明为位域访问。但是由于CPU和硬件设备之间潜在的字节顺序不匹配以这种方式编写
代码会降低其可移植性。
此外,必须密切注意将硬件文档中的寄存器定义转换为结构的位域索引。此外,一些硬件
通常是网络设备倾向于以违反任何合理字边界有时甚至是64位的方式对其寄存器字
段进行分组。这就造成了不得不在结构中定义寄存器字段的“高”和“低”部分的不便。
结构域定义的更可靠的替代方法是通过移动适当数量的位来提取所需的字段。但这仍然不能
防止字节顺序不匹配,除非所有内存访问都是逐字节执行的。此外,代码很容易变得杂乱无
章,同时可能会在所需的许多位移操作中丢失一些高层次的想法。
许多驱动程序采用了位移的方法,然后试图用定制的宏来减少杂乱无章的东西,但更多的时
候,这些宏所采用的捷径依旧妨碍了代码真正的可移植性。
解决方案
--------
该API涉及2个基本操作
- 将一个CPU可使用的数字打包到内存缓冲区中具有硬件约束/特殊性)。
- 将内存缓冲区(具有硬件约束/特殊性解压缩为一个CPU可使用的数字。
该API提供了对所述硬件约束和特殊性以及CPU字节序的抽象因此这两者之间可能不匹配。
这些API函数的基本单元是u64。从CPU的角度来看位63总是意味着字节7的位偏移量7尽管
只是逻辑上的。问题是:我们将这个比特放在内存的什么位置?
以下示例介绍了打包u64字段的内存布局。打包缓冲区中的字节偏移量始终默认为01...7。
示例显示的是逻辑字节和位所在的位置。
1. 通常情况下(无特殊性),我们会这样做:
::
63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
7 6 5 4
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 2 1 0
也就是说CPU可使用的u64的MSByte(7)位于内存偏移量0处而u64的LSByte(0)位于内存偏移量7处。
这对应于大多数人认为的“大端”其中位i对应于数字2^i。这在代码注释中也称为“逻辑”符号。
2. 如果设置了QUIRK_MSB_ON_THE_RIGHT我们按如下方式操作
::
56 57 58 59 60 61 62 63 48 49 50 51 52 53 54 55 40 41 42 43 44 45 46 47 32 33 34 35 36 37 38 39
7 6 5 4
24 25 26 27 28 29 30 31 16 17 18 19 20 21 22 23 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7
3 2 1 0
也就是说QUIRK_MSB_ON_THE_RIGHT不会影响字节定位但会反转字节内的位偏移量。
3. 如果设置了QUIRK_LITTLE_ENDIAN我们按如下方式操作
::
39 38 37 36 35 34 33 32 47 46 45 44 43 42 41 40 55 54 53 52 51 50 49 48 63 62 61 60 59 58 57 56
4 5 6 7
7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 23 22 21 20 19 18 17 16 31 30 29 28 27 26 25 24
0 1 2 3
因此QUIRK_LITTLE_ENDIAN意味着在内存区域内每个4字节的字的每个字节都被放置在与
该字的边界相比的镜像位置。
4. 如果设置了QUIRK_MSB_ON_THE_RIGHT和QUIRK_LITTLE_ENDIAN我们这样做
::
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
4 5 6 7
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0 1 2 3
5. 如果只设置了QUIRK_LSW32_IS_FIRST我们这样做
::
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 2 1 0
63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
7 6 5 4
在这种情况下8字节内存区域解释如下前4字节对应最不重要的4字节的字后4字节对应
更重要的4字节的字。
6. 如果设置了QUIRK_LSW32_IS_FIRST和QUIRK_MSB_ON_THE_RIGHT我们这样做
::
24 25 26 27 28 29 30 31 16 17 18 19 20 21 22 23 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7
3 2 1 0
56 57 58 59 60 61 62 63 48 49 50 51 52 53 54 55 40 41 42 43 44 45 46 47 32 33 34 35 36 37 38 39
7 6 5 4
7. 如果设置了QUIRK_LSW32_IS_FIRST和QUIRK_LITTLE_ENDIAN则如下所示
::
7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 23 22 21 20 19 18 17 16 31 30 29 28 27 26 25 24
0 1 2 3
39 38 37 36 35 34 33 32 47 46 45 44 43 42 41 40 55 54 53 52 51 50 49 48 63 62 61 60 59 58 57 56
4 5 6 7
8. 如果设置了QUIRK_LSW32_IS_FIRSTQUIRK_LITTLE_ENDIAN和QUIRK_MSB_ON_THE_RIGHT
则如下所示:
::
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0 1 2 3
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
4 5 6 7
我们总是认为我们的偏移量好像没有特殊性,然后在访问内存区域之前翻译它们。
预期用途
--------
选择使用该API的驱动程序首先需要确定上述3种quirk组合共8种中的哪一种与硬件文档
中描述的相匹配。然后他们应该封装packing()函数创建一个新的xxx_packing(),使用
适当的QUIRK_* one-hot 位集合来调用它。
packing()函数返回一个int类型的错误码以防止程序员使用不正确的API。这些错误预计不
会在运行时发生因此xxx_packing()返回void并简单地接受这些错误是合理的。它可以选择
转储栈或打印错误描述。

View File

@ -0,0 +1,37 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/Devicetree/changesets.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
============
设备树变更集
============
设备树变更集是一种方法,它允许人们以这样一种方式在实时树中使用变化,即要么使用全部的
变化,要么不使用。如果在使用变更集的过程中发生错误,那么树将被回滚到之前的状态。一个
变更集也可以在使用后被删除。
当一个变更集被使用时所有的改变在发出OF_RECONFIG通知器之前被一次性使用到树上。这是
为了让接收者在收到通知时看到一个完整的、一致的树的状态。
一个变化集的顺序如下。
1. of_changeset_init() - 初始化一个变更集。
2. 一些DT树变化的调用of_changeset_attach_node(), of_changeset_detach_node(),
of_changeset_add_property(), of_changeset_remove_property,
of_changeset_update_property()来准备一组变更。此时不会对活动树做任何变更。所有
的变更操作都记录在of_changeset的 `entries` 列表中。
3. of_changeset_apply() - 将变更使用到树上。要么整个变更集被使用,要么如果有错误,
树会被恢复到之前的状态。核心通过锁确保正确的顺序。如果需要的话,可以使用一个解锁的
__of_changeset_apply版本。
如果一个成功使用的变更集需要被删除可以用of_changeset_revert()来完成。

View File

@ -0,0 +1,31 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/Devicetree/dynamic-resolution-notes.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
========================
Devicetree动态解析器说明
========================
本文描述了内核内DeviceTree解析器的实现它位于drivers/of/resolver.c中。
解析器如何工作?
----------------
解析器被赋予一个任意的树作为输入该树用适当的dtc选项编译并有一个/plugin/标签。这就产
生了适当的__fixups__和__local_fixups__节点。
解析器依次通过以下步骤工作:
1. 从实时树中获取最大的设备树phandle值 + 1.
2. 调整树的所有本地 phandles以解决这个量。
3. 使用 __local__fixups__ 节点信息以相同的量调整所有本地引用。
4. 对于__fixups__节点中的每个属性找到它在实时树中引用的节点。这是用来标记该节点的标签。
5. 检索fixup的目标的phandle。
6. 对于属性中的每个fixup找到节点:属性:偏移的位置并用phandle值替换它。

View File

@ -24,21 +24,16 @@ Open Firmware 和 Devicetree
usage-model
of_unittest
Todolist:
* kernel-api
kernel-api
Devicetree Overlays
===================
.. toctree::
:maxdepth: 1
Todolist:
* changesets
* dynamic-resolution-notes
* overlay-notes
changesets
dynamic-resolution-notes
overlay-notes
Devicetree Bindings
===================

View File

@ -0,0 +1,58 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/Devicetree/kernel-api.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
=================
内核中的设备树API
=================
核心函数
--------
该API在以下内核代码中:
drivers/of/base.c
include/linux/of.h
drivers/of/property.c
include/linux/of_graph.h
drivers/of/address.c
drivers/of/irq.c
drivers/of/fdt.c
驱动模型函数
------------
该API在以下内核代码中:
include/linux/of_device.h
drivers/of/device.c
include/linux/of_platform.h
drivers/of/platform.c
覆盖和动态DT函数
----------------
该API在以下内核代码中:
drivers/of/resolver.c
drivers/of/dynamic.c
drivers/of/overlay.c

View File

@ -0,0 +1,140 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/Devicetree/overlay-notes.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
==============
设备树覆盖说明
==============
本文档描述了drivers/of/overlay.c中的内核内设备树覆盖功能的实现
Documentation/devicetree/dynamic-resolution-notes.rst[1]的配套文档。
覆盖如何工作
------------
设备树覆盖的目的是修改内核的实时树,并使修改以反映变化的方式影响内核的状态。
由于内核主要处理的是设备,任何新的设备节点如果导致一个活动的设备,就应该创建它,
而如果设备节点被禁用或被全部删除,受影响的设备应该被取消注册。
让我们举个例子我们有一个foo板它的基本树形图如下::
---- foo.dts ---------------------------------------------------------------
/* FOO平台 */
/dts-v1/;
/ {
compatible = "corp,foo";
/* 共享的资源 */
res: res {
};
/* 芯片上的外围设备 */
ocp: ocp {
/* 总是被实例化的外围设备 */
peripheral1 { ... };
};
};
---- foo.dts ---------------------------------------------------------------
覆盖bar.dts,
::
---- bar.dts - 按标签覆盖目标位置 ----------------------------
/dts-v1/;
/插件/;
&ocp {
/* bar外围 */
bar {
compatible = "corp,bar";
... /* 各种属性和子节点 */
};
};
---- bar.dts ---------------------------------------------------------------
当加载(并按照[1]中描述的方式解决应该产生foo+bar.dts::
---- foo+bar.dts -----------------------------------------------------------
/* FOO平台 + bar外围 */
/ {
compatible = "corp,foo";
/* 共享资源 */
res: res {
};
/* 芯片上的外围设备 */
ocp: ocp {
/* 总是被实例化的外围设备 */
peripheral1 { ... };
/* bar外围 */
bar {
compatible = "corp,bar";
... /* 各种属性和子节点 */
};
};
};
---- foo+bar.dts -----------------------------------------------------------
作为覆盖的结果已经创建了一个新的设备节点bar因此将注册一个bar平台设备
如果加载了匹配的设备驱动程序,将按预期创建设备。
如果基础DT不是用-@选项编译的,那么“&ocp”标签将不能用于将覆盖节点解析到基础
DT中的适当位置。在这种情况下可以提供目标路径。通过标签的目标位置的语法是比
较好的因为不管标签在DT中出现在哪里覆盖都可以被应用到任何包含标签的基础DT上。
上面的bar.dts例子被修改为使用目标路径语法即为::
---- bar.dts - 通过明确的路径覆盖目标位置 --------------------
/dts-v1/;
/插件/;
&{/ocp} {
/* bar外围 */
bar {
compatible = "corp,bar";
... /* 各种外围设备和子节点 */
}
};
---- bar.dts ---------------------------------------------------------------
内核中关于覆盖的API
-------------------
该API相当容易使用。
1) 调用of_overlay_fdt_apply()来创建和应用一个覆盖的变更集。返回值是一个
错误或一个识别这个覆盖的cookie。
2) 调用of_overlay_remove()来删除和清理先前通过调用of_overlay_fdt_apply()
而创建的覆盖变更集。不允许删除一个被另一个覆盖的覆盖变化集。
最后如果你需要一次性删除所有的覆盖只需调用of_overlay_remove_all()
它将以正确的顺序删除每一个覆盖。
你可以选择注册在覆盖操作中被调用的通知器。详见
of_overlay_notifier_register/unregister和enum of_overlay_notify_action。
OF_OVERLAY_PRE_APPLY、OF_OVERLAY_POST_APPLY或OF_OVERLAY_PRE_REMOVE
的通知器回调可以存储指向覆盖层中的设备树节点或其内容的指针,但这些指针不能持
续到OF_OVERLAY_POST_REMOVE的通知器回调。在OF_OVERLAY_POST_REMOVE通
知器被调用后包含覆盖层的内存将被kfree()ed。请注意即使OF_OVERLAY_POST_REMOVE
的通知器返回错误内存也会被kfree()ed。
drivers/of/dynamic.c中的变更集通知器是第二种类型的通知器可以通过应用或移除
覆盖层来触发。这些通知器不允许在覆盖层或其内容中存储指向设备树节点的指针。当包含
覆盖层的内存因移除覆盖层而被释放时,覆盖层代码并不能防止这类指针仍然有效。
任何其他保留指向覆盖层节点或数据的指针的代码都被认为是一个错误,因为在移除覆盖层
后,该指针将指向已释放的内存。
覆盖层的用户必须特别注意系统上发生的整体操作,以确保其他内核代码不保留任何指向覆
盖层节点或数据的指针。任何无意中使用这种指针的例子是,如果一个驱动或子系统模块在
应用了覆盖后被加载,并且该驱动或子系统扫描了整个设备树或其大部分,包括覆盖节点。

View File

@ -0,0 +1,69 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../../disclaimer-zh_CN.rst
:Original: Documentation/driver-api/gpio/index.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
=======================
通用型输入/输出GPIO
=======================
目录:
.. toctree::
:maxdepth: 2
legacy
Todolist:
* intro
* using-gpio
* driver
* consumer
* board
* drivers-on-gpio
* bt8xxgpio
核心
====
该API在以下内核代码中:
include/linux/gpio/driver.h
drivers/gpio/gpiolib.c
ACPI支持
========
该API在以下内核代码中:
drivers/gpio/gpiolib-acpi.c
设备树支持
==========
该API在以下内核代码中:
drivers/gpio/gpiolib-of.c
设备管理支持
============
该API在以下内核代码中:
drivers/gpio/gpiolib-devres.c
sysfs帮助函数
=================
该API在以下内核代码中:
drivers/gpio/gpiolib-sysfs.c

Some files were not shown because too many files have changed in this diff Show More