It's been a relatively calm cycle in docsland. We do have:
- Some initial page-table documentation from Linus (the other Linus) - Regression-handling documentation improvements from Thorsten - Addition of kerneldoc documentation for the ERR_PTR() and related macros from James Seo ...and the usual collection of fixes and updates. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmSbC9wPHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5Yw7YH/Rcd2oVQ/B8ui9TYcXTQid0ly5GvLl/ot0zf pml725bZSKodcdtmLvQ6CzMGRdzxhQpVfzy21zHAlQWiBMdheWeu0Etmpspn8fCI wnJIlUbGdp5Aq4ZtoJPTtE3vXvWEQ32gVytGjbTVNtSLRLXQ1bc+A/IvmRj3jdkV dwPfN7hPLVhBt5770pHMywlFVBQ9FUjUNC+uX0JkcNZJ3598c4ZzndBEaLdqfPHC DtWucRdnHubTncKECgYbspsfH6zuntFk8FgsD1gZ1K9izMAwVBsKSS+MeOz8oxx8 rPq4Tscqs/9mpist/PqxEu0fvTC3xsyMbxLA4hAORmgpdnbWIaQ= =q2B4 -----END PGP SIGNATURE----- Merge tag 'docs-6.5' of git://git.lwn.net/linux Pull documentation updates from Jonathan Corbet: "It's been a relatively calm cycle in docsland. We do have: - Some initial page-table documentation from Linus (the other Linus) - Regression-handling documentation improvements from Thorsten - Addition of kerneldoc documentation for the ERR_PTR() and related macros from James Seo ... and the usual collection of fixes and updates" * tag 'docs-6.5' of git://git.lwn.net/linux: docs: consolidate storage interfaces Documentation: update git configuration for Link: tag Documentation: KVM: make corrections to vcpu-requests.rst Documentation: KVM: make corrections to ppc-pv.rst Documentation: KVM: make corrections to locking.rst Documentation: KVM: make corrections to halt-polling.rst Documentation: virt: correct location of haltpoll module params Documentation/mm: Initial page table documentation docs: crypto: async-tx-api: fix typo in struct name docs/doc-guide: Clarify how to write tables docs: handling-regressions: rework section about fixing procedures docs: process: fix a typoed cross-reference docs: submitting-patches: Discuss interleaved replies MAINTAINERS: direct process doc changes to a dedicated ML Documentation: core-api: Add error pointer functions to kernel-api err.h: Add missing kerneldocs for error pointer functions Documentation: conf.py: Add __force to c_id_attributes docs: clarify KVM related kernel parameters' descriptions docs: consolidate human interface subsystems docs: admin-guide: Add information about intel_pstate active mode
This commit is contained in:
commit
a354049532
@ -2112,6 +2112,16 @@
|
||||
disable
|
||||
Do not enable intel_pstate as the default
|
||||
scaling driver for the supported processors
|
||||
active
|
||||
Use intel_pstate driver to bypass the scaling
|
||||
governors layer of cpufreq and provides it own
|
||||
algorithms for p-state selection. There are two
|
||||
P-state selection algorithms provided by
|
||||
intel_pstate in the active mode: powersave and
|
||||
performance. The way they both operate depends
|
||||
on whether or not the hardware managed P-states
|
||||
(HWP) feature has been enabled in the processor
|
||||
and possibly on the processor model.
|
||||
passive
|
||||
Use intel_pstate as a scaling driver, but configure it
|
||||
to work with generic cpufreq governors (instead of
|
||||
@ -2546,12 +2556,13 @@
|
||||
If the value is 0 (the default), KVM will pick a period based
|
||||
on the ratio, such that a page is zapped after 1 hour on average.
|
||||
|
||||
kvm-amd.nested= [KVM,AMD] Allow nested virtualization in KVM/SVM.
|
||||
Default is 1 (enabled)
|
||||
kvm-amd.nested= [KVM,AMD] Control nested virtualization feature in
|
||||
KVM/SVM. Default is 1 (enabled).
|
||||
|
||||
kvm-amd.npt= [KVM,AMD] Disable nested paging (virtualized MMU)
|
||||
for all guests.
|
||||
Default is 1 (enabled) if in 64-bit or 32-bit PAE mode.
|
||||
kvm-amd.npt= [KVM,AMD] Control KVM's use of Nested Page Tables,
|
||||
a.k.a. Two-Dimensional Page Tables. Default is 1
|
||||
(enabled). Disable by KVM if hardware lacks support
|
||||
for NPT.
|
||||
|
||||
kvm-arm.mode=
|
||||
[KVM,ARM] Select one of KVM/arm64's modes of operation.
|
||||
@ -2597,30 +2608,33 @@
|
||||
Format: <integer>
|
||||
Default: 5
|
||||
|
||||
kvm-intel.ept= [KVM,Intel] Disable extended page tables
|
||||
(virtualized MMU) support on capable Intel chips.
|
||||
Default is 1 (enabled)
|
||||
kvm-intel.ept= [KVM,Intel] Control KVM's use of Extended Page Tables,
|
||||
a.k.a. Two-Dimensional Page Tables. Default is 1
|
||||
(enabled). Disable by KVM if hardware lacks support
|
||||
for EPT.
|
||||
|
||||
kvm-intel.emulate_invalid_guest_state=
|
||||
[KVM,Intel] Disable emulation of invalid guest state.
|
||||
Ignored if kvm-intel.enable_unrestricted_guest=1, as
|
||||
guest state is never invalid for unrestricted guests.
|
||||
This param doesn't apply to nested guests (L2), as KVM
|
||||
never emulates invalid L2 guest state.
|
||||
Default is 1 (enabled)
|
||||
[KVM,Intel] Control whether to emulate invalid guest
|
||||
state. Ignored if kvm-intel.enable_unrestricted_guest=1,
|
||||
as guest state is never invalid for unrestricted
|
||||
guests. This param doesn't apply to nested guests (L2),
|
||||
as KVM never emulates invalid L2 guest state.
|
||||
Default is 1 (enabled).
|
||||
|
||||
kvm-intel.flexpriority=
|
||||
[KVM,Intel] Disable FlexPriority feature (TPR shadow).
|
||||
Default is 1 (enabled)
|
||||
[KVM,Intel] Control KVM's use of FlexPriority feature
|
||||
(TPR shadow). Default is 1 (enabled). Disalbe by KVM if
|
||||
hardware lacks support for it.
|
||||
|
||||
kvm-intel.nested=
|
||||
[KVM,Intel] Enable VMX nesting (nVMX).
|
||||
Default is 0 (disabled)
|
||||
[KVM,Intel] Control nested virtualization feature in
|
||||
KVM/VMX. Default is 1 (enabled).
|
||||
|
||||
kvm-intel.unrestricted_guest=
|
||||
[KVM,Intel] Disable unrestricted guest feature
|
||||
(virtualized real and unpaged mode) on capable
|
||||
Intel chips. Default is 1 (enabled)
|
||||
[KVM,Intel] Control KVM's use of unrestricted guest
|
||||
feature (virtualized real and unpaged mode). Default
|
||||
is 1 (enabled). Disable by KVM if EPT is disabled or
|
||||
hardware lacks support for it.
|
||||
|
||||
kvm-intel.vmentry_l1d_flush=[KVM,Intel] Mitigation for L1 Terminal Fault
|
||||
CVE-2018-3620.
|
||||
@ -2634,9 +2648,10 @@
|
||||
|
||||
Default is cond (do L1 cache flush in specific instances)
|
||||
|
||||
kvm-intel.vpid= [KVM,Intel] Disable Virtual Processor Identification
|
||||
feature (tagged TLBs) on capable Intel chips.
|
||||
Default is 1 (enabled)
|
||||
kvm-intel.vpid= [KVM,Intel] Control KVM's use of Virtual Processor
|
||||
Identification feature (tagged TLBs). Default is 1
|
||||
(enabled). Disable by KVM if hardware lacks support
|
||||
for it.
|
||||
|
||||
l1d_flush= [X86,INTEL]
|
||||
Control mitigation for L1D based snooping vulnerability.
|
||||
|
@ -74,6 +74,7 @@ if major >= 3:
|
||||
"__percpu",
|
||||
"__rcu",
|
||||
"__user",
|
||||
"__force",
|
||||
|
||||
# include/linux/compiler_attributes.h:
|
||||
"__alias",
|
||||
|
@ -96,6 +96,12 @@ Command-line Parsing
|
||||
.. kernel-doc:: lib/cmdline.c
|
||||
:export:
|
||||
|
||||
Error Pointers
|
||||
--------------
|
||||
|
||||
.. kernel-doc:: include/linux/err.h
|
||||
:internal:
|
||||
|
||||
Sorting
|
||||
-------
|
||||
|
||||
|
@ -66,7 +66,7 @@ features surfaced as a result:
|
||||
::
|
||||
|
||||
struct dma_async_tx_descriptor *
|
||||
async_<operation>(<op specific parameters>, struct async_submit ctl *submit)
|
||||
async_<operation>(<op specific parameters>, struct async_submit_ctl *submit)
|
||||
|
||||
3.2 Supported operations
|
||||
------------------------
|
||||
|
@ -313,9 +313,18 @@ the documentation build system will automatically turn a reference to
|
||||
function name exists. If you see ``c:func:`` use in a kernel document,
|
||||
please feel free to remove it.
|
||||
|
||||
Tables
|
||||
------
|
||||
|
||||
ReStructuredText provides several options for table syntax. Kernel style for
|
||||
tables is to prefer *simple table* syntax or *grid table* syntax. See the
|
||||
`reStructuredText user reference for table syntax`_ for more details.
|
||||
|
||||
.. _reStructuredText user reference for table syntax:
|
||||
https://docutils.sourceforge.io/docs/user/rst/quickref.html#tables
|
||||
|
||||
list tables
|
||||
-----------
|
||||
~~~~~~~~~~~
|
||||
|
||||
The list-table formats can be useful for tables that are not easily laid
|
||||
out in the usual Sphinx ASCII-art formats. These formats are nearly
|
||||
|
@ -56,7 +56,7 @@ by adding the following hook into your git:
|
||||
$ cat >.git/hooks/applypatch-msg <<'EOF'
|
||||
#!/bin/sh
|
||||
. git-sh-setup
|
||||
perl -pi -e 's|^Message-Id:\s*<?([^>]+)>?$|Link: https://lore.kernel.org/r/$1|g;' "$1"
|
||||
perl -pi -e 's|^Message-I[dD]:\s*<?([^>]+)>?$|Link: https://lore.kernel.org/r/$1|g;' "$1"
|
||||
test -x "$GIT_DIR/hooks/commit-msg" &&
|
||||
exec "$GIT_DIR/hooks/commit-msg" ${1+"$@"}
|
||||
:
|
||||
|
@ -3,3 +3,152 @@
|
||||
===========
|
||||
Page Tables
|
||||
===========
|
||||
|
||||
Paged virtual memory was invented along with virtual memory as a concept in
|
||||
1962 on the Ferranti Atlas Computer which was the first computer with paged
|
||||
virtual memory. The feature migrated to newer computers and became a de facto
|
||||
feature of all Unix-like systems as time went by. In 1985 the feature was
|
||||
included in the Intel 80386, which was the CPU Linux 1.0 was developed on.
|
||||
|
||||
Page tables map virtual addresses as seen by the CPU into physical addresses
|
||||
as seen on the external memory bus.
|
||||
|
||||
Linux defines page tables as a hierarchy which is currently five levels in
|
||||
height. The architecture code for each supported architecture will then
|
||||
map this to the restrictions of the hardware.
|
||||
|
||||
The physical address corresponding to the virtual address is often referenced
|
||||
by the underlying physical page frame. The **page frame number** or **pfn**
|
||||
is the physical address of the page (as seen on the external memory bus)
|
||||
divided by `PAGE_SIZE`.
|
||||
|
||||
Physical memory address 0 will be *pfn 0* and the highest pfn will be
|
||||
the last page of physical memory the external address bus of the CPU can
|
||||
address.
|
||||
|
||||
With a page granularity of 4KB and a address range of 32 bits, pfn 0 is at
|
||||
address 0x00000000, pfn 1 is at address 0x00001000, pfn 2 is at 0x00002000
|
||||
and so on until we reach pfn 0xfffff at 0xfffff000. With 16KB pages pfs are
|
||||
at 0x00004000, 0x00008000 ... 0xffffc000 and pfn goes from 0 to 0x3fffff.
|
||||
|
||||
As you can see, with 4KB pages the page base address uses bits 12-31 of the
|
||||
address, and this is why `PAGE_SHIFT` in this case is defined as 12 and
|
||||
`PAGE_SIZE` is usually defined in terms of the page shift as `(1 << PAGE_SHIFT)`
|
||||
|
||||
Over time a deeper hierarchy has been developed in response to increasing memory
|
||||
sizes. When Linux was created, 4KB pages and a single page table called
|
||||
`swapper_pg_dir` with 1024 entries was used, covering 4MB which coincided with
|
||||
the fact that Torvald's first computer had 4MB of physical memory. Entries in
|
||||
this single table were referred to as *PTE*:s - page table entries.
|
||||
|
||||
The software page table hierarchy reflects the fact that page table hardware has
|
||||
become hierarchical and that in turn is done to save page table memory and
|
||||
speed up mapping.
|
||||
|
||||
One could of course imagine a single, linear page table with enormous amounts
|
||||
of entries, breaking down the whole memory into single pages. Such a page table
|
||||
would be very sparse, because large portions of the virtual memory usually
|
||||
remains unused. By using hierarchical page tables large holes in the virtual
|
||||
address space does not waste valuable page table memory, because it will suffice
|
||||
to mark large areas as unmapped at a higher level in the page table hierarchy.
|
||||
|
||||
Additionally, on modern CPUs, a higher level page table entry can point directly
|
||||
to a physical memory range, which allows mapping a contiguous range of several
|
||||
megabytes or even gigabytes in a single high-level page table entry, taking
|
||||
shortcuts in mapping virtual memory to physical memory: there is no need to
|
||||
traverse deeper in the hierarchy when you find a large mapped range like this.
|
||||
|
||||
The page table hierarchy has now developed into this::
|
||||
|
||||
+-----+
|
||||
| PGD |
|
||||
+-----+
|
||||
|
|
||||
| +-----+
|
||||
+-->| P4D |
|
||||
+-----+
|
||||
|
|
||||
| +-----+
|
||||
+-->| PUD |
|
||||
+-----+
|
||||
|
|
||||
| +-----+
|
||||
+-->| PMD |
|
||||
+-----+
|
||||
|
|
||||
| +-----+
|
||||
+-->| PTE |
|
||||
+-----+
|
||||
|
||||
|
||||
Symbols on the different levels of the page table hierarchy have the following
|
||||
meaning beginning from the bottom:
|
||||
|
||||
- **pte**, `pte_t`, `pteval_t` = **Page Table Entry** - mentioned earlier.
|
||||
The *pte* is an array of `PTRS_PER_PTE` elements of the `pteval_t` type, each
|
||||
mapping a single page of virtual memory to a single page of physical memory.
|
||||
The architecture defines the size and contents of `pteval_t`.
|
||||
|
||||
A typical example is that the `pteval_t` is a 32- or 64-bit value with the
|
||||
upper bits being a **pfn** (page frame number), and the lower bits being some
|
||||
architecture-specific bits such as memory protection.
|
||||
|
||||
The **entry** part of the name is a bit confusing because while in Linux 1.0
|
||||
this did refer to a single page table entry in the single top level page
|
||||
table, it was retrofitted to be an array of mapping elements when two-level
|
||||
page tables were first introduced, so the *pte* is the lowermost page
|
||||
*table*, not a page table *entry*.
|
||||
|
||||
- **pmd**, `pmd_t`, `pmdval_t` = **Page Middle Directory**, the hierarchy right
|
||||
above the *pte*, with `PTRS_PER_PMD` references to the *pte*:s.
|
||||
|
||||
- **pud**, `pud_t`, `pudval_t` = **Page Upper Directory** was introduced after
|
||||
the other levels to handle 4-level page tables. It is potentially unused,
|
||||
or *folded* as we will discuss later.
|
||||
|
||||
- **p4d**, `p4d_t`, `p4dval_t` = **Page Level 4 Directory** was introduced to
|
||||
handle 5-level page tables after the *pud* was introduced. Now it was clear
|
||||
that we needed to replace *pgd*, *pmd*, *pud* etc with a figure indicating the
|
||||
directory level and that we cannot go on with ad hoc names any more. This
|
||||
is only used on systems which actually have 5 levels of page tables, otherwise
|
||||
it is folded.
|
||||
|
||||
- **pgd**, `pgd_t`, `pgdval_t` = **Page Global Directory** - the Linux kernel
|
||||
main page table handling the PGD for the kernel memory is still found in
|
||||
`swapper_pg_dir`, but each userspace process in the system also has its own
|
||||
memory context and thus its own *pgd*, found in `struct mm_struct` which
|
||||
in turn is referenced to in each `struct task_struct`. So tasks have memory
|
||||
context in the form of a `struct mm_struct` and this in turn has a
|
||||
`struct pgt_t *pgd` pointer to the corresponding page global directory.
|
||||
|
||||
To repeat: each level in the page table hierarchy is a *array of pointers*, so
|
||||
the **pgd** contains `PTRS_PER_PGD` pointers to the next level below, **p4d**
|
||||
contains `PTRS_PER_P4D` pointers to **pud** items and so on. The number of
|
||||
pointers on each level is architecture-defined.::
|
||||
|
||||
PMD
|
||||
--> +-----+ PTE
|
||||
| ptr |-------> +-----+
|
||||
| ptr |- | ptr |-------> PAGE
|
||||
| ptr | \ | ptr |
|
||||
| ptr | \ ...
|
||||
| ... | \
|
||||
| ptr | \ PTE
|
||||
+-----+ +----> +-----+
|
||||
| ptr |-------> PAGE
|
||||
| ptr |
|
||||
...
|
||||
|
||||
|
||||
Page Table Folding
|
||||
==================
|
||||
|
||||
If the architecture does not use all the page table levels, they can be *folded*
|
||||
which means skipped, and all operations performed on page tables will be
|
||||
compile-time augmented to just skip a level when accessing the next lower
|
||||
level.
|
||||
|
||||
Page table handling code that wishes to be architecture-neutral, such as the
|
||||
virtual memory manager, will need to be written so that it traverses all of the
|
||||
currently five levels. This style should also be preferred for
|
||||
architecture-specific code, so as to be robust to future changes.
|
||||
|
@ -434,9 +434,10 @@ There are a few hints which can help with linux-kernel survival:
|
||||
questions. Some developers can get impatient with people who clearly
|
||||
have not done their homework.
|
||||
|
||||
- Avoid top-posting (the practice of putting your answer above the quoted
|
||||
text you are responding to). It makes your response harder to read and
|
||||
makes a poor impression.
|
||||
- Use interleaved ("inline") replies, which makes your response easier to
|
||||
read. (i.e. avoid top-posting -- the practice of putting your answer above
|
||||
the quoted text you are responding to.) For more details, see
|
||||
:ref:`Documentation/process/submitting-patches.rst <interleaved_replies>`.
|
||||
|
||||
- Ask on the correct mailing list. Linux-kernel may be the general meeting
|
||||
point, but it is not the best place to find developers from all
|
||||
|
@ -129,88 +129,132 @@ tools and scripts used by other kernel developers or Linux distributions; one of
|
||||
these tools is regzbot, which heavily relies on the "Link:" tags to associate
|
||||
reports for regression with changes resolving them.
|
||||
|
||||
Prioritize work on fixing regressions
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Expectations and best practices for fixing regressions
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
You should fix any reported regression as quickly as possible, to provide
|
||||
affected users with a solution in a timely manner and prevent more users from
|
||||
running into the issue; nevertheless developers need to take enough time and
|
||||
care to ensure regression fixes do not cause additional damage.
|
||||
As a Linux kernel developer, you are expected to give your best to prevent
|
||||
situations where a regression caused by a recent change of yours leaves users
|
||||
only these options:
|
||||
|
||||
In the end though, developers should give their best to prevent users from
|
||||
running into situations where a regression leaves them only three options: "run
|
||||
a kernel with a regression that seriously impacts usage", "continue running an
|
||||
outdated and thus potentially insecure kernel version for more than two weeks
|
||||
after a regression's culprit was identified", and "downgrade to a still
|
||||
supported kernel series that lack required features".
|
||||
* Run a kernel with a regression that impacts usage.
|
||||
|
||||
How to realize this depends a lot on the situation. Here are a few rules of
|
||||
thumb for you, in order or importance:
|
||||
* Switch to an older or newer kernel series.
|
||||
|
||||
* Prioritize work on handling regression reports and fixing regression over all
|
||||
other Linux kernel work, unless the latter concerns acute security issues or
|
||||
bugs causing data loss or damage.
|
||||
* Continue running an outdated and thus potentially insecure kernel for more
|
||||
than three weeks after the regression's culprit was identified. Ideally it
|
||||
should be less than two. And it ought to be just a few days, if the issue is
|
||||
severe or affects many users -- either in general or in prevalent
|
||||
environments.
|
||||
|
||||
* Always consider reverting the culprit commits and reapplying them later
|
||||
together with necessary fixes, as this might be the least dangerous and
|
||||
quickest way to fix a regression.
|
||||
How to realize that in practice depends on various factors. Use the following
|
||||
rules of thumb as a guide.
|
||||
|
||||
* Developers should handle regressions in all supported kernel series, but are
|
||||
free to delegate the work to the stable team, if the issue probably at no
|
||||
point in time occurred with mainline.
|
||||
In general:
|
||||
|
||||
* Try to resolve any regressions introduced in the current development before
|
||||
its end. If you fear a fix might be too risky to apply only days before a new
|
||||
mainline release, let Linus decide: submit the fix separately to him as soon
|
||||
as possible with the explanation of the situation. He then can make a call
|
||||
and postpone the release if necessary, for example if multiple such changes
|
||||
show up in his inbox.
|
||||
* Prioritize work on regressions over all other Linux kernel work, unless the
|
||||
latter concerns a severe issue (e.g. acute security vulnerability, data loss,
|
||||
bricked hardware, ...).
|
||||
|
||||
* Address regressions in stable, longterm, or proper mainline releases with
|
||||
more urgency than regressions in mainline pre-releases. That changes after
|
||||
the release of the fifth pre-release, aka "-rc5": mainline then becomes as
|
||||
important, to ensure all the improvements and fixes are ideally tested
|
||||
together for at least one week before Linus releases a new mainline version.
|
||||
* Expedite fixing mainline regressions that recently made it into a proper
|
||||
mainline, stable, or longterm release (either directly or via backport).
|
||||
|
||||
* Fix regressions within two or three days, if they are critical for some
|
||||
reason -- for example, if the issue is likely to affect many users of the
|
||||
kernel series in question on all or certain architectures. Note, this
|
||||
includes mainline, as issues like compile errors otherwise might prevent many
|
||||
testers or continuous integration systems from testing the series.
|
||||
* Do not consider regressions from the current cycle as something that can wait
|
||||
till the end of the cycle, as the issue might discourage or prevent users and
|
||||
CI systems from testing mainline now or generally.
|
||||
|
||||
* Aim to fix regressions within one week after the culprit was identified, if
|
||||
the issue was introduced in either:
|
||||
* Work with the required care to avoid additional or bigger damage, even if
|
||||
resolving an issue then might take longer than outlined below.
|
||||
|
||||
* a recent stable/longterm release
|
||||
On timing once the culprit of a regression is known:
|
||||
|
||||
* the development cycle of the latest proper mainline release
|
||||
* Aim to mainline a fix within two or three days, if the issue is severe or
|
||||
bothering many users -- either in general or in prevalent conditions like a
|
||||
particular hardware environment, distribution, or stable/longterm series.
|
||||
|
||||
In the latter case (say Linux v5.14), try to address regressions even
|
||||
quicker, if the stable series for the predecessor (v5.13) will be abandoned
|
||||
soon or already was stamped "End-of-Life" (EOL) -- this usually happens about
|
||||
three to four weeks after a new mainline release.
|
||||
* Aim to mainline a fix by Sunday after the next, if the culprit made it
|
||||
into a recent mainline, stable, or longterm release (either directly or via
|
||||
backport); if the culprit became known early during a week and is simple to
|
||||
resolve, try to mainline the fix within the same week.
|
||||
|
||||
* Try to fix all other regressions within two weeks after the culprit was
|
||||
found. Two or three additional weeks are acceptable for performance
|
||||
regressions and other issues which are annoying, but don't prevent anyone
|
||||
from running Linux (unless it's an issue in the current development cycle,
|
||||
as those should ideally be addressed before the release). A few weeks in
|
||||
total are acceptable if a regression can only be fixed with a risky change
|
||||
and at the same time is affecting only a few users; as much time is
|
||||
also okay if the regression is already present in the second newest longterm
|
||||
kernel series.
|
||||
* For other regressions, aim to mainline fixes before the hindmost Sunday
|
||||
within the next three weeks. One or two Sundays later are acceptable, if the
|
||||
regression is something people can live with easily for a while -- like a
|
||||
mild performance regression.
|
||||
|
||||
Note: The aforementioned time frames for resolving regressions are meant to
|
||||
include getting the fix tested, reviewed, and merged into mainline, ideally with
|
||||
the fix being in linux-next at least briefly. This leads to delays you need to
|
||||
account for.
|
||||
* It's strongly discouraged to delay mainlining regression fixes till the next
|
||||
merge window, except when the fix is extraordinarily risky or when the
|
||||
culprit was mainlined more than a year ago.
|
||||
|
||||
Subsystem maintainers are expected to assist in reaching those periods by doing
|
||||
timely reviews and quick handling of accepted patches. They thus might have to
|
||||
send git-pull requests earlier or more often than usual; depending on the fix,
|
||||
it might even be acceptable to skip testing in linux-next. Especially fixes for
|
||||
regressions in stable and longterm kernels need to be handled quickly, as fixes
|
||||
need to be merged in mainline before they can be backported to older series.
|
||||
On procedure:
|
||||
|
||||
* Always consider reverting the culprit, as it's often the quickest and least
|
||||
dangerous way to fix a regression. Don't worry about mainlining a fixed
|
||||
variant later: that should be straight-forward, as most of the code went
|
||||
through review once already.
|
||||
|
||||
* Try to resolve any regressions introduced in mainline during the past
|
||||
twelve months before the current development cycle ends: Linus wants such
|
||||
regressions to be handled like those from the current cycle, unless fixing
|
||||
bears unusual risks.
|
||||
|
||||
* Consider CCing Linus on discussions or patch review, if a regression seems
|
||||
tangly. Do the same in precarious or urgent cases -- especially if the
|
||||
subsystem maintainer might be unavailable. Also CC the stable team, when you
|
||||
know such a regression made it into a mainline, stable, or longterm release.
|
||||
|
||||
* For urgent regressions, consider asking Linus to pick up the fix straight
|
||||
from the mailing list: he is totally fine with that for uncontroversial
|
||||
fixes. Ideally though such requests should happen in accordance with the
|
||||
subsystem maintainers or come directly from them.
|
||||
|
||||
* In case you are unsure if a fix is worth the risk applying just days before
|
||||
a new mainline release, send Linus a mail with the usual lists and people in
|
||||
CC; in it, summarize the situation while asking him to consider picking up
|
||||
the fix straight from the list. He then himself can make the call and when
|
||||
needed even postpone the release. Such requests again should ideally happen
|
||||
in accordance with the subsystem maintainers or come directly from them.
|
||||
|
||||
Regarding stable and longterm kernels:
|
||||
|
||||
* You are free to leave regressions to the stable team, if they at no point in
|
||||
time occurred with mainline or were fixed there already.
|
||||
|
||||
* If a regression made it into a proper mainline release during the past
|
||||
twelve months, ensure to tag the fix with "Cc: stable@vger.kernel.org", as a
|
||||
"Fixes:" tag alone does not guarantee a backport. Please add the same tag,
|
||||
in case you know the culprit was backported to stable or longterm kernels.
|
||||
|
||||
* When receiving reports about regressions in recent stable or longterm kernel
|
||||
series, please evaluate at least briefly if the issue might happen in current
|
||||
mainline as well -- and if that seems likely, take hold of the report. If in
|
||||
doubt, ask the reporter to check mainline.
|
||||
|
||||
* Whenever you want to swiftly resolve a regression that recently also made it
|
||||
into a proper mainline, stable, or longterm release, fix it quickly in
|
||||
mainline; when appropriate thus involve Linus to fast-track the fix (see
|
||||
above). That's because the stable team normally does neither revert nor fix
|
||||
any changes that cause the same problems in mainline.
|
||||
|
||||
* In case of urgent regression fixes you might want to ensure prompt
|
||||
backporting by dropping the stable team a note once the fix was mainlined;
|
||||
this is especially advisable during merge windows and shortly thereafter, as
|
||||
the fix otherwise might land at the end of a huge patch queue.
|
||||
|
||||
On patch flow:
|
||||
|
||||
* Developers, when trying to reach the time periods mentioned above, remember
|
||||
to account for the time it takes to get fixes tested, reviewed, and merged by
|
||||
Linus, ideally with them being in linux-next at least briefly. Hence, if a
|
||||
fix is urgent, make it obvious to ensure others handle it appropriately.
|
||||
|
||||
* Reviewers, you are kindly asked to assist developers in reaching the time
|
||||
periods mentioned above by reviewing regression fixes in a timely manner.
|
||||
|
||||
* Subsystem maintainers, you likewise are encouraged to expedite the handling
|
||||
of regression fixes. Thus evaluate if skipping linux-next is an option for
|
||||
the particular fix. Also consider sending git pull requests more often than
|
||||
usual when needed. And try to avoid holding onto regression fixes over
|
||||
weekends -- especially when the fix is marked for backporting.
|
||||
|
||||
|
||||
More aspects regarding regressions developers should be aware of
|
||||
|
@ -331,6 +331,31 @@ explaining difference against previous submission (see
|
||||
See Documentation/process/email-clients.rst for recommendations on email
|
||||
clients and mailing list etiquette.
|
||||
|
||||
.. _interleaved_replies:
|
||||
|
||||
Use trimmed interleaved replies in email discussions
|
||||
----------------------------------------------------
|
||||
Top-posting is strongly discouraged in Linux kernel development
|
||||
discussions. Interleaved (or "inline") replies make conversations much
|
||||
easier to follow. For more details see:
|
||||
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
|
||||
|
||||
As is frequently quoted on the mailing list::
|
||||
|
||||
A: http://en.wikipedia.org/wiki/Top_post
|
||||
Q: Were do I find info about this thing called top-posting?
|
||||
A: Because it messes up the order in which people normally read text.
|
||||
Q: Why is top-posting such a bad thing?
|
||||
A: Top-posting.
|
||||
Q: What is the most annoying thing in e-mail?
|
||||
|
||||
Similarly, please trim all unneeded quotations that aren't relevant
|
||||
to your reply. This makes responses easier to find, and saves time and
|
||||
space. For more details see: http://daringfireball.net/2007/07/on_top ::
|
||||
|
||||
A: No.
|
||||
Q: Should I include quotations after my reply?
|
||||
|
||||
.. _resend_reminders:
|
||||
|
||||
Don't get discouraged - or impatient
|
||||
|
@ -10,6 +10,30 @@ is taken directly from the kernel source, with supplemental material added
|
||||
as needed (or at least as we managed to add it — probably *not* all that is
|
||||
needed).
|
||||
|
||||
Human interfaces
|
||||
----------------
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
input/index
|
||||
hid/index
|
||||
sound/index
|
||||
gpu/index
|
||||
fb/index
|
||||
|
||||
Storage interfaces
|
||||
------------------
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
filesystems/index
|
||||
block/index
|
||||
cdrom/index
|
||||
scsi/index
|
||||
target/index
|
||||
|
||||
**Fixme**: much more organizational work is needed here.
|
||||
|
||||
.. toctree::
|
||||
@ -19,12 +43,8 @@ needed).
|
||||
core-api/index
|
||||
locking/index
|
||||
accounting/index
|
||||
block/index
|
||||
cdrom/index
|
||||
cpu-freq/index
|
||||
fb/index
|
||||
fpga/index
|
||||
hid/index
|
||||
i2c/index
|
||||
iio/index
|
||||
isdn/index
|
||||
@ -34,25 +54,19 @@ needed).
|
||||
networking/index
|
||||
pcmcia/index
|
||||
power/index
|
||||
target/index
|
||||
timers/index
|
||||
spi/index
|
||||
w1/index
|
||||
watchdog/index
|
||||
virt/index
|
||||
input/index
|
||||
hwmon/index
|
||||
gpu/index
|
||||
accel/index
|
||||
security/index
|
||||
sound/index
|
||||
crypto/index
|
||||
filesystems/index
|
||||
mm/index
|
||||
bpf/index
|
||||
usb/index
|
||||
PCI/index
|
||||
scsi/index
|
||||
misc-devices/index
|
||||
scheduler/index
|
||||
mhi/index
|
||||
|
@ -72,7 +72,7 @@ high once achieves global guest_halt_poll_ns value).
|
||||
|
||||
Default: Y
|
||||
|
||||
The module parameters can be set from the debugfs files in::
|
||||
The module parameters can be set from the sysfs files in::
|
||||
|
||||
/sys/module/haltpoll/parameters/
|
||||
|
||||
|
@ -112,11 +112,11 @@ powerpc kvm-hv case.
|
||||
| | function. | |
|
||||
+-----------------------+---------------------------+-------------------------+
|
||||
|
||||
These module parameters can be set from the debugfs files in:
|
||||
These module parameters can be set from the sysfs files in:
|
||||
|
||||
/sys/module/kvm/parameters/
|
||||
|
||||
Note: that these module parameters are system wide values and are not able to
|
||||
Note: these module parameters are system-wide values and are not able to
|
||||
be tuned on a per vm basis.
|
||||
|
||||
Any changes to these parameters will be picked up by new and existing vCPUs the
|
||||
@ -142,12 +142,12 @@ Further Notes
|
||||
global max polling interval (halt_poll_ns) then the host will always poll for the
|
||||
entire block time and thus cpu utilisation will go to 100%.
|
||||
|
||||
- Halt polling essentially presents a trade off between power usage and latency and
|
||||
- Halt polling essentially presents a trade-off between power usage and latency and
|
||||
the module parameters should be used to tune the affinity for this. Idle cpu time is
|
||||
essentially converted to host kernel time with the aim of decreasing latency when
|
||||
entering the guest.
|
||||
|
||||
- Halt polling will only be conducted by the host when no other tasks are runnable on
|
||||
that cpu, otherwise the polling will cease immediately and schedule will be invoked to
|
||||
allow that other task to run. Thus this doesn't allow a guest to denial of service the
|
||||
cpu.
|
||||
allow that other task to run. Thus this doesn't allow a guest to cause denial of service
|
||||
of the cpu.
|
||||
|
@ -67,7 +67,7 @@ following two cases:
|
||||
2. Write-Protection: The SPTE is present and the fault is caused by
|
||||
write-protect. That means we just need to change the W bit of the spte.
|
||||
|
||||
What we use to avoid all the race is the Host-writable bit and MMU-writable bit
|
||||
What we use to avoid all the races is the Host-writable bit and MMU-writable bit
|
||||
on the spte:
|
||||
|
||||
- Host-writable means the gfn is writable in the host kernel page tables and in
|
||||
@ -130,7 +130,7 @@ to gfn. For indirect sp, we disabled fast page fault for simplicity.
|
||||
A solution for indirect sp could be to pin the gfn, for example via
|
||||
kvm_vcpu_gfn_to_pfn_atomic, before the cmpxchg. After the pinning:
|
||||
|
||||
- We have held the refcount of pfn that means the pfn can not be freed and
|
||||
- We have held the refcount of pfn; that means the pfn can not be freed and
|
||||
be reused for another gfn.
|
||||
- The pfn is writable and therefore it cannot be shared between different gfns
|
||||
by KSM.
|
||||
@ -186,22 +186,22 @@ writable between reading spte and updating spte. Like below case:
|
||||
The Dirty bit is lost in this case.
|
||||
|
||||
In order to avoid this kind of issue, we always treat the spte as "volatile"
|
||||
if it can be updated out of mmu-lock, see spte_has_volatile_bits(), it means,
|
||||
if it can be updated out of mmu-lock [see spte_has_volatile_bits()]; it means
|
||||
the spte is always atomically updated in this case.
|
||||
|
||||
3) flush tlbs due to spte updated
|
||||
|
||||
If the spte is updated from writable to readonly, we should flush all TLBs,
|
||||
If the spte is updated from writable to read-only, we should flush all TLBs,
|
||||
otherwise rmap_write_protect will find a read-only spte, even though the
|
||||
writable spte might be cached on a CPU's TLB.
|
||||
|
||||
As mentioned before, the spte can be updated to writable out of mmu-lock on
|
||||
fast page fault path, in order to easily audit the path, we see if TLBs need
|
||||
be flushed caused by this reason in mmu_spte_update() since this is a common
|
||||
fast page fault path. In order to easily audit the path, we see if TLBs needing
|
||||
to be flushed caused this reason in mmu_spte_update() since this is a common
|
||||
function to update spte (present -> present).
|
||||
|
||||
Since the spte is "volatile" if it can be updated out of mmu-lock, we always
|
||||
atomically update the spte, the race caused by fast page fault can be avoided,
|
||||
atomically update the spte and the race caused by fast page fault can be avoided.
|
||||
See the comments in spte_has_volatile_bits() and mmu_spte_update().
|
||||
|
||||
Lockless Access Tracking:
|
||||
@ -283,9 +283,9 @@ time it will be set using the Dirty tracking mechanism described above.
|
||||
:Arch: x86
|
||||
:Protects: wakeup_vcpus_on_cpu
|
||||
:Comment: This is a per-CPU lock and it is used for VT-d posted-interrupts.
|
||||
When VT-d posted-interrupts is supported and the VM has assigned
|
||||
When VT-d posted-interrupts are supported and the VM has assigned
|
||||
devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu
|
||||
protected by blocked_vcpu_on_cpu_lock, when VT-d hardware issues
|
||||
protected by blocked_vcpu_on_cpu_lock. When VT-d hardware issues
|
||||
wakeup notification event since external interrupts from the
|
||||
assigned devices happens, we will find the vCPU on the list to
|
||||
wakeup.
|
||||
|
@ -89,7 +89,7 @@ also define a new hypercall feature to indicate that the host can give you more
|
||||
registers. Only if the host supports the additional features, make use of them.
|
||||
|
||||
The magic page layout is described by struct kvm_vcpu_arch_shared
|
||||
in arch/powerpc/include/asm/kvm_para.h.
|
||||
in arch/powerpc/include/uapi/asm/kvm_para.h.
|
||||
|
||||
Magic page features
|
||||
===================
|
||||
@ -112,7 +112,7 @@ Magic page flags
|
||||
================
|
||||
|
||||
In addition to features that indicate whether a host is capable of a particular
|
||||
feature we also have a channel for a guest to tell the guest whether it's capable
|
||||
feature we also have a channel for a guest to tell the host whether it's capable
|
||||
of something. This is what we call "flags".
|
||||
|
||||
Flags are passed to the host in the low 12 bits of the Effective Address.
|
||||
@ -139,7 +139,7 @@ Patched instructions
|
||||
====================
|
||||
|
||||
The "ld" and "std" instructions are transformed to "lwz" and "stw" instructions
|
||||
respectively on 32 bit systems with an added offset of 4 to accommodate for big
|
||||
respectively on 32-bit systems with an added offset of 4 to accommodate for big
|
||||
endianness.
|
||||
|
||||
The following is a list of mapping the Linux kernel performs when running as
|
||||
@ -210,7 +210,7 @@ available on all targets.
|
||||
2) PAPR hypercalls
|
||||
|
||||
PAPR hypercalls are needed to run server PowerPC PAPR guests (-M pseries in QEMU).
|
||||
These are the same hypercalls that pHyp, the POWER hypervisor implements. Some of
|
||||
These are the same hypercalls that pHyp, the POWER hypervisor, implements. Some of
|
||||
them are handled in the kernel, some are handled in user space. This is only
|
||||
available on book3s_64.
|
||||
|
||||
|
@ -101,7 +101,7 @@ also be used, e.g. ::
|
||||
|
||||
However, VCPU request users should refrain from doing so, as it would
|
||||
break the abstraction. The first 8 bits are reserved for architecture
|
||||
independent requests, all additional bits are available for architecture
|
||||
independent requests; all additional bits are available for architecture
|
||||
dependent requests.
|
||||
|
||||
Architecture Independent Requests
|
||||
@ -151,8 +151,8 @@ KVM_REQUEST_NO_WAKEUP
|
||||
|
||||
This flag is applied to requests that only need immediate attention
|
||||
from VCPUs running in guest mode. That is, sleeping VCPUs do not need
|
||||
to be awaken for these requests. Sleeping VCPUs will handle the
|
||||
requests when they are awaken later for some other reason.
|
||||
to be awakened for these requests. Sleeping VCPUs will handle the
|
||||
requests when they are awakened later for some other reason.
|
||||
|
||||
KVM_REQUEST_WAIT
|
||||
|
||||
|
@ -6239,6 +6239,12 @@ X: Documentation/power/
|
||||
X: Documentation/spi/
|
||||
X: Documentation/userspace-api/media/
|
||||
|
||||
DOCUMENTATION PROCESS
|
||||
M: Jonathan Corbet <corbet@lwn.net>
|
||||
S: Maintained
|
||||
F: Documentation/process/
|
||||
L: workflows@vger.kernel.org
|
||||
|
||||
DOCUMENTATION REPORTING ISSUES
|
||||
M: Thorsten Leemhuis <linux@leemhuis.info>
|
||||
L: linux-doc@vger.kernel.org
|
||||
|
@ -19,23 +19,54 @@
|
||||
|
||||
#ifndef __ASSEMBLY__
|
||||
|
||||
/**
|
||||
* IS_ERR_VALUE - Detect an error pointer.
|
||||
* @x: The pointer to check.
|
||||
*
|
||||
* Like IS_ERR(), but does not generate a compiler warning if result is unused.
|
||||
*/
|
||||
#define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned long)-MAX_ERRNO)
|
||||
|
||||
/**
|
||||
* ERR_PTR - Create an error pointer.
|
||||
* @error: A negative error code.
|
||||
*
|
||||
* Encodes @error into a pointer value. Users should consider the result
|
||||
* opaque and not assume anything about how the error is encoded.
|
||||
*
|
||||
* Return: A pointer with @error encoded within its value.
|
||||
*/
|
||||
static inline void * __must_check ERR_PTR(long error)
|
||||
{
|
||||
return (void *) error;
|
||||
}
|
||||
|
||||
/**
|
||||
* PTR_ERR - Extract the error code from an error pointer.
|
||||
* @ptr: An error pointer.
|
||||
* Return: The error code within @ptr.
|
||||
*/
|
||||
static inline long __must_check PTR_ERR(__force const void *ptr)
|
||||
{
|
||||
return (long) ptr;
|
||||
}
|
||||
|
||||
/**
|
||||
* IS_ERR - Detect an error pointer.
|
||||
* @ptr: The pointer to check.
|
||||
* Return: true if @ptr is an error pointer, false otherwise.
|
||||
*/
|
||||
static inline bool __must_check IS_ERR(__force const void *ptr)
|
||||
{
|
||||
return IS_ERR_VALUE((unsigned long)ptr);
|
||||
}
|
||||
|
||||
/**
|
||||
* IS_ERR_OR_NULL - Detect an error pointer or a null pointer.
|
||||
* @ptr: The pointer to check.
|
||||
*
|
||||
* Like IS_ERR(), but also returns true for a null pointer.
|
||||
*/
|
||||
static inline bool __must_check IS_ERR_OR_NULL(__force const void *ptr)
|
||||
{
|
||||
return unlikely(!ptr) || IS_ERR_VALUE((unsigned long)ptr);
|
||||
@ -54,6 +85,23 @@ static inline void * __must_check ERR_CAST(__force const void *ptr)
|
||||
return (void *) ptr;
|
||||
}
|
||||
|
||||
/**
|
||||
* PTR_ERR_OR_ZERO - Extract the error code from a pointer if it has one.
|
||||
* @ptr: A potential error pointer.
|
||||
*
|
||||
* Convenience function that can be used inside a function that returns
|
||||
* an error code to propagate errors received as error pointers.
|
||||
* For example, ``return PTR_ERR_OR_ZERO(ptr);`` replaces:
|
||||
*
|
||||
* .. code-block:: c
|
||||
*
|
||||
* if (IS_ERR(ptr))
|
||||
* return PTR_ERR(ptr);
|
||||
* else
|
||||
* return 0;
|
||||
*
|
||||
* Return: The error code within @ptr if it is an error pointer; 0 otherwise.
|
||||
*/
|
||||
static inline int __must_check PTR_ERR_OR_ZERO(__force const void *ptr)
|
||||
{
|
||||
if (IS_ERR(ptr))
|
||||
|
Loading…
x
Reference in New Issue
Block a user