A fair amount of stuff this time around, dominated by yet another massive
set from Mauro toward the completion of the RST conversion. I *really* hope we are getting close to the end of this. Meanwhile, those patches reach pretty far afield to update document references around the tree; there should be no actual code changes there. There will be, alas, more of the usual trivial merge conflicts. Beyond that we have more translations, improvements to the sphinx scripting, a number of additions to the sysctl documentation, and lots of fixes. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl7VId8PHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5Yq/gH/iaDgirQZV6UZ2v9sfwQNYolNpf2sKAuOZjd bPFB7WJoMQbKwQEvYrAUL2+5zPOcLYuIfzyOfo1BV1py+EyKbACcKjI4AedxfJF7 +NchmOBhlEqmEhzx2U08HRc4/8J223WG17fJRVsV3p+opJySexSFeQucfOciX5NR RUCxweWWyg/FgyqjkyMMTtsePqZPmcT5dWTlVXISlbWzcv5NFhuJXnSrw8Sfzcmm SJMzqItv3O+CabnKQ8kMLV2PozXTMfjeWH47ZUK0Y8/8PP9+cvqwFzZ0UDQJ1Xaz oyW/TqmunaXhfMsMFeFGSwtfgwRHvXdxkQdtwNHvo1dV4dzTvDw= =fDC/ -----END PGP SIGNATURE----- Merge tag 'docs-5.8' of git://git.lwn.net/linux Pull documentation updates from Jonathan Corbet: "A fair amount of stuff this time around, dominated by yet another massive set from Mauro toward the completion of the RST conversion. I *really* hope we are getting close to the end of this. Meanwhile, those patches reach pretty far afield to update document references around the tree; there should be no actual code changes there. There will be, alas, more of the usual trivial merge conflicts. Beyond that we have more translations, improvements to the sphinx scripting, a number of additions to the sysctl documentation, and lots of fixes" * tag 'docs-5.8' of git://git.lwn.net/linux: (130 commits) Documentation: fixes to the maintainer-entry-profile template zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rst tracing: Fix events.rst section numbering docs: acpi: fix old http link and improve document format docs: filesystems: add info about efivars content Documentation: LSM: Correct the basic LSM description mailmap: change email for Ricardo Ribalda docs: sysctl/kernel: document unaligned controls Documentation: admin-guide: update bug-hunting.rst docs: sysctl/kernel: document ngroups_max nvdimm: fixes to maintainter-entry-profile Documentation/features: Correct RISC-V kprobes support entry Documentation/features: Refresh the arch support status files Revert "docs: sysctl/kernel: document ngroups_max" docs: move locking-specific documents to locking/ docs: move digsig docs to the security book docs: move the kref doc into the core-api book docs: add IRQ documentation at the core-api book docs: debugging-via-ohci1394.txt: add it to the core-api book docs: fix references for ipmi.rst file ...
This commit is contained in:
commit
b23c4771ff
5
.mailmap
5
.mailmap
@ -152,6 +152,7 @@ Krzysztof Kozlowski <krzk@kernel.org> <k.kozlowski.k@gmail.com>
|
|||||||
Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
|
Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
|
||||||
Leon Romanovsky <leon@kernel.org> <leon@leon.nu>
|
Leon Romanovsky <leon@kernel.org> <leon@leon.nu>
|
||||||
Leon Romanovsky <leon@kernel.org> <leonro@mellanox.com>
|
Leon Romanovsky <leon@kernel.org> <leonro@mellanox.com>
|
||||||
|
Leonardo Bras <leobras.c@gmail.com> <leonardo@linux.ibm.com>
|
||||||
Leonid I Ananiev <leonid.i.ananiev@intel.com>
|
Leonid I Ananiev <leonid.i.ananiev@intel.com>
|
||||||
Linas Vepstas <linas@austin.ibm.com>
|
Linas Vepstas <linas@austin.ibm.com>
|
||||||
Linus Lüssing <linus.luessing@c0d3.blue> <linus.luessing@web.de>
|
Linus Lüssing <linus.luessing@c0d3.blue> <linus.luessing@web.de>
|
||||||
@ -234,7 +235,9 @@ Ralf Baechle <ralf@linux-mips.org>
|
|||||||
Ralf Wildenhues <Ralf.Wildenhues@gmx.de>
|
Ralf Wildenhues <Ralf.Wildenhues@gmx.de>
|
||||||
Randy Dunlap <rdunlap@infradead.org> <rdunlap@xenotime.net>
|
Randy Dunlap <rdunlap@infradead.org> <rdunlap@xenotime.net>
|
||||||
Rémi Denis-Courmont <rdenis@simphalempin.com>
|
Rémi Denis-Courmont <rdenis@simphalempin.com>
|
||||||
Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
|
Ricardo Ribalda <ribalda@kernel.org> <ricardo.ribalda@gmail.com>
|
||||||
|
Ricardo Ribalda <ribalda@kernel.org> <ricardo@ribalda.com>
|
||||||
|
Ricardo Ribalda <ribalda@kernel.org> Ricardo Ribalda Delgado <ribalda@kernel.org>
|
||||||
Ross Zwisler <zwisler@kernel.org> <ross.zwisler@linux.intel.com>
|
Ross Zwisler <zwisler@kernel.org> <ross.zwisler@linux.intel.com>
|
||||||
Rudolf Marek <R.Marek@sh.cvut.cz>
|
Rudolf Marek <R.Marek@sh.cvut.cz>
|
||||||
Rui Saraiva <rmps@joel.ist.utl.pt>
|
Rui Saraiva <rmps@joel.ist.utl.pt>
|
||||||
|
6
CREDITS
6
CREDITS
@ -3104,14 +3104,16 @@ W: http://www.qsl.net/dl1bke/
|
|||||||
D: Generic Z8530 driver, AX.25 DAMA slave implementation
|
D: Generic Z8530 driver, AX.25 DAMA slave implementation
|
||||||
D: Several AX.25 hacks
|
D: Several AX.25 hacks
|
||||||
|
|
||||||
N: Ricardo Ribalda Delgado
|
N: Ricardo Ribalda
|
||||||
E: ricardo.ribalda@gmail.com
|
E: ribalda@kernel.org
|
||||||
W: http://ribalda.com
|
W: http://ribalda.com
|
||||||
D: PLX USB338x driver
|
D: PLX USB338x driver
|
||||||
D: PCA9634 driver
|
D: PCA9634 driver
|
||||||
D: Option GTM671WFS
|
D: Option GTM671WFS
|
||||||
D: Fintek F81216A
|
D: Fintek F81216A
|
||||||
D: AD5761 iio driver
|
D: AD5761 iio driver
|
||||||
|
D: TI DAC7612 driver
|
||||||
|
D: Sony IMX214 driver
|
||||||
D: Various kernel hacks
|
D: Various kernel hacks
|
||||||
S: Qtechnology A/S
|
S: Qtechnology A/S
|
||||||
S: Valby Langgade 142
|
S: Valby Langgade 142
|
||||||
|
@ -54,7 +54,7 @@ Date: October 2002
|
|||||||
Contact: Linux Memory Management list <linux-mm@kvack.org>
|
Contact: Linux Memory Management list <linux-mm@kvack.org>
|
||||||
Description:
|
Description:
|
||||||
Provides information about the node's distribution and memory
|
Provides information about the node's distribution and memory
|
||||||
utilization. Similar to /proc/meminfo, see Documentation/filesystems/proc.txt
|
utilization. Similar to /proc/meminfo, see Documentation/filesystems/proc.rst
|
||||||
|
|
||||||
What: /sys/devices/system/node/nodeX/numastat
|
What: /sys/devices/system/node/nodeX/numastat
|
||||||
Date: October 2002
|
Date: October 2002
|
||||||
|
@ -11,7 +11,7 @@ Description:
|
|||||||
Additionally, the fields Pss_Anon, Pss_File and Pss_Shmem
|
Additionally, the fields Pss_Anon, Pss_File and Pss_Shmem
|
||||||
are not present in /proc/pid/smaps. These fields represent
|
are not present in /proc/pid/smaps. These fields represent
|
||||||
the sum of the Pss field of each type (anon, file, shmem).
|
the sum of the Pss field of each type (anon, file, shmem).
|
||||||
For more details, see Documentation/filesystems/proc.txt
|
For more details, see Documentation/filesystems/proc.rst
|
||||||
and the procfs man page.
|
and the procfs man page.
|
||||||
|
|
||||||
Typical output looks like this:
|
Typical output looks like this:
|
||||||
|
@ -98,7 +98,11 @@ else # HAVE_PDFLATEX
|
|||||||
|
|
||||||
pdfdocs: latexdocs
|
pdfdocs: latexdocs
|
||||||
@$(srctree)/scripts/sphinx-pre-install --version-check
|
@$(srctree)/scripts/sphinx-pre-install --version-check
|
||||||
$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX="$(PDFLATEX)" LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit;)
|
$(foreach var,$(SPHINXDIRS), \
|
||||||
|
$(MAKE) PDFLATEX="$(PDFLATEX)" LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit; \
|
||||||
|
mkdir -p $(BUILDDIR)/$(var)/pdf; \
|
||||||
|
mv $(subst .tex,.pdf,$(wildcard $(BUILDDIR)/$(var)/latex/*.tex)) $(BUILDDIR)/$(var)/pdf/; \
|
||||||
|
)
|
||||||
|
|
||||||
endif # HAVE_PDFLATEX
|
endif # HAVE_PDFLATEX
|
||||||
|
|
||||||
|
@ -32,12 +32,13 @@ interrupt goes unhandled over time, they are tracked by the Linux kernel as
|
|||||||
Spurious Interrupts. The IRQ will be disabled by the Linux kernel after it
|
Spurious Interrupts. The IRQ will be disabled by the Linux kernel after it
|
||||||
reaches a specific count with the error "nobody cared". This disabled IRQ
|
reaches a specific count with the error "nobody cared". This disabled IRQ
|
||||||
now prevents valid usage by an existing interrupt which may happen to share
|
now prevents valid usage by an existing interrupt which may happen to share
|
||||||
the IRQ line.
|
the IRQ line::
|
||||||
|
|
||||||
irq 19: nobody cared (try booting with the "irqpoll" option)
|
irq 19: nobody cared (try booting with the "irqpoll" option)
|
||||||
CPU: 0 PID: 2988 Comm: irq/34-nipalk Tainted: 4.14.87-rt49-02410-g4a640ec-dirty #1
|
CPU: 0 PID: 2988 Comm: irq/34-nipalk Tainted: 4.14.87-rt49-02410-g4a640ec-dirty #1
|
||||||
Hardware name: National Instruments NI PXIe-8880/NI PXIe-8880, BIOS 2.1.5f1 01/09/2020
|
Hardware name: National Instruments NI PXIe-8880/NI PXIe-8880, BIOS 2.1.5f1 01/09/2020
|
||||||
Call Trace:
|
Call Trace:
|
||||||
|
|
||||||
<IRQ>
|
<IRQ>
|
||||||
? dump_stack+0x46/0x5e
|
? dump_stack+0x46/0x5e
|
||||||
? __report_bad_irq+0x2e/0xb0
|
? __report_bad_irq+0x2e/0xb0
|
||||||
@ -85,15 +86,18 @@ Mitigations
|
|||||||
The mitigations take the form of PCI quirks. The preference has been to
|
The mitigations take the form of PCI quirks. The preference has been to
|
||||||
first identify and make use of a means to disable the routing to the PCH.
|
first identify and make use of a means to disable the routing to the PCH.
|
||||||
In such a case a quirk to disable boot interrupt generation can be
|
In such a case a quirk to disable boot interrupt generation can be
|
||||||
added.[1]
|
added. [1]_
|
||||||
|
|
||||||
Intel® 6300ESB I/O Controller Hub
|
Intel® 6300ESB I/O Controller Hub
|
||||||
Alternate Base Address Register:
|
Alternate Base Address Register:
|
||||||
BIE: Boot Interrupt Enable
|
BIE: Boot Interrupt Enable
|
||||||
0 = Boot interrupt is enabled.
|
|
||||||
1 = Boot interrupt is disabled.
|
|
||||||
|
|
||||||
Intel® Sandy Bridge through Sky Lake based Xeon servers:
|
== ===========================
|
||||||
|
0 Boot interrupt is enabled.
|
||||||
|
1 Boot interrupt is disabled.
|
||||||
|
== ===========================
|
||||||
|
|
||||||
|
Intel® Sandy Bridge through Sky Lake based Xeon servers:
|
||||||
Coherent Interface Protocol Interrupt Control
|
Coherent Interface Protocol Interrupt Control
|
||||||
dis_intx_route2pch/dis_intx_route2ich/dis_intx_route2dmi2:
|
dis_intx_route2pch/dis_intx_route2ich/dis_intx_route2dmi2:
|
||||||
When this bit is set. Local INTx messages received from the
|
When this bit is set. Local INTx messages received from the
|
||||||
@ -109,12 +113,12 @@ line by default. Therefore, on chipsets where this INTx routing cannot be
|
|||||||
disabled, the Linux kernel will reroute the valid interrupt to its legacy
|
disabled, the Linux kernel will reroute the valid interrupt to its legacy
|
||||||
interrupt. This redirection of the handler will prevent the occurrence of
|
interrupt. This redirection of the handler will prevent the occurrence of
|
||||||
the spurious interrupt detection which would ordinarily disable the IRQ
|
the spurious interrupt detection which would ordinarily disable the IRQ
|
||||||
line due to excessive unhandled counts.[2]
|
line due to excessive unhandled counts. [2]_
|
||||||
|
|
||||||
The config option X86_REROUTE_FOR_BROKEN_BOOT_IRQS exists to enable (or
|
The config option X86_REROUTE_FOR_BROKEN_BOOT_IRQS exists to enable (or
|
||||||
disable) the redirection of the interrupt handler to the PCH interrupt
|
disable) the redirection of the interrupt handler to the PCH interrupt
|
||||||
line. The option can be overridden by either pci=ioapicreroute or
|
line. The option can be overridden by either pci=ioapicreroute or
|
||||||
pci=noioapicreroute.[3]
|
pci=noioapicreroute. [3]_
|
||||||
|
|
||||||
|
|
||||||
More Documentation
|
More Documentation
|
||||||
@ -127,19 +131,19 @@ into the evolution of its handling with chipsets.
|
|||||||
Example of disabling of the boot interrupt
|
Example of disabling of the boot interrupt
|
||||||
------------------------------------------
|
------------------------------------------
|
||||||
|
|
||||||
Intel® 6300ESB I/O Controller Hub (Document # 300641-004US)
|
- Intel® 6300ESB I/O Controller Hub (Document # 300641-004US)
|
||||||
5.7.3 Boot Interrupt
|
5.7.3 Boot Interrupt
|
||||||
https://www.intel.com/content/dam/doc/datasheet/6300esb-io-controller-hub-datasheet.pdf
|
https://www.intel.com/content/dam/doc/datasheet/6300esb-io-controller-hub-datasheet.pdf
|
||||||
|
|
||||||
Intel® Xeon® Processor E5-1600/2400/2600/4600 v3 Product Families
|
- Intel® Xeon® Processor E5-1600/2400/2600/4600 v3 Product Families
|
||||||
Datasheet - Volume 2: Registers (Document # 330784-003)
|
Datasheet - Volume 2: Registers (Document # 330784-003)
|
||||||
6.6.41 cipintrc Coherent Interface Protocol Interrupt Control
|
6.6.41 cipintrc Coherent Interface Protocol Interrupt Control
|
||||||
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf
|
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf
|
||||||
|
|
||||||
Example of handler rerouting
|
Example of handler rerouting
|
||||||
----------------------------
|
----------------------------
|
||||||
|
|
||||||
Intel® 6700PXH 64-bit PCI Hub (Document # 302628)
|
- Intel® 6700PXH 64-bit PCI Hub (Document # 302628)
|
||||||
2.15.2 PCI Express Legacy INTx Support and Boot Interrupt
|
2.15.2 PCI Express Legacy INTx Support and Boot Interrupt
|
||||||
https://www.intel.com/content/dam/doc/datasheet/6700pxh-64-bit-pci-hub-datasheet.pdf
|
https://www.intel.com/content/dam/doc/datasheet/6700pxh-64-bit-pci-hub-datasheet.pdf
|
||||||
|
|
||||||
@ -150,6 +154,6 @@ Cheers,
|
|||||||
Sean V Kelley
|
Sean V Kelley
|
||||||
sean.v.kelley@linux.intel.com
|
sean.v.kelley@linux.intel.com
|
||||||
|
|
||||||
[1] https://lore.kernel.org/r/12131949181903-git-send-email-sassmann@suse.de/
|
.. [1] https://lore.kernel.org/r/12131949181903-git-send-email-sassmann@suse.de/
|
||||||
[2] https://lore.kernel.org/r/12131949182094-git-send-email-sassmann@suse.de/
|
.. [2] https://lore.kernel.org/r/12131949182094-git-send-email-sassmann@suse.de/
|
||||||
[3] https://lore.kernel.org/r/487C8EA7.6020205@suse.de/
|
.. [3] https://lore.kernel.org/r/487C8EA7.6020205@suse.de/
|
||||||
|
@ -63,7 +63,7 @@ which can then be compiled to AML binary format::
|
|||||||
ASL Input: minnomax.asl - 30 lines, 614 bytes, 7 keywords
|
ASL Input: minnomax.asl - 30 lines, 614 bytes, 7 keywords
|
||||||
AML Output: minnowmax.aml - 165 bytes, 6 named objects, 1 executable opcodes
|
AML Output: minnowmax.aml - 165 bytes, 6 named objects, 1 executable opcodes
|
||||||
|
|
||||||
[1] http://wiki.minnowboard.org/MinnowBoard_MAX#Low_Speed_Expansion_Connector_.28Top.29
|
[1] https://www.elinux.org/Minnowboard:MinnowMax#Low_Speed_Expansion_.28Top.29
|
||||||
|
|
||||||
The resulting AML code can then be loaded by the kernel using one of the methods
|
The resulting AML code can then be loaded by the kernel using one of the methods
|
||||||
below.
|
below.
|
||||||
|
@ -49,15 +49,19 @@ the issue, it may also contain the word **Oops**, as on this one::
|
|||||||
|
|
||||||
Despite being an **Oops** or some other sort of stack trace, the offended
|
Despite being an **Oops** or some other sort of stack trace, the offended
|
||||||
line is usually required to identify and handle the bug. Along this chapter,
|
line is usually required to identify and handle the bug. Along this chapter,
|
||||||
we'll refer to "Oops" for all kinds of stack traces that need to be analized.
|
we'll refer to "Oops" for all kinds of stack traces that need to be analyzed.
|
||||||
|
|
||||||
.. note::
|
If the kernel is compiled with ``CONFIG_DEBUG_INFO``, you can enhance the
|
||||||
|
quality of the stack trace by using file:`scripts/decode_stacktrace.sh`.
|
||||||
|
|
||||||
|
Modules linked in
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
Modules that are tainted or are being loaded or unloaded are marked with
|
||||||
|
"(...)", where the taint flags are described in
|
||||||
|
file:`Documentation/admin-guide/tainted-kernels.rst`, "being loaded" is
|
||||||
|
annotated with "+", and "being unloaded" is annotated with "-".
|
||||||
|
|
||||||
``ksymoops`` is useless on 2.6 or upper. Please use the Oops in its original
|
|
||||||
format (from ``dmesg``, etc). Ignore any references in this or other docs to
|
|
||||||
"decoding the Oops" or "running it through ksymoops".
|
|
||||||
If you post an Oops from 2.6+ that has been run through ``ksymoops``,
|
|
||||||
people will just tell you to repost it.
|
|
||||||
|
|
||||||
Where is the Oops message is located?
|
Where is the Oops message is located?
|
||||||
-------------------------------------
|
-------------------------------------
|
||||||
@ -71,7 +75,7 @@ by running ``journalctl`` command.
|
|||||||
Sometimes ``klogd`` dies, in which case you can run ``dmesg > file`` to
|
Sometimes ``klogd`` dies, in which case you can run ``dmesg > file`` to
|
||||||
read the data from the kernel buffers and save it. Or you can
|
read the data from the kernel buffers and save it. Or you can
|
||||||
``cat /proc/kmsg > file``, however you have to break in to stop the transfer,
|
``cat /proc/kmsg > file``, however you have to break in to stop the transfer,
|
||||||
``kmsg`` is a "never ending file".
|
since ``kmsg`` is a "never ending file".
|
||||||
|
|
||||||
If the machine has crashed so badly that you cannot enter commands or
|
If the machine has crashed so badly that you cannot enter commands or
|
||||||
the disk is not available then you have three options:
|
the disk is not available then you have three options:
|
||||||
@ -81,9 +85,9 @@ the disk is not available then you have three options:
|
|||||||
planned for a crash. Alternatively, you can take a picture of
|
planned for a crash. Alternatively, you can take a picture of
|
||||||
the screen with a digital camera - not nice, but better than
|
the screen with a digital camera - not nice, but better than
|
||||||
nothing. If the messages scroll off the top of the console, you
|
nothing. If the messages scroll off the top of the console, you
|
||||||
may find that booting with a higher resolution (eg, ``vga=791``)
|
may find that booting with a higher resolution (e.g., ``vga=791``)
|
||||||
will allow you to read more of the text. (Caveat: This needs ``vesafb``,
|
will allow you to read more of the text. (Caveat: This needs ``vesafb``,
|
||||||
so won't help for 'early' oopses)
|
so won't help for 'early' oopses.)
|
||||||
|
|
||||||
(2) Boot with a serial console (see
|
(2) Boot with a serial console (see
|
||||||
:ref:`Documentation/admin-guide/serial-console.rst <serial_console>`),
|
:ref:`Documentation/admin-guide/serial-console.rst <serial_console>`),
|
||||||
@ -104,7 +108,7 @@ Kernel source file. There are two methods for doing that. Usually, using
|
|||||||
gdb
|
gdb
|
||||||
^^^
|
^^^
|
||||||
|
|
||||||
The GNU debug (``gdb``) is the best way to figure out the exact file and line
|
The GNU debugger (``gdb``) is the best way to figure out the exact file and line
|
||||||
number of the OOPS from the ``vmlinux`` file.
|
number of the OOPS from the ``vmlinux`` file.
|
||||||
|
|
||||||
The usage of gdb works best on a kernel compiled with ``CONFIG_DEBUG_INFO``.
|
The usage of gdb works best on a kernel compiled with ``CONFIG_DEBUG_INFO``.
|
||||||
@ -165,7 +169,7 @@ If you have a call trace, such as::
|
|||||||
[<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee
|
[<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee
|
||||||
...
|
...
|
||||||
|
|
||||||
this shows the problem likely in the :jbd: module. You can load that module
|
this shows the problem likely is in the :jbd: module. You can load that module
|
||||||
in gdb and list the relevant code::
|
in gdb and list the relevant code::
|
||||||
|
|
||||||
$ gdb fs/jbd/jbd.ko
|
$ gdb fs/jbd/jbd.ko
|
||||||
@ -199,8 +203,9 @@ in the kernel hacking menu of the menu configuration.) For example::
|
|||||||
You need to be at the top level of the kernel tree for this to pick up
|
You need to be at the top level of the kernel tree for this to pick up
|
||||||
your C files.
|
your C files.
|
||||||
|
|
||||||
If you don't have access to the code you can also debug on some crash dumps
|
If you don't have access to the source code you can still debug some crash
|
||||||
e.g. crash dump output as shown by Dave Miller::
|
dumps using the following method (example crash dump output as shown by
|
||||||
|
Dave Miller)::
|
||||||
|
|
||||||
EIP is at +0x14/0x4c0
|
EIP is at +0x14/0x4c0
|
||||||
...
|
...
|
||||||
@ -230,6 +235,9 @@ e.g. crash dump output as shown by Dave Miller::
|
|||||||
mov 0x8(%ebp), %ebx ! %ebx = skb->sk
|
mov 0x8(%ebp), %ebx ! %ebx = skb->sk
|
||||||
mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt
|
mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt
|
||||||
|
|
||||||
|
file:`scripts/decodecode` can be used to automate most of this, depending
|
||||||
|
on what CPU architecture is being debugged.
|
||||||
|
|
||||||
Reporting the bug
|
Reporting the bug
|
||||||
-----------------
|
-----------------
|
||||||
|
|
||||||
@ -241,7 +249,7 @@ used for the development of the affected code. This can be done by using
|
|||||||
the ``get_maintainer.pl`` script.
|
the ``get_maintainer.pl`` script.
|
||||||
|
|
||||||
For example, if you find a bug at the gspca's sonixj.c file, you can get
|
For example, if you find a bug at the gspca's sonixj.c file, you can get
|
||||||
their maintainers with::
|
its maintainers with::
|
||||||
|
|
||||||
$ ./scripts/get_maintainer.pl -f drivers/media/usb/gspca/sonixj.c
|
$ ./scripts/get_maintainer.pl -f drivers/media/usb/gspca/sonixj.c
|
||||||
Hans Verkuil <hverkuil@xs4all.nl> (odd fixer:GSPCA USB WEBCAM DRIVER,commit_signer:1/1=100%)
|
Hans Verkuil <hverkuil@xs4all.nl> (odd fixer:GSPCA USB WEBCAM DRIVER,commit_signer:1/1=100%)
|
||||||
@ -253,16 +261,17 @@ their maintainers with::
|
|||||||
|
|
||||||
Please notice that it will point to:
|
Please notice that it will point to:
|
||||||
|
|
||||||
- The last developers that touched on the source code. On the above example,
|
- The last developers that touched the source code (if this is done inside
|
||||||
Tejun and Bhaktipriya (in this specific case, none really envolved on the
|
a git tree). On the above example, Tejun and Bhaktipriya (in this
|
||||||
development of this file);
|
specific case, none really envolved on the development of this file);
|
||||||
- The driver maintainer (Hans Verkuil);
|
- The driver maintainer (Hans Verkuil);
|
||||||
- The subsystem maintainer (Mauro Carvalho Chehab);
|
- The subsystem maintainer (Mauro Carvalho Chehab);
|
||||||
- The driver and/or subsystem mailing list (linux-media@vger.kernel.org);
|
- The driver and/or subsystem mailing list (linux-media@vger.kernel.org);
|
||||||
- the Linux Kernel mailing list (linux-kernel@vger.kernel.org).
|
- the Linux Kernel mailing list (linux-kernel@vger.kernel.org).
|
||||||
|
|
||||||
Usually, the fastest way to have your bug fixed is to report it to mailing
|
Usually, the fastest way to have your bug fixed is to report it to mailing
|
||||||
list used for the development of the code (linux-media ML) copying the driver maintainer (Hans).
|
list used for the development of the code (linux-media ML) copying the
|
||||||
|
driver maintainer (Hans).
|
||||||
|
|
||||||
If you are totally stumped as to whom to send the report, and
|
If you are totally stumped as to whom to send the report, and
|
||||||
``get_maintainer.pl`` didn't provide you anything useful, send it to
|
``get_maintainer.pl`` didn't provide you anything useful, send it to
|
||||||
@ -303,9 +312,9 @@ protection fault message can be simply cut out of the message files
|
|||||||
and forwarded to the kernel developers.
|
and forwarded to the kernel developers.
|
||||||
|
|
||||||
Two types of address resolution are performed by ``klogd``. The first is
|
Two types of address resolution are performed by ``klogd``. The first is
|
||||||
static translation and the second is dynamic translation. Static
|
static translation and the second is dynamic translation.
|
||||||
translation uses the System.map file in much the same manner that
|
Static translation uses the System.map file.
|
||||||
ksymoops does. In order to do static translation the ``klogd`` daemon
|
In order to do static translation the ``klogd`` daemon
|
||||||
must be able to find a system map file at daemon initialization time.
|
must be able to find a system map file at daemon initialization time.
|
||||||
See the klogd man page for information on how ``klogd`` searches for map
|
See the klogd man page for information on how ``klogd`` searches for map
|
||||||
files.
|
files.
|
||||||
|
@ -105,7 +105,7 @@ References
|
|||||||
----------
|
----------
|
||||||
|
|
||||||
- http://lkml.org/lkml/2007/2/12/6
|
- http://lkml.org/lkml/2007/2/12/6
|
||||||
- Documentation/filesystems/proc.txt (1.8)
|
- Documentation/filesystems/proc.rst (1.8)
|
||||||
|
|
||||||
|
|
||||||
Thanks
|
Thanks
|
||||||
|
@ -268,7 +268,7 @@ Guest mitigation mechanisms
|
|||||||
/proc/irq/$NR/smp_affinity[_list] files. Limited documentation is
|
/proc/irq/$NR/smp_affinity[_list] files. Limited documentation is
|
||||||
available at:
|
available at:
|
||||||
|
|
||||||
https://www.kernel.org/doc/Documentation/IRQ-affinity.txt
|
https://www.kernel.org/doc/Documentation/core-api/irq/irq-affinity.rst
|
||||||
|
|
||||||
.. _smt_control:
|
.. _smt_control:
|
||||||
|
|
||||||
|
@ -1,52 +1,48 @@
|
|||||||
Explaining the dreaded "No init found." boot hang message
|
Explaining the "No working init found." boot hang message
|
||||||
=========================================================
|
=========================================================
|
||||||
|
:Authors: Andreas Mohr <andi at lisas period de>
|
||||||
|
Cristian Souza <cristianmsbr at gmail period com>
|
||||||
|
|
||||||
OK, so you've got this pretty unintuitive message (currently located
|
This document provides some high-level reasons for failure
|
||||||
in init/main.c) and are wondering what the H*** went wrong.
|
(listed roughly in order of execution) to load the init binary.
|
||||||
Some high-level reasons for failure (listed roughly in order of execution)
|
|
||||||
to load the init binary are:
|
|
||||||
|
|
||||||
A) Unable to mount root FS
|
1) **Unable to mount root FS**: Set "debug" kernel parameter (in bootloader
|
||||||
B) init binary doesn't exist on rootfs
|
config file or CONFIG_CMDLINE) to get more detailed kernel messages.
|
||||||
C) broken console device
|
|
||||||
D) binary exists but dependencies not available
|
|
||||||
E) binary cannot be loaded
|
|
||||||
|
|
||||||
Detailed explanations:
|
2) **init binary doesn't exist on rootfs**: Make sure you have the correct
|
||||||
|
root FS type (and ``root=`` kernel parameter points to the correct
|
||||||
|
partition), required drivers such as storage hardware (such as SCSI or
|
||||||
|
USB!) and filesystem (ext3, jffs2, etc.) are builtin (alternatively as
|
||||||
|
modules, to be pre-loaded by an initrd).
|
||||||
|
|
||||||
A) Set "debug" kernel parameter (in bootloader config file or CONFIG_CMDLINE)
|
3) **Broken console device**: Possibly a conflict in ``console= setup``
|
||||||
to get more detailed kernel messages.
|
--> initial console unavailable. E.g. some serial consoles are unreliable
|
||||||
B) make sure you have the correct root FS type
|
due to serial IRQ issues (e.g. missing interrupt-based configuration).
|
||||||
(and ``root=`` kernel parameter points to the correct partition),
|
|
||||||
required drivers such as storage hardware (such as SCSI or USB!)
|
|
||||||
and filesystem (ext3, jffs2 etc.) are builtin (alternatively as modules,
|
|
||||||
to be pre-loaded by an initrd)
|
|
||||||
C) Possibly a conflict in ``console= setup`` --> initial console unavailable.
|
|
||||||
E.g. some serial consoles are unreliable due to serial IRQ issues (e.g.
|
|
||||||
missing interrupt-based configuration).
|
|
||||||
Try using a different ``console= device`` or e.g. ``netconsole=``.
|
Try using a different ``console= device`` or e.g. ``netconsole=``.
|
||||||
D) e.g. required library dependencies of the init binary such as
|
|
||||||
``/lib/ld-linux.so.2`` missing or broken. Use
|
4) **Binary exists but dependencies not available**: E.g. required library
|
||||||
``readelf -d <INIT>|grep NEEDED`` to find out which libraries are required.
|
dependencies of the init binary such as ``/lib/ld-linux.so.2`` missing or
|
||||||
E) make sure the binary's architecture matches your hardware.
|
broken. Use ``readelf -d <INIT>|grep NEEDED`` to find out which libraries
|
||||||
E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware.
|
are required.
|
||||||
In case you tried loading a non-binary file here (shell script?),
|
|
||||||
you should make sure that the script specifies an interpreter in its shebang
|
5) **Binary cannot be loaded**: Make sure the binary's architecture matches
|
||||||
header line (``#!/...``) that is fully working (including its library
|
your hardware. E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM
|
||||||
dependencies). And before tackling scripts, better first test a simple
|
hardware. In case you tried loading a non-binary file here (shell script?),
|
||||||
non-script binary such as ``/bin/sh`` and confirm its successful execution.
|
you should make sure that the script specifies an interpreter in its
|
||||||
To find out more, add code ``to init/main.c`` to display kernel_execve()s
|
shebang header line (``#!/...``) that is fully working (including its
|
||||||
return values.
|
library dependencies). And before tackling scripts, better first test a
|
||||||
|
simple non-script binary such as ``/bin/sh`` and confirm its successful
|
||||||
|
execution. To find out more, add code ``to init/main.c`` to display
|
||||||
|
kernel_execve()s return values.
|
||||||
|
|
||||||
Please extend this explanation whenever you find new failure causes
|
Please extend this explanation whenever you find new failure causes
|
||||||
(after all loading the init binary is a CRITICAL and hard transition step
|
(after all loading the init binary is a CRITICAL and hard transition step
|
||||||
which needs to be made as painless as possible), then submit patch to LKML.
|
which needs to be made as painless as possible), then submit a patch to LKML.
|
||||||
Further TODOs:
|
Further TODOs:
|
||||||
|
|
||||||
- Implement the various ``run_init_process()`` invocations via a struct array
|
- Implement the various ``run_init_process()`` invocations via a struct array
|
||||||
which can then store the ``kernel_execve()`` result value and on failure
|
which can then store the ``kernel_execve()`` result value and on failure
|
||||||
log it all by iterating over **all** results (very important usability fix).
|
log it all by iterating over **all** results (very important usability fix).
|
||||||
- try to make the implementation itself more helpful in general,
|
- Try to make the implementation itself more helpful in general, e.g. by
|
||||||
e.g. by providing additional error messages at affected places.
|
providing additional error messages at affected places.
|
||||||
|
|
||||||
Andreas Mohr <andi at lisas period de>
|
|
||||||
|
@ -3336,7 +3336,7 @@
|
|||||||
See Documentation/admin-guide/sysctl/vm.rst for details.
|
See Documentation/admin-guide/sysctl/vm.rst for details.
|
||||||
|
|
||||||
ohci1394_dma=early [HW] enable debugging via the ohci1394 driver.
|
ohci1394_dma=early [HW] enable debugging via the ohci1394 driver.
|
||||||
See Documentation/debugging-via-ohci1394.txt for more
|
See Documentation/core-api/debugging-via-ohci1394.rst for more
|
||||||
info.
|
info.
|
||||||
|
|
||||||
olpc_ec_timeout= [OLPC] ms delay when issuing EC commands
|
olpc_ec_timeout= [OLPC] ms delay when issuing EC commands
|
||||||
|
@ -10,7 +10,7 @@ them to a "housekeeping" CPU dedicated to such work.
|
|||||||
References
|
References
|
||||||
==========
|
==========
|
||||||
|
|
||||||
- Documentation/IRQ-affinity.txt: Binding interrupts to sets of CPUs.
|
- Documentation/core-api/irq/irq-affinity.rst: Binding interrupts to sets of CPUs.
|
||||||
|
|
||||||
- Documentation/admin-guide/cgroup-v1: Using cgroups to bind tasks to sets of CPUs.
|
- Documentation/admin-guide/cgroup-v1: Using cgroups to bind tasks to sets of CPUs.
|
||||||
|
|
||||||
|
@ -12,107 +12,107 @@ and more generally they allow userland to take control of various
|
|||||||
memory page faults, something otherwise only the kernel code could do.
|
memory page faults, something otherwise only the kernel code could do.
|
||||||
|
|
||||||
For example userfaults allows a proper and more optimal implementation
|
For example userfaults allows a proper and more optimal implementation
|
||||||
of the PROT_NONE+SIGSEGV trick.
|
of the ``PROT_NONE+SIGSEGV`` trick.
|
||||||
|
|
||||||
Design
|
Design
|
||||||
======
|
======
|
||||||
|
|
||||||
Userfaults are delivered and resolved through the userfaultfd syscall.
|
Userfaults are delivered and resolved through the ``userfaultfd`` syscall.
|
||||||
|
|
||||||
The userfaultfd (aside from registering and unregistering virtual
|
The ``userfaultfd`` (aside from registering and unregistering virtual
|
||||||
memory ranges) provides two primary functionalities:
|
memory ranges) provides two primary functionalities:
|
||||||
|
|
||||||
1) read/POLLIN protocol to notify a userland thread of the faults
|
1) ``read/POLLIN`` protocol to notify a userland thread of the faults
|
||||||
happening
|
happening
|
||||||
|
|
||||||
2) various UFFDIO_* ioctls that can manage the virtual memory regions
|
2) various ``UFFDIO_*`` ioctls that can manage the virtual memory regions
|
||||||
registered in the userfaultfd that allows userland to efficiently
|
registered in the ``userfaultfd`` that allows userland to efficiently
|
||||||
resolve the userfaults it receives via 1) or to manage the virtual
|
resolve the userfaults it receives via 1) or to manage the virtual
|
||||||
memory in the background
|
memory in the background
|
||||||
|
|
||||||
The real advantage of userfaults if compared to regular virtual memory
|
The real advantage of userfaults if compared to regular virtual memory
|
||||||
management of mremap/mprotect is that the userfaults in all their
|
management of mremap/mprotect is that the userfaults in all their
|
||||||
operations never involve heavyweight structures like vmas (in fact the
|
operations never involve heavyweight structures like vmas (in fact the
|
||||||
userfaultfd runtime load never takes the mmap_sem for writing).
|
``userfaultfd`` runtime load never takes the mmap_sem for writing).
|
||||||
|
|
||||||
Vmas are not suitable for page- (or hugepage) granular fault tracking
|
Vmas are not suitable for page- (or hugepage) granular fault tracking
|
||||||
when dealing with virtual address spaces that could span
|
when dealing with virtual address spaces that could span
|
||||||
Terabytes. Too many vmas would be needed for that.
|
Terabytes. Too many vmas would be needed for that.
|
||||||
|
|
||||||
The userfaultfd once opened by invoking the syscall, can also be
|
The ``userfaultfd`` once opened by invoking the syscall, can also be
|
||||||
passed using unix domain sockets to a manager process, so the same
|
passed using unix domain sockets to a manager process, so the same
|
||||||
manager process could handle the userfaults of a multitude of
|
manager process could handle the userfaults of a multitude of
|
||||||
different processes without them being aware about what is going on
|
different processes without them being aware about what is going on
|
||||||
(well of course unless they later try to use the userfaultfd
|
(well of course unless they later try to use the ``userfaultfd``
|
||||||
themselves on the same region the manager is already tracking, which
|
themselves on the same region the manager is already tracking, which
|
||||||
is a corner case that would currently return -EBUSY).
|
is a corner case that would currently return ``-EBUSY``).
|
||||||
|
|
||||||
API
|
API
|
||||||
===
|
===
|
||||||
|
|
||||||
When first opened the userfaultfd must be enabled invoking the
|
When first opened the ``userfaultfd`` must be enabled invoking the
|
||||||
UFFDIO_API ioctl specifying a uffdio_api.api value set to UFFD_API (or
|
``UFFDIO_API`` ioctl specifying a ``uffdio_api.api`` value set to ``UFFD_API`` (or
|
||||||
a later API version) which will specify the read/POLLIN protocol
|
a later API version) which will specify the ``read/POLLIN`` protocol
|
||||||
userland intends to speak on the UFFD and the uffdio_api.features
|
userland intends to speak on the ``UFFD`` and the ``uffdio_api.features``
|
||||||
userland requires. The UFFDIO_API ioctl if successful (i.e. if the
|
userland requires. The ``UFFDIO_API`` ioctl if successful (i.e. if the
|
||||||
requested uffdio_api.api is spoken also by the running kernel and the
|
requested ``uffdio_api.api`` is spoken also by the running kernel and the
|
||||||
requested features are going to be enabled) will return into
|
requested features are going to be enabled) will return into
|
||||||
uffdio_api.features and uffdio_api.ioctls two 64bit bitmasks of
|
``uffdio_api.features`` and ``uffdio_api.ioctls`` two 64bit bitmasks of
|
||||||
respectively all the available features of the read(2) protocol and
|
respectively all the available features of the read(2) protocol and
|
||||||
the generic ioctl available.
|
the generic ioctl available.
|
||||||
|
|
||||||
The uffdio_api.features bitmask returned by the UFFDIO_API ioctl
|
The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl
|
||||||
defines what memory types are supported by the userfaultfd and what
|
defines what memory types are supported by the ``userfaultfd`` and what
|
||||||
events, except page fault notifications, may be generated.
|
events, except page fault notifications, may be generated.
|
||||||
|
|
||||||
If the kernel supports registering userfaultfd ranges on hugetlbfs
|
If the kernel supports registering ``userfaultfd`` ranges on hugetlbfs
|
||||||
virtual memory areas, UFFD_FEATURE_MISSING_HUGETLBFS will be set in
|
virtual memory areas, ``UFFD_FEATURE_MISSING_HUGETLBFS`` will be set in
|
||||||
uffdio_api.features. Similarly, UFFD_FEATURE_MISSING_SHMEM will be
|
``uffdio_api.features``. Similarly, ``UFFD_FEATURE_MISSING_SHMEM`` will be
|
||||||
set if the kernel supports registering userfaultfd ranges on shared
|
set if the kernel supports registering ``userfaultfd`` ranges on shared
|
||||||
memory (covering all shmem APIs, i.e. tmpfs, IPCSHM, /dev/zero
|
memory (covering all shmem APIs, i.e. tmpfs, ``IPCSHM``, ``/dev/zero``,
|
||||||
MAP_SHARED, memfd_create, etc).
|
``MAP_SHARED``, ``memfd_create``, etc).
|
||||||
|
|
||||||
The userland application that wants to use userfaultfd with hugetlbfs
|
The userland application that wants to use ``userfaultfd`` with hugetlbfs
|
||||||
or shared memory need to set the corresponding flag in
|
or shared memory need to set the corresponding flag in
|
||||||
uffdio_api.features to enable those features.
|
``uffdio_api.features`` to enable those features.
|
||||||
|
|
||||||
If the userland desires to receive notifications for events other than
|
If the userland desires to receive notifications for events other than
|
||||||
page faults, it has to verify that uffdio_api.features has appropriate
|
page faults, it has to verify that ``uffdio_api.features`` has appropriate
|
||||||
UFFD_FEATURE_EVENT_* bits set. These events are described in more
|
``UFFD_FEATURE_EVENT_*`` bits set. These events are described in more
|
||||||
detail below in "Non-cooperative userfaultfd" section.
|
detail below in `Non-cooperative userfaultfd`_ section.
|
||||||
|
|
||||||
Once the userfaultfd has been enabled the UFFDIO_REGISTER ioctl should
|
Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should
|
||||||
be invoked (if present in the returned uffdio_api.ioctls bitmask) to
|
be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to
|
||||||
register a memory range in the userfaultfd by setting the
|
register a memory range in the ``userfaultfd`` by setting the
|
||||||
uffdio_register structure accordingly. The uffdio_register.mode
|
uffdio_register structure accordingly. The ``uffdio_register.mode``
|
||||||
bitmask will specify to the kernel which kind of faults to track for
|
bitmask will specify to the kernel which kind of faults to track for
|
||||||
the range (UFFDIO_REGISTER_MODE_MISSING would track missing
|
the range (``UFFDIO_REGISTER_MODE_MISSING`` would track missing
|
||||||
pages). The UFFDIO_REGISTER ioctl will return the
|
pages). The ``UFFDIO_REGISTER`` ioctl will return the
|
||||||
uffdio_register.ioctls bitmask of ioctls that are suitable to resolve
|
``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve
|
||||||
userfaults on the range registered. Not all ioctls will necessarily be
|
userfaults on the range registered. Not all ioctls will necessarily be
|
||||||
supported for all memory types depending on the underlying virtual
|
supported for all memory types depending on the underlying virtual
|
||||||
memory backend (anonymous memory vs tmpfs vs real filebacked
|
memory backend (anonymous memory vs tmpfs vs real filebacked
|
||||||
mappings).
|
mappings).
|
||||||
|
|
||||||
Userland can use the uffdio_register.ioctls to manage the virtual
|
Userland can use the ``uffdio_register.ioctls`` to manage the virtual
|
||||||
address space in the background (to add or potentially also remove
|
address space in the background (to add or potentially also remove
|
||||||
memory from the userfaultfd registered range). This means a userfault
|
memory from the ``userfaultfd`` registered range). This means a userfault
|
||||||
could be triggering just before userland maps in the background the
|
could be triggering just before userland maps in the background the
|
||||||
user-faulted page.
|
user-faulted page.
|
||||||
|
|
||||||
The primary ioctl to resolve userfaults is UFFDIO_COPY. That
|
The primary ioctl to resolve userfaults is ``UFFDIO_COPY``. That
|
||||||
atomically copies a page into the userfault registered range and wakes
|
atomically copies a page into the userfault registered range and wakes
|
||||||
up the blocked userfaults (unless uffdio_copy.mode &
|
up the blocked userfaults
|
||||||
UFFDIO_COPY_MODE_DONTWAKE is set). Other ioctl works similarly to
|
(unless ``uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE`` is set).
|
||||||
UFFDIO_COPY. They're atomic as in guaranteeing that nothing can see an
|
Other ioctl works similarly to ``UFFDIO_COPY``. They're atomic as in
|
||||||
half copied page since it'll keep userfaulting until the copy has
|
guaranteeing that nothing can see an half copied page since it'll
|
||||||
finished.
|
keep userfaulting until the copy has finished.
|
||||||
|
|
||||||
Notes:
|
Notes:
|
||||||
|
|
||||||
- If you requested UFFDIO_REGISTER_MODE_MISSING when registering then
|
- If you requested ``UFFDIO_REGISTER_MODE_MISSING`` when registering then
|
||||||
you must provide some kind of page in your thread after reading from
|
you must provide some kind of page in your thread after reading from
|
||||||
the uffd. You must provide either UFFDIO_COPY or UFFDIO_ZEROPAGE.
|
the uffd. You must provide either ``UFFDIO_COPY`` or ``UFFDIO_ZEROPAGE``.
|
||||||
The normal behavior of the OS automatically providing a zero page on
|
The normal behavior of the OS automatically providing a zero page on
|
||||||
an annonymous mmaping is not in place.
|
an annonymous mmaping is not in place.
|
||||||
|
|
||||||
@ -122,13 +122,13 @@ Notes:
|
|||||||
|
|
||||||
- You get the address of the access that triggered the missing page
|
- You get the address of the access that triggered the missing page
|
||||||
event out of a struct uffd_msg that you read in the thread from the
|
event out of a struct uffd_msg that you read in the thread from the
|
||||||
uffd. You can supply as many pages as you want with UFFDIO_COPY or
|
uffd. You can supply as many pages as you want with ``UFFDIO_COPY`` or
|
||||||
UFFDIO_ZEROPAGE. Keep in mind that unless you used DONTWAKE then
|
``UFFDIO_ZEROPAGE``. Keep in mind that unless you used DONTWAKE then
|
||||||
the first of any of those IOCTLs wakes up the faulting thread.
|
the first of any of those IOCTLs wakes up the faulting thread.
|
||||||
|
|
||||||
- Be sure to test for all errors including (pollfd[0].revents &
|
- Be sure to test for all errors including
|
||||||
POLLERR). This can happen, e.g. when ranges supplied were
|
(``pollfd[0].revents & POLLERR``). This can happen, e.g. when ranges
|
||||||
incorrect.
|
supplied were incorrect.
|
||||||
|
|
||||||
Write Protect Notifications
|
Write Protect Notifications
|
||||||
---------------------------
|
---------------------------
|
||||||
@ -136,41 +136,42 @@ Write Protect Notifications
|
|||||||
This is equivalent to (but faster than) using mprotect and a SIGSEGV
|
This is equivalent to (but faster than) using mprotect and a SIGSEGV
|
||||||
signal handler.
|
signal handler.
|
||||||
|
|
||||||
Firstly you need to register a range with UFFDIO_REGISTER_MODE_WP.
|
Firstly you need to register a range with ``UFFDIO_REGISTER_MODE_WP``.
|
||||||
Instead of using mprotect(2) you use ioctl(uffd, UFFDIO_WRITEPROTECT,
|
Instead of using mprotect(2) you use
|
||||||
struct *uffdio_writeprotect) while mode = UFFDIO_WRITEPROTECT_MODE_WP
|
``ioctl(uffd, UFFDIO_WRITEPROTECT, struct *uffdio_writeprotect)``
|
||||||
|
while ``mode = UFFDIO_WRITEPROTECT_MODE_WP``
|
||||||
in the struct passed in. The range does not default to and does not
|
in the struct passed in. The range does not default to and does not
|
||||||
have to be identical to the range you registered with. You can write
|
have to be identical to the range you registered with. You can write
|
||||||
protect as many ranges as you like (inside the registered range).
|
protect as many ranges as you like (inside the registered range).
|
||||||
Then, in the thread reading from uffd the struct will have
|
Then, in the thread reading from uffd the struct will have
|
||||||
msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP set. Now you send
|
``msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP`` set. Now you send
|
||||||
ioctl(uffd, UFFDIO_WRITEPROTECT, struct *uffdio_writeprotect) again
|
``ioctl(uffd, UFFDIO_WRITEPROTECT, struct *uffdio_writeprotect)``
|
||||||
while pagefault.mode does not have UFFDIO_WRITEPROTECT_MODE_WP set.
|
again while ``pagefault.mode`` does not have ``UFFDIO_WRITEPROTECT_MODE_WP``
|
||||||
This wakes up the thread which will continue to run with writes. This
|
set. This wakes up the thread which will continue to run with writes. This
|
||||||
allows you to do the bookkeeping about the write in the uffd reading
|
allows you to do the bookkeeping about the write in the uffd reading
|
||||||
thread before the ioctl.
|
thread before the ioctl.
|
||||||
|
|
||||||
If you registered with both UFFDIO_REGISTER_MODE_MISSING and
|
If you registered with both ``UFFDIO_REGISTER_MODE_MISSING`` and
|
||||||
UFFDIO_REGISTER_MODE_WP then you need to think about the sequence in
|
``UFFDIO_REGISTER_MODE_WP`` then you need to think about the sequence in
|
||||||
which you supply a page and undo write protect. Note that there is a
|
which you supply a page and undo write protect. Note that there is a
|
||||||
difference between writes into a WP area and into a !WP area. The
|
difference between writes into a WP area and into a !WP area. The
|
||||||
former will have UFFD_PAGEFAULT_FLAG_WP set, the latter
|
former will have ``UFFD_PAGEFAULT_FLAG_WP`` set, the latter
|
||||||
UFFD_PAGEFAULT_FLAG_WRITE. The latter did not fail on protection but
|
``UFFD_PAGEFAULT_FLAG_WRITE``. The latter did not fail on protection but
|
||||||
you still need to supply a page when UFFDIO_REGISTER_MODE_MISSING was
|
you still need to supply a page when ``UFFDIO_REGISTER_MODE_MISSING`` was
|
||||||
used.
|
used.
|
||||||
|
|
||||||
QEMU/KVM
|
QEMU/KVM
|
||||||
========
|
========
|
||||||
|
|
||||||
QEMU/KVM is using the userfaultfd syscall to implement postcopy live
|
QEMU/KVM is using the ``userfaultfd`` syscall to implement postcopy live
|
||||||
migration. Postcopy live migration is one form of memory
|
migration. Postcopy live migration is one form of memory
|
||||||
externalization consisting of a virtual machine running with part or
|
externalization consisting of a virtual machine running with part or
|
||||||
all of its memory residing on a different node in the cloud. The
|
all of its memory residing on a different node in the cloud. The
|
||||||
userfaultfd abstraction is generic enough that not a single line of
|
``userfaultfd`` abstraction is generic enough that not a single line of
|
||||||
KVM kernel code had to be modified in order to add postcopy live
|
KVM kernel code had to be modified in order to add postcopy live
|
||||||
migration to QEMU.
|
migration to QEMU.
|
||||||
|
|
||||||
Guest async page faults, FOLL_NOWAIT and all other GUP features work
|
Guest async page faults, ``FOLL_NOWAIT`` and all other ``GUP*`` features work
|
||||||
just fine in combination with userfaults. Userfaults trigger async
|
just fine in combination with userfaults. Userfaults trigger async
|
||||||
page faults in the guest scheduler so those guest processes that
|
page faults in the guest scheduler so those guest processes that
|
||||||
aren't waiting for userfaults (i.e. network bound) can keep running in
|
aren't waiting for userfaults (i.e. network bound) can keep running in
|
||||||
@ -183,19 +184,19 @@ generating userfaults for readonly guest regions.
|
|||||||
The implementation of postcopy live migration currently uses one
|
The implementation of postcopy live migration currently uses one
|
||||||
single bidirectional socket but in the future two different sockets
|
single bidirectional socket but in the future two different sockets
|
||||||
will be used (to reduce the latency of the userfaults to the minimum
|
will be used (to reduce the latency of the userfaults to the minimum
|
||||||
possible without having to decrease /proc/sys/net/ipv4/tcp_wmem).
|
possible without having to decrease ``/proc/sys/net/ipv4/tcp_wmem``).
|
||||||
|
|
||||||
The QEMU in the source node writes all pages that it knows are missing
|
The QEMU in the source node writes all pages that it knows are missing
|
||||||
in the destination node, into the socket, and the migration thread of
|
in the destination node, into the socket, and the migration thread of
|
||||||
the QEMU running in the destination node runs UFFDIO_COPY|ZEROPAGE
|
the QEMU running in the destination node runs ``UFFDIO_COPY|ZEROPAGE``
|
||||||
ioctls on the userfaultfd in order to map the received pages into the
|
ioctls on the ``userfaultfd`` in order to map the received pages into the
|
||||||
guest (UFFDIO_ZEROCOPY is used if the source page was a zero page).
|
guest (``UFFDIO_ZEROCOPY`` is used if the source page was a zero page).
|
||||||
|
|
||||||
A different postcopy thread in the destination node listens with
|
A different postcopy thread in the destination node listens with
|
||||||
poll() to the userfaultfd in parallel. When a POLLIN event is
|
poll() to the ``userfaultfd`` in parallel. When a ``POLLIN`` event is
|
||||||
generated after a userfault triggers, the postcopy thread read() from
|
generated after a userfault triggers, the postcopy thread read() from
|
||||||
the userfaultfd and receives the fault address (or -EAGAIN in case the
|
the ``userfaultfd`` and receives the fault address (or ``-EAGAIN`` in case the
|
||||||
userfault was already resolved and waken by a UFFDIO_COPY|ZEROPAGE run
|
userfault was already resolved and waken by a ``UFFDIO_COPY|ZEROPAGE`` run
|
||||||
by the parallel QEMU migration thread).
|
by the parallel QEMU migration thread).
|
||||||
|
|
||||||
After the QEMU postcopy thread (running in the destination node) gets
|
After the QEMU postcopy thread (running in the destination node) gets
|
||||||
@ -206,7 +207,7 @@ remaining missing pages from that new page offset. Soon after that
|
|||||||
(just the time to flush the tcp_wmem queue through the network) the
|
(just the time to flush the tcp_wmem queue through the network) the
|
||||||
migration thread in the QEMU running in the destination node will
|
migration thread in the QEMU running in the destination node will
|
||||||
receive the page that triggered the userfault and it'll map it as
|
receive the page that triggered the userfault and it'll map it as
|
||||||
usual with the UFFDIO_COPY|ZEROPAGE (without actually knowing if it
|
usual with the ``UFFDIO_COPY|ZEROPAGE`` (without actually knowing if it
|
||||||
was spontaneously sent by the source or if it was an urgent page
|
was spontaneously sent by the source or if it was an urgent page
|
||||||
requested through a userfault).
|
requested through a userfault).
|
||||||
|
|
||||||
@ -219,74 +220,74 @@ checked to find which missing pages to send in round robin and we seek
|
|||||||
over it when receiving incoming userfaults. After sending each page of
|
over it when receiving incoming userfaults. After sending each page of
|
||||||
course the bitmap is updated accordingly. It's also useful to avoid
|
course the bitmap is updated accordingly. It's also useful to avoid
|
||||||
sending the same page twice (in case the userfault is read by the
|
sending the same page twice (in case the userfault is read by the
|
||||||
postcopy thread just before UFFDIO_COPY|ZEROPAGE runs in the migration
|
postcopy thread just before ``UFFDIO_COPY|ZEROPAGE`` runs in the migration
|
||||||
thread).
|
thread).
|
||||||
|
|
||||||
Non-cooperative userfaultfd
|
Non-cooperative userfaultfd
|
||||||
===========================
|
===========================
|
||||||
|
|
||||||
When the userfaultfd is monitored by an external manager, the manager
|
When the ``userfaultfd`` is monitored by an external manager, the manager
|
||||||
must be able to track changes in the process virtual memory
|
must be able to track changes in the process virtual memory
|
||||||
layout. Userfaultfd can notify the manager about such changes using
|
layout. Userfaultfd can notify the manager about such changes using
|
||||||
the same read(2) protocol as for the page fault notifications. The
|
the same read(2) protocol as for the page fault notifications. The
|
||||||
manager has to explicitly enable these events by setting appropriate
|
manager has to explicitly enable these events by setting appropriate
|
||||||
bits in uffdio_api.features passed to UFFDIO_API ioctl:
|
bits in ``uffdio_api.features`` passed to ``UFFDIO_API`` ioctl:
|
||||||
|
|
||||||
UFFD_FEATURE_EVENT_FORK
|
``UFFD_FEATURE_EVENT_FORK``
|
||||||
enable userfaultfd hooks for fork(). When this feature is
|
enable ``userfaultfd`` hooks for fork(). When this feature is
|
||||||
enabled, the userfaultfd context of the parent process is
|
enabled, the ``userfaultfd`` context of the parent process is
|
||||||
duplicated into the newly created process. The manager
|
duplicated into the newly created process. The manager
|
||||||
receives UFFD_EVENT_FORK with file descriptor of the new
|
receives ``UFFD_EVENT_FORK`` with file descriptor of the new
|
||||||
userfaultfd context in the uffd_msg.fork.
|
``userfaultfd`` context in the ``uffd_msg.fork``.
|
||||||
|
|
||||||
UFFD_FEATURE_EVENT_REMAP
|
``UFFD_FEATURE_EVENT_REMAP``
|
||||||
enable notifications about mremap() calls. When the
|
enable notifications about mremap() calls. When the
|
||||||
non-cooperative process moves a virtual memory area to a
|
non-cooperative process moves a virtual memory area to a
|
||||||
different location, the manager will receive
|
different location, the manager will receive
|
||||||
UFFD_EVENT_REMAP. The uffd_msg.remap will contain the old and
|
``UFFD_EVENT_REMAP``. The ``uffd_msg.remap`` will contain the old and
|
||||||
new addresses of the area and its original length.
|
new addresses of the area and its original length.
|
||||||
|
|
||||||
UFFD_FEATURE_EVENT_REMOVE
|
``UFFD_FEATURE_EVENT_REMOVE``
|
||||||
enable notifications about madvise(MADV_REMOVE) and
|
enable notifications about madvise(MADV_REMOVE) and
|
||||||
madvise(MADV_DONTNEED) calls. The event UFFD_EVENT_REMOVE will
|
madvise(MADV_DONTNEED) calls. The event ``UFFD_EVENT_REMOVE`` will
|
||||||
be generated upon these calls to madvise. The uffd_msg.remove
|
be generated upon these calls to madvise(). The ``uffd_msg.remove``
|
||||||
will contain start and end addresses of the removed area.
|
will contain start and end addresses of the removed area.
|
||||||
|
|
||||||
UFFD_FEATURE_EVENT_UNMAP
|
``UFFD_FEATURE_EVENT_UNMAP``
|
||||||
enable notifications about memory unmapping. The manager will
|
enable notifications about memory unmapping. The manager will
|
||||||
get UFFD_EVENT_UNMAP with uffd_msg.remove containing start and
|
get ``UFFD_EVENT_UNMAP`` with ``uffd_msg.remove`` containing start and
|
||||||
end addresses of the unmapped area.
|
end addresses of the unmapped area.
|
||||||
|
|
||||||
Although the UFFD_FEATURE_EVENT_REMOVE and UFFD_FEATURE_EVENT_UNMAP
|
Although the ``UFFD_FEATURE_EVENT_REMOVE`` and ``UFFD_FEATURE_EVENT_UNMAP``
|
||||||
are pretty similar, they quite differ in the action expected from the
|
are pretty similar, they quite differ in the action expected from the
|
||||||
userfaultfd manager. In the former case, the virtual memory is
|
``userfaultfd`` manager. In the former case, the virtual memory is
|
||||||
removed, but the area is not, the area remains monitored by the
|
removed, but the area is not, the area remains monitored by the
|
||||||
userfaultfd, and if a page fault occurs in that area it will be
|
``userfaultfd``, and if a page fault occurs in that area it will be
|
||||||
delivered to the manager. The proper resolution for such page fault is
|
delivered to the manager. The proper resolution for such page fault is
|
||||||
to zeromap the faulting address. However, in the latter case, when an
|
to zeromap the faulting address. However, in the latter case, when an
|
||||||
area is unmapped, either explicitly (with munmap() system call), or
|
area is unmapped, either explicitly (with munmap() system call), or
|
||||||
implicitly (e.g. during mremap()), the area is removed and in turn the
|
implicitly (e.g. during mremap()), the area is removed and in turn the
|
||||||
userfaultfd context for such area disappears too and the manager will
|
``userfaultfd`` context for such area disappears too and the manager will
|
||||||
not get further userland page faults from the removed area. Still, the
|
not get further userland page faults from the removed area. Still, the
|
||||||
notification is required in order to prevent manager from using
|
notification is required in order to prevent manager from using
|
||||||
UFFDIO_COPY on the unmapped area.
|
``UFFDIO_COPY`` on the unmapped area.
|
||||||
|
|
||||||
Unlike userland page faults which have to be synchronous and require
|
Unlike userland page faults which have to be synchronous and require
|
||||||
explicit or implicit wakeup, all the events are delivered
|
explicit or implicit wakeup, all the events are delivered
|
||||||
asynchronously and the non-cooperative process resumes execution as
|
asynchronously and the non-cooperative process resumes execution as
|
||||||
soon as manager executes read(). The userfaultfd manager should
|
soon as manager executes read(). The ``userfaultfd`` manager should
|
||||||
carefully synchronize calls to UFFDIO_COPY with the events
|
carefully synchronize calls to ``UFFDIO_COPY`` with the events
|
||||||
processing. To aid the synchronization, the UFFDIO_COPY ioctl will
|
processing. To aid the synchronization, the ``UFFDIO_COPY`` ioctl will
|
||||||
return -ENOSPC when the monitored process exits at the time of
|
return ``-ENOSPC`` when the monitored process exits at the time of
|
||||||
UFFDIO_COPY, and -ENOENT, when the non-cooperative process has changed
|
``UFFDIO_COPY``, and ``-ENOENT``, when the non-cooperative process has changed
|
||||||
its virtual memory layout simultaneously with outstanding UFFDIO_COPY
|
its virtual memory layout simultaneously with outstanding ``UFFDIO_COPY``
|
||||||
operation.
|
operation.
|
||||||
|
|
||||||
The current asynchronous model of the event delivery is optimal for
|
The current asynchronous model of the event delivery is optimal for
|
||||||
single threaded non-cooperative userfaultfd manager implementations. A
|
single threaded non-cooperative ``userfaultfd`` manager implementations. A
|
||||||
synchronous event delivery model can be added later as a new
|
synchronous event delivery model can be added later as a new
|
||||||
userfaultfd feature to facilitate multithreading enhancements of the
|
``userfaultfd`` feature to facilitate multithreading enhancements of the
|
||||||
non cooperative manager, for example to allow UFFDIO_COPY ioctls to
|
non cooperative manager, for example to allow ``UFFDIO_COPY`` ioctls to
|
||||||
run in parallel to the event reception. Single threaded
|
run in parallel to the event reception. Single threaded
|
||||||
implementations should continue to use the current async event
|
implementations should continue to use the current async event
|
||||||
delivery model instead.
|
delivery model instead.
|
||||||
|
@ -18,7 +18,7 @@ Mounting the root filesystem via NFS (nfsroot)
|
|||||||
In order to use a diskless system, such as an X-terminal or printer server for
|
In order to use a diskless system, such as an X-terminal or printer server for
|
||||||
example, it is necessary for the root filesystem to be present on a non-disk
|
example, it is necessary for the root filesystem to be present on a non-disk
|
||||||
device. This may be an initramfs (see
|
device. This may be an initramfs (see
|
||||||
Documentation/filesystems/ramfs-rootfs-initramfs.txt), a ramdisk (see
|
Documentation/filesystems/ramfs-rootfs-initramfs.rst), a ramdisk (see
|
||||||
Documentation/admin-guide/initrd.rst) or a filesystem mounted via NFS. The
|
Documentation/admin-guide/initrd.rst) or a filesystem mounted via NFS. The
|
||||||
following text describes on how to use NFS for the root filesystem. For the rest
|
following text describes on how to use NFS for the root filesystem. For the rest
|
||||||
of this text 'client' means the diskless system, and 'server' means the NFS
|
of this text 'client' means the diskless system, and 'server' means the NFS
|
||||||
|
@ -6,6 +6,21 @@ Numa policy hit/miss statistics
|
|||||||
|
|
||||||
All units are pages. Hugepages have separate counters.
|
All units are pages. Hugepages have separate counters.
|
||||||
|
|
||||||
|
The numa_hit, numa_miss and numa_foreign counters reflect how well processes
|
||||||
|
are able to allocate memory from nodes they prefer. If they succeed, numa_hit
|
||||||
|
is incremented on the preferred node, otherwise numa_foreign is incremented on
|
||||||
|
the preferred node and numa_miss on the node where allocation succeeded.
|
||||||
|
|
||||||
|
Usually preferred node is the one local to the CPU where the process executes,
|
||||||
|
but restrictions such as mempolicies can change that, so there are also two
|
||||||
|
counters based on CPU local node. local_node is similar to numa_hit and is
|
||||||
|
incremented on allocation from a node by CPU on the same node. other_node is
|
||||||
|
similar to numa_miss and is incremented on the node where allocation succeeds
|
||||||
|
from a CPU from a different node. Note there is no counter analogical to
|
||||||
|
numa_foreign.
|
||||||
|
|
||||||
|
In more detail:
|
||||||
|
|
||||||
=============== ============================================================
|
=============== ============================================================
|
||||||
numa_hit A process wanted to allocate memory from this node,
|
numa_hit A process wanted to allocate memory from this node,
|
||||||
and succeeded.
|
and succeeded.
|
||||||
@ -14,11 +29,13 @@ numa_miss A process wanted to allocate memory from another node,
|
|||||||
but ended up with memory from this node.
|
but ended up with memory from this node.
|
||||||
|
|
||||||
numa_foreign A process wanted to allocate on this node,
|
numa_foreign A process wanted to allocate on this node,
|
||||||
but ended up with memory from another one.
|
but ended up with memory from another node.
|
||||||
|
|
||||||
local_node A process ran on this node and got memory from it.
|
local_node A process ran on this node's CPU,
|
||||||
|
and got memory from this node.
|
||||||
|
|
||||||
other_node A process ran on this node and got memory from another node.
|
other_node A process ran on a different node's CPU
|
||||||
|
and got memory from this node.
|
||||||
|
|
||||||
interleave_hit Interleaving wanted to allocate from this node
|
interleave_hit Interleaving wanted to allocate from this node
|
||||||
and succeeded.
|
and succeeded.
|
||||||
@ -28,3 +45,11 @@ For easier reading you can use the numastat utility from the numactl package
|
|||||||
(http://oss.sgi.com/projects/libnuma/). Note that it only works
|
(http://oss.sgi.com/projects/libnuma/). Note that it only works
|
||||||
well right now on machines with a small number of CPUs.
|
well right now on machines with a small number of CPUs.
|
||||||
|
|
||||||
|
Note that on systems with memoryless nodes (where a node has CPUs but no
|
||||||
|
memory) the numa_hit, numa_miss and numa_foreign statistics can be skewed
|
||||||
|
heavily. In the current kernel implementation, if a process prefers a
|
||||||
|
memoryless node (i.e. because it is running on one of its local CPU), the
|
||||||
|
implementation actually treats one of the nearest nodes with memory as the
|
||||||
|
preferred node. As a result, such allocation will not increase the numa_foreign
|
||||||
|
counter on the memoryless node, and will skew the numa_hit, numa_miss and
|
||||||
|
numa_foreign statistics of the nearest node.
|
||||||
|
@ -156,11 +156,11 @@ the labels provided by the BIOS won't match the real ones.
|
|||||||
ECC memory
|
ECC memory
|
||||||
----------
|
----------
|
||||||
|
|
||||||
As mentioned on the previous section, ECC memory has extra bits to be
|
As mentioned in the previous section, ECC memory has extra bits to be
|
||||||
used for error correction. So, on 64 bit systems, a memory module
|
used for error correction. In the above example, a memory module has
|
||||||
has 64 bits of *data width*, and 74 bits of *total width*. So, there are
|
64 bits of *data width*, and 72 bits of *total width*. The extra 8
|
||||||
8 bits extra bits to be used for the error detection and correction
|
bits which are used for the error detection and correction mechanisms
|
||||||
mechanisms. Those extra bits are called *syndrome*\ [#f1]_\ [#f2]_.
|
are referred to as the *syndrome*\ [#f1]_\ [#f2]_.
|
||||||
|
|
||||||
So, when the cpu requests the memory controller to write a word with
|
So, when the cpu requests the memory controller to write a word with
|
||||||
*data width*, the memory controller calculates the *syndrome* in real time,
|
*data width*, the memory controller calculates the *syndrome* in real time,
|
||||||
@ -212,7 +212,7 @@ EDAC - Error Detection And Correction
|
|||||||
purposes.
|
purposes.
|
||||||
|
|
||||||
When the subsystem was pushed upstream for the first time, on
|
When the subsystem was pushed upstream for the first time, on
|
||||||
Kernel 2.6.16, for the first time, it was renamed to ``EDAC``.
|
Kernel 2.6.16, it was renamed to ``EDAC``.
|
||||||
|
|
||||||
Purpose
|
Purpose
|
||||||
-------
|
-------
|
||||||
@ -351,15 +351,17 @@ controllers. The following example will assume 2 channels:
|
|||||||
+------------+-----------+-----------+
|
+------------+-----------+-----------+
|
||||||
| | ``ch0`` | ``ch1`` |
|
| | ``ch0`` | ``ch1`` |
|
||||||
+============+===========+===========+
|
+============+===========+===========+
|
||||||
| ``csrow0`` | DIMM_A0 | DIMM_B0 |
|
| |**DIMM_A0**|**DIMM_B0**|
|
||||||
| | rank0 | rank0 |
|
+------------+-----------+-----------+
|
||||||
+------------+ - | - |
|
| ``csrow0`` | rank0 | rank0 |
|
||||||
|
+------------+-----------+-----------+
|
||||||
| ``csrow1`` | rank1 | rank1 |
|
| ``csrow1`` | rank1 | rank1 |
|
||||||
+------------+-----------+-----------+
|
+------------+-----------+-----------+
|
||||||
| ``csrow2`` | DIMM_A1 | DIMM_B1 |
|
| |**DIMM_A1**|**DIMM_B1**|
|
||||||
| | rank0 | rank0 |
|
+------------+-----------+-----------+
|
||||||
+------------+ - | - |
|
| ``csrow2`` | rank0 | rank0 |
|
||||||
| ``csrow3`` | rank1 | rank1 |
|
+------------+-----------+-----------+
|
||||||
|
| ``csrow3`` | rank1 | rank1 |
|
||||||
+------------+-----------+-----------+
|
+------------+-----------+-----------+
|
||||||
|
|
||||||
In the above example, there are 4 physical slots on the motherboard
|
In the above example, there are 4 physical slots on the motherboard
|
||||||
|
@ -102,6 +102,30 @@ See the ``type_of_loader`` and ``ext_loader_ver`` fields in
|
|||||||
:doc:`/x86/boot` for additional information.
|
:doc:`/x86/boot` for additional information.
|
||||||
|
|
||||||
|
|
||||||
|
bpf_stats_enabled
|
||||||
|
=================
|
||||||
|
|
||||||
|
Controls whether the kernel should collect statistics on BPF programs
|
||||||
|
(total time spent running, number of times run...). Enabling
|
||||||
|
statistics causes a slight reduction in performance on each program
|
||||||
|
run. The statistics can be seen using ``bpftool``.
|
||||||
|
|
||||||
|
= ===================================
|
||||||
|
0 Don't collect statistics (default).
|
||||||
|
1 Collect statistics.
|
||||||
|
= ===================================
|
||||||
|
|
||||||
|
|
||||||
|
cad_pid
|
||||||
|
=======
|
||||||
|
|
||||||
|
This is the pid which will be signalled on reboot (notably, by
|
||||||
|
Ctrl-Alt-Delete). Writing a value to this file which doesn't
|
||||||
|
correspond to a running process will result in ``-ESRCH``.
|
||||||
|
|
||||||
|
See also `ctrl-alt-del`_.
|
||||||
|
|
||||||
|
|
||||||
cap_last_cap
|
cap_last_cap
|
||||||
============
|
============
|
||||||
|
|
||||||
@ -241,6 +265,40 @@ domain names are in general different. For a detailed discussion
|
|||||||
see the ``hostname(1)`` man page.
|
see the ``hostname(1)`` man page.
|
||||||
|
|
||||||
|
|
||||||
|
firmware_config
|
||||||
|
===============
|
||||||
|
|
||||||
|
See :doc:`/driver-api/firmware/fallback-mechanisms`.
|
||||||
|
|
||||||
|
The entries in this directory allow the firmware loader helper
|
||||||
|
fallback to be controlled:
|
||||||
|
|
||||||
|
* ``force_sysfs_fallback``, when set to 1, forces the use of the
|
||||||
|
fallback;
|
||||||
|
* ``ignore_sysfs_fallback``, when set to 1, ignores any fallback.
|
||||||
|
|
||||||
|
|
||||||
|
ftrace_dump_on_oops
|
||||||
|
===================
|
||||||
|
|
||||||
|
Determines whether ``ftrace_dump()`` should be called on an oops (or
|
||||||
|
kernel panic). This will output the contents of the ftrace buffers to
|
||||||
|
the console. This is very useful for capturing traces that lead to
|
||||||
|
crashes and outputting them to a serial console.
|
||||||
|
|
||||||
|
= ===================================================
|
||||||
|
0 Disabled (default).
|
||||||
|
1 Dump buffers of all CPUs.
|
||||||
|
2 Dump the buffer of the CPU that triggered the oops.
|
||||||
|
= ===================================================
|
||||||
|
|
||||||
|
|
||||||
|
ftrace_enabled, stack_tracer_enabled
|
||||||
|
====================================
|
||||||
|
|
||||||
|
See :doc:`/trace/ftrace`.
|
||||||
|
|
||||||
|
|
||||||
hardlockup_all_cpu_backtrace
|
hardlockup_all_cpu_backtrace
|
||||||
============================
|
============================
|
||||||
|
|
||||||
@ -344,6 +402,25 @@ Controls whether the panic kmsg data should be reported to Hyper-V.
|
|||||||
= =========================================================
|
= =========================================================
|
||||||
|
|
||||||
|
|
||||||
|
ignore-unaligned-usertrap
|
||||||
|
=========================
|
||||||
|
|
||||||
|
On architectures where unaligned accesses cause traps, and where this
|
||||||
|
feature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN``;
|
||||||
|
currently, ``arc`` and ``ia64``), controls whether all unaligned traps
|
||||||
|
are logged.
|
||||||
|
|
||||||
|
= =============================================================
|
||||||
|
0 Log all unaligned accesses.
|
||||||
|
1 Only warn the first time a process traps. This is the default
|
||||||
|
setting.
|
||||||
|
= =============================================================
|
||||||
|
|
||||||
|
See also `unaligned-trap`_ and `unaligned-dump-stack`_. On ``ia64``,
|
||||||
|
this allows system administrators to override the
|
||||||
|
``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded.
|
||||||
|
|
||||||
|
|
||||||
kexec_load_disabled
|
kexec_load_disabled
|
||||||
===================
|
===================
|
||||||
|
|
||||||
@ -459,6 +536,15 @@ Notes:
|
|||||||
successful IPC object allocation. If an IPC object allocation syscall
|
successful IPC object allocation. If an IPC object allocation syscall
|
||||||
fails, it is undefined if the value remains unmodified or is reset to -1.
|
fails, it is undefined if the value remains unmodified or is reset to -1.
|
||||||
|
|
||||||
|
|
||||||
|
ngroups_max
|
||||||
|
===========
|
||||||
|
|
||||||
|
Maximum number of supplementary groups, _i.e._ the maximum size which
|
||||||
|
``setgroups`` will accept. Exports ``NGROUPS_MAX`` from the kernel.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
nmi_watchdog
|
nmi_watchdog
|
||||||
============
|
============
|
||||||
|
|
||||||
@ -877,7 +963,7 @@ this sysctl interface anymore.
|
|||||||
pty
|
pty
|
||||||
===
|
===
|
||||||
|
|
||||||
See Documentation/filesystems/devpts.txt.
|
See Documentation/filesystems/devpts.rst.
|
||||||
|
|
||||||
|
|
||||||
randomize_va_space
|
randomize_va_space
|
||||||
@ -1173,6 +1259,65 @@ If a value outside of this range is written to ``threads-max`` an
|
|||||||
``EINVAL`` error occurs.
|
``EINVAL`` error occurs.
|
||||||
|
|
||||||
|
|
||||||
|
traceoff_on_warning
|
||||||
|
===================
|
||||||
|
|
||||||
|
When set, disables tracing (see :doc:`/trace/ftrace`) when a
|
||||||
|
``WARN()`` is hit.
|
||||||
|
|
||||||
|
|
||||||
|
tracepoint_printk
|
||||||
|
=================
|
||||||
|
|
||||||
|
When tracepoints are sent to printk() (enabled by the ``tp_printk``
|
||||||
|
boot parameter), this entry provides runtime control::
|
||||||
|
|
||||||
|
echo 0 > /proc/sys/kernel/tracepoint_printk
|
||||||
|
|
||||||
|
will stop tracepoints from being sent to printk(), and::
|
||||||
|
|
||||||
|
echo 1 > /proc/sys/kernel/tracepoint_printk
|
||||||
|
|
||||||
|
will send them to printk() again.
|
||||||
|
|
||||||
|
This only works if the kernel was booted with ``tp_printk`` enabled.
|
||||||
|
|
||||||
|
See :doc:`/admin-guide/kernel-parameters` and
|
||||||
|
:doc:`/trace/boottime-trace`.
|
||||||
|
|
||||||
|
|
||||||
|
.. _unaligned-dump-stack:
|
||||||
|
|
||||||
|
unaligned-dump-stack (ia64)
|
||||||
|
===========================
|
||||||
|
|
||||||
|
When logging unaligned accesses, controls whether the stack is
|
||||||
|
dumped.
|
||||||
|
|
||||||
|
= ===================================================
|
||||||
|
0 Do not dump the stack. This is the default setting.
|
||||||
|
1 Dump the stack.
|
||||||
|
= ===================================================
|
||||||
|
|
||||||
|
See also `ignore-unaligned-usertrap`_.
|
||||||
|
|
||||||
|
|
||||||
|
unaligned-trap
|
||||||
|
==============
|
||||||
|
|
||||||
|
On architectures where unaligned accesses cause traps, and where this
|
||||||
|
feature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_ALLOW``; currently,
|
||||||
|
``arc`` and ``parisc``), controls whether unaligned traps are caught
|
||||||
|
and emulated (instead of failing).
|
||||||
|
|
||||||
|
= ========================================================
|
||||||
|
0 Do not emulate unaligned accesses.
|
||||||
|
1 Emulate unaligned accesses. This is the default setting.
|
||||||
|
= ========================================================
|
||||||
|
|
||||||
|
See also `ignore-unaligned-usertrap`_.
|
||||||
|
|
||||||
|
|
||||||
unknown_nmi_panic
|
unknown_nmi_panic
|
||||||
=================
|
=================
|
||||||
|
|
||||||
@ -1184,6 +1329,16 @@ NMI switch that most IA32 servers have fires unknown NMI up, for
|
|||||||
example. If a system hangs up, try pressing the NMI switch.
|
example. If a system hangs up, try pressing the NMI switch.
|
||||||
|
|
||||||
|
|
||||||
|
unprivileged_bpf_disabled
|
||||||
|
=========================
|
||||||
|
|
||||||
|
Writing 1 to this entry will disable unprivileged calls to ``bpf()``;
|
||||||
|
once disabled, calling ``bpf()`` without ``CAP_SYS_ADMIN`` will return
|
||||||
|
``-EPERM``.
|
||||||
|
|
||||||
|
Once set, this can't be cleared.
|
||||||
|
|
||||||
|
|
||||||
watchdog
|
watchdog
|
||||||
========
|
========
|
||||||
|
|
||||||
|
@ -24,13 +24,13 @@ optional external memory-mapped interface.
|
|||||||
Version 1 of the Activity Monitors architecture implements a counter group
|
Version 1 of the Activity Monitors architecture implements a counter group
|
||||||
of four fixed and architecturally defined 64-bit event counters.
|
of four fixed and architecturally defined 64-bit event counters.
|
||||||
|
|
||||||
- CPU cycle counter: increments at the frequency of the CPU.
|
- CPU cycle counter: increments at the frequency of the CPU.
|
||||||
- Constant counter: increments at the fixed frequency of the system
|
- Constant counter: increments at the fixed frequency of the system
|
||||||
clock.
|
clock.
|
||||||
- Instructions retired: increments with every architecturally executed
|
- Instructions retired: increments with every architecturally executed
|
||||||
instruction.
|
instruction.
|
||||||
- Memory stall cycles: counts instruction dispatch stall cycles caused by
|
- Memory stall cycles: counts instruction dispatch stall cycles caused by
|
||||||
misses in the last level cache within the clock domain.
|
misses in the last level cache within the clock domain.
|
||||||
|
|
||||||
When in WFI or WFE these counters do not increment.
|
When in WFI or WFE these counters do not increment.
|
||||||
|
|
||||||
@ -59,11 +59,11 @@ counters, only the presence of the extension.
|
|||||||
Firmware (code running at higher exception levels, e.g. arm-tf) support is
|
Firmware (code running at higher exception levels, e.g. arm-tf) support is
|
||||||
needed to:
|
needed to:
|
||||||
|
|
||||||
- Enable access for lower exception levels (EL2 and EL1) to the AMU
|
- Enable access for lower exception levels (EL2 and EL1) to the AMU
|
||||||
registers.
|
registers.
|
||||||
- Enable the counters. If not enabled these will read as 0.
|
- Enable the counters. If not enabled these will read as 0.
|
||||||
- Save/restore the counters before/after the CPU is being put/brought up
|
- Save/restore the counters before/after the CPU is being put/brought up
|
||||||
from the 'off' power state.
|
from the 'off' power state.
|
||||||
|
|
||||||
When using kernels that have this feature enabled but boot with broken
|
When using kernels that have this feature enabled but boot with broken
|
||||||
firmware the user may experience panics or lockups when accessing the
|
firmware the user may experience panics or lockups when accessing the
|
||||||
@ -81,10 +81,10 @@ are not trapped in EL2/EL3.
|
|||||||
The fixed counters of AMUv1 are accessible though the following system
|
The fixed counters of AMUv1 are accessible though the following system
|
||||||
register definitions:
|
register definitions:
|
||||||
|
|
||||||
- SYS_AMEVCNTR0_CORE_EL0
|
- SYS_AMEVCNTR0_CORE_EL0
|
||||||
- SYS_AMEVCNTR0_CONST_EL0
|
- SYS_AMEVCNTR0_CONST_EL0
|
||||||
- SYS_AMEVCNTR0_INST_RET_EL0
|
- SYS_AMEVCNTR0_INST_RET_EL0
|
||||||
- SYS_AMEVCNTR0_MEM_STALL_EL0
|
- SYS_AMEVCNTR0_MEM_STALL_EL0
|
||||||
|
|
||||||
Auxiliary platform specific counters can be accessed using
|
Auxiliary platform specific counters can be accessed using
|
||||||
SYS_AMEVCNTR1_EL0(n), where n is a value between 0 and 15.
|
SYS_AMEVCNTR1_EL0(n), where n is a value between 0 and 15.
|
||||||
@ -97,9 +97,9 @@ Userspace access
|
|||||||
|
|
||||||
Currently, access from userspace to the AMU registers is disabled due to:
|
Currently, access from userspace to the AMU registers is disabled due to:
|
||||||
|
|
||||||
- Security reasons: they might expose information about code executed in
|
- Security reasons: they might expose information about code executed in
|
||||||
secure mode.
|
secure mode.
|
||||||
- Purpose: AMU counters are intended for system management use.
|
- Purpose: AMU counters are intended for system management use.
|
||||||
|
|
||||||
Also, the presence of the feature is not visible to userspace.
|
Also, the presence of the feature is not visible to userspace.
|
||||||
|
|
||||||
@ -110,8 +110,8 @@ Virtualization
|
|||||||
Currently, access from userspace (EL0) and kernelspace (EL1) on the KVM
|
Currently, access from userspace (EL0) and kernelspace (EL1) on the KVM
|
||||||
guest side is disabled due to:
|
guest side is disabled due to:
|
||||||
|
|
||||||
- Security reasons: they might expose information about code executed
|
- Security reasons: they might expose information about code executed
|
||||||
by other guests or the host.
|
by other guests or the host.
|
||||||
|
|
||||||
Any attempt to access the AMU registers will result in an UNDEFINED
|
Any attempt to access the AMU registers will result in an UNDEFINED
|
||||||
exception being injected into the guest.
|
exception being injected into the guest.
|
||||||
|
@ -173,8 +173,10 @@ Before jumping into the kernel, the following conditions must be met:
|
|||||||
- Caches, MMUs
|
- Caches, MMUs
|
||||||
|
|
||||||
The MMU must be off.
|
The MMU must be off.
|
||||||
|
|
||||||
The instruction cache may be on or off, and must not hold any stale
|
The instruction cache may be on or off, and must not hold any stale
|
||||||
entries corresponding to the loaded kernel image.
|
entries corresponding to the loaded kernel image.
|
||||||
|
|
||||||
The address range corresponding to the loaded kernel image must be
|
The address range corresponding to the loaded kernel image must be
|
||||||
cleaned to the PoC. In the presence of a system cache or other
|
cleaned to the PoC. In the presence of a system cache or other
|
||||||
coherent masters with caches enabled, this will typically require
|
coherent masters with caches enabled, this will typically require
|
||||||
@ -239,6 +241,7 @@ Before jumping into the kernel, the following conditions must be met:
|
|||||||
- The DT or ACPI tables must describe a GICv2 interrupt controller.
|
- The DT or ACPI tables must describe a GICv2 interrupt controller.
|
||||||
|
|
||||||
For CPUs with pointer authentication functionality:
|
For CPUs with pointer authentication functionality:
|
||||||
|
|
||||||
- If EL3 is present:
|
- If EL3 is present:
|
||||||
|
|
||||||
- SCR_EL3.APK (bit 16) must be initialised to 0b1
|
- SCR_EL3.APK (bit 16) must be initialised to 0b1
|
||||||
@ -250,18 +253,22 @@ Before jumping into the kernel, the following conditions must be met:
|
|||||||
- HCR_EL2.API (bit 41) must be initialised to 0b1
|
- HCR_EL2.API (bit 41) must be initialised to 0b1
|
||||||
|
|
||||||
For CPUs with Activity Monitors Unit v1 (AMUv1) extension present:
|
For CPUs with Activity Monitors Unit v1 (AMUv1) extension present:
|
||||||
|
|
||||||
- If EL3 is present:
|
- If EL3 is present:
|
||||||
CPTR_EL3.TAM (bit 30) must be initialised to 0b0
|
|
||||||
CPTR_EL2.TAM (bit 30) must be initialised to 0b0
|
- CPTR_EL3.TAM (bit 30) must be initialised to 0b0
|
||||||
AMCNTENSET0_EL0 must be initialised to 0b1111
|
- CPTR_EL2.TAM (bit 30) must be initialised to 0b0
|
||||||
AMCNTENSET1_EL0 must be initialised to a platform specific value
|
- AMCNTENSET0_EL0 must be initialised to 0b1111
|
||||||
having 0b1 set for the corresponding bit for each of the auxiliary
|
- AMCNTENSET1_EL0 must be initialised to a platform specific value
|
||||||
counters present.
|
having 0b1 set for the corresponding bit for each of the auxiliary
|
||||||
|
counters present.
|
||||||
|
|
||||||
- If the kernel is entered at EL1:
|
- If the kernel is entered at EL1:
|
||||||
AMCNTENSET0_EL0 must be initialised to 0b1111
|
|
||||||
AMCNTENSET1_EL0 must be initialised to a platform specific value
|
- AMCNTENSET0_EL0 must be initialised to 0b1111
|
||||||
having 0b1 set for the corresponding bit for each of the auxiliary
|
- AMCNTENSET1_EL0 must be initialised to a platform specific value
|
||||||
counters present.
|
having 0b1 set for the corresponding bit for each of the auxiliary
|
||||||
|
counters present.
|
||||||
|
|
||||||
The requirements described above for CPU mode, caches, MMUs, architected
|
The requirements described above for CPU mode, caches, MMUs, architected
|
||||||
timers, coherency and system registers apply to all CPUs. All CPUs must
|
timers, coherency and system registers apply to all CPUs. All CPUs must
|
||||||
@ -305,7 +312,8 @@ following manner:
|
|||||||
Documentation/devicetree/bindings/arm/psci.yaml.
|
Documentation/devicetree/bindings/arm/psci.yaml.
|
||||||
|
|
||||||
- Secondary CPU general-purpose register settings
|
- Secondary CPU general-purpose register settings
|
||||||
x0 = 0 (reserved for future use)
|
|
||||||
x1 = 0 (reserved for future use)
|
- x0 = 0 (reserved for future use)
|
||||||
x2 = 0 (reserved for future use)
|
- x1 = 0 (reserved for future use)
|
||||||
x3 = 0 (reserved for future use)
|
- x2 = 0 (reserved for future use)
|
||||||
|
- x3 = 0 (reserved for future use)
|
||||||
|
@ -388,44 +388,6 @@ if major == 1 and minor < 6:
|
|||||||
# author, documentclass [howto, manual, or own class]).
|
# author, documentclass [howto, manual, or own class]).
|
||||||
# Sorted in alphabetical order
|
# Sorted in alphabetical order
|
||||||
latex_documents = [
|
latex_documents = [
|
||||||
('admin-guide/index', 'linux-user.tex', 'Linux Kernel User Documentation',
|
|
||||||
'The kernel development community', 'manual'),
|
|
||||||
('core-api/index', 'core-api.tex', 'The kernel core API manual',
|
|
||||||
'The kernel development community', 'manual'),
|
|
||||||
('crypto/index', 'crypto-api.tex', 'Linux Kernel Crypto API manual',
|
|
||||||
'The kernel development community', 'manual'),
|
|
||||||
('dev-tools/index', 'dev-tools.tex', 'Development tools for the Kernel',
|
|
||||||
'The kernel development community', 'manual'),
|
|
||||||
('doc-guide/index', 'kernel-doc-guide.tex', 'Linux Kernel Documentation Guide',
|
|
||||||
'The kernel development community', 'manual'),
|
|
||||||
('driver-api/index', 'driver-api.tex', 'The kernel driver API manual',
|
|
||||||
'The kernel development community', 'manual'),
|
|
||||||
('filesystems/index', 'filesystems.tex', 'Linux Filesystems API',
|
|
||||||
'The kernel development community', 'manual'),
|
|
||||||
('admin-guide/ext4', 'ext4-admin-guide.tex', 'ext4 Administration Guide',
|
|
||||||
'ext4 Community', 'manual'),
|
|
||||||
('filesystems/ext4/index', 'ext4-data-structures.tex',
|
|
||||||
'ext4 Data Structures and Algorithms', 'ext4 Community', 'manual'),
|
|
||||||
('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide',
|
|
||||||
'The kernel development community', 'manual'),
|
|
||||||
('input/index', 'linux-input.tex', 'The Linux input driver subsystem',
|
|
||||||
'The kernel development community', 'manual'),
|
|
||||||
('kernel-hacking/index', 'kernel-hacking.tex', 'Unreliable Guide To Hacking The Linux Kernel',
|
|
||||||
'The kernel development community', 'manual'),
|
|
||||||
('media/index', 'media.tex', 'Linux Media Subsystem Documentation',
|
|
||||||
'The kernel development community', 'manual'),
|
|
||||||
('networking/index', 'networking.tex', 'Linux Networking Documentation',
|
|
||||||
'The kernel development community', 'manual'),
|
|
||||||
('process/index', 'development-process.tex', 'Linux Kernel Development Documentation',
|
|
||||||
'The kernel development community', 'manual'),
|
|
||||||
('security/index', 'security.tex', 'The kernel security subsystem manual',
|
|
||||||
'The kernel development community', 'manual'),
|
|
||||||
('sh/index', 'sh.tex', 'SuperH architecture implementation manual',
|
|
||||||
'The kernel development community', 'manual'),
|
|
||||||
('sound/index', 'sound.tex', 'Linux Sound Subsystem Documentation',
|
|
||||||
'The kernel development community', 'manual'),
|
|
||||||
('userspace-api/index', 'userspace-api.tex', 'The Linux kernel user-space API guide',
|
|
||||||
'The kernel development community', 'manual'),
|
|
||||||
]
|
]
|
||||||
|
|
||||||
# Add all other index files from Documentation/ subdirectories
|
# Add all other index files from Documentation/ subdirectories
|
||||||
|
@ -18,6 +18,7 @@ it.
|
|||||||
|
|
||||||
kernel-api
|
kernel-api
|
||||||
workqueue
|
workqueue
|
||||||
|
printk-basics
|
||||||
printk-formats
|
printk-formats
|
||||||
symbol-namespaces
|
symbol-namespaces
|
||||||
|
|
||||||
@ -30,10 +31,12 @@ Library functionality that is used throughout the kernel.
|
|||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
|
||||||
kobject
|
kobject
|
||||||
|
kref
|
||||||
assoc_array
|
assoc_array
|
||||||
xarray
|
xarray
|
||||||
idr
|
idr
|
||||||
circular-buffers
|
circular-buffers
|
||||||
|
rbtree
|
||||||
generic-radix-tree
|
generic-radix-tree
|
||||||
packing
|
packing
|
||||||
timekeeping
|
timekeeping
|
||||||
@ -50,6 +53,7 @@ How Linux keeps everything from happening at the same time. See
|
|||||||
|
|
||||||
atomic_ops
|
atomic_ops
|
||||||
refcount-vs-atomic
|
refcount-vs-atomic
|
||||||
|
irq/index
|
||||||
local_ops
|
local_ops
|
||||||
padata
|
padata
|
||||||
../RCU/index
|
../RCU/index
|
||||||
@ -78,6 +82,10 @@ more memory-management documentation in :doc:`/vm/index`.
|
|||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
|
||||||
memory-allocation
|
memory-allocation
|
||||||
|
dma-api
|
||||||
|
dma-api-howto
|
||||||
|
dma-attributes
|
||||||
|
dma-isa-lpc
|
||||||
mm-api
|
mm-api
|
||||||
genalloc
|
genalloc
|
||||||
pin_user_pages
|
pin_user_pages
|
||||||
@ -92,6 +100,7 @@ Interfaces for kernel debugging
|
|||||||
|
|
||||||
debug-objects
|
debug-objects
|
||||||
tracepoint
|
tracepoint
|
||||||
|
debugging-via-ohci1394
|
||||||
|
|
||||||
Everything else
|
Everything else
|
||||||
===============
|
===============
|
||||||
|
11
Documentation/core-api/irq/index.rst
Normal file
11
Documentation/core-api/irq/index.rst
Normal file
@ -0,0 +1,11 @@
|
|||||||
|
====
|
||||||
|
IRQs
|
||||||
|
====
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
|
||||||
|
concepts
|
||||||
|
irq-affinity
|
||||||
|
irq-domain
|
||||||
|
irqflags-tracing
|
@ -263,7 +263,8 @@ needs to:
|
|||||||
Hierarchy irq_domain is in no way x86 specific, and is heavily used to
|
Hierarchy irq_domain is in no way x86 specific, and is heavily used to
|
||||||
support other architectures, such as ARM, ARM64 etc.
|
support other architectures, such as ARM, ARM64 etc.
|
||||||
|
|
||||||
=== Debugging ===
|
Debugging
|
||||||
|
=========
|
||||||
|
|
||||||
Most of the internals of the IRQ subsystem are exposed in debugfs by
|
Most of the internals of the IRQ subsystem are exposed in debugfs by
|
||||||
turning CONFIG_GENERIC_IRQ_DEBUGFS on.
|
turning CONFIG_GENERIC_IRQ_DEBUGFS on.
|
@ -80,11 +80,11 @@ what is the pointer to the containing structure? You must avoid tricks
|
|||||||
(such as assuming that the kobject is at the beginning of the structure)
|
(such as assuming that the kobject is at the beginning of the structure)
|
||||||
and, instead, use the container_of() macro, found in ``<linux/kernel.h>``::
|
and, instead, use the container_of() macro, found in ``<linux/kernel.h>``::
|
||||||
|
|
||||||
container_of(pointer, type, member)
|
container_of(ptr, type, member)
|
||||||
|
|
||||||
where:
|
where:
|
||||||
|
|
||||||
* ``pointer`` is the pointer to the embedded kobject,
|
* ``ptr`` is the pointer to the embedded kobject,
|
||||||
* ``type`` is the type of the containing structure, and
|
* ``type`` is the type of the containing structure, and
|
||||||
* ``member`` is the name of the structure field to which ``pointer`` points.
|
* ``member`` is the name of the structure field to which ``pointer`` points.
|
||||||
|
|
||||||
@ -140,7 +140,7 @@ the name of the kobject, call kobject_rename()::
|
|||||||
|
|
||||||
int kobject_rename(struct kobject *kobj, const char *new_name);
|
int kobject_rename(struct kobject *kobj, const char *new_name);
|
||||||
|
|
||||||
kobject_rename does not perform any locking or have a solid notion of
|
kobject_rename() does not perform any locking or have a solid notion of
|
||||||
what names are valid so the caller must provide their own sanity checking
|
what names are valid so the caller must provide their own sanity checking
|
||||||
and serialization.
|
and serialization.
|
||||||
|
|
||||||
@ -210,7 +210,7 @@ statically and will warn the developer of this improper usage.
|
|||||||
If all that you want to use a kobject for is to provide a reference counter
|
If all that you want to use a kobject for is to provide a reference counter
|
||||||
for your structure, please use the struct kref instead; a kobject would be
|
for your structure, please use the struct kref instead; a kobject would be
|
||||||
overkill. For more information on how to use struct kref, please see the
|
overkill. For more information on how to use struct kref, please see the
|
||||||
file Documentation/kref.txt in the Linux kernel source tree.
|
file Documentation/core-api/kref.rst in the Linux kernel source tree.
|
||||||
|
|
||||||
|
|
||||||
Creating "simple" kobjects
|
Creating "simple" kobjects
|
||||||
@ -222,17 +222,17 @@ ksets, show and store functions, and other details. This is the one
|
|||||||
exception where a single kobject should be created. To create such an
|
exception where a single kobject should be created. To create such an
|
||||||
entry, use the function::
|
entry, use the function::
|
||||||
|
|
||||||
struct kobject *kobject_create_and_add(char *name, struct kobject *parent);
|
struct kobject *kobject_create_and_add(const char *name, struct kobject *parent);
|
||||||
|
|
||||||
This function will create a kobject and place it in sysfs in the location
|
This function will create a kobject and place it in sysfs in the location
|
||||||
underneath the specified parent kobject. To create simple attributes
|
underneath the specified parent kobject. To create simple attributes
|
||||||
associated with this kobject, use::
|
associated with this kobject, use::
|
||||||
|
|
||||||
int sysfs_create_file(struct kobject *kobj, struct attribute *attr);
|
int sysfs_create_file(struct kobject *kobj, const struct attribute *attr);
|
||||||
|
|
||||||
or::
|
or::
|
||||||
|
|
||||||
int sysfs_create_group(struct kobject *kobj, struct attribute_group *grp);
|
int sysfs_create_group(struct kobject *kobj, const struct attribute_group *grp);
|
||||||
|
|
||||||
Both types of attributes used here, with a kobject that has been created
|
Both types of attributes used here, with a kobject that has been created
|
||||||
with the kobject_create_and_add(), can be of type kobj_attribute, so no
|
with the kobject_create_and_add(), can be of type kobj_attribute, so no
|
||||||
@ -300,8 +300,10 @@ kobj_type::
|
|||||||
void (*release)(struct kobject *kobj);
|
void (*release)(struct kobject *kobj);
|
||||||
const struct sysfs_ops *sysfs_ops;
|
const struct sysfs_ops *sysfs_ops;
|
||||||
struct attribute **default_attrs;
|
struct attribute **default_attrs;
|
||||||
|
const struct attribute_group **default_groups;
|
||||||
const struct kobj_ns_type_operations *(*child_ns_type)(struct kobject *kobj);
|
const struct kobj_ns_type_operations *(*child_ns_type)(struct kobject *kobj);
|
||||||
const void *(*namespace)(struct kobject *kobj);
|
const void *(*namespace)(struct kobject *kobj);
|
||||||
|
void (*get_ownership)(struct kobject *kobj, kuid_t *uid, kgid_t *gid);
|
||||||
};
|
};
|
||||||
|
|
||||||
This structure is used to describe a particular type of kobject (or, more
|
This structure is used to describe a particular type of kobject (or, more
|
||||||
@ -352,12 +354,12 @@ created and never declared statically or on the stack. To create a new
|
|||||||
kset use::
|
kset use::
|
||||||
|
|
||||||
struct kset *kset_create_and_add(const char *name,
|
struct kset *kset_create_and_add(const char *name,
|
||||||
struct kset_uevent_ops *u,
|
const struct kset_uevent_ops *uevent_ops,
|
||||||
struct kobject *parent);
|
struct kobject *parent_kobj);
|
||||||
|
|
||||||
When you are finished with the kset, call::
|
When you are finished with the kset, call::
|
||||||
|
|
||||||
void kset_unregister(struct kset *kset);
|
void kset_unregister(struct kset *k);
|
||||||
|
|
||||||
to destroy it. This removes the kset from sysfs and decrements its reference
|
to destroy it. This removes the kset from sysfs and decrements its reference
|
||||||
count. When the reference count goes to zero, the kset will be released.
|
count. When the reference count goes to zero, the kset will be released.
|
||||||
@ -371,9 +373,9 @@ If a kset wishes to control the uevent operations of the kobjects
|
|||||||
associated with it, it can use the struct kset_uevent_ops to handle it::
|
associated with it, it can use the struct kset_uevent_ops to handle it::
|
||||||
|
|
||||||
struct kset_uevent_ops {
|
struct kset_uevent_ops {
|
||||||
int (*filter)(struct kset *kset, struct kobject *kobj);
|
int (* const filter)(struct kset *kset, struct kobject *kobj);
|
||||||
const char *(*name)(struct kset *kset, struct kobject *kobj);
|
const char *(* const name)(struct kset *kset, struct kobject *kobj);
|
||||||
int (*uevent)(struct kset *kset, struct kobject *kobj,
|
int (* const uevent)(struct kset *kset, struct kobject *kobj,
|
||||||
struct kobj_uevent_env *env);
|
struct kobj_uevent_env *env);
|
||||||
};
|
};
|
||||||
|
|
||||||
|
115
Documentation/core-api/printk-basics.rst
Normal file
115
Documentation/core-api/printk-basics.rst
Normal file
@ -0,0 +1,115 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
===========================
|
||||||
|
Message logging with printk
|
||||||
|
===========================
|
||||||
|
|
||||||
|
printk() is one of the most widely known functions in the Linux kernel. It's the
|
||||||
|
standard tool we have for printing messages and usually the most basic way of
|
||||||
|
tracing and debugging. If you're familiar with printf(3) you can tell printk()
|
||||||
|
is based on it, although it has some functional differences:
|
||||||
|
|
||||||
|
- printk() messages can specify a log level.
|
||||||
|
|
||||||
|
- the format string, while largely compatible with C99, doesn't follow the
|
||||||
|
exact same specification. It has some extensions and a few limitations
|
||||||
|
(no ``%n`` or floating point conversion specifiers). See :ref:`How to get
|
||||||
|
printk format specifiers right <printk-specifiers>`.
|
||||||
|
|
||||||
|
All printk() messages are printed to the kernel log buffer, which is a ring
|
||||||
|
buffer exported to userspace through /dev/kmsg. The usual way to read it is
|
||||||
|
using ``dmesg``.
|
||||||
|
|
||||||
|
printk() is typically used like this::
|
||||||
|
|
||||||
|
printk(KERN_INFO "Message: %s\n", arg);
|
||||||
|
|
||||||
|
where ``KERN_INFO`` is the log level (note that it's concatenated to the format
|
||||||
|
string, the log level is not a separate argument). The available log levels are:
|
||||||
|
|
||||||
|
+----------------+--------+-----------------------------------------------+
|
||||||
|
| Name | String | Alias function |
|
||||||
|
+================+========+===============================================+
|
||||||
|
| KERN_EMERG | "0" | pr_emerg() |
|
||||||
|
+----------------+--------+-----------------------------------------------+
|
||||||
|
| KERN_ALERT | "1" | pr_alert() |
|
||||||
|
+----------------+--------+-----------------------------------------------+
|
||||||
|
| KERN_CRIT | "2" | pr_crit() |
|
||||||
|
+----------------+--------+-----------------------------------------------+
|
||||||
|
| KERN_ERR | "3" | pr_err() |
|
||||||
|
+----------------+--------+-----------------------------------------------+
|
||||||
|
| KERN_WARNING | "4" | pr_warn() |
|
||||||
|
+----------------+--------+-----------------------------------------------+
|
||||||
|
| KERN_NOTICE | "5" | pr_notice() |
|
||||||
|
+----------------+--------+-----------------------------------------------+
|
||||||
|
| KERN_INFO | "6" | pr_info() |
|
||||||
|
+----------------+--------+-----------------------------------------------+
|
||||||
|
| KERN_DEBUG | "7" | pr_debug() and pr_devel() if DEBUG is defined |
|
||||||
|
+----------------+--------+-----------------------------------------------+
|
||||||
|
| KERN_DEFAULT | "" | |
|
||||||
|
+----------------+--------+-----------------------------------------------+
|
||||||
|
| KERN_CONT | "c" | pr_cont() |
|
||||||
|
+----------------+--------+-----------------------------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
The log level specifies the importance of a message. The kernel decides whether
|
||||||
|
to show the message immediately (printing it to the current console) depending
|
||||||
|
on its log level and the current *console_loglevel* (a kernel variable). If the
|
||||||
|
message priority is higher (lower log level value) than the *console_loglevel*
|
||||||
|
the message will be printed to the console.
|
||||||
|
|
||||||
|
If the log level is omitted, the message is printed with ``KERN_DEFAULT``
|
||||||
|
level.
|
||||||
|
|
||||||
|
You can check the current *console_loglevel* with::
|
||||||
|
|
||||||
|
$ cat /proc/sys/kernel/printk
|
||||||
|
4 4 1 7
|
||||||
|
|
||||||
|
The result shows the *current*, *default*, *minimum* and *boot-time-default* log
|
||||||
|
levels.
|
||||||
|
|
||||||
|
To change the current console_loglevel simply write the the desired level to
|
||||||
|
``/proc/sys/kernel/printk``. For example, to print all messages to the console::
|
||||||
|
|
||||||
|
# echo 8 > /proc/sys/kernel/printk
|
||||||
|
|
||||||
|
Another way, using ``dmesg``::
|
||||||
|
|
||||||
|
# dmesg -n 5
|
||||||
|
|
||||||
|
sets the console_loglevel to print KERN_WARNING (4) or more severe messages to
|
||||||
|
console. See ``dmesg(1)`` for more information.
|
||||||
|
|
||||||
|
As an alternative to printk() you can use the ``pr_*()`` aliases for
|
||||||
|
logging. This family of macros embed the log level in the macro names. For
|
||||||
|
example::
|
||||||
|
|
||||||
|
pr_info("Info message no. %d\n", msg_num);
|
||||||
|
|
||||||
|
prints a ``KERN_INFO`` message.
|
||||||
|
|
||||||
|
Besides being more concise than the equivalent printk() calls, they can use a
|
||||||
|
common definition for the format string through the pr_fmt() macro. For
|
||||||
|
instance, defining this at the top of a source file (before any ``#include``
|
||||||
|
directive)::
|
||||||
|
|
||||||
|
#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
|
||||||
|
|
||||||
|
would prefix every pr_*() message in that file with the module and function name
|
||||||
|
that originated the message.
|
||||||
|
|
||||||
|
For debugging purposes there are also two conditionally-compiled macros:
|
||||||
|
pr_debug() and pr_devel(), which are compiled-out unless ``DEBUG`` (or
|
||||||
|
also ``CONFIG_DYNAMIC_DEBUG`` in the case of pr_debug()) is defined.
|
||||||
|
|
||||||
|
|
||||||
|
Function reference
|
||||||
|
==================
|
||||||
|
|
||||||
|
.. kernel-doc:: kernel/printk/printk.c
|
||||||
|
:functions: printk
|
||||||
|
|
||||||
|
.. kernel-doc:: include/linux/printk.h
|
||||||
|
:functions: pr_emerg pr_alert pr_crit pr_err pr_warn pr_notice pr_info
|
||||||
|
pr_fmt pr_debug pr_devel pr_cont
|
@ -2,6 +2,8 @@
|
|||||||
How to get printk format specifiers right
|
How to get printk format specifiers right
|
||||||
=========================================
|
=========================================
|
||||||
|
|
||||||
|
.. _printk-specifiers:
|
||||||
|
|
||||||
:Author: Randy Dunlap <rdunlap@infradead.org>
|
:Author: Randy Dunlap <rdunlap@infradead.org>
|
||||||
:Author: Andrew Murray <amurray@mpc-data.co.uk>
|
:Author: Andrew Murray <amurray@mpc-data.co.uk>
|
||||||
|
|
||||||
|
@ -6,7 +6,7 @@ Documentation subsystem maintainer entry profile
|
|||||||
The documentation "subsystem" is the central coordinating point for the
|
The documentation "subsystem" is the central coordinating point for the
|
||||||
kernel's documentation and associated infrastructure. It covers the
|
kernel's documentation and associated infrastructure. It covers the
|
||||||
hierarchy under Documentation/ (with the exception of
|
hierarchy under Documentation/ (with the exception of
|
||||||
Documentation/device-tree), various utilities under scripts/ and, at least
|
Documentation/devicetree), various utilities under scripts/ and, at least
|
||||||
some of the time, LICENSES/.
|
some of the time, LICENSES/.
|
||||||
|
|
||||||
It's worth noting, though, that the boundaries of this subsystem are rather
|
It's worth noting, though, that the boundaries of this subsystem are rather
|
||||||
|
@ -11,7 +11,7 @@ course not limited to GPU use cases.
|
|||||||
The three main components of this are: (1) dma-buf, representing a
|
The three main components of this are: (1) dma-buf, representing a
|
||||||
sg_table and exposed to userspace as a file descriptor to allow passing
|
sg_table and exposed to userspace as a file descriptor to allow passing
|
||||||
between devices, (2) fence, which provides a mechanism to signal when
|
between devices, (2) fence, which provides a mechanism to signal when
|
||||||
one device as finished access, and (3) reservation, which manages the
|
one device has finished access, and (3) reservation, which manages the
|
||||||
shared or exclusive fence(s) associated with the buffer.
|
shared or exclusive fence(s) associated with the buffer.
|
||||||
|
|
||||||
Shared DMA Buffers
|
Shared DMA Buffers
|
||||||
@ -31,7 +31,7 @@ The exporter
|
|||||||
- implements and manages operations in :c:type:`struct dma_buf_ops
|
- implements and manages operations in :c:type:`struct dma_buf_ops
|
||||||
<dma_buf_ops>` for the buffer,
|
<dma_buf_ops>` for the buffer,
|
||||||
- allows other users to share the buffer by using dma_buf sharing APIs,
|
- allows other users to share the buffer by using dma_buf sharing APIs,
|
||||||
- manages the details of buffer allocation, wrapped int a :c:type:`struct
|
- manages the details of buffer allocation, wrapped in a :c:type:`struct
|
||||||
dma_buf <dma_buf>`,
|
dma_buf <dma_buf>`,
|
||||||
- decides about the actual backing storage where this allocation happens,
|
- decides about the actual backing storage where this allocation happens,
|
||||||
- and takes care of any migration of scatterlist - for all (shared) users of
|
- and takes care of any migration of scatterlist - for all (shared) users of
|
||||||
|
@ -50,10 +50,10 @@ Attributes
|
|||||||
|
|
||||||
Attributes of devices can be exported by a device driver through sysfs.
|
Attributes of devices can be exported by a device driver through sysfs.
|
||||||
|
|
||||||
Please see Documentation/filesystems/sysfs.txt for more information
|
Please see Documentation/filesystems/sysfs.rst for more information
|
||||||
on how sysfs works.
|
on how sysfs works.
|
||||||
|
|
||||||
As explained in Documentation/kobject.txt, device attributes must be
|
As explained in Documentation/core-api/kobject.rst, device attributes must be
|
||||||
created before the KOBJ_ADD uevent is generated. The only way to realize
|
created before the KOBJ_ADD uevent is generated. The only way to realize
|
||||||
that is by defining an attribute group.
|
that is by defining an attribute group.
|
||||||
|
|
||||||
|
@ -121,4 +121,4 @@ device-specific data or tunable interfaces.
|
|||||||
|
|
||||||
More information about the sysfs directory layout can be found in
|
More information about the sysfs directory layout can be found in
|
||||||
the other documents in this directory and in the file
|
the other documents in this directory and in the file
|
||||||
Documentation/filesystems/sysfs.txt.
|
Documentation/filesystems/sysfs.rst.
|
||||||
|
@ -39,6 +39,7 @@ available subsections can be seen below.
|
|||||||
spi
|
spi
|
||||||
i2c
|
i2c
|
||||||
ipmb
|
ipmb
|
||||||
|
ipmi
|
||||||
i3c/index
|
i3c/index
|
||||||
interconnect
|
interconnect
|
||||||
devfreq
|
devfreq
|
||||||
|
@ -278,8 +278,8 @@ by a region device with a dynamically assigned id (REGION0 - REGION5).
|
|||||||
be contiguous in DPA-space.
|
be contiguous in DPA-space.
|
||||||
|
|
||||||
This bus is provided by the kernel under the device
|
This bus is provided by the kernel under the device
|
||||||
/sys/devices/platform/nfit_test.0 when CONFIG_NFIT_TEST is enabled and
|
/sys/devices/platform/nfit_test.0 when the nfit_test.ko module from
|
||||||
the nfit_test.ko module is loaded. This not only test LIBNVDIMM but the
|
tools/testing/nvdimm is loaded. This not only test LIBNVDIMM but the
|
||||||
acpi_nfit.ko driver as well.
|
acpi_nfit.ko driver as well.
|
||||||
|
|
||||||
|
|
||||||
|
@ -1,3 +1,6 @@
|
|||||||
|
================
|
||||||
|
CPU Idle Cooling
|
||||||
|
================
|
||||||
|
|
||||||
Situation:
|
Situation:
|
||||||
----------
|
----------
|
||||||
|
@ -8,6 +8,7 @@ Thermal
|
|||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
|
||||||
cpu-cooling-api
|
cpu-cooling-api
|
||||||
|
cpu-idle-cooling
|
||||||
sysfs-api
|
sysfs-api
|
||||||
power_allocator
|
power_allocator
|
||||||
|
|
||||||
|
@ -23,7 +23,7 @@
|
|||||||
| openrisc: | TODO |
|
| openrisc: | TODO |
|
||||||
| parisc: | TODO |
|
| parisc: | TODO |
|
||||||
| powerpc: | ok |
|
| powerpc: | ok |
|
||||||
| riscv: | TODO |
|
| riscv: | ok |
|
||||||
| s390: | ok |
|
| s390: | ok |
|
||||||
| sh: | TODO |
|
| sh: | TODO |
|
||||||
| sparc: | ok |
|
| sparc: | ok |
|
||||||
|
@ -22,9 +22,9 @@
|
|||||||
| nios2: | TODO |
|
| nios2: | TODO |
|
||||||
| openrisc: | TODO |
|
| openrisc: | TODO |
|
||||||
| parisc: | TODO |
|
| parisc: | TODO |
|
||||||
| powerpc: | TODO |
|
| powerpc: | ok |
|
||||||
| riscv: | TODO |
|
| riscv: | ok |
|
||||||
| s390: | TODO |
|
| s390: | ok |
|
||||||
| sh: | TODO |
|
| sh: | TODO |
|
||||||
| sparc: | TODO |
|
| sparc: | TODO |
|
||||||
| um: | TODO |
|
| um: | TODO |
|
||||||
|
@ -11,7 +11,7 @@
|
|||||||
| arm: | ok |
|
| arm: | ok |
|
||||||
| arm64: | ok |
|
| arm64: | ok |
|
||||||
| c6x: | TODO |
|
| c6x: | TODO |
|
||||||
| csky: | TODO |
|
| csky: | ok |
|
||||||
| h8300: | TODO |
|
| h8300: | TODO |
|
||||||
| hexagon: | TODO |
|
| hexagon: | TODO |
|
||||||
| ia64: | TODO |
|
| ia64: | TODO |
|
||||||
|
@ -11,7 +11,7 @@
|
|||||||
| arm: | TODO |
|
| arm: | TODO |
|
||||||
| arm64: | TODO |
|
| arm64: | TODO |
|
||||||
| c6x: | TODO |
|
| c6x: | TODO |
|
||||||
| csky: | TODO |
|
| csky: | ok |
|
||||||
| h8300: | TODO |
|
| h8300: | TODO |
|
||||||
| hexagon: | TODO |
|
| hexagon: | TODO |
|
||||||
| ia64: | TODO |
|
| ia64: | TODO |
|
||||||
|
@ -11,7 +11,7 @@
|
|||||||
| arm: | ok |
|
| arm: | ok |
|
||||||
| arm64: | ok |
|
| arm64: | ok |
|
||||||
| c6x: | TODO |
|
| c6x: | TODO |
|
||||||
| csky: | TODO |
|
| csky: | ok |
|
||||||
| h8300: | TODO |
|
| h8300: | TODO |
|
||||||
| hexagon: | TODO |
|
| hexagon: | TODO |
|
||||||
| ia64: | ok |
|
| ia64: | ok |
|
||||||
@ -23,7 +23,7 @@
|
|||||||
| openrisc: | TODO |
|
| openrisc: | TODO |
|
||||||
| parisc: | ok |
|
| parisc: | ok |
|
||||||
| powerpc: | ok |
|
| powerpc: | ok |
|
||||||
| riscv: | ok |
|
| riscv: | TODO |
|
||||||
| s390: | ok |
|
| s390: | ok |
|
||||||
| sh: | ok |
|
| sh: | ok |
|
||||||
| sparc: | ok |
|
| sparc: | ok |
|
||||||
|
@ -11,7 +11,7 @@
|
|||||||
| arm: | ok |
|
| arm: | ok |
|
||||||
| arm64: | ok |
|
| arm64: | ok |
|
||||||
| c6x: | TODO |
|
| c6x: | TODO |
|
||||||
| csky: | TODO |
|
| csky: | ok |
|
||||||
| h8300: | TODO |
|
| h8300: | TODO |
|
||||||
| hexagon: | TODO |
|
| hexagon: | TODO |
|
||||||
| ia64: | ok |
|
| ia64: | ok |
|
||||||
|
@ -11,7 +11,7 @@
|
|||||||
| arm: | ok |
|
| arm: | ok |
|
||||||
| arm64: | ok |
|
| arm64: | ok |
|
||||||
| c6x: | TODO |
|
| c6x: | TODO |
|
||||||
| csky: | TODO |
|
| csky: | ok |
|
||||||
| h8300: | TODO |
|
| h8300: | TODO |
|
||||||
| hexagon: | TODO |
|
| hexagon: | TODO |
|
||||||
| ia64: | TODO |
|
| ia64: | TODO |
|
||||||
|
@ -11,7 +11,7 @@
|
|||||||
| arm: | ok |
|
| arm: | ok |
|
||||||
| arm64: | ok |
|
| arm64: | ok |
|
||||||
| c6x: | TODO |
|
| c6x: | TODO |
|
||||||
| csky: | TODO |
|
| csky: | ok |
|
||||||
| h8300: | TODO |
|
| h8300: | TODO |
|
||||||
| hexagon: | TODO |
|
| hexagon: | TODO |
|
||||||
| ia64: | TODO |
|
| ia64: | TODO |
|
||||||
|
@ -16,7 +16,7 @@
|
|||||||
| hexagon: | TODO |
|
| hexagon: | TODO |
|
||||||
| ia64: | TODO |
|
| ia64: | TODO |
|
||||||
| m68k: | TODO |
|
| m68k: | TODO |
|
||||||
| microblaze: | TODO |
|
| microblaze: | ok |
|
||||||
| mips: | ok |
|
| mips: | ok |
|
||||||
| nds32: | TODO |
|
| nds32: | TODO |
|
||||||
| nios2: | TODO |
|
| nios2: | TODO |
|
||||||
|
@ -11,7 +11,7 @@
|
|||||||
| arm: | ok |
|
| arm: | ok |
|
||||||
| arm64: | ok |
|
| arm64: | ok |
|
||||||
| c6x: | TODO |
|
| c6x: | TODO |
|
||||||
| csky: | TODO |
|
| csky: | ok |
|
||||||
| h8300: | TODO |
|
| h8300: | TODO |
|
||||||
| hexagon: | ok |
|
| hexagon: | ok |
|
||||||
| ia64: | TODO |
|
| ia64: | TODO |
|
||||||
|
@ -11,7 +11,7 @@
|
|||||||
| arm: | ok |
|
| arm: | ok |
|
||||||
| arm64: | ok |
|
| arm64: | ok |
|
||||||
| c6x: | TODO |
|
| c6x: | TODO |
|
||||||
| csky: | TODO |
|
| csky: | ok |
|
||||||
| h8300: | TODO |
|
| h8300: | TODO |
|
||||||
| hexagon: | ok |
|
| hexagon: | ok |
|
||||||
| ia64: | TODO |
|
| ia64: | TODO |
|
||||||
@ -21,7 +21,7 @@
|
|||||||
| nds32: | ok |
|
| nds32: | ok |
|
||||||
| nios2: | TODO |
|
| nios2: | TODO |
|
||||||
| openrisc: | TODO |
|
| openrisc: | TODO |
|
||||||
| parisc: | TODO |
|
| parisc: | ok |
|
||||||
| powerpc: | ok |
|
| powerpc: | ok |
|
||||||
| riscv: | TODO |
|
| riscv: | TODO |
|
||||||
| s390: | ok |
|
| s390: | ok |
|
||||||
|
@ -11,7 +11,7 @@
|
|||||||
| arm: | ok |
|
| arm: | ok |
|
||||||
| arm64: | ok |
|
| arm64: | ok |
|
||||||
| c6x: | TODO |
|
| c6x: | TODO |
|
||||||
| csky: | TODO |
|
| csky: | ok |
|
||||||
| h8300: | TODO |
|
| h8300: | TODO |
|
||||||
| hexagon: | TODO |
|
| hexagon: | TODO |
|
||||||
| ia64: | TODO |
|
| ia64: | TODO |
|
||||||
@ -23,7 +23,7 @@
|
|||||||
| openrisc: | TODO |
|
| openrisc: | TODO |
|
||||||
| parisc: | TODO |
|
| parisc: | TODO |
|
||||||
| powerpc: | ok |
|
| powerpc: | ok |
|
||||||
| riscv: | TODO |
|
| riscv: | ok |
|
||||||
| s390: | ok |
|
| s390: | ok |
|
||||||
| sh: | TODO |
|
| sh: | TODO |
|
||||||
| sparc: | TODO |
|
| sparc: | TODO |
|
||||||
|
@ -11,7 +11,7 @@
|
|||||||
| arm: | ok |
|
| arm: | ok |
|
||||||
| arm64: | ok |
|
| arm64: | ok |
|
||||||
| c6x: | TODO |
|
| c6x: | TODO |
|
||||||
| csky: | TODO |
|
| csky: | ok |
|
||||||
| h8300: | TODO |
|
| h8300: | TODO |
|
||||||
| hexagon: | TODO |
|
| hexagon: | TODO |
|
||||||
| ia64: | TODO |
|
| ia64: | TODO |
|
||||||
@ -23,7 +23,7 @@
|
|||||||
| openrisc: | TODO |
|
| openrisc: | TODO |
|
||||||
| parisc: | TODO |
|
| parisc: | TODO |
|
||||||
| powerpc: | ok |
|
| powerpc: | ok |
|
||||||
| riscv: | TODO |
|
| riscv: | ok |
|
||||||
| s390: | ok |
|
| s390: | ok |
|
||||||
| sh: | TODO |
|
| sh: | TODO |
|
||||||
| sparc: | TODO |
|
| sparc: | TODO |
|
||||||
|
@ -23,7 +23,7 @@
|
|||||||
| openrisc: | TODO |
|
| openrisc: | TODO |
|
||||||
| parisc: | ok |
|
| parisc: | ok |
|
||||||
| powerpc: | ok |
|
| powerpc: | ok |
|
||||||
| riscv: | TODO |
|
| riscv: | ok |
|
||||||
| s390: | ok |
|
| s390: | ok |
|
||||||
| sh: | TODO |
|
| sh: | TODO |
|
||||||
| sparc: | TODO |
|
| sparc: | TODO |
|
||||||
|
@ -22,7 +22,7 @@
|
|||||||
| nios2: | TODO |
|
| nios2: | TODO |
|
||||||
| openrisc: | TODO |
|
| openrisc: | TODO |
|
||||||
| parisc: | TODO |
|
| parisc: | TODO |
|
||||||
| powerpc: | TODO |
|
| powerpc: | ok |
|
||||||
| riscv: | TODO |
|
| riscv: | TODO |
|
||||||
| s390: | TODO |
|
| s390: | TODO |
|
||||||
| sh: | TODO |
|
| sh: | TODO |
|
||||||
|
@ -17,7 +17,7 @@
|
|||||||
| ia64: | TODO |
|
| ia64: | TODO |
|
||||||
| m68k: | TODO |
|
| m68k: | TODO |
|
||||||
| microblaze: | TODO |
|
| microblaze: | TODO |
|
||||||
| mips: | TODO |
|
| mips: | ok |
|
||||||
| nds32: | TODO |
|
| nds32: | TODO |
|
||||||
| nios2: | TODO |
|
| nios2: | TODO |
|
||||||
| openrisc: | TODO |
|
| openrisc: | TODO |
|
||||||
|
@ -192,4 +192,4 @@ For more information on the Plan 9 Operating System check out
|
|||||||
http://plan9.bell-labs.com/plan9
|
http://plan9.bell-labs.com/plan9
|
||||||
|
|
||||||
For information on Plan 9 from User Space (Plan 9 applications and libraries
|
For information on Plan 9 from User Space (Plan 9 applications and libraries
|
||||||
ported to Linux/BSD/OSX/etc) check out http://swtch.com/plan9
|
ported to Linux/BSD/OSX/etc) check out https://9fans.github.io/plan9port/
|
||||||
|
@ -1,3 +1,10 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
=================
|
||||||
|
Automount Support
|
||||||
|
=================
|
||||||
|
|
||||||
|
|
||||||
Support is available for filesystems that wish to do automounting
|
Support is available for filesystems that wish to do automounting
|
||||||
support (such as kAFS which can be found in fs/afs/ and NFS in
|
support (such as kAFS which can be found in fs/afs/ and NFS in
|
||||||
fs/nfs/). This facility includes allowing in-kernel mounts to be
|
fs/nfs/). This facility includes allowing in-kernel mounts to be
|
||||||
@ -5,13 +12,12 @@ performed and mountpoint degradation to be requested. The latter can
|
|||||||
also be requested by userspace.
|
also be requested by userspace.
|
||||||
|
|
||||||
|
|
||||||
======================
|
In-Kernel Automounting
|
||||||
IN-KERNEL AUTOMOUNTING
|
|
||||||
======================
|
======================
|
||||||
|
|
||||||
See section "Mount Traps" of Documentation/filesystems/autofs.rst
|
See section "Mount Traps" of Documentation/filesystems/autofs.rst
|
||||||
|
|
||||||
Then from userspace, you can just do something like:
|
Then from userspace, you can just do something like::
|
||||||
|
|
||||||
[root@andromeda root]# mount -t afs \#root.afs. /afs
|
[root@andromeda root]# mount -t afs \#root.afs. /afs
|
||||||
[root@andromeda root]# ls /afs
|
[root@andromeda root]# ls /afs
|
||||||
@ -21,7 +27,7 @@ Then from userspace, you can just do something like:
|
|||||||
[root@andromeda root]# ls /afs/cambridge/afsdoc/
|
[root@andromeda root]# ls /afs/cambridge/afsdoc/
|
||||||
ChangeLog html LICENSE pdf RELNOTES-1.2.2
|
ChangeLog html LICENSE pdf RELNOTES-1.2.2
|
||||||
|
|
||||||
And then if you look in the mountpoint catalogue, you'll see something like:
|
And then if you look in the mountpoint catalogue, you'll see something like::
|
||||||
|
|
||||||
[root@andromeda root]# cat /proc/mounts
|
[root@andromeda root]# cat /proc/mounts
|
||||||
...
|
...
|
||||||
@ -30,8 +36,7 @@ And then if you look in the mountpoint catalogue, you'll see something like:
|
|||||||
#afsdoc. /afs/cambridge.redhat.com/afsdoc afs rw 0 0
|
#afsdoc. /afs/cambridge.redhat.com/afsdoc afs rw 0 0
|
||||||
|
|
||||||
|
|
||||||
===========================
|
Automatic Mountpoint Expiry
|
||||||
AUTOMATIC MOUNTPOINT EXPIRY
|
|
||||||
===========================
|
===========================
|
||||||
|
|
||||||
Automatic expiration of mountpoints is easy, provided you've mounted the
|
Automatic expiration of mountpoints is easy, provided you've mounted the
|
||||||
@ -43,7 +48,8 @@ To do expiration, you need to follow these steps:
|
|||||||
hung.
|
hung.
|
||||||
|
|
||||||
(2) When a new mountpoint is created in the ->d_automount method, add
|
(2) When a new mountpoint is created in the ->d_automount method, add
|
||||||
the mnt to the list using mnt_set_expiry()
|
the mnt to the list using mnt_set_expiry()::
|
||||||
|
|
||||||
mnt_set_expiry(newmnt, &afs_vfsmounts);
|
mnt_set_expiry(newmnt, &afs_vfsmounts);
|
||||||
|
|
||||||
(3) When you want mountpoints to be expired, call mark_mounts_for_expiry()
|
(3) When you want mountpoints to be expired, call mark_mounts_for_expiry()
|
||||||
@ -70,8 +76,7 @@ and the copies of those that are on an expiration list will be added to the
|
|||||||
same expiration list.
|
same expiration list.
|
||||||
|
|
||||||
|
|
||||||
=======================
|
Userspace Driven Expiry
|
||||||
USERSPACE DRIVEN EXPIRY
|
|
||||||
=======================
|
=======================
|
||||||
|
|
||||||
As an alternative, it is possible for userspace to request expiry of any
|
As an alternative, it is possible for userspace to request expiry of any
|
@ -1,6 +1,8 @@
|
|||||||
==========================
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
FS-CACHE CACHE BACKEND API
|
|
||||||
==========================
|
==========================
|
||||||
|
FS-Cache Cache backend API
|
||||||
|
==========================
|
||||||
|
|
||||||
The FS-Cache system provides an API by which actual caches can be supplied to
|
The FS-Cache system provides an API by which actual caches can be supplied to
|
||||||
FS-Cache for it to then serve out to network filesystems and other interested
|
FS-Cache for it to then serve out to network filesystems and other interested
|
||||||
@ -9,15 +11,14 @@ parties.
|
|||||||
This API is declared in <linux/fscache-cache.h>.
|
This API is declared in <linux/fscache-cache.h>.
|
||||||
|
|
||||||
|
|
||||||
====================================
|
Initialising and Registering a Cache
|
||||||
INITIALISING AND REGISTERING A CACHE
|
|
||||||
====================================
|
====================================
|
||||||
|
|
||||||
To start off, a cache definition must be initialised and registered for each
|
To start off, a cache definition must be initialised and registered for each
|
||||||
cache the backend wants to make available. For instance, CacheFS does this in
|
cache the backend wants to make available. For instance, CacheFS does this in
|
||||||
the fill_super() operation on mounting.
|
the fill_super() operation on mounting.
|
||||||
|
|
||||||
The cache definition (struct fscache_cache) should be initialised by calling:
|
The cache definition (struct fscache_cache) should be initialised by calling::
|
||||||
|
|
||||||
void fscache_init_cache(struct fscache_cache *cache,
|
void fscache_init_cache(struct fscache_cache *cache,
|
||||||
struct fscache_cache_ops *ops,
|
struct fscache_cache_ops *ops,
|
||||||
@ -26,17 +27,17 @@ The cache definition (struct fscache_cache) should be initialised by calling:
|
|||||||
|
|
||||||
Where:
|
Where:
|
||||||
|
|
||||||
(*) "cache" is a pointer to the cache definition;
|
* "cache" is a pointer to the cache definition;
|
||||||
|
|
||||||
(*) "ops" is a pointer to the table of operations that the backend supports on
|
* "ops" is a pointer to the table of operations that the backend supports on
|
||||||
this cache; and
|
this cache; and
|
||||||
|
|
||||||
(*) "idfmt" is a format and printf-style arguments for constructing a label
|
* "idfmt" is a format and printf-style arguments for constructing a label
|
||||||
for the cache.
|
for the cache.
|
||||||
|
|
||||||
|
|
||||||
The cache should then be registered with FS-Cache by passing a pointer to the
|
The cache should then be registered with FS-Cache by passing a pointer to the
|
||||||
previously initialised cache definition to:
|
previously initialised cache definition to::
|
||||||
|
|
||||||
int fscache_add_cache(struct fscache_cache *cache,
|
int fscache_add_cache(struct fscache_cache *cache,
|
||||||
struct fscache_object *fsdef,
|
struct fscache_object *fsdef,
|
||||||
@ -44,12 +45,12 @@ previously initialised cache definition to:
|
|||||||
|
|
||||||
Two extra arguments should also be supplied:
|
Two extra arguments should also be supplied:
|
||||||
|
|
||||||
(*) "fsdef" which should point to the object representation for the FS-Cache
|
* "fsdef" which should point to the object representation for the FS-Cache
|
||||||
master index in this cache. Netfs primary index entries will be created
|
master index in this cache. Netfs primary index entries will be created
|
||||||
here. FS-Cache keeps the caller's reference to the index object if
|
here. FS-Cache keeps the caller's reference to the index object if
|
||||||
successful and will release it upon withdrawal of the cache.
|
successful and will release it upon withdrawal of the cache.
|
||||||
|
|
||||||
(*) "tagname" which, if given, should be a text string naming this cache. If
|
* "tagname" which, if given, should be a text string naming this cache. If
|
||||||
this is NULL, the identifier will be used instead. For CacheFS, the
|
this is NULL, the identifier will be used instead. For CacheFS, the
|
||||||
identifier is set to name the underlying block device and the tag can be
|
identifier is set to name the underlying block device and the tag can be
|
||||||
supplied by mount.
|
supplied by mount.
|
||||||
@ -58,20 +59,18 @@ This function may return -ENOMEM if it ran out of memory or -EEXIST if the tag
|
|||||||
is already in use. 0 will be returned on success.
|
is already in use. 0 will be returned on success.
|
||||||
|
|
||||||
|
|
||||||
=====================
|
Unregistering a Cache
|
||||||
UNREGISTERING A CACHE
|
|
||||||
=====================
|
=====================
|
||||||
|
|
||||||
A cache can be withdrawn from the system by calling this function with a
|
A cache can be withdrawn from the system by calling this function with a
|
||||||
pointer to the cache definition:
|
pointer to the cache definition::
|
||||||
|
|
||||||
void fscache_withdraw_cache(struct fscache_cache *cache);
|
void fscache_withdraw_cache(struct fscache_cache *cache);
|
||||||
|
|
||||||
In CacheFS's case, this is called by put_super().
|
In CacheFS's case, this is called by put_super().
|
||||||
|
|
||||||
|
|
||||||
========
|
Security
|
||||||
SECURITY
|
|
||||||
========
|
========
|
||||||
|
|
||||||
The cache methods are executed one of two contexts:
|
The cache methods are executed one of two contexts:
|
||||||
@ -89,8 +88,7 @@ be masqueraded for the duration of the cache driver's access to the cache.
|
|||||||
This is left to the cache to handle; FS-Cache makes no effort in this regard.
|
This is left to the cache to handle; FS-Cache makes no effort in this regard.
|
||||||
|
|
||||||
|
|
||||||
===================================
|
Control and Statistics Presentation
|
||||||
CONTROL AND STATISTICS PRESENTATION
|
|
||||||
===================================
|
===================================
|
||||||
|
|
||||||
The cache may present data to the outside world through FS-Cache's interfaces
|
The cache may present data to the outside world through FS-Cache's interfaces
|
||||||
@ -101,11 +99,10 @@ is enabled. This is accessible through the kobject struct fscache_cache::kobj
|
|||||||
and is for use by the cache as it sees fit.
|
and is for use by the cache as it sees fit.
|
||||||
|
|
||||||
|
|
||||||
========================
|
Relevant Data Structures
|
||||||
RELEVANT DATA STRUCTURES
|
|
||||||
========================
|
========================
|
||||||
|
|
||||||
(*) Index/Data file FS-Cache representation cookie:
|
* Index/Data file FS-Cache representation cookie::
|
||||||
|
|
||||||
struct fscache_cookie {
|
struct fscache_cookie {
|
||||||
struct fscache_object_def *def;
|
struct fscache_object_def *def;
|
||||||
@ -121,7 +118,7 @@ RELEVANT DATA STRUCTURES
|
|||||||
cache operations.
|
cache operations.
|
||||||
|
|
||||||
|
|
||||||
(*) In-cache object representation:
|
* In-cache object representation::
|
||||||
|
|
||||||
struct fscache_object {
|
struct fscache_object {
|
||||||
int debug_id;
|
int debug_id;
|
||||||
@ -150,7 +147,7 @@ RELEVANT DATA STRUCTURES
|
|||||||
initialised by calling fscache_object_init(object).
|
initialised by calling fscache_object_init(object).
|
||||||
|
|
||||||
|
|
||||||
(*) FS-Cache operation record:
|
* FS-Cache operation record::
|
||||||
|
|
||||||
struct fscache_operation {
|
struct fscache_operation {
|
||||||
atomic_t usage;
|
atomic_t usage;
|
||||||
@ -173,7 +170,7 @@ RELEVANT DATA STRUCTURES
|
|||||||
an operation needs more processing time, it should be enqueued again.
|
an operation needs more processing time, it should be enqueued again.
|
||||||
|
|
||||||
|
|
||||||
(*) FS-Cache retrieval operation record:
|
* FS-Cache retrieval operation record::
|
||||||
|
|
||||||
struct fscache_retrieval {
|
struct fscache_retrieval {
|
||||||
struct fscache_operation op;
|
struct fscache_operation op;
|
||||||
@ -198,7 +195,7 @@ RELEVANT DATA STRUCTURES
|
|||||||
it sees fit.
|
it sees fit.
|
||||||
|
|
||||||
|
|
||||||
(*) FS-Cache storage operation record:
|
* FS-Cache storage operation record::
|
||||||
|
|
||||||
struct fscache_storage {
|
struct fscache_storage {
|
||||||
struct fscache_operation op;
|
struct fscache_operation op;
|
||||||
@ -212,16 +209,17 @@ RELEVANT DATA STRUCTURES
|
|||||||
storage.
|
storage.
|
||||||
|
|
||||||
|
|
||||||
================
|
Cache Operations
|
||||||
CACHE OPERATIONS
|
|
||||||
================
|
================
|
||||||
|
|
||||||
The cache backend provides FS-Cache with a table of operations that can be
|
The cache backend provides FS-Cache with a table of operations that can be
|
||||||
performed on the denizens of the cache. These are held in a structure of type:
|
performed on the denizens of the cache. These are held in a structure of type:
|
||||||
|
|
||||||
struct fscache_cache_ops
|
::
|
||||||
|
|
||||||
(*) Name of cache provider [mandatory]:
|
struct fscache_cache_ops
|
||||||
|
|
||||||
|
* Name of cache provider [mandatory]::
|
||||||
|
|
||||||
const char *name
|
const char *name
|
||||||
|
|
||||||
@ -229,7 +227,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
the backend.
|
the backend.
|
||||||
|
|
||||||
|
|
||||||
(*) Allocate a new object [mandatory]:
|
* Allocate a new object [mandatory]::
|
||||||
|
|
||||||
struct fscache_object *(*alloc_object)(struct fscache_cache *cache,
|
struct fscache_object *(*alloc_object)(struct fscache_cache *cache,
|
||||||
struct fscache_cookie *cookie)
|
struct fscache_cookie *cookie)
|
||||||
@ -244,7 +242,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
form once lookup is complete or aborted.
|
form once lookup is complete or aborted.
|
||||||
|
|
||||||
|
|
||||||
(*) Look up and create object [mandatory]:
|
* Look up and create object [mandatory]::
|
||||||
|
|
||||||
void (*lookup_object)(struct fscache_object *object)
|
void (*lookup_object)(struct fscache_object *object)
|
||||||
|
|
||||||
@ -263,7 +261,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
to abort the lookup of that object.
|
to abort the lookup of that object.
|
||||||
|
|
||||||
|
|
||||||
(*) Release lookup data [mandatory]:
|
* Release lookup data [mandatory]::
|
||||||
|
|
||||||
void (*lookup_complete)(struct fscache_object *object)
|
void (*lookup_complete)(struct fscache_object *object)
|
||||||
|
|
||||||
@ -271,7 +269,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
using to perform a lookup.
|
using to perform a lookup.
|
||||||
|
|
||||||
|
|
||||||
(*) Increment object refcount [mandatory]:
|
* Increment object refcount [mandatory]::
|
||||||
|
|
||||||
struct fscache_object *(*grab_object)(struct fscache_object *object)
|
struct fscache_object *(*grab_object)(struct fscache_object *object)
|
||||||
|
|
||||||
@ -280,7 +278,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
It should return the object pointer if successful.
|
It should return the object pointer if successful.
|
||||||
|
|
||||||
|
|
||||||
(*) Lock/Unlock object [mandatory]:
|
* Lock/Unlock object [mandatory]::
|
||||||
|
|
||||||
void (*lock_object)(struct fscache_object *object)
|
void (*lock_object)(struct fscache_object *object)
|
||||||
void (*unlock_object)(struct fscache_object *object)
|
void (*unlock_object)(struct fscache_object *object)
|
||||||
@ -289,7 +287,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
to schedule with the lock held, so a spinlock isn't sufficient.
|
to schedule with the lock held, so a spinlock isn't sufficient.
|
||||||
|
|
||||||
|
|
||||||
(*) Pin/Unpin object [optional]:
|
* Pin/Unpin object [optional]::
|
||||||
|
|
||||||
int (*pin_object)(struct fscache_object *object)
|
int (*pin_object)(struct fscache_object *object)
|
||||||
void (*unpin_object)(struct fscache_object *object)
|
void (*unpin_object)(struct fscache_object *object)
|
||||||
@ -299,7 +297,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
enough space in the cache to permit this.
|
enough space in the cache to permit this.
|
||||||
|
|
||||||
|
|
||||||
(*) Check coherency state of an object [mandatory]:
|
* Check coherency state of an object [mandatory]::
|
||||||
|
|
||||||
int (*check_consistency)(struct fscache_object *object)
|
int (*check_consistency)(struct fscache_object *object)
|
||||||
|
|
||||||
@ -308,7 +306,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
if they're consistent and -ESTALE otherwise. -ENOMEM and -ERESTARTSYS
|
if they're consistent and -ESTALE otherwise. -ENOMEM and -ERESTARTSYS
|
||||||
may also be returned.
|
may also be returned.
|
||||||
|
|
||||||
(*) Update object [mandatory]:
|
* Update object [mandatory]::
|
||||||
|
|
||||||
int (*update_object)(struct fscache_object *object)
|
int (*update_object)(struct fscache_object *object)
|
||||||
|
|
||||||
@ -317,7 +315,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
obtained by calling object->cookie->def->get_aux()/get_attr().
|
obtained by calling object->cookie->def->get_aux()/get_attr().
|
||||||
|
|
||||||
|
|
||||||
(*) Invalidate data object [mandatory]:
|
* Invalidate data object [mandatory]::
|
||||||
|
|
||||||
int (*invalidate_object)(struct fscache_operation *op)
|
int (*invalidate_object)(struct fscache_operation *op)
|
||||||
|
|
||||||
@ -329,7 +327,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
fscache_op_complete() must be called on op before returning.
|
fscache_op_complete() must be called on op before returning.
|
||||||
|
|
||||||
|
|
||||||
(*) Discard object [mandatory]:
|
* Discard object [mandatory]::
|
||||||
|
|
||||||
void (*drop_object)(struct fscache_object *object)
|
void (*drop_object)(struct fscache_object *object)
|
||||||
|
|
||||||
@ -341,7 +339,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
caller. The caller will invoke the put_object() method as appropriate.
|
caller. The caller will invoke the put_object() method as appropriate.
|
||||||
|
|
||||||
|
|
||||||
(*) Release object reference [mandatory]:
|
* Release object reference [mandatory]::
|
||||||
|
|
||||||
void (*put_object)(struct fscache_object *object)
|
void (*put_object)(struct fscache_object *object)
|
||||||
|
|
||||||
@ -349,7 +347,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
be freed when all the references to it are released.
|
be freed when all the references to it are released.
|
||||||
|
|
||||||
|
|
||||||
(*) Synchronise a cache [mandatory]:
|
* Synchronise a cache [mandatory]::
|
||||||
|
|
||||||
void (*sync)(struct fscache_cache *cache)
|
void (*sync)(struct fscache_cache *cache)
|
||||||
|
|
||||||
@ -357,7 +355,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
device.
|
device.
|
||||||
|
|
||||||
|
|
||||||
(*) Dissociate a cache [mandatory]:
|
* Dissociate a cache [mandatory]::
|
||||||
|
|
||||||
void (*dissociate_pages)(struct fscache_cache *cache)
|
void (*dissociate_pages)(struct fscache_cache *cache)
|
||||||
|
|
||||||
@ -365,7 +363,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
cache withdrawal.
|
cache withdrawal.
|
||||||
|
|
||||||
|
|
||||||
(*) Notification that the attributes on a netfs file changed [mandatory]:
|
* Notification that the attributes on a netfs file changed [mandatory]::
|
||||||
|
|
||||||
int (*attr_changed)(struct fscache_object *object);
|
int (*attr_changed)(struct fscache_object *object);
|
||||||
|
|
||||||
@ -386,7 +384,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
execution of this operation.
|
execution of this operation.
|
||||||
|
|
||||||
|
|
||||||
(*) Reserve cache space for an object's data [optional]:
|
* Reserve cache space for an object's data [optional]::
|
||||||
|
|
||||||
int (*reserve_space)(struct fscache_object *object, loff_t size);
|
int (*reserve_space)(struct fscache_object *object, loff_t size);
|
||||||
|
|
||||||
@ -404,7 +402,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
size if larger than that already.
|
size if larger than that already.
|
||||||
|
|
||||||
|
|
||||||
(*) Request page be read from cache [mandatory]:
|
* Request page be read from cache [mandatory]::
|
||||||
|
|
||||||
int (*read_or_alloc_page)(struct fscache_retrieval *op,
|
int (*read_or_alloc_page)(struct fscache_retrieval *op,
|
||||||
struct page *page,
|
struct page *page,
|
||||||
@ -446,7 +444,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
with. This will complete the operation when all pages are dealt with.
|
with. This will complete the operation when all pages are dealt with.
|
||||||
|
|
||||||
|
|
||||||
(*) Request pages be read from cache [mandatory]:
|
* Request pages be read from cache [mandatory]::
|
||||||
|
|
||||||
int (*read_or_alloc_pages)(struct fscache_retrieval *op,
|
int (*read_or_alloc_pages)(struct fscache_retrieval *op,
|
||||||
struct list_head *pages,
|
struct list_head *pages,
|
||||||
@ -457,7 +455,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
of pages instead of one page. Any pages on which a read operation is
|
of pages instead of one page. Any pages on which a read operation is
|
||||||
started must be added to the page cache for the specified mapping and also
|
started must be added to the page cache for the specified mapping and also
|
||||||
to the LRU. Such pages must also be removed from the pages list and
|
to the LRU. Such pages must also be removed from the pages list and
|
||||||
*nr_pages decremented per page.
|
``*nr_pages`` decremented per page.
|
||||||
|
|
||||||
If there was an error such as -ENOMEM, then that should be returned; else
|
If there was an error such as -ENOMEM, then that should be returned; else
|
||||||
if one or more pages couldn't be read or allocated, then -ENOBUFS should
|
if one or more pages couldn't be read or allocated, then -ENOBUFS should
|
||||||
@ -466,7 +464,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
returned.
|
returned.
|
||||||
|
|
||||||
|
|
||||||
(*) Request page be allocated in the cache [mandatory]:
|
* Request page be allocated in the cache [mandatory]::
|
||||||
|
|
||||||
int (*allocate_page)(struct fscache_retrieval *op,
|
int (*allocate_page)(struct fscache_retrieval *op,
|
||||||
struct page *page,
|
struct page *page,
|
||||||
@ -482,7 +480,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
allocated, then the netfs page should be marked and 0 returned.
|
allocated, then the netfs page should be marked and 0 returned.
|
||||||
|
|
||||||
|
|
||||||
(*) Request pages be allocated in the cache [mandatory]:
|
* Request pages be allocated in the cache [mandatory]::
|
||||||
|
|
||||||
int (*allocate_pages)(struct fscache_retrieval *op,
|
int (*allocate_pages)(struct fscache_retrieval *op,
|
||||||
struct list_head *pages,
|
struct list_head *pages,
|
||||||
@ -493,7 +491,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
nr_pages should be treated as for the read_or_alloc_pages() method.
|
nr_pages should be treated as for the read_or_alloc_pages() method.
|
||||||
|
|
||||||
|
|
||||||
(*) Request page be written to cache [mandatory]:
|
* Request page be written to cache [mandatory]::
|
||||||
|
|
||||||
int (*write_page)(struct fscache_storage *op,
|
int (*write_page)(struct fscache_storage *op,
|
||||||
struct page *page);
|
struct page *page);
|
||||||
@ -514,7 +512,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
appropriately.
|
appropriately.
|
||||||
|
|
||||||
|
|
||||||
(*) Discard retained per-page metadata [mandatory]:
|
* Discard retained per-page metadata [mandatory]::
|
||||||
|
|
||||||
void (*uncache_page)(struct fscache_object *object, struct page *page)
|
void (*uncache_page)(struct fscache_object *object, struct page *page)
|
||||||
|
|
||||||
@ -523,13 +521,12 @@ performed on the denizens of the cache. These are held in a structure of type:
|
|||||||
maintains for this page.
|
maintains for this page.
|
||||||
|
|
||||||
|
|
||||||
==================
|
FS-Cache Utilities
|
||||||
FS-CACHE UTILITIES
|
|
||||||
==================
|
==================
|
||||||
|
|
||||||
FS-Cache provides some utilities that a cache backend may make use of:
|
FS-Cache provides some utilities that a cache backend may make use of:
|
||||||
|
|
||||||
(*) Note occurrence of an I/O error in a cache:
|
* Note occurrence of an I/O error in a cache::
|
||||||
|
|
||||||
void fscache_io_error(struct fscache_cache *cache)
|
void fscache_io_error(struct fscache_cache *cache)
|
||||||
|
|
||||||
@ -541,7 +538,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
|||||||
This does not actually withdraw the cache. That must be done separately.
|
This does not actually withdraw the cache. That must be done separately.
|
||||||
|
|
||||||
|
|
||||||
(*) Invoke the retrieval I/O completion function:
|
* Invoke the retrieval I/O completion function::
|
||||||
|
|
||||||
void fscache_end_io(struct fscache_retrieval *op, struct page *page,
|
void fscache_end_io(struct fscache_retrieval *op, struct page *page,
|
||||||
int error);
|
int error);
|
||||||
@ -550,8 +547,8 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
|||||||
error value should be 0 if successful and an error otherwise.
|
error value should be 0 if successful and an error otherwise.
|
||||||
|
|
||||||
|
|
||||||
(*) Record that one or more pages being retrieved or allocated have been dealt
|
* Record that one or more pages being retrieved or allocated have been dealt
|
||||||
with:
|
with::
|
||||||
|
|
||||||
void fscache_retrieval_complete(struct fscache_retrieval *op,
|
void fscache_retrieval_complete(struct fscache_retrieval *op,
|
||||||
int n_pages);
|
int n_pages);
|
||||||
@ -562,7 +559,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
|||||||
completed.
|
completed.
|
||||||
|
|
||||||
|
|
||||||
(*) Record operation completion:
|
* Record operation completion::
|
||||||
|
|
||||||
void fscache_op_complete(struct fscache_operation *op);
|
void fscache_op_complete(struct fscache_operation *op);
|
||||||
|
|
||||||
@ -571,7 +568,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
|||||||
one or more pending operations to start running.
|
one or more pending operations to start running.
|
||||||
|
|
||||||
|
|
||||||
(*) Set highest store limit:
|
* Set highest store limit::
|
||||||
|
|
||||||
void fscache_set_store_limit(struct fscache_object *object,
|
void fscache_set_store_limit(struct fscache_object *object,
|
||||||
loff_t i_size);
|
loff_t i_size);
|
||||||
@ -581,7 +578,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
|||||||
rejected by fscache_read_alloc_page() and co with -ENOBUFS.
|
rejected by fscache_read_alloc_page() and co with -ENOBUFS.
|
||||||
|
|
||||||
|
|
||||||
(*) Mark pages as being cached:
|
* Mark pages as being cached::
|
||||||
|
|
||||||
void fscache_mark_pages_cached(struct fscache_retrieval *op,
|
void fscache_mark_pages_cached(struct fscache_retrieval *op,
|
||||||
struct pagevec *pagevec);
|
struct pagevec *pagevec);
|
||||||
@ -590,7 +587,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
|||||||
the netfs must call fscache_uncache_page() to unmark the pages.
|
the netfs must call fscache_uncache_page() to unmark the pages.
|
||||||
|
|
||||||
|
|
||||||
(*) Perform coherency check on an object:
|
* Perform coherency check on an object::
|
||||||
|
|
||||||
enum fscache_checkaux fscache_check_aux(struct fscache_object *object,
|
enum fscache_checkaux fscache_check_aux(struct fscache_object *object,
|
||||||
const void *data,
|
const void *data,
|
||||||
@ -603,29 +600,26 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
|||||||
|
|
||||||
One of three values will be returned:
|
One of three values will be returned:
|
||||||
|
|
||||||
(*) FSCACHE_CHECKAUX_OKAY
|
FSCACHE_CHECKAUX_OKAY
|
||||||
|
|
||||||
The coherency data indicates the object is valid as is.
|
The coherency data indicates the object is valid as is.
|
||||||
|
|
||||||
(*) FSCACHE_CHECKAUX_NEEDS_UPDATE
|
FSCACHE_CHECKAUX_NEEDS_UPDATE
|
||||||
|
|
||||||
The coherency data needs updating, but otherwise the object is
|
The coherency data needs updating, but otherwise the object is
|
||||||
valid.
|
valid.
|
||||||
|
|
||||||
(*) FSCACHE_CHECKAUX_OBSOLETE
|
FSCACHE_CHECKAUX_OBSOLETE
|
||||||
|
|
||||||
The coherency data indicates that the object is obsolete and should
|
The coherency data indicates that the object is obsolete and should
|
||||||
be discarded.
|
be discarded.
|
||||||
|
|
||||||
|
|
||||||
(*) Initialise a freshly allocated object:
|
* Initialise a freshly allocated object::
|
||||||
|
|
||||||
void fscache_object_init(struct fscache_object *object);
|
void fscache_object_init(struct fscache_object *object);
|
||||||
|
|
||||||
This initialises all the fields in an object representation.
|
This initialises all the fields in an object representation.
|
||||||
|
|
||||||
|
|
||||||
(*) Indicate the destruction of an object:
|
* Indicate the destruction of an object::
|
||||||
|
|
||||||
void fscache_object_destroyed(struct fscache_cache *cache);
|
void fscache_object_destroyed(struct fscache_cache *cache);
|
||||||
|
|
||||||
@ -635,7 +629,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
|||||||
all the objects.
|
all the objects.
|
||||||
|
|
||||||
|
|
||||||
(*) Indicate negative lookup on an object:
|
* Indicate negative lookup on an object::
|
||||||
|
|
||||||
void fscache_object_lookup_negative(struct fscache_object *object);
|
void fscache_object_lookup_negative(struct fscache_object *object);
|
||||||
|
|
||||||
@ -650,7 +644,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
|||||||
significant - all subsequent calls are ignored.
|
significant - all subsequent calls are ignored.
|
||||||
|
|
||||||
|
|
||||||
(*) Indicate an object has been obtained:
|
* Indicate an object has been obtained::
|
||||||
|
|
||||||
void fscache_obtained_object(struct fscache_object *object);
|
void fscache_obtained_object(struct fscache_object *object);
|
||||||
|
|
||||||
@ -667,7 +661,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
|||||||
(2) that writes may now proceed against this object.
|
(2) that writes may now proceed against this object.
|
||||||
|
|
||||||
|
|
||||||
(*) Indicate that object lookup failed:
|
* Indicate that object lookup failed::
|
||||||
|
|
||||||
void fscache_object_lookup_error(struct fscache_object *object);
|
void fscache_object_lookup_error(struct fscache_object *object);
|
||||||
|
|
||||||
@ -676,7 +670,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
|||||||
as possible.
|
as possible.
|
||||||
|
|
||||||
|
|
||||||
(*) Indicate that a stale object was found and discarded:
|
* Indicate that a stale object was found and discarded::
|
||||||
|
|
||||||
void fscache_object_retrying_stale(struct fscache_object *object);
|
void fscache_object_retrying_stale(struct fscache_object *object);
|
||||||
|
|
||||||
@ -685,7 +679,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
|||||||
discarded from the cache and the lookup will be performed again.
|
discarded from the cache and the lookup will be performed again.
|
||||||
|
|
||||||
|
|
||||||
(*) Indicate that the caching backend killed an object:
|
* Indicate that the caching backend killed an object::
|
||||||
|
|
||||||
void fscache_object_mark_killed(struct fscache_object *object,
|
void fscache_object_mark_killed(struct fscache_object *object,
|
||||||
enum fscache_why_object_killed why);
|
enum fscache_why_object_killed why);
|
||||||
@ -693,13 +687,20 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
|||||||
This is called to indicate that the cache backend preemptively killed an
|
This is called to indicate that the cache backend preemptively killed an
|
||||||
object. The why parameter should be set to indicate the reason:
|
object. The why parameter should be set to indicate the reason:
|
||||||
|
|
||||||
FSCACHE_OBJECT_IS_STALE - the object was stale and needs discarding.
|
FSCACHE_OBJECT_IS_STALE
|
||||||
FSCACHE_OBJECT_NO_SPACE - there was insufficient cache space
|
- the object was stale and needs discarding.
|
||||||
FSCACHE_OBJECT_WAS_RETIRED - the object was retired when relinquished.
|
|
||||||
FSCACHE_OBJECT_WAS_CULLED - the object was culled to make space.
|
FSCACHE_OBJECT_NO_SPACE
|
||||||
|
- there was insufficient cache space
|
||||||
|
|
||||||
|
FSCACHE_OBJECT_WAS_RETIRED
|
||||||
|
- the object was retired when relinquished.
|
||||||
|
|
||||||
|
FSCACHE_OBJECT_WAS_CULLED
|
||||||
|
- the object was culled to make space.
|
||||||
|
|
||||||
|
|
||||||
(*) Get and release references on a retrieval record:
|
* Get and release references on a retrieval record::
|
||||||
|
|
||||||
void fscache_get_retrieval(struct fscache_retrieval *op);
|
void fscache_get_retrieval(struct fscache_retrieval *op);
|
||||||
void fscache_put_retrieval(struct fscache_retrieval *op);
|
void fscache_put_retrieval(struct fscache_retrieval *op);
|
||||||
@ -708,7 +709,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
|||||||
asynchronous data retrieval and block allocation.
|
asynchronous data retrieval and block allocation.
|
||||||
|
|
||||||
|
|
||||||
(*) Enqueue a retrieval record for processing.
|
* Enqueue a retrieval record for processing::
|
||||||
|
|
||||||
void fscache_enqueue_retrieval(struct fscache_retrieval *op);
|
void fscache_enqueue_retrieval(struct fscache_retrieval *op);
|
||||||
|
|
||||||
@ -718,7 +719,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
|||||||
within the callback function.
|
within the callback function.
|
||||||
|
|
||||||
|
|
||||||
(*) List of object state names:
|
* List of object state names::
|
||||||
|
|
||||||
const char *fscache_object_states[];
|
const char *fscache_object_states[];
|
||||||
|
|
@ -1,8 +1,10 @@
|
|||||||
===============================================
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
CacheFiles: CACHE ON ALREADY MOUNTED FILESYSTEM
|
|
||||||
===============================================
|
|
||||||
|
|
||||||
Contents:
|
===============================================
|
||||||
|
CacheFiles: CACHE ON ALREADY MOUNTED FILESYSTEM
|
||||||
|
===============================================
|
||||||
|
|
||||||
|
.. Contents:
|
||||||
|
|
||||||
(*) Overview.
|
(*) Overview.
|
||||||
|
|
||||||
@ -27,8 +29,8 @@ Contents:
|
|||||||
(*) Debugging.
|
(*) Debugging.
|
||||||
|
|
||||||
|
|
||||||
========
|
|
||||||
OVERVIEW
|
Overview
|
||||||
========
|
========
|
||||||
|
|
||||||
CacheFiles is a caching backend that's meant to use as a cache a directory on
|
CacheFiles is a caching backend that's meant to use as a cache a directory on
|
||||||
@ -58,8 +60,8 @@ spare space and automatically contract when the set of data requires more
|
|||||||
space.
|
space.
|
||||||
|
|
||||||
|
|
||||||
============
|
|
||||||
REQUIREMENTS
|
Requirements
|
||||||
============
|
============
|
||||||
|
|
||||||
The use of CacheFiles and its daemon requires the following features to be
|
The use of CacheFiles and its daemon requires the following features to be
|
||||||
@ -79,84 +81,70 @@ It is strongly recommended that the "dir_index" option is enabled on Ext3
|
|||||||
filesystems being used as a cache.
|
filesystems being used as a cache.
|
||||||
|
|
||||||
|
|
||||||
=============
|
Configuration
|
||||||
CONFIGURATION
|
|
||||||
=============
|
=============
|
||||||
|
|
||||||
The cache is configured by a script in /etc/cachefilesd.conf. These commands
|
The cache is configured by a script in /etc/cachefilesd.conf. These commands
|
||||||
set up cache ready for use. The following script commands are available:
|
set up cache ready for use. The following script commands are available:
|
||||||
|
|
||||||
(*) brun <N>%
|
brun <N>%, bcull <N>%, bstop <N>%, frun <N>%, fcull <N>%, fstop <N>%
|
||||||
(*) bcull <N>%
|
|
||||||
(*) bstop <N>%
|
|
||||||
(*) frun <N>%
|
|
||||||
(*) fcull <N>%
|
|
||||||
(*) fstop <N>%
|
|
||||||
|
|
||||||
Configure the culling limits. Optional. See the section on culling
|
Configure the culling limits. Optional. See the section on culling
|
||||||
The defaults are 7% (run), 5% (cull) and 1% (stop) respectively.
|
The defaults are 7% (run), 5% (cull) and 1% (stop) respectively.
|
||||||
|
|
||||||
The commands beginning with a 'b' are file space (block) limits, those
|
The commands beginning with a 'b' are file space (block) limits, those
|
||||||
beginning with an 'f' are file count limits.
|
beginning with an 'f' are file count limits.
|
||||||
|
|
||||||
(*) dir <path>
|
dir <path>
|
||||||
|
|
||||||
Specify the directory containing the root of the cache. Mandatory.
|
Specify the directory containing the root of the cache. Mandatory.
|
||||||
|
|
||||||
(*) tag <name>
|
tag <name>
|
||||||
|
|
||||||
Specify a tag to FS-Cache to use in distinguishing multiple caches.
|
Specify a tag to FS-Cache to use in distinguishing multiple caches.
|
||||||
Optional. The default is "CacheFiles".
|
Optional. The default is "CacheFiles".
|
||||||
|
|
||||||
(*) debug <mask>
|
debug <mask>
|
||||||
|
|
||||||
Specify a numeric bitmask to control debugging in the kernel module.
|
Specify a numeric bitmask to control debugging in the kernel module.
|
||||||
Optional. The default is zero (all off). The following values can be
|
Optional. The default is zero (all off). The following values can be
|
||||||
OR'd into the mask to collect various information:
|
OR'd into the mask to collect various information:
|
||||||
|
|
||||||
|
== =================================================
|
||||||
1 Turn on trace of function entry (_enter() macros)
|
1 Turn on trace of function entry (_enter() macros)
|
||||||
2 Turn on trace of function exit (_leave() macros)
|
2 Turn on trace of function exit (_leave() macros)
|
||||||
4 Turn on trace of internal debug points (_debug())
|
4 Turn on trace of internal debug points (_debug())
|
||||||
|
== =================================================
|
||||||
|
|
||||||
This mask can also be set through sysfs, eg:
|
This mask can also be set through sysfs, eg::
|
||||||
|
|
||||||
echo 5 >/sys/modules/cachefiles/parameters/debug
|
echo 5 >/sys/modules/cachefiles/parameters/debug
|
||||||
|
|
||||||
|
|
||||||
==================
|
Starting the Cache
|
||||||
STARTING THE CACHE
|
|
||||||
==================
|
==================
|
||||||
|
|
||||||
The cache is started by running the daemon. The daemon opens the cache device,
|
The cache is started by running the daemon. The daemon opens the cache device,
|
||||||
configures the cache and tells it to begin caching. At that point the cache
|
configures the cache and tells it to begin caching. At that point the cache
|
||||||
binds to fscache and the cache becomes live.
|
binds to fscache and the cache becomes live.
|
||||||
|
|
||||||
The daemon is run as follows:
|
The daemon is run as follows::
|
||||||
|
|
||||||
/sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>]
|
/sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>]
|
||||||
|
|
||||||
The flags are:
|
The flags are:
|
||||||
|
|
||||||
(*) -d
|
``-d``
|
||||||
|
|
||||||
Increase the debugging level. This can be specified multiple times and
|
Increase the debugging level. This can be specified multiple times and
|
||||||
is cumulative with itself.
|
is cumulative with itself.
|
||||||
|
|
||||||
(*) -s
|
``-s``
|
||||||
|
|
||||||
Send messages to stderr instead of syslog.
|
Send messages to stderr instead of syslog.
|
||||||
|
|
||||||
(*) -n
|
``-n``
|
||||||
|
|
||||||
Don't daemonise and go into background.
|
Don't daemonise and go into background.
|
||||||
|
|
||||||
(*) -f <configfile>
|
``-f <configfile>``
|
||||||
|
|
||||||
Use an alternative configuration file rather than the default one.
|
Use an alternative configuration file rather than the default one.
|
||||||
|
|
||||||
|
|
||||||
===============
|
Things to Avoid
|
||||||
THINGS TO AVOID
|
|
||||||
===============
|
===============
|
||||||
|
|
||||||
Do not mount other things within the cache as this will cause problems. The
|
Do not mount other things within the cache as this will cause problems. The
|
||||||
@ -179,8 +167,7 @@ Do not chmod files in the cache. The module creates things with minimal
|
|||||||
permissions to prevent random users being able to access them directly.
|
permissions to prevent random users being able to access them directly.
|
||||||
|
|
||||||
|
|
||||||
=============
|
Cache Culling
|
||||||
CACHE CULLING
|
|
||||||
=============
|
=============
|
||||||
|
|
||||||
The cache may need culling occasionally to make space. This involves
|
The cache may need culling occasionally to make space. This involves
|
||||||
@ -192,27 +179,21 @@ Cache culling is done on the basis of the percentage of blocks and the
|
|||||||
percentage of files available in the underlying filesystem. There are six
|
percentage of files available in the underlying filesystem. There are six
|
||||||
"limits":
|
"limits":
|
||||||
|
|
||||||
(*) brun
|
brun, frun
|
||||||
(*) frun
|
|
||||||
|
|
||||||
If the amount of free space and the number of available files in the cache
|
If the amount of free space and the number of available files in the cache
|
||||||
rises above both these limits, then culling is turned off.
|
rises above both these limits, then culling is turned off.
|
||||||
|
|
||||||
(*) bcull
|
bcull, fcull
|
||||||
(*) fcull
|
|
||||||
|
|
||||||
If the amount of available space or the number of available files in the
|
If the amount of available space or the number of available files in the
|
||||||
cache falls below either of these limits, then culling is started.
|
cache falls below either of these limits, then culling is started.
|
||||||
|
|
||||||
(*) bstop
|
bstop, fstop
|
||||||
(*) fstop
|
|
||||||
|
|
||||||
If the amount of available space or the number of available files in the
|
If the amount of available space or the number of available files in the
|
||||||
cache falls below either of these limits, then no further allocation of
|
cache falls below either of these limits, then no further allocation of
|
||||||
disk space or files is permitted until culling has raised things above
|
disk space or files is permitted until culling has raised things above
|
||||||
these limits again.
|
these limits again.
|
||||||
|
|
||||||
These must be configured thusly:
|
These must be configured thusly::
|
||||||
|
|
||||||
0 <= bstop < bcull < brun < 100
|
0 <= bstop < bcull < brun < 100
|
||||||
0 <= fstop < fcull < frun < 100
|
0 <= fstop < fcull < frun < 100
|
||||||
@ -226,16 +207,14 @@ started as soon as space is made in the table. Objects will be skipped if
|
|||||||
their atimes have changed or if the kernel module says it is still using them.
|
their atimes have changed or if the kernel module says it is still using them.
|
||||||
|
|
||||||
|
|
||||||
===============
|
Cache Structure
|
||||||
CACHE STRUCTURE
|
|
||||||
===============
|
===============
|
||||||
|
|
||||||
The CacheFiles module will create two directories in the directory it was
|
The CacheFiles module will create two directories in the directory it was
|
||||||
given:
|
given:
|
||||||
|
|
||||||
(*) cache/
|
* cache/
|
||||||
|
* graveyard/
|
||||||
(*) graveyard/
|
|
||||||
|
|
||||||
The active cache objects all reside in the first directory. The CacheFiles
|
The active cache objects all reside in the first directory. The CacheFiles
|
||||||
kernel module moves any retired or culled objects that it can't simply unlink
|
kernel module moves any retired or culled objects that it can't simply unlink
|
||||||
@ -261,10 +240,10 @@ If an object has children, then it will be represented as a directory.
|
|||||||
Immediately in the representative directory are a collection of directories
|
Immediately in the representative directory are a collection of directories
|
||||||
named for hash values of the child object keys with an '@' prepended. Into
|
named for hash values of the child object keys with an '@' prepended. Into
|
||||||
this directory, if possible, will be placed the representations of the child
|
this directory, if possible, will be placed the representations of the child
|
||||||
objects:
|
objects::
|
||||||
|
|
||||||
INDEX INDEX INDEX DATA FILES
|
/INDEX /INDEX /INDEX /DATA FILES
|
||||||
========= ========== ================================= ================
|
/=========/==========/=================================/================
|
||||||
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
|
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
|
||||||
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
|
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
|
||||||
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
|
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
|
||||||
@ -275,7 +254,7 @@ If the key is so long that it exceeds NAME_MAX with the decorations added on to
|
|||||||
it, then it will be cut into pieces, the first few of which will be used to
|
it, then it will be cut into pieces, the first few of which will be used to
|
||||||
make a nest of directories, and the last one of which will be the objects
|
make a nest of directories, and the last one of which will be the objects
|
||||||
inside the last directory. The names of the intermediate directories will have
|
inside the last directory. The names of the intermediate directories will have
|
||||||
'+' prepended:
|
'+' prepended::
|
||||||
|
|
||||||
J1223/@23/+xy...z/+kl...m/Epqr
|
J1223/@23/+xy...z/+kl...m/Epqr
|
||||||
|
|
||||||
@ -288,11 +267,13 @@ To handle this, CacheFiles will use a suitably printable filename directly and
|
|||||||
"base-64" encode ones that aren't directly suitable. The two versions of
|
"base-64" encode ones that aren't directly suitable. The two versions of
|
||||||
object filenames indicate the encoding:
|
object filenames indicate the encoding:
|
||||||
|
|
||||||
|
=============== =============== ===============
|
||||||
OBJECT TYPE PRINTABLE ENCODED
|
OBJECT TYPE PRINTABLE ENCODED
|
||||||
=============== =============== ===============
|
=============== =============== ===============
|
||||||
Index "I..." "J..."
|
Index "I..." "J..."
|
||||||
Data "D..." "E..."
|
Data "D..." "E..."
|
||||||
Special "S..." "T..."
|
Special "S..." "T..."
|
||||||
|
=============== =============== ===============
|
||||||
|
|
||||||
Intermediate directories are always "@" or "+" as appropriate.
|
Intermediate directories are always "@" or "+" as appropriate.
|
||||||
|
|
||||||
@ -307,8 +288,7 @@ Note that CacheFiles will erase from the cache any file it doesn't recognise or
|
|||||||
any file of an incorrect type (such as a FIFO file or a device file).
|
any file of an incorrect type (such as a FIFO file or a device file).
|
||||||
|
|
||||||
|
|
||||||
==========================
|
Security Model and SELinux
|
||||||
SECURITY MODEL AND SELINUX
|
|
||||||
==========================
|
==========================
|
||||||
|
|
||||||
CacheFiles is implemented to deal properly with the LSM security features of
|
CacheFiles is implemented to deal properly with the LSM security features of
|
||||||
@ -331,26 +311,26 @@ When the CacheFiles module is asked to bind to its cache, it:
|
|||||||
|
|
||||||
(1) Finds the security label attached to the root cache directory and uses
|
(1) Finds the security label attached to the root cache directory and uses
|
||||||
that as the security label with which it will create files. By default,
|
that as the security label with which it will create files. By default,
|
||||||
this is:
|
this is::
|
||||||
|
|
||||||
cachefiles_var_t
|
cachefiles_var_t
|
||||||
|
|
||||||
(2) Finds the security label of the process which issued the bind request
|
(2) Finds the security label of the process which issued the bind request
|
||||||
(presumed to be the cachefilesd daemon), which by default will be:
|
(presumed to be the cachefilesd daemon), which by default will be::
|
||||||
|
|
||||||
cachefilesd_t
|
cachefilesd_t
|
||||||
|
|
||||||
and asks LSM to supply a security ID as which it should act given the
|
and asks LSM to supply a security ID as which it should act given the
|
||||||
daemon's label. By default, this will be:
|
daemon's label. By default, this will be::
|
||||||
|
|
||||||
cachefiles_kernel_t
|
cachefiles_kernel_t
|
||||||
|
|
||||||
SELinux transitions the daemon's security ID to the module's security ID
|
SELinux transitions the daemon's security ID to the module's security ID
|
||||||
based on a rule of this form in the policy.
|
based on a rule of this form in the policy::
|
||||||
|
|
||||||
type_transition <daemon's-ID> kernel_t : process <module's-ID>;
|
type_transition <daemon's-ID> kernel_t : process <module's-ID>;
|
||||||
|
|
||||||
For instance:
|
For instance::
|
||||||
|
|
||||||
type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t;
|
type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t;
|
||||||
|
|
||||||
@ -370,7 +350,7 @@ There are policy source files available in:
|
|||||||
|
|
||||||
http://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2
|
http://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2
|
||||||
|
|
||||||
and later versions. In that tarball, see the files:
|
and later versions. In that tarball, see the files::
|
||||||
|
|
||||||
cachefilesd.te
|
cachefilesd.te
|
||||||
cachefilesd.fc
|
cachefilesd.fc
|
||||||
@ -379,7 +359,7 @@ and later versions. In that tarball, see the files:
|
|||||||
They are built and installed directly by the RPM.
|
They are built and installed directly by the RPM.
|
||||||
|
|
||||||
If a non-RPM based system is being used, then copy the above files to their own
|
If a non-RPM based system is being used, then copy the above files to their own
|
||||||
directory and run:
|
directory and run::
|
||||||
|
|
||||||
make -f /usr/share/selinux/devel/Makefile
|
make -f /usr/share/selinux/devel/Makefile
|
||||||
semodule -i cachefilesd.pp
|
semodule -i cachefilesd.pp
|
||||||
@ -394,7 +374,7 @@ an auxiliary policy must be installed to label the alternate location of the
|
|||||||
cache.
|
cache.
|
||||||
|
|
||||||
For instructions on how to add an auxiliary policy to enable the cache to be
|
For instructions on how to add an auxiliary policy to enable the cache to be
|
||||||
located elsewhere when SELinux is in enforcing mode, please see:
|
located elsewhere when SELinux is in enforcing mode, please see::
|
||||||
|
|
||||||
/usr/share/doc/cachefilesd-*/move-cache.txt
|
/usr/share/doc/cachefilesd-*/move-cache.txt
|
||||||
|
|
||||||
@ -402,8 +382,7 @@ When the cachefilesd rpm is installed; alternatively, the document can be found
|
|||||||
in the sources.
|
in the sources.
|
||||||
|
|
||||||
|
|
||||||
==================
|
A Note on Security
|
||||||
A NOTE ON SECURITY
|
|
||||||
==================
|
==================
|
||||||
|
|
||||||
CacheFiles makes use of the split security in the task_struct. It allocates
|
CacheFiles makes use of the split security in the task_struct. It allocates
|
||||||
@ -445,17 +424,18 @@ for CacheFiles to run in a context of a specific security label, or to create
|
|||||||
files and directories with another security label.
|
files and directories with another security label.
|
||||||
|
|
||||||
|
|
||||||
=======================
|
Statistical Information
|
||||||
STATISTICAL INFORMATION
|
|
||||||
=======================
|
=======================
|
||||||
|
|
||||||
If FS-Cache is compiled with the following option enabled:
|
If FS-Cache is compiled with the following option enabled::
|
||||||
|
|
||||||
CONFIG_CACHEFILES_HISTOGRAM=y
|
CONFIG_CACHEFILES_HISTOGRAM=y
|
||||||
|
|
||||||
then it will gather certain statistics and display them through a proc file.
|
then it will gather certain statistics and display them through a proc file.
|
||||||
|
|
||||||
(*) /proc/fs/cachefiles/histogram
|
/proc/fs/cachefiles/histogram
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
cat /proc/fs/cachefiles/histogram
|
cat /proc/fs/cachefiles/histogram
|
||||||
JIFS SECS LOOKUPS MKDIRS CREATES
|
JIFS SECS LOOKUPS MKDIRS CREATES
|
||||||
@ -465,36 +445,39 @@ then it will gather certain statistics and display them through a proc file.
|
|||||||
between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The
|
between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The
|
||||||
columns are as follows:
|
columns are as follows:
|
||||||
|
|
||||||
|
======= =======================================================
|
||||||
COLUMN TIME MEASUREMENT
|
COLUMN TIME MEASUREMENT
|
||||||
======= =======================================================
|
======= =======================================================
|
||||||
LOOKUPS Length of time to perform a lookup on the backing fs
|
LOOKUPS Length of time to perform a lookup on the backing fs
|
||||||
MKDIRS Length of time to perform a mkdir on the backing fs
|
MKDIRS Length of time to perform a mkdir on the backing fs
|
||||||
CREATES Length of time to perform a create on the backing fs
|
CREATES Length of time to perform a create on the backing fs
|
||||||
|
======= =======================================================
|
||||||
|
|
||||||
Each row shows the number of events that took a particular range of times.
|
Each row shows the number of events that took a particular range of times.
|
||||||
Each step is 1 jiffy in size. The JIFS column indicates the particular
|
Each step is 1 jiffy in size. The JIFS column indicates the particular
|
||||||
jiffy range covered, and the SECS field the equivalent number of seconds.
|
jiffy range covered, and the SECS field the equivalent number of seconds.
|
||||||
|
|
||||||
|
|
||||||
=========
|
Debugging
|
||||||
DEBUGGING
|
|
||||||
=========
|
=========
|
||||||
|
|
||||||
If CONFIG_CACHEFILES_DEBUG is enabled, the CacheFiles facility can have runtime
|
If CONFIG_CACHEFILES_DEBUG is enabled, the CacheFiles facility can have runtime
|
||||||
debugging enabled by adjusting the value in:
|
debugging enabled by adjusting the value in::
|
||||||
|
|
||||||
/sys/module/cachefiles/parameters/debug
|
/sys/module/cachefiles/parameters/debug
|
||||||
|
|
||||||
This is a bitmask of debugging streams to enable:
|
This is a bitmask of debugging streams to enable:
|
||||||
|
|
||||||
|
======= ======= =============================== =======================
|
||||||
BIT VALUE STREAM POINT
|
BIT VALUE STREAM POINT
|
||||||
======= ======= =============================== =======================
|
======= ======= =============================== =======================
|
||||||
0 1 General Function entry trace
|
0 1 General Function entry trace
|
||||||
1 2 Function exit trace
|
1 2 Function exit trace
|
||||||
2 4 General
|
2 4 General
|
||||||
|
======= ======= =============================== =======================
|
||||||
|
|
||||||
The appropriate set of values should be OR'd together and the result written to
|
The appropriate set of values should be OR'd together and the result written to
|
||||||
the control file. For example:
|
the control file. For example::
|
||||||
|
|
||||||
echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug
|
echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug
|
||||||
|
|
565
Documentation/filesystems/caching/fscache.rst
Normal file
565
Documentation/filesystems/caching/fscache.rst
Normal file
@ -0,0 +1,565 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
==========================
|
||||||
|
General Filesystem Caching
|
||||||
|
==========================
|
||||||
|
|
||||||
|
Overview
|
||||||
|
========
|
||||||
|
|
||||||
|
This facility is a general purpose cache for network filesystems, though it
|
||||||
|
could be used for caching other things such as ISO9660 filesystems too.
|
||||||
|
|
||||||
|
FS-Cache mediates between cache backends (such as CacheFS) and network
|
||||||
|
filesystems::
|
||||||
|
|
||||||
|
+---------+
|
||||||
|
| | +--------------+
|
||||||
|
| NFS |--+ | |
|
||||||
|
| | | +-->| CacheFS |
|
||||||
|
+---------+ | +----------+ | | /dev/hda5 |
|
||||||
|
| | | | +--------------+
|
||||||
|
+---------+ +-->| | |
|
||||||
|
| | | |--+
|
||||||
|
| AFS |----->| FS-Cache |
|
||||||
|
| | | |--+
|
||||||
|
+---------+ +-->| | |
|
||||||
|
| | | | +--------------+
|
||||||
|
+---------+ | +----------+ | | |
|
||||||
|
| | | +-->| CacheFiles |
|
||||||
|
| ISOFS |--+ | /var/cache |
|
||||||
|
| | +--------------+
|
||||||
|
+---------+
|
||||||
|
|
||||||
|
Or to look at it another way, FS-Cache is a module that provides a caching
|
||||||
|
facility to a network filesystem such that the cache is transparent to the
|
||||||
|
user::
|
||||||
|
|
||||||
|
+---------+
|
||||||
|
| |
|
||||||
|
| Server |
|
||||||
|
| |
|
||||||
|
+---------+
|
||||||
|
| NETWORK
|
||||||
|
~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
|
||||||
|
| +----------+
|
||||||
|
V | |
|
||||||
|
+---------+ | |
|
||||||
|
| | | |
|
||||||
|
| NFS |----->| FS-Cache |
|
||||||
|
| | | |--+
|
||||||
|
+---------+ | | | +--------------+ +--------------+
|
||||||
|
| | | | | | | |
|
||||||
|
V +----------+ +-->| CacheFiles |-->| Ext3 |
|
||||||
|
+---------+ | /var/cache | | /dev/sda6 |
|
||||||
|
| | +--------------+ +--------------+
|
||||||
|
| VFS | ^ ^
|
||||||
|
| | | |
|
||||||
|
+---------+ +--------------+ |
|
||||||
|
| KERNEL SPACE | |
|
||||||
|
~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|~~~~~~|~~~~
|
||||||
|
| USER SPACE | |
|
||||||
|
V | |
|
||||||
|
+---------+ +--------------+
|
||||||
|
| | | |
|
||||||
|
| Process | | cachefilesd |
|
||||||
|
| | | |
|
||||||
|
+---------+ +--------------+
|
||||||
|
|
||||||
|
|
||||||
|
FS-Cache does not follow the idea of completely loading every netfs file
|
||||||
|
opened in its entirety into a cache before permitting it to be accessed and
|
||||||
|
then serving the pages out of that cache rather than the netfs inode because:
|
||||||
|
|
||||||
|
(1) It must be practical to operate without a cache.
|
||||||
|
|
||||||
|
(2) The size of any accessible file must not be limited to the size of the
|
||||||
|
cache.
|
||||||
|
|
||||||
|
(3) The combined size of all opened files (this includes mapped libraries)
|
||||||
|
must not be limited to the size of the cache.
|
||||||
|
|
||||||
|
(4) The user should not be forced to download an entire file just to do a
|
||||||
|
one-off access of a small portion of it (such as might be done with the
|
||||||
|
"file" program).
|
||||||
|
|
||||||
|
It instead serves the cache out in PAGE_SIZE chunks as and when requested by
|
||||||
|
the netfs('s) using it.
|
||||||
|
|
||||||
|
|
||||||
|
FS-Cache provides the following facilities:
|
||||||
|
|
||||||
|
(1) More than one cache can be used at once. Caches can be selected
|
||||||
|
explicitly by use of tags.
|
||||||
|
|
||||||
|
(2) Caches can be added / removed at any time.
|
||||||
|
|
||||||
|
(3) The netfs is provided with an interface that allows either party to
|
||||||
|
withdraw caching facilities from a file (required for (2)).
|
||||||
|
|
||||||
|
(4) The interface to the netfs returns as few errors as possible, preferring
|
||||||
|
rather to let the netfs remain oblivious.
|
||||||
|
|
||||||
|
(5) Cookies are used to represent indices, files and other objects to the
|
||||||
|
netfs. The simplest cookie is just a NULL pointer - indicating nothing
|
||||||
|
cached there.
|
||||||
|
|
||||||
|
(6) The netfs is allowed to propose - dynamically - any index hierarchy it
|
||||||
|
desires, though it must be aware that the index search function is
|
||||||
|
recursive, stack space is limited, and indices can only be children of
|
||||||
|
indices.
|
||||||
|
|
||||||
|
(7) Data I/O is done direct to and from the netfs's pages. The netfs
|
||||||
|
indicates that page A is at index B of the data-file represented by cookie
|
||||||
|
C, and that it should be read or written. The cache backend may or may
|
||||||
|
not start I/O on that page, but if it does, a netfs callback will be
|
||||||
|
invoked to indicate completion. The I/O may be either synchronous or
|
||||||
|
asynchronous.
|
||||||
|
|
||||||
|
(8) Cookies can be "retired" upon release. At this point FS-Cache will mark
|
||||||
|
them as obsolete and the index hierarchy rooted at that point will get
|
||||||
|
recycled.
|
||||||
|
|
||||||
|
(9) The netfs provides a "match" function for index searches. In addition to
|
||||||
|
saying whether a match was made or not, this can also specify that an
|
||||||
|
entry should be updated or deleted.
|
||||||
|
|
||||||
|
(10) As much as possible is done asynchronously.
|
||||||
|
|
||||||
|
|
||||||
|
FS-Cache maintains a virtual indexing tree in which all indices, files, objects
|
||||||
|
and pages are kept. Bits of this tree may actually reside in one or more
|
||||||
|
caches::
|
||||||
|
|
||||||
|
FSDEF
|
||||||
|
|
|
||||||
|
+------------------------------------+
|
||||||
|
| |
|
||||||
|
NFS AFS
|
||||||
|
| |
|
||||||
|
+--------------------------+ +-----------+
|
||||||
|
| | | |
|
||||||
|
homedir mirror afs.org redhat.com
|
||||||
|
| | |
|
||||||
|
+------------+ +---------------+ +----------+
|
||||||
|
| | | | | |
|
||||||
|
00001 00002 00007 00125 vol00001 vol00002
|
||||||
|
| | | | |
|
||||||
|
+---+---+ +-----+ +---+ +------+------+ +-----+----+
|
||||||
|
| | | | | | | | | | | | |
|
||||||
|
PG0 PG1 PG2 PG0 XATTR PG0 PG1 DIRENT DIRENT DIRENT R/W R/O Bak
|
||||||
|
| |
|
||||||
|
PG0 +-------+
|
||||||
|
| |
|
||||||
|
00001 00003
|
||||||
|
|
|
||||||
|
+---+---+
|
||||||
|
| | |
|
||||||
|
PG0 PG1 PG2
|
||||||
|
|
||||||
|
In the example above, you can see two netfs's being backed: NFS and AFS. These
|
||||||
|
have different index hierarchies:
|
||||||
|
|
||||||
|
* The NFS primary index contains per-server indices. Each server index is
|
||||||
|
indexed by NFS file handles to get data file objects. Each data file
|
||||||
|
objects can have an array of pages, but may also have further child
|
||||||
|
objects, such as extended attributes and directory entries. Extended
|
||||||
|
attribute objects themselves have page-array contents.
|
||||||
|
|
||||||
|
* The AFS primary index contains per-cell indices. Each cell index contains
|
||||||
|
per-logical-volume indices. Each of volume index contains up to three
|
||||||
|
indices for the read-write, read-only and backup mirrors of those volumes.
|
||||||
|
Each of these contains vnode data file objects, each of which contains an
|
||||||
|
array of pages.
|
||||||
|
|
||||||
|
The very top index is the FS-Cache master index in which individual netfs's
|
||||||
|
have entries.
|
||||||
|
|
||||||
|
Any index object may reside in more than one cache, provided it only has index
|
||||||
|
children. Any index with non-index object children will be assumed to only
|
||||||
|
reside in one cache.
|
||||||
|
|
||||||
|
|
||||||
|
The netfs API to FS-Cache can be found in:
|
||||||
|
|
||||||
|
Documentation/filesystems/caching/netfs-api.rst
|
||||||
|
|
||||||
|
The cache backend API to FS-Cache can be found in:
|
||||||
|
|
||||||
|
Documentation/filesystems/caching/backend-api.rst
|
||||||
|
|
||||||
|
A description of the internal representations and object state machine can be
|
||||||
|
found in:
|
||||||
|
|
||||||
|
Documentation/filesystems/caching/object.rst
|
||||||
|
|
||||||
|
|
||||||
|
Statistical Information
|
||||||
|
=======================
|
||||||
|
|
||||||
|
If FS-Cache is compiled with the following options enabled::
|
||||||
|
|
||||||
|
CONFIG_FSCACHE_STATS=y
|
||||||
|
CONFIG_FSCACHE_HISTOGRAM=y
|
||||||
|
|
||||||
|
then it will gather certain statistics and display them through a number of
|
||||||
|
proc files.
|
||||||
|
|
||||||
|
/proc/fs/fscache/stats
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
This shows counts of a number of events that can happen in FS-Cache:
|
||||||
|
|
||||||
|
+--------------+-------+-------------------------------------------------------+
|
||||||
|
|CLASS |EVENT |MEANING |
|
||||||
|
+==============+=======+=======================================================+
|
||||||
|
|Cookies |idx=N |Number of index cookies allocated |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |dat=N |Number of data storage cookies allocated |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |spc=N |Number of special cookies allocated |
|
||||||
|
+--------------+-------+-------------------------------------------------------+
|
||||||
|
|Objects |alc=N |Number of objects allocated |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |nal=N |Number of object allocation failures |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |avl=N |Number of objects that reached the available state |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |ded=N |Number of objects that reached the dead state |
|
||||||
|
+--------------+-------+-------------------------------------------------------+
|
||||||
|
|ChkAux |non=N |Number of objects that didn't have a coherency check |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |ok=N |Number of objects that passed a coherency check |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |upd=N |Number of objects that needed a coherency data update |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |obs=N |Number of objects that were declared obsolete |
|
||||||
|
+--------------+-------+-------------------------------------------------------+
|
||||||
|
|Pages |mrk=N |Number of pages marked as being cached |
|
||||||
|
| |unc=N |Number of uncache page requests seen |
|
||||||
|
+--------------+-------+-------------------------------------------------------+
|
||||||
|
|Acquire |n=N |Number of acquire cookie requests seen |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |nul=N |Number of acq reqs given a NULL parent |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |noc=N |Number of acq reqs rejected due to no cache available |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |ok=N |Number of acq reqs succeeded |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |nbf=N |Number of acq reqs rejected due to error |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |oom=N |Number of acq reqs failed on ENOMEM |
|
||||||
|
+--------------+-------+-------------------------------------------------------+
|
||||||
|
|Lookups |n=N |Number of lookup calls made on cache backends |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |neg=N |Number of negative lookups made |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |pos=N |Number of positive lookups made |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |crt=N |Number of objects created by lookup |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |tmo=N |Number of lookups timed out and requeued |
|
||||||
|
+--------------+-------+-------------------------------------------------------+
|
||||||
|
|Updates |n=N |Number of update cookie requests seen |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |nul=N |Number of upd reqs given a NULL parent |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |run=N |Number of upd reqs granted CPU time |
|
||||||
|
+--------------+-------+-------------------------------------------------------+
|
||||||
|
|Relinqs |n=N |Number of relinquish cookie requests seen |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |nul=N |Number of rlq reqs given a NULL parent |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |wcr=N |Number of rlq reqs waited on completion of creation |
|
||||||
|
+--------------+-------+-------------------------------------------------------+
|
||||||
|
|AttrChg |n=N |Number of attribute changed requests seen |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |ok=N |Number of attr changed requests queued |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |nbf=N |Number of attr changed rejected -ENOBUFS |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |oom=N |Number of attr changed failed -ENOMEM |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |run=N |Number of attr changed ops given CPU time |
|
||||||
|
+--------------+-------+-------------------------------------------------------+
|
||||||
|
|Allocs |n=N |Number of allocation requests seen |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |ok=N |Number of successful alloc reqs |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |wt=N |Number of alloc reqs that waited on lookup completion |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |nbf=N |Number of alloc reqs rejected -ENOBUFS |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |int=N |Number of alloc reqs aborted -ERESTARTSYS |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |ops=N |Number of alloc reqs submitted |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |owt=N |Number of alloc reqs waited for CPU time |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |abt=N |Number of alloc reqs aborted due to object death |
|
||||||
|
+--------------+-------+-------------------------------------------------------+
|
||||||
|
|Retrvls |n=N |Number of retrieval (read) requests seen |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |ok=N |Number of successful retr reqs |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |wt=N |Number of retr reqs that waited on lookup completion |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |nod=N |Number of retr reqs returned -ENODATA |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |nbf=N |Number of retr reqs rejected -ENOBUFS |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |int=N |Number of retr reqs aborted -ERESTARTSYS |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |oom=N |Number of retr reqs failed -ENOMEM |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |ops=N |Number of retr reqs submitted |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |owt=N |Number of retr reqs waited for CPU time |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |abt=N |Number of retr reqs aborted due to object death |
|
||||||
|
+--------------+-------+-------------------------------------------------------+
|
||||||
|
|Stores |n=N |Number of storage (write) requests seen |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |ok=N |Number of successful store reqs |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |agn=N |Number of store reqs on a page already pending storage |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |nbf=N |Number of store reqs rejected -ENOBUFS |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |oom=N |Number of store reqs failed -ENOMEM |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |ops=N |Number of store reqs submitted |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |run=N |Number of store reqs granted CPU time |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |pgs=N |Number of pages given store req processing time |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |rxd=N |Number of store reqs deleted from tracking tree |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |olm=N |Number of store reqs over store limit |
|
||||||
|
+--------------+-------+-------------------------------------------------------+
|
||||||
|
|VmScan |nos=N |Number of release reqs against pages with no |
|
||||||
|
| | |pending store |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |gon=N |Number of release reqs against pages stored by |
|
||||||
|
| | |time lock granted |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |bsy=N |Number of release reqs ignored due to in-progress store|
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |can=N |Number of page stores cancelled due to release req |
|
||||||
|
+--------------+-------+-------------------------------------------------------+
|
||||||
|
|Ops |pend=N |Number of times async ops added to pending queues |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |run=N |Number of times async ops given CPU time |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |enq=N |Number of times async ops queued for processing |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |can=N |Number of async ops cancelled |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |rej=N |Number of async ops rejected due to object |
|
||||||
|
| | |lookup/create failure |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |ini=N |Number of async ops initialised |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |dfr=N |Number of async ops queued for deferred release |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |rel=N |Number of async ops released |
|
||||||
|
| | |(should equal ini=N when idle) |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |gc=N |Number of deferred-release async ops garbage collected |
|
||||||
|
+--------------+-------+-------------------------------------------------------+
|
||||||
|
|CacheOp |alo=N |Number of in-progress alloc_object() cache ops |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |luo=N |Number of in-progress lookup_object() cache ops |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |luc=N |Number of in-progress lookup_complete() cache ops |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |gro=N |Number of in-progress grab_object() cache ops |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |upo=N |Number of in-progress update_object() cache ops |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |dro=N |Number of in-progress drop_object() cache ops |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |pto=N |Number of in-progress put_object() cache ops |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |syn=N |Number of in-progress sync_cache() cache ops |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |atc=N |Number of in-progress attr_changed() cache ops |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |rap=N |Number of in-progress read_or_alloc_page() cache ops |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |ras=N |Number of in-progress read_or_alloc_pages() cache ops |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |alp=N |Number of in-progress allocate_page() cache ops |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |als=N |Number of in-progress allocate_pages() cache ops |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |wrp=N |Number of in-progress write_page() cache ops |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |ucp=N |Number of in-progress uncache_page() cache ops |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |dsp=N |Number of in-progress dissociate_pages() cache ops |
|
||||||
|
+--------------+-------+-------------------------------------------------------+
|
||||||
|
|CacheEv |nsp=N |Number of object lookups/creations rejected due to |
|
||||||
|
| | |lack of space |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |stl=N |Number of stale objects deleted |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |rtr=N |Number of objects retired when relinquished |
|
||||||
|
+ +-------+-------------------------------------------------------+
|
||||||
|
| |cul=N |Number of objects culled |
|
||||||
|
+--------------+-------+-------------------------------------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
/proc/fs/fscache/histogram
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
cat /proc/fs/fscache/histogram
|
||||||
|
JIFS SECS OBJ INST OP RUNS OBJ RUNS RETRV DLY RETRIEVLS
|
||||||
|
===== ===== ========= ========= ========= ========= =========
|
||||||
|
|
||||||
|
This shows the breakdown of the number of times each amount of time
|
||||||
|
between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The
|
||||||
|
columns are as follows:
|
||||||
|
|
||||||
|
========= =======================================================
|
||||||
|
COLUMN TIME MEASUREMENT
|
||||||
|
========= =======================================================
|
||||||
|
OBJ INST Length of time to instantiate an object
|
||||||
|
OP RUNS Length of time a call to process an operation took
|
||||||
|
OBJ RUNS Length of time a call to process an object event took
|
||||||
|
RETRV DLY Time between an requesting a read and lookup completing
|
||||||
|
RETRIEVLS Time between beginning and end of a retrieval
|
||||||
|
========= =======================================================
|
||||||
|
|
||||||
|
Each row shows the number of events that took a particular range of times.
|
||||||
|
Each step is 1 jiffy in size. The JIFS column indicates the particular
|
||||||
|
jiffy range covered, and the SECS field the equivalent number of seconds.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Object List
|
||||||
|
===========
|
||||||
|
|
||||||
|
If CONFIG_FSCACHE_OBJECT_LIST is enabled, the FS-Cache facility will maintain a
|
||||||
|
list of all the objects currently allocated and allow them to be viewed
|
||||||
|
through::
|
||||||
|
|
||||||
|
/proc/fs/fscache/objects
|
||||||
|
|
||||||
|
This will look something like::
|
||||||
|
|
||||||
|
[root@andromeda ~]# head /proc/fs/fscache/objects
|
||||||
|
OBJECT PARENT STAT CHLDN OPS OOP IPR EX READS EM EV F S | NETFS_COOKIE_DEF TY FL NETFS_DATA OBJECT_KEY, AUX_DATA
|
||||||
|
======== ======== ==== ===== === === === == ===== == == = = | ================ == == ================ ================
|
||||||
|
17e4b 2 ACTV 0 0 0 0 0 0 7b 4 0 0 | NFS.fh DT 0 ffff88001dd82820 010006017edcf8bbc93b43298fdfbe71e50b57b13a172c0117f38472, e567634700000000000000000000000063f2404a000000000000000000000000c9030000000000000000000063f2404a
|
||||||
|
1693a 2 ACTV 0 0 0 0 0 0 7b 4 0 0 | NFS.fh DT 0 ffff88002db23380 010006017edcf8bbc93b43298fdfbe71e50b57b1e0162c01a2df0ea6, 420ebc4a000000000000000000000000420ebc4a0000000000000000000000000e1801000000000000000000420ebc4a
|
||||||
|
|
||||||
|
where the first set of columns before the '|' describe the object:
|
||||||
|
|
||||||
|
======= ===============================================================
|
||||||
|
COLUMN DESCRIPTION
|
||||||
|
======= ===============================================================
|
||||||
|
OBJECT Object debugging ID (appears as OBJ%x in some debug messages)
|
||||||
|
PARENT Debugging ID of parent object
|
||||||
|
STAT Object state
|
||||||
|
CHLDN Number of child objects of this object
|
||||||
|
OPS Number of outstanding operations on this object
|
||||||
|
OOP Number of outstanding child object management operations
|
||||||
|
IPR
|
||||||
|
EX Number of outstanding exclusive operations
|
||||||
|
READS Number of outstanding read operations
|
||||||
|
EM Object's event mask
|
||||||
|
EV Events raised on this object
|
||||||
|
F Object flags
|
||||||
|
S Object work item busy state mask (1:pending 2:running)
|
||||||
|
======= ===============================================================
|
||||||
|
|
||||||
|
and the second set of columns describe the object's cookie, if present:
|
||||||
|
|
||||||
|
================ ======================================================
|
||||||
|
COLUMN DESCRIPTION
|
||||||
|
================ ======================================================
|
||||||
|
NETFS_COOKIE_DEF Name of netfs cookie definition
|
||||||
|
TY Cookie type (IX - index, DT - data, hex - special)
|
||||||
|
FL Cookie flags
|
||||||
|
NETFS_DATA Netfs private data stored in the cookie
|
||||||
|
OBJECT_KEY Object key } 1 column, with separating comma
|
||||||
|
AUX_DATA Object aux data } presence may be configured
|
||||||
|
================ ======================================================
|
||||||
|
|
||||||
|
The data shown may be filtered by attaching the a key to an appropriate keyring
|
||||||
|
before viewing the file. Something like::
|
||||||
|
|
||||||
|
keyctl add user fscache:objlist <restrictions> @s
|
||||||
|
|
||||||
|
where <restrictions> are a selection of the following letters:
|
||||||
|
|
||||||
|
== =========================================================
|
||||||
|
K Show hexdump of object key (don't show if not given)
|
||||||
|
A Show hexdump of object aux data (don't show if not given)
|
||||||
|
== =========================================================
|
||||||
|
|
||||||
|
and the following paired letters:
|
||||||
|
|
||||||
|
== =========================================================
|
||||||
|
C Show objects that have a cookie
|
||||||
|
c Show objects that don't have a cookie
|
||||||
|
B Show objects that are busy
|
||||||
|
b Show objects that aren't busy
|
||||||
|
W Show objects that have pending writes
|
||||||
|
w Show objects that don't have pending writes
|
||||||
|
R Show objects that have outstanding reads
|
||||||
|
r Show objects that don't have outstanding reads
|
||||||
|
S Show objects that have work queued
|
||||||
|
s Show objects that don't have work queued
|
||||||
|
== =========================================================
|
||||||
|
|
||||||
|
If neither side of a letter pair is given, then both are implied. For example:
|
||||||
|
|
||||||
|
keyctl add user fscache:objlist KB @s
|
||||||
|
|
||||||
|
shows objects that are busy, and lists their object keys, but does not dump
|
||||||
|
their auxiliary data. It also implies "CcWwRrSs", but as 'B' is given, 'b' is
|
||||||
|
not implied.
|
||||||
|
|
||||||
|
By default all objects and all fields will be shown.
|
||||||
|
|
||||||
|
|
||||||
|
Debugging
|
||||||
|
=========
|
||||||
|
|
||||||
|
If CONFIG_FSCACHE_DEBUG is enabled, the FS-Cache facility can have runtime
|
||||||
|
debugging enabled by adjusting the value in::
|
||||||
|
|
||||||
|
/sys/module/fscache/parameters/debug
|
||||||
|
|
||||||
|
This is a bitmask of debugging streams to enable:
|
||||||
|
|
||||||
|
======= ======= =============================== =======================
|
||||||
|
BIT VALUE STREAM POINT
|
||||||
|
======= ======= =============================== =======================
|
||||||
|
0 1 Cache management Function entry trace
|
||||||
|
1 2 Function exit trace
|
||||||
|
2 4 General
|
||||||
|
3 8 Cookie management Function entry trace
|
||||||
|
4 16 Function exit trace
|
||||||
|
5 32 General
|
||||||
|
6 64 Page handling Function entry trace
|
||||||
|
7 128 Function exit trace
|
||||||
|
8 256 General
|
||||||
|
9 512 Operation management Function entry trace
|
||||||
|
10 1024 Function exit trace
|
||||||
|
11 2048 General
|
||||||
|
======= ======= =============================== =======================
|
||||||
|
|
||||||
|
The appropriate set of values should be OR'd together and the result written to
|
||||||
|
the control file. For example::
|
||||||
|
|
||||||
|
echo $((1|8|64)) >/sys/module/fscache/parameters/debug
|
||||||
|
|
||||||
|
will turn on all function entry debugging.
|
@ -1,448 +0,0 @@
|
|||||||
==========================
|
|
||||||
General Filesystem Caching
|
|
||||||
==========================
|
|
||||||
|
|
||||||
========
|
|
||||||
OVERVIEW
|
|
||||||
========
|
|
||||||
|
|
||||||
This facility is a general purpose cache for network filesystems, though it
|
|
||||||
could be used for caching other things such as ISO9660 filesystems too.
|
|
||||||
|
|
||||||
FS-Cache mediates between cache backends (such as CacheFS) and network
|
|
||||||
filesystems:
|
|
||||||
|
|
||||||
+---------+
|
|
||||||
| | +--------------+
|
|
||||||
| NFS |--+ | |
|
|
||||||
| | | +-->| CacheFS |
|
|
||||||
+---------+ | +----------+ | | /dev/hda5 |
|
|
||||||
| | | | +--------------+
|
|
||||||
+---------+ +-->| | |
|
|
||||||
| | | |--+
|
|
||||||
| AFS |----->| FS-Cache |
|
|
||||||
| | | |--+
|
|
||||||
+---------+ +-->| | |
|
|
||||||
| | | | +--------------+
|
|
||||||
+---------+ | +----------+ | | |
|
|
||||||
| | | +-->| CacheFiles |
|
|
||||||
| ISOFS |--+ | /var/cache |
|
|
||||||
| | +--------------+
|
|
||||||
+---------+
|
|
||||||
|
|
||||||
Or to look at it another way, FS-Cache is a module that provides a caching
|
|
||||||
facility to a network filesystem such that the cache is transparent to the
|
|
||||||
user:
|
|
||||||
|
|
||||||
+---------+
|
|
||||||
| |
|
|
||||||
| Server |
|
|
||||||
| |
|
|
||||||
+---------+
|
|
||||||
| NETWORK
|
|
||||||
~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
|
||||||
| +----------+
|
|
||||||
V | |
|
|
||||||
+---------+ | |
|
|
||||||
| | | |
|
|
||||||
| NFS |----->| FS-Cache |
|
|
||||||
| | | |--+
|
|
||||||
+---------+ | | | +--------------+ +--------------+
|
|
||||||
| | | | | | | |
|
|
||||||
V +----------+ +-->| CacheFiles |-->| Ext3 |
|
|
||||||
+---------+ | /var/cache | | /dev/sda6 |
|
|
||||||
| | +--------------+ +--------------+
|
|
||||||
| VFS | ^ ^
|
|
||||||
| | | |
|
|
||||||
+---------+ +--------------+ |
|
|
||||||
| KERNEL SPACE | |
|
|
||||||
~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|~~~~~~|~~~~
|
|
||||||
| USER SPACE | |
|
|
||||||
V | |
|
|
||||||
+---------+ +--------------+
|
|
||||||
| | | |
|
|
||||||
| Process | | cachefilesd |
|
|
||||||
| | | |
|
|
||||||
+---------+ +--------------+
|
|
||||||
|
|
||||||
|
|
||||||
FS-Cache does not follow the idea of completely loading every netfs file
|
|
||||||
opened in its entirety into a cache before permitting it to be accessed and
|
|
||||||
then serving the pages out of that cache rather than the netfs inode because:
|
|
||||||
|
|
||||||
(1) It must be practical to operate without a cache.
|
|
||||||
|
|
||||||
(2) The size of any accessible file must not be limited to the size of the
|
|
||||||
cache.
|
|
||||||
|
|
||||||
(3) The combined size of all opened files (this includes mapped libraries)
|
|
||||||
must not be limited to the size of the cache.
|
|
||||||
|
|
||||||
(4) The user should not be forced to download an entire file just to do a
|
|
||||||
one-off access of a small portion of it (such as might be done with the
|
|
||||||
"file" program).
|
|
||||||
|
|
||||||
It instead serves the cache out in PAGE_SIZE chunks as and when requested by
|
|
||||||
the netfs('s) using it.
|
|
||||||
|
|
||||||
|
|
||||||
FS-Cache provides the following facilities:
|
|
||||||
|
|
||||||
(1) More than one cache can be used at once. Caches can be selected
|
|
||||||
explicitly by use of tags.
|
|
||||||
|
|
||||||
(2) Caches can be added / removed at any time.
|
|
||||||
|
|
||||||
(3) The netfs is provided with an interface that allows either party to
|
|
||||||
withdraw caching facilities from a file (required for (2)).
|
|
||||||
|
|
||||||
(4) The interface to the netfs returns as few errors as possible, preferring
|
|
||||||
rather to let the netfs remain oblivious.
|
|
||||||
|
|
||||||
(5) Cookies are used to represent indices, files and other objects to the
|
|
||||||
netfs. The simplest cookie is just a NULL pointer - indicating nothing
|
|
||||||
cached there.
|
|
||||||
|
|
||||||
(6) The netfs is allowed to propose - dynamically - any index hierarchy it
|
|
||||||
desires, though it must be aware that the index search function is
|
|
||||||
recursive, stack space is limited, and indices can only be children of
|
|
||||||
indices.
|
|
||||||
|
|
||||||
(7) Data I/O is done direct to and from the netfs's pages. The netfs
|
|
||||||
indicates that page A is at index B of the data-file represented by cookie
|
|
||||||
C, and that it should be read or written. The cache backend may or may
|
|
||||||
not start I/O on that page, but if it does, a netfs callback will be
|
|
||||||
invoked to indicate completion. The I/O may be either synchronous or
|
|
||||||
asynchronous.
|
|
||||||
|
|
||||||
(8) Cookies can be "retired" upon release. At this point FS-Cache will mark
|
|
||||||
them as obsolete and the index hierarchy rooted at that point will get
|
|
||||||
recycled.
|
|
||||||
|
|
||||||
(9) The netfs provides a "match" function for index searches. In addition to
|
|
||||||
saying whether a match was made or not, this can also specify that an
|
|
||||||
entry should be updated or deleted.
|
|
||||||
|
|
||||||
(10) As much as possible is done asynchronously.
|
|
||||||
|
|
||||||
|
|
||||||
FS-Cache maintains a virtual indexing tree in which all indices, files, objects
|
|
||||||
and pages are kept. Bits of this tree may actually reside in one or more
|
|
||||||
caches.
|
|
||||||
|
|
||||||
FSDEF
|
|
||||||
|
|
|
||||||
+------------------------------------+
|
|
||||||
| |
|
|
||||||
NFS AFS
|
|
||||||
| |
|
|
||||||
+--------------------------+ +-----------+
|
|
||||||
| | | |
|
|
||||||
homedir mirror afs.org redhat.com
|
|
||||||
| | |
|
|
||||||
+------------+ +---------------+ +----------+
|
|
||||||
| | | | | |
|
|
||||||
00001 00002 00007 00125 vol00001 vol00002
|
|
||||||
| | | | |
|
|
||||||
+---+---+ +-----+ +---+ +------+------+ +-----+----+
|
|
||||||
| | | | | | | | | | | | |
|
|
||||||
PG0 PG1 PG2 PG0 XATTR PG0 PG1 DIRENT DIRENT DIRENT R/W R/O Bak
|
|
||||||
| |
|
|
||||||
PG0 +-------+
|
|
||||||
| |
|
|
||||||
00001 00003
|
|
||||||
|
|
|
||||||
+---+---+
|
|
||||||
| | |
|
|
||||||
PG0 PG1 PG2
|
|
||||||
|
|
||||||
In the example above, you can see two netfs's being backed: NFS and AFS. These
|
|
||||||
have different index hierarchies:
|
|
||||||
|
|
||||||
(*) The NFS primary index contains per-server indices. Each server index is
|
|
||||||
indexed by NFS file handles to get data file objects. Each data file
|
|
||||||
objects can have an array of pages, but may also have further child
|
|
||||||
objects, such as extended attributes and directory entries. Extended
|
|
||||||
attribute objects themselves have page-array contents.
|
|
||||||
|
|
||||||
(*) The AFS primary index contains per-cell indices. Each cell index contains
|
|
||||||
per-logical-volume indices. Each of volume index contains up to three
|
|
||||||
indices for the read-write, read-only and backup mirrors of those volumes.
|
|
||||||
Each of these contains vnode data file objects, each of which contains an
|
|
||||||
array of pages.
|
|
||||||
|
|
||||||
The very top index is the FS-Cache master index in which individual netfs's
|
|
||||||
have entries.
|
|
||||||
|
|
||||||
Any index object may reside in more than one cache, provided it only has index
|
|
||||||
children. Any index with non-index object children will be assumed to only
|
|
||||||
reside in one cache.
|
|
||||||
|
|
||||||
|
|
||||||
The netfs API to FS-Cache can be found in:
|
|
||||||
|
|
||||||
Documentation/filesystems/caching/netfs-api.txt
|
|
||||||
|
|
||||||
The cache backend API to FS-Cache can be found in:
|
|
||||||
|
|
||||||
Documentation/filesystems/caching/backend-api.txt
|
|
||||||
|
|
||||||
A description of the internal representations and object state machine can be
|
|
||||||
found in:
|
|
||||||
|
|
||||||
Documentation/filesystems/caching/object.txt
|
|
||||||
|
|
||||||
|
|
||||||
=======================
|
|
||||||
STATISTICAL INFORMATION
|
|
||||||
=======================
|
|
||||||
|
|
||||||
If FS-Cache is compiled with the following options enabled:
|
|
||||||
|
|
||||||
CONFIG_FSCACHE_STATS=y
|
|
||||||
CONFIG_FSCACHE_HISTOGRAM=y
|
|
||||||
|
|
||||||
then it will gather certain statistics and display them through a number of
|
|
||||||
proc files.
|
|
||||||
|
|
||||||
(*) /proc/fs/fscache/stats
|
|
||||||
|
|
||||||
This shows counts of a number of events that can happen in FS-Cache:
|
|
||||||
|
|
||||||
CLASS EVENT MEANING
|
|
||||||
======= ======= =======================================================
|
|
||||||
Cookies idx=N Number of index cookies allocated
|
|
||||||
dat=N Number of data storage cookies allocated
|
|
||||||
spc=N Number of special cookies allocated
|
|
||||||
Objects alc=N Number of objects allocated
|
|
||||||
nal=N Number of object allocation failures
|
|
||||||
avl=N Number of objects that reached the available state
|
|
||||||
ded=N Number of objects that reached the dead state
|
|
||||||
ChkAux non=N Number of objects that didn't have a coherency check
|
|
||||||
ok=N Number of objects that passed a coherency check
|
|
||||||
upd=N Number of objects that needed a coherency data update
|
|
||||||
obs=N Number of objects that were declared obsolete
|
|
||||||
Pages mrk=N Number of pages marked as being cached
|
|
||||||
unc=N Number of uncache page requests seen
|
|
||||||
Acquire n=N Number of acquire cookie requests seen
|
|
||||||
nul=N Number of acq reqs given a NULL parent
|
|
||||||
noc=N Number of acq reqs rejected due to no cache available
|
|
||||||
ok=N Number of acq reqs succeeded
|
|
||||||
nbf=N Number of acq reqs rejected due to error
|
|
||||||
oom=N Number of acq reqs failed on ENOMEM
|
|
||||||
Lookups n=N Number of lookup calls made on cache backends
|
|
||||||
neg=N Number of negative lookups made
|
|
||||||
pos=N Number of positive lookups made
|
|
||||||
crt=N Number of objects created by lookup
|
|
||||||
tmo=N Number of lookups timed out and requeued
|
|
||||||
Updates n=N Number of update cookie requests seen
|
|
||||||
nul=N Number of upd reqs given a NULL parent
|
|
||||||
run=N Number of upd reqs granted CPU time
|
|
||||||
Relinqs n=N Number of relinquish cookie requests seen
|
|
||||||
nul=N Number of rlq reqs given a NULL parent
|
|
||||||
wcr=N Number of rlq reqs waited on completion of creation
|
|
||||||
AttrChg n=N Number of attribute changed requests seen
|
|
||||||
ok=N Number of attr changed requests queued
|
|
||||||
nbf=N Number of attr changed rejected -ENOBUFS
|
|
||||||
oom=N Number of attr changed failed -ENOMEM
|
|
||||||
run=N Number of attr changed ops given CPU time
|
|
||||||
Allocs n=N Number of allocation requests seen
|
|
||||||
ok=N Number of successful alloc reqs
|
|
||||||
wt=N Number of alloc reqs that waited on lookup completion
|
|
||||||
nbf=N Number of alloc reqs rejected -ENOBUFS
|
|
||||||
int=N Number of alloc reqs aborted -ERESTARTSYS
|
|
||||||
ops=N Number of alloc reqs submitted
|
|
||||||
owt=N Number of alloc reqs waited for CPU time
|
|
||||||
abt=N Number of alloc reqs aborted due to object death
|
|
||||||
Retrvls n=N Number of retrieval (read) requests seen
|
|
||||||
ok=N Number of successful retr reqs
|
|
||||||
wt=N Number of retr reqs that waited on lookup completion
|
|
||||||
nod=N Number of retr reqs returned -ENODATA
|
|
||||||
nbf=N Number of retr reqs rejected -ENOBUFS
|
|
||||||
int=N Number of retr reqs aborted -ERESTARTSYS
|
|
||||||
oom=N Number of retr reqs failed -ENOMEM
|
|
||||||
ops=N Number of retr reqs submitted
|
|
||||||
owt=N Number of retr reqs waited for CPU time
|
|
||||||
abt=N Number of retr reqs aborted due to object death
|
|
||||||
Stores n=N Number of storage (write) requests seen
|
|
||||||
ok=N Number of successful store reqs
|
|
||||||
agn=N Number of store reqs on a page already pending storage
|
|
||||||
nbf=N Number of store reqs rejected -ENOBUFS
|
|
||||||
oom=N Number of store reqs failed -ENOMEM
|
|
||||||
ops=N Number of store reqs submitted
|
|
||||||
run=N Number of store reqs granted CPU time
|
|
||||||
pgs=N Number of pages given store req processing time
|
|
||||||
rxd=N Number of store reqs deleted from tracking tree
|
|
||||||
olm=N Number of store reqs over store limit
|
|
||||||
VmScan nos=N Number of release reqs against pages with no pending store
|
|
||||||
gon=N Number of release reqs against pages stored by time lock granted
|
|
||||||
bsy=N Number of release reqs ignored due to in-progress store
|
|
||||||
can=N Number of page stores cancelled due to release req
|
|
||||||
Ops pend=N Number of times async ops added to pending queues
|
|
||||||
run=N Number of times async ops given CPU time
|
|
||||||
enq=N Number of times async ops queued for processing
|
|
||||||
can=N Number of async ops cancelled
|
|
||||||
rej=N Number of async ops rejected due to object lookup/create failure
|
|
||||||
ini=N Number of async ops initialised
|
|
||||||
dfr=N Number of async ops queued for deferred release
|
|
||||||
rel=N Number of async ops released (should equal ini=N when idle)
|
|
||||||
gc=N Number of deferred-release async ops garbage collected
|
|
||||||
CacheOp alo=N Number of in-progress alloc_object() cache ops
|
|
||||||
luo=N Number of in-progress lookup_object() cache ops
|
|
||||||
luc=N Number of in-progress lookup_complete() cache ops
|
|
||||||
gro=N Number of in-progress grab_object() cache ops
|
|
||||||
upo=N Number of in-progress update_object() cache ops
|
|
||||||
dro=N Number of in-progress drop_object() cache ops
|
|
||||||
pto=N Number of in-progress put_object() cache ops
|
|
||||||
syn=N Number of in-progress sync_cache() cache ops
|
|
||||||
atc=N Number of in-progress attr_changed() cache ops
|
|
||||||
rap=N Number of in-progress read_or_alloc_page() cache ops
|
|
||||||
ras=N Number of in-progress read_or_alloc_pages() cache ops
|
|
||||||
alp=N Number of in-progress allocate_page() cache ops
|
|
||||||
als=N Number of in-progress allocate_pages() cache ops
|
|
||||||
wrp=N Number of in-progress write_page() cache ops
|
|
||||||
ucp=N Number of in-progress uncache_page() cache ops
|
|
||||||
dsp=N Number of in-progress dissociate_pages() cache ops
|
|
||||||
CacheEv nsp=N Number of object lookups/creations rejected due to lack of space
|
|
||||||
stl=N Number of stale objects deleted
|
|
||||||
rtr=N Number of objects retired when relinquished
|
|
||||||
cul=N Number of objects culled
|
|
||||||
|
|
||||||
|
|
||||||
(*) /proc/fs/fscache/histogram
|
|
||||||
|
|
||||||
cat /proc/fs/fscache/histogram
|
|
||||||
JIFS SECS OBJ INST OP RUNS OBJ RUNS RETRV DLY RETRIEVLS
|
|
||||||
===== ===== ========= ========= ========= ========= =========
|
|
||||||
|
|
||||||
This shows the breakdown of the number of times each amount of time
|
|
||||||
between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The
|
|
||||||
columns are as follows:
|
|
||||||
|
|
||||||
COLUMN TIME MEASUREMENT
|
|
||||||
======= =======================================================
|
|
||||||
OBJ INST Length of time to instantiate an object
|
|
||||||
OP RUNS Length of time a call to process an operation took
|
|
||||||
OBJ RUNS Length of time a call to process an object event took
|
|
||||||
RETRV DLY Time between an requesting a read and lookup completing
|
|
||||||
RETRIEVLS Time between beginning and end of a retrieval
|
|
||||||
|
|
||||||
Each row shows the number of events that took a particular range of times.
|
|
||||||
Each step is 1 jiffy in size. The JIFS column indicates the particular
|
|
||||||
jiffy range covered, and the SECS field the equivalent number of seconds.
|
|
||||||
|
|
||||||
|
|
||||||
===========
|
|
||||||
OBJECT LIST
|
|
||||||
===========
|
|
||||||
|
|
||||||
If CONFIG_FSCACHE_OBJECT_LIST is enabled, the FS-Cache facility will maintain a
|
|
||||||
list of all the objects currently allocated and allow them to be viewed
|
|
||||||
through:
|
|
||||||
|
|
||||||
/proc/fs/fscache/objects
|
|
||||||
|
|
||||||
This will look something like:
|
|
||||||
|
|
||||||
[root@andromeda ~]# head /proc/fs/fscache/objects
|
|
||||||
OBJECT PARENT STAT CHLDN OPS OOP IPR EX READS EM EV F S | NETFS_COOKIE_DEF TY FL NETFS_DATA OBJECT_KEY, AUX_DATA
|
|
||||||
======== ======== ==== ===== === === === == ===== == == = = | ================ == == ================ ================
|
|
||||||
17e4b 2 ACTV 0 0 0 0 0 0 7b 4 0 0 | NFS.fh DT 0 ffff88001dd82820 010006017edcf8bbc93b43298fdfbe71e50b57b13a172c0117f38472, e567634700000000000000000000000063f2404a000000000000000000000000c9030000000000000000000063f2404a
|
|
||||||
1693a 2 ACTV 0 0 0 0 0 0 7b 4 0 0 | NFS.fh DT 0 ffff88002db23380 010006017edcf8bbc93b43298fdfbe71e50b57b1e0162c01a2df0ea6, 420ebc4a000000000000000000000000420ebc4a0000000000000000000000000e1801000000000000000000420ebc4a
|
|
||||||
|
|
||||||
where the first set of columns before the '|' describe the object:
|
|
||||||
|
|
||||||
COLUMN DESCRIPTION
|
|
||||||
======= ===============================================================
|
|
||||||
OBJECT Object debugging ID (appears as OBJ%x in some debug messages)
|
|
||||||
PARENT Debugging ID of parent object
|
|
||||||
STAT Object state
|
|
||||||
CHLDN Number of child objects of this object
|
|
||||||
OPS Number of outstanding operations on this object
|
|
||||||
OOP Number of outstanding child object management operations
|
|
||||||
IPR
|
|
||||||
EX Number of outstanding exclusive operations
|
|
||||||
READS Number of outstanding read operations
|
|
||||||
EM Object's event mask
|
|
||||||
EV Events raised on this object
|
|
||||||
F Object flags
|
|
||||||
S Object work item busy state mask (1:pending 2:running)
|
|
||||||
|
|
||||||
and the second set of columns describe the object's cookie, if present:
|
|
||||||
|
|
||||||
COLUMN DESCRIPTION
|
|
||||||
=============== =======================================================
|
|
||||||
NETFS_COOKIE_DEF Name of netfs cookie definition
|
|
||||||
TY Cookie type (IX - index, DT - data, hex - special)
|
|
||||||
FL Cookie flags
|
|
||||||
NETFS_DATA Netfs private data stored in the cookie
|
|
||||||
OBJECT_KEY Object key } 1 column, with separating comma
|
|
||||||
AUX_DATA Object aux data } presence may be configured
|
|
||||||
|
|
||||||
The data shown may be filtered by attaching the a key to an appropriate keyring
|
|
||||||
before viewing the file. Something like:
|
|
||||||
|
|
||||||
keyctl add user fscache:objlist <restrictions> @s
|
|
||||||
|
|
||||||
where <restrictions> are a selection of the following letters:
|
|
||||||
|
|
||||||
K Show hexdump of object key (don't show if not given)
|
|
||||||
A Show hexdump of object aux data (don't show if not given)
|
|
||||||
|
|
||||||
and the following paired letters:
|
|
||||||
|
|
||||||
C Show objects that have a cookie
|
|
||||||
c Show objects that don't have a cookie
|
|
||||||
B Show objects that are busy
|
|
||||||
b Show objects that aren't busy
|
|
||||||
W Show objects that have pending writes
|
|
||||||
w Show objects that don't have pending writes
|
|
||||||
R Show objects that have outstanding reads
|
|
||||||
r Show objects that don't have outstanding reads
|
|
||||||
S Show objects that have work queued
|
|
||||||
s Show objects that don't have work queued
|
|
||||||
|
|
||||||
If neither side of a letter pair is given, then both are implied. For example:
|
|
||||||
|
|
||||||
keyctl add user fscache:objlist KB @s
|
|
||||||
|
|
||||||
shows objects that are busy, and lists their object keys, but does not dump
|
|
||||||
their auxiliary data. It also implies "CcWwRrSs", but as 'B' is given, 'b' is
|
|
||||||
not implied.
|
|
||||||
|
|
||||||
By default all objects and all fields will be shown.
|
|
||||||
|
|
||||||
|
|
||||||
=========
|
|
||||||
DEBUGGING
|
|
||||||
=========
|
|
||||||
|
|
||||||
If CONFIG_FSCACHE_DEBUG is enabled, the FS-Cache facility can have runtime
|
|
||||||
debugging enabled by adjusting the value in:
|
|
||||||
|
|
||||||
/sys/module/fscache/parameters/debug
|
|
||||||
|
|
||||||
This is a bitmask of debugging streams to enable:
|
|
||||||
|
|
||||||
BIT VALUE STREAM POINT
|
|
||||||
======= ======= =============================== =======================
|
|
||||||
0 1 Cache management Function entry trace
|
|
||||||
1 2 Function exit trace
|
|
||||||
2 4 General
|
|
||||||
3 8 Cookie management Function entry trace
|
|
||||||
4 16 Function exit trace
|
|
||||||
5 32 General
|
|
||||||
6 64 Page handling Function entry trace
|
|
||||||
7 128 Function exit trace
|
|
||||||
8 256 General
|
|
||||||
9 512 Operation management Function entry trace
|
|
||||||
10 1024 Function exit trace
|
|
||||||
11 2048 General
|
|
||||||
|
|
||||||
The appropriate set of values should be OR'd together and the result written to
|
|
||||||
the control file. For example:
|
|
||||||
|
|
||||||
echo $((1|8|64)) >/sys/module/fscache/parameters/debug
|
|
||||||
|
|
||||||
will turn on all function entry debugging.
|
|
14
Documentation/filesystems/caching/index.rst
Normal file
14
Documentation/filesystems/caching/index.rst
Normal file
@ -0,0 +1,14 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
Filesystem Caching
|
||||||
|
==================
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
fscache
|
||||||
|
object
|
||||||
|
backend-api
|
||||||
|
cachefiles
|
||||||
|
netfs-api
|
||||||
|
operations
|
@ -1,6 +1,8 @@
|
|||||||
===============================
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
FS-CACHE NETWORK FILESYSTEM API
|
|
||||||
===============================
|
===============================
|
||||||
|
FS-Cache Network Filesystem API
|
||||||
|
===============================
|
||||||
|
|
||||||
There's an API by which a network filesystem can make use of the FS-Cache
|
There's an API by which a network filesystem can make use of the FS-Cache
|
||||||
facilities. This is based around a number of principles:
|
facilities. This is based around a number of principles:
|
||||||
@ -19,7 +21,7 @@ facilities. This is based around a number of principles:
|
|||||||
|
|
||||||
This API is declared in <linux/fscache.h>.
|
This API is declared in <linux/fscache.h>.
|
||||||
|
|
||||||
This document contains the following sections:
|
.. This document contains the following sections:
|
||||||
|
|
||||||
(1) Network filesystem definition
|
(1) Network filesystem definition
|
||||||
(2) Index definition
|
(2) Index definition
|
||||||
@ -41,12 +43,11 @@ This document contains the following sections:
|
|||||||
(18) FS-Cache specific page flags.
|
(18) FS-Cache specific page flags.
|
||||||
|
|
||||||
|
|
||||||
=============================
|
Network Filesystem Definition
|
||||||
NETWORK FILESYSTEM DEFINITION
|
|
||||||
=============================
|
=============================
|
||||||
|
|
||||||
FS-Cache needs a description of the network filesystem. This is specified
|
FS-Cache needs a description of the network filesystem. This is specified
|
||||||
using a record of the following structure:
|
using a record of the following structure::
|
||||||
|
|
||||||
struct fscache_netfs {
|
struct fscache_netfs {
|
||||||
uint32_t version;
|
uint32_t version;
|
||||||
@ -71,7 +72,7 @@ The fields are:
|
|||||||
another parameter passed into the registration function.
|
another parameter passed into the registration function.
|
||||||
|
|
||||||
For example, kAFS (linux/fs/afs/) uses the following definitions to describe
|
For example, kAFS (linux/fs/afs/) uses the following definitions to describe
|
||||||
itself:
|
itself::
|
||||||
|
|
||||||
struct fscache_netfs afs_cache_netfs = {
|
struct fscache_netfs afs_cache_netfs = {
|
||||||
.version = 0,
|
.version = 0,
|
||||||
@ -79,8 +80,7 @@ itself:
|
|||||||
};
|
};
|
||||||
|
|
||||||
|
|
||||||
================
|
Index Definition
|
||||||
INDEX DEFINITION
|
|
||||||
================
|
================
|
||||||
|
|
||||||
Indices are used for two purposes:
|
Indices are used for two purposes:
|
||||||
@ -114,11 +114,10 @@ There are some limits on indices:
|
|||||||
function is recursive. Too many layers will run the kernel out of stack.
|
function is recursive. Too many layers will run the kernel out of stack.
|
||||||
|
|
||||||
|
|
||||||
=================
|
Object Definition
|
||||||
OBJECT DEFINITION
|
|
||||||
=================
|
=================
|
||||||
|
|
||||||
To define an object, a structure of the following type should be filled out:
|
To define an object, a structure of the following type should be filled out::
|
||||||
|
|
||||||
struct fscache_cookie_def
|
struct fscache_cookie_def
|
||||||
{
|
{
|
||||||
@ -149,16 +148,13 @@ This has the following fields:
|
|||||||
|
|
||||||
This is one of the following values:
|
This is one of the following values:
|
||||||
|
|
||||||
(*) FSCACHE_COOKIE_TYPE_INDEX
|
FSCACHE_COOKIE_TYPE_INDEX
|
||||||
|
|
||||||
This defines an index, which is a special FS-Cache type.
|
This defines an index, which is a special FS-Cache type.
|
||||||
|
|
||||||
(*) FSCACHE_COOKIE_TYPE_DATAFILE
|
FSCACHE_COOKIE_TYPE_DATAFILE
|
||||||
|
|
||||||
This defines an ordinary data file.
|
This defines an ordinary data file.
|
||||||
|
|
||||||
(*) Any other value between 2 and 255
|
Any other value between 2 and 255
|
||||||
|
|
||||||
This defines an extraordinary object such as an XATTR.
|
This defines an extraordinary object such as an XATTR.
|
||||||
|
|
||||||
(2) The name of the object type (NUL terminated unless all 16 chars are used)
|
(2) The name of the object type (NUL terminated unless all 16 chars are used)
|
||||||
@ -192,9 +188,14 @@ This has the following fields:
|
|||||||
|
|
||||||
If present, the function should return one of the following values:
|
If present, the function should return one of the following values:
|
||||||
|
|
||||||
(*) FSCACHE_CHECKAUX_OKAY - the entry is okay as is
|
FSCACHE_CHECKAUX_OKAY
|
||||||
(*) FSCACHE_CHECKAUX_NEEDS_UPDATE - the entry requires update
|
- the entry is okay as is
|
||||||
(*) FSCACHE_CHECKAUX_OBSOLETE - the entry should be deleted
|
|
||||||
|
FSCACHE_CHECKAUX_NEEDS_UPDATE
|
||||||
|
- the entry requires update
|
||||||
|
|
||||||
|
FSCACHE_CHECKAUX_OBSOLETE
|
||||||
|
- the entry should be deleted
|
||||||
|
|
||||||
This function can also be used to extract data from the auxiliary data in
|
This function can also be used to extract data from the auxiliary data in
|
||||||
the cache and copy it into the netfs's structures.
|
the cache and copy it into the netfs's structures.
|
||||||
@ -236,32 +237,30 @@ This has the following fields:
|
|||||||
This function is not required for indices as they're not permitted data.
|
This function is not required for indices as they're not permitted data.
|
||||||
|
|
||||||
|
|
||||||
===================================
|
Network Filesystem (Un)registration
|
||||||
NETWORK FILESYSTEM (UN)REGISTRATION
|
|
||||||
===================================
|
===================================
|
||||||
|
|
||||||
The first step is to declare the network filesystem to the cache. This also
|
The first step is to declare the network filesystem to the cache. This also
|
||||||
involves specifying the layout of the primary index (for AFS, this would be the
|
involves specifying the layout of the primary index (for AFS, this would be the
|
||||||
"cell" level).
|
"cell" level).
|
||||||
|
|
||||||
The registration function is:
|
The registration function is::
|
||||||
|
|
||||||
int fscache_register_netfs(struct fscache_netfs *netfs);
|
int fscache_register_netfs(struct fscache_netfs *netfs);
|
||||||
|
|
||||||
It just takes a pointer to the netfs definition. It returns 0 or an error as
|
It just takes a pointer to the netfs definition. It returns 0 or an error as
|
||||||
appropriate.
|
appropriate.
|
||||||
|
|
||||||
For kAFS, registration is done as follows:
|
For kAFS, registration is done as follows::
|
||||||
|
|
||||||
ret = fscache_register_netfs(&afs_cache_netfs);
|
ret = fscache_register_netfs(&afs_cache_netfs);
|
||||||
|
|
||||||
The last step is, of course, unregistration:
|
The last step is, of course, unregistration::
|
||||||
|
|
||||||
void fscache_unregister_netfs(struct fscache_netfs *netfs);
|
void fscache_unregister_netfs(struct fscache_netfs *netfs);
|
||||||
|
|
||||||
|
|
||||||
================
|
Cache Tag Lookup
|
||||||
CACHE TAG LOOKUP
|
|
||||||
================
|
================
|
||||||
|
|
||||||
FS-Cache permits the use of more than one cache. To permit particular index
|
FS-Cache permits the use of more than one cache. To permit particular index
|
||||||
@ -270,7 +269,7 @@ representation tags. This step is optional; it can be left entirely up to
|
|||||||
FS-Cache as to which cache should be used. The problem with doing that is that
|
FS-Cache as to which cache should be used. The problem with doing that is that
|
||||||
FS-Cache will always pick the first cache that was registered.
|
FS-Cache will always pick the first cache that was registered.
|
||||||
|
|
||||||
To get the representation for a named tag:
|
To get the representation for a named tag::
|
||||||
|
|
||||||
struct fscache_cache_tag *fscache_lookup_cache_tag(const char *name);
|
struct fscache_cache_tag *fscache_lookup_cache_tag(const char *name);
|
||||||
|
|
||||||
@ -278,7 +277,7 @@ This takes a text string as the name and returns a representation of a tag. It
|
|||||||
will never return an error. It may return a dummy tag, however, if it runs out
|
will never return an error. It may return a dummy tag, however, if it runs out
|
||||||
of memory; this will inhibit caching with this tag.
|
of memory; this will inhibit caching with this tag.
|
||||||
|
|
||||||
Any representation so obtained must be released by passing it to this function:
|
Any representation so obtained must be released by passing it to this function::
|
||||||
|
|
||||||
void fscache_release_cache_tag(struct fscache_cache_tag *tag);
|
void fscache_release_cache_tag(struct fscache_cache_tag *tag);
|
||||||
|
|
||||||
@ -286,13 +285,12 @@ The tag will be retrieved by FS-Cache when it calls the object definition
|
|||||||
operation select_cache().
|
operation select_cache().
|
||||||
|
|
||||||
|
|
||||||
==================
|
Index Registration
|
||||||
INDEX REGISTRATION
|
|
||||||
==================
|
==================
|
||||||
|
|
||||||
The third step is to inform FS-Cache about part of an index hierarchy that can
|
The third step is to inform FS-Cache about part of an index hierarchy that can
|
||||||
be used to locate files. This is done by requesting a cookie for each index in
|
be used to locate files. This is done by requesting a cookie for each index in
|
||||||
the path to the file:
|
the path to the file::
|
||||||
|
|
||||||
struct fscache_cookie *
|
struct fscache_cookie *
|
||||||
fscache_acquire_cookie(struct fscache_cookie *parent,
|
fscache_acquire_cookie(struct fscache_cookie *parent,
|
||||||
@ -339,7 +337,7 @@ must be enabled to do anything with it. A disabled cookie can be enabled by
|
|||||||
calling fscache_enable_cookie() (see below).
|
calling fscache_enable_cookie() (see below).
|
||||||
|
|
||||||
For example, with AFS, a cell would be added to the primary index. This index
|
For example, with AFS, a cell would be added to the primary index. This index
|
||||||
entry would have a dependent inode containing volume mappings within this cell:
|
entry would have a dependent inode containing volume mappings within this cell::
|
||||||
|
|
||||||
cell->cache =
|
cell->cache =
|
||||||
fscache_acquire_cookie(afs_cache_netfs.primary_index,
|
fscache_acquire_cookie(afs_cache_netfs.primary_index,
|
||||||
@ -349,7 +347,7 @@ entry would have a dependent inode containing volume mappings within this cell:
|
|||||||
cell, 0, true);
|
cell, 0, true);
|
||||||
|
|
||||||
And then a particular volume could be added to that index by ID, creating
|
And then a particular volume could be added to that index by ID, creating
|
||||||
another index for vnodes (AFS inode equivalents):
|
another index for vnodes (AFS inode equivalents)::
|
||||||
|
|
||||||
volume->cache =
|
volume->cache =
|
||||||
fscache_acquire_cookie(volume->cell->cache,
|
fscache_acquire_cookie(volume->cell->cache,
|
||||||
@ -359,13 +357,12 @@ another index for vnodes (AFS inode equivalents):
|
|||||||
volume, 0, true);
|
volume, 0, true);
|
||||||
|
|
||||||
|
|
||||||
======================
|
Data File Registration
|
||||||
DATA FILE REGISTRATION
|
|
||||||
======================
|
======================
|
||||||
|
|
||||||
The fourth step is to request a data file be created in the cache. This is
|
The fourth step is to request a data file be created in the cache. This is
|
||||||
identical to index cookie acquisition. The only difference is that the type in
|
identical to index cookie acquisition. The only difference is that the type in
|
||||||
the object definition should be something other than index type.
|
the object definition should be something other than index type::
|
||||||
|
|
||||||
vnode->cache =
|
vnode->cache =
|
||||||
fscache_acquire_cookie(volume->cache,
|
fscache_acquire_cookie(volume->cache,
|
||||||
@ -375,15 +372,14 @@ the object definition should be something other than index type.
|
|||||||
vnode, vnode->status.size, true);
|
vnode, vnode->status.size, true);
|
||||||
|
|
||||||
|
|
||||||
=================================
|
Miscellaneous Object Registration
|
||||||
MISCELLANEOUS OBJECT REGISTRATION
|
|
||||||
=================================
|
=================================
|
||||||
|
|
||||||
An optional step is to request an object of miscellaneous type be created in
|
An optional step is to request an object of miscellaneous type be created in
|
||||||
the cache. This is almost identical to index cookie acquisition. The only
|
the cache. This is almost identical to index cookie acquisition. The only
|
||||||
difference is that the type in the object definition should be something other
|
difference is that the type in the object definition should be something other
|
||||||
than index type. While the parent object could be an index, it's more likely
|
than index type. While the parent object could be an index, it's more likely
|
||||||
it would be some other type of object such as a data file.
|
it would be some other type of object such as a data file::
|
||||||
|
|
||||||
xattr->cache =
|
xattr->cache =
|
||||||
fscache_acquire_cookie(vnode->cache,
|
fscache_acquire_cookie(vnode->cache,
|
||||||
@ -396,13 +392,12 @@ Miscellaneous objects might be used to store extended attributes or directory
|
|||||||
entries for example.
|
entries for example.
|
||||||
|
|
||||||
|
|
||||||
==========================
|
Setting the Data File Size
|
||||||
SETTING THE DATA FILE SIZE
|
|
||||||
==========================
|
==========================
|
||||||
|
|
||||||
The fifth step is to set the physical attributes of the file, such as its size.
|
The fifth step is to set the physical attributes of the file, such as its size.
|
||||||
This doesn't automatically reserve any space in the cache, but permits the
|
This doesn't automatically reserve any space in the cache, but permits the
|
||||||
cache to adjust its metadata for data tracking appropriately:
|
cache to adjust its metadata for data tracking appropriately::
|
||||||
|
|
||||||
int fscache_attr_changed(struct fscache_cookie *cookie);
|
int fscache_attr_changed(struct fscache_cookie *cookie);
|
||||||
|
|
||||||
@ -417,8 +412,7 @@ some point in the future, and as such, it may happen after the function returns
|
|||||||
to the caller. The attribute adjustment excludes read and write operations.
|
to the caller. The attribute adjustment excludes read and write operations.
|
||||||
|
|
||||||
|
|
||||||
=====================
|
Page alloc/read/write
|
||||||
PAGE ALLOC/READ/WRITE
|
|
||||||
=====================
|
=====================
|
||||||
|
|
||||||
And the sixth step is to store and retrieve pages in the cache. There are
|
And the sixth step is to store and retrieve pages in the cache. There are
|
||||||
@ -441,7 +435,7 @@ PAGE READ
|
|||||||
|
|
||||||
Firstly, the netfs should ask FS-Cache to examine the caches and read the
|
Firstly, the netfs should ask FS-Cache to examine the caches and read the
|
||||||
contents cached for a particular page of a particular file if present, or else
|
contents cached for a particular page of a particular file if present, or else
|
||||||
allocate space to store the contents if not:
|
allocate space to store the contents if not::
|
||||||
|
|
||||||
typedef
|
typedef
|
||||||
void (*fscache_rw_complete_t)(struct page *page,
|
void (*fscache_rw_complete_t)(struct page *page,
|
||||||
@ -474,14 +468,14 @@ Else if there's a copy of the page resident in the cache:
|
|||||||
|
|
||||||
(4) When the read is complete, end_io_func() will be invoked with:
|
(4) When the read is complete, end_io_func() will be invoked with:
|
||||||
|
|
||||||
(*) The netfs data supplied when the cookie was created.
|
* The netfs data supplied when the cookie was created.
|
||||||
|
|
||||||
(*) The page descriptor.
|
* The page descriptor.
|
||||||
|
|
||||||
(*) The context argument passed to the above function. This will be
|
* The context argument passed to the above function. This will be
|
||||||
maintained with the get_context/put_context functions mentioned above.
|
maintained with the get_context/put_context functions mentioned above.
|
||||||
|
|
||||||
(*) An argument that's 0 on success or negative for an error code.
|
* An argument that's 0 on success or negative for an error code.
|
||||||
|
|
||||||
If an error occurs, it should be assumed that the page contains no usable
|
If an error occurs, it should be assumed that the page contains no usable
|
||||||
data. fscache_readpages_cancel() may need to be called.
|
data. fscache_readpages_cancel() may need to be called.
|
||||||
@ -504,11 +498,11 @@ This function may also return -ENOMEM or -EINTR, in which case it won't have
|
|||||||
read any data from the cache.
|
read any data from the cache.
|
||||||
|
|
||||||
|
|
||||||
PAGE ALLOCATE
|
Page Allocate
|
||||||
-------------
|
-------------
|
||||||
|
|
||||||
Alternatively, if there's not expected to be any data in the cache for a page
|
Alternatively, if there's not expected to be any data in the cache for a page
|
||||||
because the file has been extended, a block can simply be allocated instead:
|
because the file has been extended, a block can simply be allocated instead::
|
||||||
|
|
||||||
int fscache_alloc_page(struct fscache_cookie *cookie,
|
int fscache_alloc_page(struct fscache_cookie *cookie,
|
||||||
struct page *page,
|
struct page *page,
|
||||||
@ -523,12 +517,12 @@ The mark_pages_cached() cookie operation will be called on the page if
|
|||||||
successful.
|
successful.
|
||||||
|
|
||||||
|
|
||||||
PAGE WRITE
|
Page Write
|
||||||
----------
|
----------
|
||||||
|
|
||||||
Secondly, if the netfs changes the contents of the page (either due to an
|
Secondly, if the netfs changes the contents of the page (either due to an
|
||||||
initial download or if a user performs a write), then the page should be
|
initial download or if a user performs a write), then the page should be
|
||||||
written back to the cache:
|
written back to the cache::
|
||||||
|
|
||||||
int fscache_write_page(struct fscache_cookie *cookie,
|
int fscache_write_page(struct fscache_cookie *cookie,
|
||||||
struct page *page,
|
struct page *page,
|
||||||
@ -566,11 +560,11 @@ place if unforeseen circumstances arose (such as a disk error).
|
|||||||
Writing takes place asynchronously.
|
Writing takes place asynchronously.
|
||||||
|
|
||||||
|
|
||||||
MULTIPLE PAGE READ
|
Multiple Page Read
|
||||||
------------------
|
------------------
|
||||||
|
|
||||||
A facility is provided to read several pages at once, as requested by the
|
A facility is provided to read several pages at once, as requested by the
|
||||||
readpages() address space operation:
|
readpages() address space operation::
|
||||||
|
|
||||||
int fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
|
int fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
|
||||||
struct address_space *mapping,
|
struct address_space *mapping,
|
||||||
@ -598,7 +592,7 @@ This works in a similar way to fscache_read_or_alloc_page(), except:
|
|||||||
be returned.
|
be returned.
|
||||||
|
|
||||||
Otherwise, if all pages had reads dispatched, then 0 will be returned, the
|
Otherwise, if all pages had reads dispatched, then 0 will be returned, the
|
||||||
list will be empty and *nr_pages will be 0.
|
list will be empty and ``*nr_pages`` will be 0.
|
||||||
|
|
||||||
(4) end_io_func will be called once for each page being read as the reads
|
(4) end_io_func will be called once for each page being read as the reads
|
||||||
complete. It will be called in process context if error != 0, but it may
|
complete. It will be called in process context if error != 0, but it may
|
||||||
@ -609,13 +603,13 @@ some of the pages being read and some being allocated. Those pages will have
|
|||||||
been marked appropriately and will need uncaching.
|
been marked appropriately and will need uncaching.
|
||||||
|
|
||||||
|
|
||||||
CANCELLATION OF UNREAD PAGES
|
Cancellation of Unread Pages
|
||||||
----------------------------
|
----------------------------
|
||||||
|
|
||||||
If one or more pages are passed to fscache_read_or_alloc_pages() but not then
|
If one or more pages are passed to fscache_read_or_alloc_pages() but not then
|
||||||
read from the cache and also not read from the underlying filesystem then
|
read from the cache and also not read from the underlying filesystem then
|
||||||
those pages will need to have any marks and reservations removed. This can be
|
those pages will need to have any marks and reservations removed. This can be
|
||||||
done by calling:
|
done by calling::
|
||||||
|
|
||||||
void fscache_readpages_cancel(struct fscache_cookie *cookie,
|
void fscache_readpages_cancel(struct fscache_cookie *cookie,
|
||||||
struct list_head *pages);
|
struct list_head *pages);
|
||||||
@ -625,11 +619,10 @@ fscache_read_or_alloc_pages(). Every page in the pages list will be examined
|
|||||||
and any that have PG_fscache set will be uncached.
|
and any that have PG_fscache set will be uncached.
|
||||||
|
|
||||||
|
|
||||||
==============
|
Page Uncaching
|
||||||
PAGE UNCACHING
|
|
||||||
==============
|
==============
|
||||||
|
|
||||||
To uncache a page, this function should be called:
|
To uncache a page, this function should be called::
|
||||||
|
|
||||||
void fscache_uncache_page(struct fscache_cookie *cookie,
|
void fscache_uncache_page(struct fscache_cookie *cookie,
|
||||||
struct page *page);
|
struct page *page);
|
||||||
@ -644,12 +637,12 @@ data file must be retired (see the relinquish cookie function below).
|
|||||||
|
|
||||||
Furthermore, note that this does not cancel the asynchronous read or write
|
Furthermore, note that this does not cancel the asynchronous read or write
|
||||||
operation started by the read/alloc and write functions, so the page
|
operation started by the read/alloc and write functions, so the page
|
||||||
invalidation functions must use:
|
invalidation functions must use::
|
||||||
|
|
||||||
bool fscache_check_page_write(struct fscache_cookie *cookie,
|
bool fscache_check_page_write(struct fscache_cookie *cookie,
|
||||||
struct page *page);
|
struct page *page);
|
||||||
|
|
||||||
to see if a page is being written to the cache, and:
|
to see if a page is being written to the cache, and::
|
||||||
|
|
||||||
void fscache_wait_on_page_write(struct fscache_cookie *cookie,
|
void fscache_wait_on_page_write(struct fscache_cookie *cookie,
|
||||||
struct page *page);
|
struct page *page);
|
||||||
@ -660,7 +653,7 @@ to wait for it to finish if it is.
|
|||||||
When releasepage() is being implemented, a special FS-Cache function exists to
|
When releasepage() is being implemented, a special FS-Cache function exists to
|
||||||
manage the heuristics of coping with vmscan trying to eject pages, which may
|
manage the heuristics of coping with vmscan trying to eject pages, which may
|
||||||
conflict with the cache trying to write pages to the cache (which may itself
|
conflict with the cache trying to write pages to the cache (which may itself
|
||||||
need to allocate memory):
|
need to allocate memory)::
|
||||||
|
|
||||||
bool fscache_maybe_release_page(struct fscache_cookie *cookie,
|
bool fscache_maybe_release_page(struct fscache_cookie *cookie,
|
||||||
struct page *page,
|
struct page *page,
|
||||||
@ -676,12 +669,12 @@ storage request to complete, or it may attempt to cancel the storage request -
|
|||||||
in which case the page will not be stored in the cache this time.
|
in which case the page will not be stored in the cache this time.
|
||||||
|
|
||||||
|
|
||||||
BULK INODE PAGE UNCACHE
|
Bulk Image Page Uncache
|
||||||
-----------------------
|
-----------------------
|
||||||
|
|
||||||
A convenience routine is provided to perform an uncache on all the pages
|
A convenience routine is provided to perform an uncache on all the pages
|
||||||
attached to an inode. This assumes that the pages on the inode correspond on a
|
attached to an inode. This assumes that the pages on the inode correspond on a
|
||||||
1:1 basis with the pages in the cache.
|
1:1 basis with the pages in the cache::
|
||||||
|
|
||||||
void fscache_uncache_all_inode_pages(struct fscache_cookie *cookie,
|
void fscache_uncache_all_inode_pages(struct fscache_cookie *cookie,
|
||||||
struct inode *inode);
|
struct inode *inode);
|
||||||
@ -692,12 +685,11 @@ written to the cache and for the cache to finish with the page generally. No
|
|||||||
error is returned.
|
error is returned.
|
||||||
|
|
||||||
|
|
||||||
===============================
|
Index and Data File consistency
|
||||||
INDEX AND DATA FILE CONSISTENCY
|
|
||||||
===============================
|
===============================
|
||||||
|
|
||||||
To find out whether auxiliary data for an object is up to data within the
|
To find out whether auxiliary data for an object is up to data within the
|
||||||
cache, the following function can be called:
|
cache, the following function can be called::
|
||||||
|
|
||||||
int fscache_check_consistency(struct fscache_cookie *cookie,
|
int fscache_check_consistency(struct fscache_cookie *cookie,
|
||||||
const void *aux_data);
|
const void *aux_data);
|
||||||
@ -708,7 +700,7 @@ data buffer first. It returns 0 if it is and -ESTALE if it isn't; it may also
|
|||||||
return -ENOMEM and -ERESTARTSYS.
|
return -ENOMEM and -ERESTARTSYS.
|
||||||
|
|
||||||
To request an update of the index data for an index or other object, the
|
To request an update of the index data for an index or other object, the
|
||||||
following function should be called:
|
following function should be called::
|
||||||
|
|
||||||
void fscache_update_cookie(struct fscache_cookie *cookie,
|
void fscache_update_cookie(struct fscache_cookie *cookie,
|
||||||
const void *aux_data);
|
const void *aux_data);
|
||||||
@ -721,8 +713,7 @@ Note that partial updates may happen automatically at other times, such as when
|
|||||||
data blocks are added to a data file object.
|
data blocks are added to a data file object.
|
||||||
|
|
||||||
|
|
||||||
=================
|
Cookie Enablement
|
||||||
COOKIE ENABLEMENT
|
|
||||||
=================
|
=================
|
||||||
|
|
||||||
Cookies exist in one of two states: enabled and disabled. If a cookie is
|
Cookies exist in one of two states: enabled and disabled. If a cookie is
|
||||||
@ -731,7 +722,7 @@ invalidate its state; allocate, read or write backing pages - though it is
|
|||||||
still possible to uncache pages and relinquish the cookie.
|
still possible to uncache pages and relinquish the cookie.
|
||||||
|
|
||||||
The initial enablement state is set by fscache_acquire_cookie(), but the cookie
|
The initial enablement state is set by fscache_acquire_cookie(), but the cookie
|
||||||
can be enabled or disabled later. To disable a cookie, call:
|
can be enabled or disabled later. To disable a cookie, call::
|
||||||
|
|
||||||
void fscache_disable_cookie(struct fscache_cookie *cookie,
|
void fscache_disable_cookie(struct fscache_cookie *cookie,
|
||||||
const void *aux_data,
|
const void *aux_data,
|
||||||
@ -746,7 +737,7 @@ All possible failures are handled internally. The caller should consider
|
|||||||
calling fscache_uncache_all_inode_pages() afterwards to make sure all page
|
calling fscache_uncache_all_inode_pages() afterwards to make sure all page
|
||||||
markings are cleared up.
|
markings are cleared up.
|
||||||
|
|
||||||
Cookies can be enabled or reenabled with:
|
Cookies can be enabled or reenabled with::
|
||||||
|
|
||||||
void fscache_enable_cookie(struct fscache_cookie *cookie,
|
void fscache_enable_cookie(struct fscache_cookie *cookie,
|
||||||
const void *aux_data,
|
const void *aux_data,
|
||||||
@ -771,13 +762,12 @@ In both cases, the cookie's auxiliary data buffer is updated from aux_data if
|
|||||||
that is non-NULL inside the enablement lock before proceeding.
|
that is non-NULL inside the enablement lock before proceeding.
|
||||||
|
|
||||||
|
|
||||||
===============================
|
Miscellaneous Cookie operations
|
||||||
MISCELLANEOUS COOKIE OPERATIONS
|
|
||||||
===============================
|
===============================
|
||||||
|
|
||||||
There are a number of operations that can be used to control cookies:
|
There are a number of operations that can be used to control cookies:
|
||||||
|
|
||||||
(*) Cookie pinning:
|
* Cookie pinning::
|
||||||
|
|
||||||
int fscache_pin_cookie(struct fscache_cookie *cookie);
|
int fscache_pin_cookie(struct fscache_cookie *cookie);
|
||||||
void fscache_unpin_cookie(struct fscache_cookie *cookie);
|
void fscache_unpin_cookie(struct fscache_cookie *cookie);
|
||||||
@ -790,7 +780,7 @@ There are a number of operations that can be used to control cookies:
|
|||||||
-ENOSPC if there isn't enough space to honour the operation, -ENOMEM or
|
-ENOSPC if there isn't enough space to honour the operation, -ENOMEM or
|
||||||
-EIO if there's any other problem.
|
-EIO if there's any other problem.
|
||||||
|
|
||||||
(*) Data space reservation:
|
* Data space reservation::
|
||||||
|
|
||||||
int fscache_reserve_space(struct fscache_cookie *cookie, loff_t size);
|
int fscache_reserve_space(struct fscache_cookie *cookie, loff_t size);
|
||||||
|
|
||||||
@ -809,11 +799,10 @@ There are a number of operations that can be used to control cookies:
|
|||||||
make space if it's not in use.
|
make space if it's not in use.
|
||||||
|
|
||||||
|
|
||||||
=====================
|
Cookie Unregistration
|
||||||
COOKIE UNREGISTRATION
|
|
||||||
=====================
|
=====================
|
||||||
|
|
||||||
To get rid of a cookie, this function should be called.
|
To get rid of a cookie, this function should be called::
|
||||||
|
|
||||||
void fscache_relinquish_cookie(struct fscache_cookie *cookie,
|
void fscache_relinquish_cookie(struct fscache_cookie *cookie,
|
||||||
const void *aux_data,
|
const void *aux_data,
|
||||||
@ -835,16 +824,14 @@ the cookies for "child" indices, objects and pages have been relinquished
|
|||||||
first.
|
first.
|
||||||
|
|
||||||
|
|
||||||
==================
|
Index Invalidation
|
||||||
INDEX INVALIDATION
|
|
||||||
==================
|
==================
|
||||||
|
|
||||||
There is no direct way to invalidate an index subtree. To do this, the caller
|
There is no direct way to invalidate an index subtree. To do this, the caller
|
||||||
should relinquish and retire the cookie they have, and then acquire a new one.
|
should relinquish and retire the cookie they have, and then acquire a new one.
|
||||||
|
|
||||||
|
|
||||||
======================
|
Data File Invalidation
|
||||||
DATA FILE INVALIDATION
|
|
||||||
======================
|
======================
|
||||||
|
|
||||||
Sometimes it will be necessary to invalidate an object that contains data.
|
Sometimes it will be necessary to invalidate an object that contains data.
|
||||||
@ -853,7 +840,7 @@ change - at which point the netfs has to throw away all the state it had for an
|
|||||||
inode and reload from the server.
|
inode and reload from the server.
|
||||||
|
|
||||||
To indicate that a cache object should be invalidated, the following function
|
To indicate that a cache object should be invalidated, the following function
|
||||||
can be called:
|
can be called::
|
||||||
|
|
||||||
void fscache_invalidate(struct fscache_cookie *cookie);
|
void fscache_invalidate(struct fscache_cookie *cookie);
|
||||||
|
|
||||||
@ -868,13 +855,12 @@ auxiliary data update operation as it is very likely these will have changed.
|
|||||||
|
|
||||||
Using the following function, the netfs can wait for the invalidation operation
|
Using the following function, the netfs can wait for the invalidation operation
|
||||||
to have reached a point at which it can start submitting ordinary operations
|
to have reached a point at which it can start submitting ordinary operations
|
||||||
once again:
|
once again::
|
||||||
|
|
||||||
void fscache_wait_on_invalidate(struct fscache_cookie *cookie);
|
void fscache_wait_on_invalidate(struct fscache_cookie *cookie);
|
||||||
|
|
||||||
|
|
||||||
===========================
|
FS-cache Specific Page Flag
|
||||||
FS-CACHE SPECIFIC PAGE FLAG
|
|
||||||
===========================
|
===========================
|
||||||
|
|
||||||
FS-Cache makes use of a page flag, PG_private_2, for its own purpose. This is
|
FS-Cache makes use of a page flag, PG_private_2, for its own purpose. This is
|
||||||
@ -898,7 +884,7 @@ was given under certain circumstances.
|
|||||||
This bit does not overlap with such as PG_private. This means that FS-Cache
|
This bit does not overlap with such as PG_private. This means that FS-Cache
|
||||||
can be used with a filesystem that uses the block buffering code.
|
can be used with a filesystem that uses the block buffering code.
|
||||||
|
|
||||||
There are a number of operations defined on this flag:
|
There are a number of operations defined on this flag::
|
||||||
|
|
||||||
int PageFsCache(struct page *page);
|
int PageFsCache(struct page *page);
|
||||||
void SetPageFsCache(struct page *page)
|
void SetPageFsCache(struct page *page)
|
@ -1,10 +1,12 @@
|
|||||||
====================================================
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
IN-KERNEL CACHE OBJECT REPRESENTATION AND MANAGEMENT
|
|
||||||
====================================================
|
====================================================
|
||||||
|
In-Kernel Cache Object Representation and Management
|
||||||
|
====================================================
|
||||||
|
|
||||||
By: David Howells <dhowells@redhat.com>
|
By: David Howells <dhowells@redhat.com>
|
||||||
|
|
||||||
Contents:
|
.. Contents:
|
||||||
|
|
||||||
(*) Representation
|
(*) Representation
|
||||||
|
|
||||||
@ -18,8 +20,7 @@ Contents:
|
|||||||
(*) The set of events.
|
(*) The set of events.
|
||||||
|
|
||||||
|
|
||||||
==============
|
Representation
|
||||||
REPRESENTATION
|
|
||||||
==============
|
==============
|
||||||
|
|
||||||
FS-Cache maintains an in-kernel representation of each object that a netfs is
|
FS-Cache maintains an in-kernel representation of each object that a netfs is
|
||||||
@ -38,7 +39,7 @@ or even by no objects (it may not be cached).
|
|||||||
|
|
||||||
Furthermore, both cookies and objects are hierarchical. The two hierarchies
|
Furthermore, both cookies and objects are hierarchical. The two hierarchies
|
||||||
correspond, but the cookies tree is a superset of the union of the object trees
|
correspond, but the cookies tree is a superset of the union of the object trees
|
||||||
of multiple caches:
|
of multiple caches::
|
||||||
|
|
||||||
NETFS INDEX TREE : CACHE 1 : CACHE 2
|
NETFS INDEX TREE : CACHE 1 : CACHE 2
|
||||||
: :
|
: :
|
||||||
@ -89,8 +90,7 @@ pointers to the cookies. The cookies themselves and any objects attached to
|
|||||||
those cookies are hidden from it.
|
those cookies are hidden from it.
|
||||||
|
|
||||||
|
|
||||||
===============================
|
Object Management State Machine
|
||||||
OBJECT MANAGEMENT STATE MACHINE
|
|
||||||
===============================
|
===============================
|
||||||
|
|
||||||
Within FS-Cache, each active object is managed by its own individual state
|
Within FS-Cache, each active object is managed by its own individual state
|
||||||
@ -124,7 +124,7 @@ is not masked, the object will be queued for processing (by calling
|
|||||||
fscache_enqueue_object()).
|
fscache_enqueue_object()).
|
||||||
|
|
||||||
|
|
||||||
PROVISION OF CPU TIME
|
Provision of CPU Time
|
||||||
---------------------
|
---------------------
|
||||||
|
|
||||||
The work to be done by the various states was given CPU time by the threads of
|
The work to be done by the various states was given CPU time by the threads of
|
||||||
@ -141,7 +141,7 @@ because:
|
|||||||
workqueues don't necessarily have the right numbers of threads.
|
workqueues don't necessarily have the right numbers of threads.
|
||||||
|
|
||||||
|
|
||||||
LOCKING SIMPLIFICATION
|
Locking Simplification
|
||||||
----------------------
|
----------------------
|
||||||
|
|
||||||
Because only one worker thread may be operating on any particular object's
|
Because only one worker thread may be operating on any particular object's
|
||||||
@ -151,8 +151,7 @@ from the cache backend's representation (fscache_object) - which may be
|
|||||||
requested from either end.
|
requested from either end.
|
||||||
|
|
||||||
|
|
||||||
=================
|
The Set of States
|
||||||
THE SET OF STATES
|
|
||||||
=================
|
=================
|
||||||
|
|
||||||
The object state machine has a set of states that it can be in. There are
|
The object state machine has a set of states that it can be in. There are
|
||||||
@ -275,19 +274,17 @@ memory and potentially deletes stuff from disk:
|
|||||||
this state.
|
this state.
|
||||||
|
|
||||||
|
|
||||||
THE SET OF EVENTS
|
The Set of Events
|
||||||
-----------------
|
-----------------
|
||||||
|
|
||||||
There are a number of events that can be raised to an object state machine:
|
There are a number of events that can be raised to an object state machine:
|
||||||
|
|
||||||
(*) FSCACHE_OBJECT_EV_UPDATE
|
FSCACHE_OBJECT_EV_UPDATE
|
||||||
|
|
||||||
The netfs requested that an object be updated. The state machine will ask
|
The netfs requested that an object be updated. The state machine will ask
|
||||||
the cache backend to update the object, and the cache backend will ask the
|
the cache backend to update the object, and the cache backend will ask the
|
||||||
netfs for details of the change through its cookie definition ops.
|
netfs for details of the change through its cookie definition ops.
|
||||||
|
|
||||||
(*) FSCACHE_OBJECT_EV_CLEARED
|
FSCACHE_OBJECT_EV_CLEARED
|
||||||
|
|
||||||
This is signalled in two circumstances:
|
This is signalled in two circumstances:
|
||||||
|
|
||||||
(a) when an object's last child object is dropped and
|
(a) when an object's last child object is dropped and
|
||||||
@ -296,20 +293,16 @@ There are a number of events that can be raised to an object state machine:
|
|||||||
|
|
||||||
This is used to proceed from the dying state.
|
This is used to proceed from the dying state.
|
||||||
|
|
||||||
(*) FSCACHE_OBJECT_EV_ERROR
|
FSCACHE_OBJECT_EV_ERROR
|
||||||
|
|
||||||
This is signalled when an I/O error occurs during the processing of some
|
This is signalled when an I/O error occurs during the processing of some
|
||||||
object.
|
object.
|
||||||
|
|
||||||
(*) FSCACHE_OBJECT_EV_RELEASE
|
FSCACHE_OBJECT_EV_RELEASE, FSCACHE_OBJECT_EV_RETIRE
|
||||||
(*) FSCACHE_OBJECT_EV_RETIRE
|
|
||||||
|
|
||||||
These are signalled when the netfs relinquishes a cookie it was using.
|
These are signalled when the netfs relinquishes a cookie it was using.
|
||||||
The event selected depends on whether the netfs asks for the backing
|
The event selected depends on whether the netfs asks for the backing
|
||||||
object to be retired (deleted) or retained.
|
object to be retired (deleted) or retained.
|
||||||
|
|
||||||
(*) FSCACHE_OBJECT_EV_WITHDRAW
|
FSCACHE_OBJECT_EV_WITHDRAW
|
||||||
|
|
||||||
This is signalled when the cache backend wants to withdraw an object.
|
This is signalled when the cache backend wants to withdraw an object.
|
||||||
This means that the object will have to be detached from the netfs's
|
This means that the object will have to be detached from the netfs's
|
||||||
cookie.
|
cookie.
|
@ -1,10 +1,12 @@
|
|||||||
================================
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
ASYNCHRONOUS OPERATIONS HANDLING
|
|
||||||
================================
|
================================
|
||||||
|
Asynchronous Operations Handling
|
||||||
|
================================
|
||||||
|
|
||||||
By: David Howells <dhowells@redhat.com>
|
By: David Howells <dhowells@redhat.com>
|
||||||
|
|
||||||
Contents:
|
.. Contents:
|
||||||
|
|
||||||
(*) Overview.
|
(*) Overview.
|
||||||
|
|
||||||
@ -17,8 +19,7 @@ Contents:
|
|||||||
(*) Asynchronous callback.
|
(*) Asynchronous callback.
|
||||||
|
|
||||||
|
|
||||||
========
|
Overview
|
||||||
OVERVIEW
|
|
||||||
========
|
========
|
||||||
|
|
||||||
FS-Cache has an asynchronous operations handling facility that it uses for its
|
FS-Cache has an asynchronous operations handling facility that it uses for its
|
||||||
@ -33,11 +34,10 @@ backend for completion.
|
|||||||
To make use of this facility, <linux/fscache-cache.h> should be #included.
|
To make use of this facility, <linux/fscache-cache.h> should be #included.
|
||||||
|
|
||||||
|
|
||||||
===============================
|
Operation Record Initialisation
|
||||||
OPERATION RECORD INITIALISATION
|
|
||||||
===============================
|
===============================
|
||||||
|
|
||||||
An operation is recorded in an fscache_operation struct:
|
An operation is recorded in an fscache_operation struct::
|
||||||
|
|
||||||
struct fscache_operation {
|
struct fscache_operation {
|
||||||
union {
|
union {
|
||||||
@ -50,7 +50,7 @@ An operation is recorded in an fscache_operation struct:
|
|||||||
};
|
};
|
||||||
|
|
||||||
Someone wanting to issue an operation should allocate something with this
|
Someone wanting to issue an operation should allocate something with this
|
||||||
struct embedded in it. They should initialise it by calling:
|
struct embedded in it. They should initialise it by calling::
|
||||||
|
|
||||||
void fscache_operation_init(struct fscache_operation *op,
|
void fscache_operation_init(struct fscache_operation *op,
|
||||||
fscache_operation_release_t release);
|
fscache_operation_release_t release);
|
||||||
@ -67,8 +67,7 @@ FSCACHE_OP_WAITING may be set in op->flags prior to each submission of the
|
|||||||
operation and waited for afterwards.
|
operation and waited for afterwards.
|
||||||
|
|
||||||
|
|
||||||
==========
|
Parameters
|
||||||
PARAMETERS
|
|
||||||
==========
|
==========
|
||||||
|
|
||||||
There are a number of parameters that can be set in the operation record's flag
|
There are a number of parameters that can be set in the operation record's flag
|
||||||
@ -87,7 +86,7 @@ operations:
|
|||||||
|
|
||||||
If this option is to be used, FSCACHE_OP_WAITING must be set in op->flags
|
If this option is to be used, FSCACHE_OP_WAITING must be set in op->flags
|
||||||
before submitting the operation, and the operating thread must wait for it
|
before submitting the operation, and the operating thread must wait for it
|
||||||
to be cleared before proceeding:
|
to be cleared before proceeding::
|
||||||
|
|
||||||
wait_on_bit(&op->flags, FSCACHE_OP_WAITING,
|
wait_on_bit(&op->flags, FSCACHE_OP_WAITING,
|
||||||
TASK_UNINTERRUPTIBLE);
|
TASK_UNINTERRUPTIBLE);
|
||||||
@ -101,7 +100,7 @@ operations:
|
|||||||
page to a netfs page after the backing fs has read the page in.
|
page to a netfs page after the backing fs has read the page in.
|
||||||
|
|
||||||
If this option is used, op->fast_work and op->processor must be
|
If this option is used, op->fast_work and op->processor must be
|
||||||
initialised before submitting the operation:
|
initialised before submitting the operation::
|
||||||
|
|
||||||
INIT_WORK(&op->fast_work, do_some_work);
|
INIT_WORK(&op->fast_work, do_some_work);
|
||||||
|
|
||||||
@ -114,7 +113,7 @@ operations:
|
|||||||
pages that have just been fetched from a remote server.
|
pages that have just been fetched from a remote server.
|
||||||
|
|
||||||
If this option is used, op->slow_work and op->processor must be
|
If this option is used, op->slow_work and op->processor must be
|
||||||
initialised before submitting the operation:
|
initialised before submitting the operation::
|
||||||
|
|
||||||
fscache_operation_init_slow(op, processor)
|
fscache_operation_init_slow(op, processor)
|
||||||
|
|
||||||
@ -132,8 +131,7 @@ Furthermore, operations may be one of two types:
|
|||||||
operations running at the same time.
|
operations running at the same time.
|
||||||
|
|
||||||
|
|
||||||
=========
|
Procedure
|
||||||
PROCEDURE
|
|
||||||
=========
|
=========
|
||||||
|
|
||||||
Operations are used through the following procedure:
|
Operations are used through the following procedure:
|
||||||
@ -143,7 +141,7 @@ Operations are used through the following procedure:
|
|||||||
generic op embedded within.
|
generic op embedded within.
|
||||||
|
|
||||||
(2) The submitting thread must then submit the operation for processing using
|
(2) The submitting thread must then submit the operation for processing using
|
||||||
one of the following two functions:
|
one of the following two functions::
|
||||||
|
|
||||||
int fscache_submit_op(struct fscache_object *object,
|
int fscache_submit_op(struct fscache_object *object,
|
||||||
struct fscache_operation *op);
|
struct fscache_operation *op);
|
||||||
@ -164,7 +162,7 @@ Operations are used through the following procedure:
|
|||||||
operation of conflicting exclusivity is in progress on the object.
|
operation of conflicting exclusivity is in progress on the object.
|
||||||
|
|
||||||
If the operation is asynchronous, the manager will retain a reference to
|
If the operation is asynchronous, the manager will retain a reference to
|
||||||
it, so the caller should put their reference to it by passing it to:
|
it, so the caller should put their reference to it by passing it to::
|
||||||
|
|
||||||
void fscache_put_operation(struct fscache_operation *op);
|
void fscache_put_operation(struct fscache_operation *op);
|
||||||
|
|
||||||
@ -179,12 +177,12 @@ Operations are used through the following procedure:
|
|||||||
(4) The operation holds an effective lock upon the object, preventing other
|
(4) The operation holds an effective lock upon the object, preventing other
|
||||||
exclusive ops conflicting until it is released. The operation can be
|
exclusive ops conflicting until it is released. The operation can be
|
||||||
enqueued for further immediate asynchronous processing by adjusting the
|
enqueued for further immediate asynchronous processing by adjusting the
|
||||||
CPU time provisioning option if necessary, eg:
|
CPU time provisioning option if necessary, eg::
|
||||||
|
|
||||||
op->flags &= ~FSCACHE_OP_TYPE;
|
op->flags &= ~FSCACHE_OP_TYPE;
|
||||||
op->flags |= ~FSCACHE_OP_FAST;
|
op->flags |= ~FSCACHE_OP_FAST;
|
||||||
|
|
||||||
and calling:
|
and calling::
|
||||||
|
|
||||||
void fscache_enqueue_operation(struct fscache_operation *op)
|
void fscache_enqueue_operation(struct fscache_operation *op)
|
||||||
|
|
||||||
@ -192,13 +190,12 @@ Operations are used through the following procedure:
|
|||||||
pools.
|
pools.
|
||||||
|
|
||||||
|
|
||||||
=====================
|
Asynchronous Callback
|
||||||
ASYNCHRONOUS CALLBACK
|
|
||||||
=====================
|
=====================
|
||||||
|
|
||||||
When used in asynchronous mode, the worker thread pool will invoke the
|
When used in asynchronous mode, the worker thread pool will invoke the
|
||||||
processor method with a pointer to the operation. This should then get at the
|
processor method with a pointer to the operation. This should then get at the
|
||||||
container struct by using container_of():
|
container struct by using container_of()::
|
||||||
|
|
||||||
static void fscache_write_op(struct fscache_operation *_op)
|
static void fscache_write_op(struct fscache_operation *_op)
|
||||||
{
|
{
|
@ -1,7 +1,11 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
===========================================
|
||||||
Mounting root file system via SMB (cifs.ko)
|
Mounting root file system via SMB (cifs.ko)
|
||||||
===========================================
|
===========================================
|
||||||
|
|
||||||
Written 2019 by Paulo Alcantara <palcantara@suse.de>
|
Written 2019 by Paulo Alcantara <palcantara@suse.de>
|
||||||
|
|
||||||
Written 2019 by Aurelien Aptel <aaptel@suse.com>
|
Written 2019 by Aurelien Aptel <aaptel@suse.com>
|
||||||
|
|
||||||
The CONFIG_CIFS_ROOT option enables experimental root file system
|
The CONFIG_CIFS_ROOT option enables experimental root file system
|
||||||
@ -32,7 +36,7 @@ Server configuration
|
|||||||
====================
|
====================
|
||||||
|
|
||||||
To enable SMB1+UNIX extensions you will need to set these global
|
To enable SMB1+UNIX extensions you will need to set these global
|
||||||
settings in Samba smb.conf:
|
settings in Samba smb.conf::
|
||||||
|
|
||||||
[global]
|
[global]
|
||||||
server min protocol = NT1
|
server min protocol = NT1
|
||||||
@ -41,12 +45,16 @@ settings in Samba smb.conf:
|
|||||||
Kernel command line
|
Kernel command line
|
||||||
===================
|
===================
|
||||||
|
|
||||||
root=/dev/cifs
|
::
|
||||||
|
|
||||||
|
root=/dev/cifs
|
||||||
|
|
||||||
This is just a virtual device that basically tells the kernel to mount
|
This is just a virtual device that basically tells the kernel to mount
|
||||||
the root file system via SMB protocol.
|
the root file system via SMB protocol.
|
||||||
|
|
||||||
cifsroot=//<server-ip>/<share>[,options]
|
::
|
||||||
|
|
||||||
|
cifsroot=//<server-ip>/<share>[,options]
|
||||||
|
|
||||||
Enables the kernel to mount the root file system via SMB that are
|
Enables the kernel to mount the root file system via SMB that are
|
||||||
located in the <server-ip> and <share> specified in this option.
|
located in the <server-ip> and <share> specified in this option.
|
||||||
@ -65,33 +73,33 @@ options
|
|||||||
Examples
|
Examples
|
||||||
========
|
========
|
||||||
|
|
||||||
Export root file system as a Samba share in smb.conf file.
|
Export root file system as a Samba share in smb.conf file::
|
||||||
|
|
||||||
...
|
...
|
||||||
[linux]
|
[linux]
|
||||||
path = /path/to/rootfs
|
path = /path/to/rootfs
|
||||||
read only = no
|
read only = no
|
||||||
guest ok = yes
|
guest ok = yes
|
||||||
force user = root
|
force user = root
|
||||||
force group = root
|
force group = root
|
||||||
browseable = yes
|
browseable = yes
|
||||||
writeable = yes
|
writeable = yes
|
||||||
admin users = root
|
admin users = root
|
||||||
public = yes
|
public = yes
|
||||||
create mask = 0777
|
create mask = 0777
|
||||||
directory mask = 0777
|
directory mask = 0777
|
||||||
...
|
...
|
||||||
|
|
||||||
Restart smb service.
|
Restart smb service::
|
||||||
|
|
||||||
# systemctl restart smb
|
# systemctl restart smb
|
||||||
|
|
||||||
Test it under QEMU on a kernel built with CONFIG_CIFS_ROOT and
|
Test it under QEMU on a kernel built with CONFIG_CIFS_ROOT and
|
||||||
CONFIG_IP_PNP options enabled.
|
CONFIG_IP_PNP options enabled::
|
||||||
|
|
||||||
# qemu-system-x86_64 -enable-kvm -cpu host -m 1024 \
|
# qemu-system-x86_64 -enable-kvm -cpu host -m 1024 \
|
||||||
-kernel /path/to/linux/arch/x86/boot/bzImage -nographic \
|
-kernel /path/to/linux/arch/x86/boot/bzImage -nographic \
|
||||||
-append "root=/dev/cifs rw ip=dhcp cifsroot=//10.0.2.2/linux,username=foo,password=bar console=ttyS0 3"
|
-append "root=/dev/cifs rw ip=dhcp cifsroot=//10.0.2.2/linux,username=foo,password=bar console=ttyS0 3"
|
||||||
|
|
||||||
|
|
||||||
1: https://wiki.samba.org/index.php/UNIX_Extensions
|
1: https://wiki.samba.org/index.php/UNIX_Extensions
|
1670
Documentation/filesystems/coda.rst
Normal file
1670
Documentation/filesystems/coda.rst
Normal file
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -1,5 +1,6 @@
|
|||||||
|
=======================================================
|
||||||
configfs - Userspace-driven kernel object configuration.
|
Configfs - Userspace-driven Kernel Object Configuration
|
||||||
|
=======================================================
|
||||||
|
|
||||||
Joel Becker <joel.becker@oracle.com>
|
Joel Becker <joel.becker@oracle.com>
|
||||||
|
|
||||||
@ -9,7 +10,8 @@ Copyright (c) 2005 Oracle Corporation,
|
|||||||
Joel Becker <joel.becker@oracle.com>
|
Joel Becker <joel.becker@oracle.com>
|
||||||
|
|
||||||
|
|
||||||
[What is configfs?]
|
What is configfs?
|
||||||
|
=================
|
||||||
|
|
||||||
configfs is a ram-based filesystem that provides the converse of
|
configfs is a ram-based filesystem that provides the converse of
|
||||||
sysfs's functionality. Where sysfs is a filesystem-based view of
|
sysfs's functionality. Where sysfs is a filesystem-based view of
|
||||||
@ -35,10 +37,11 @@ kernel modules backing the items must respond to this.
|
|||||||
Both sysfs and configfs can and should exist together on the same
|
Both sysfs and configfs can and should exist together on the same
|
||||||
system. One is not a replacement for the other.
|
system. One is not a replacement for the other.
|
||||||
|
|
||||||
[Using configfs]
|
Using configfs
|
||||||
|
==============
|
||||||
|
|
||||||
configfs can be compiled as a module or into the kernel. You can access
|
configfs can be compiled as a module or into the kernel. You can access
|
||||||
it by doing
|
it by doing::
|
||||||
|
|
||||||
mount -t configfs none /config
|
mount -t configfs none /config
|
||||||
|
|
||||||
@ -56,28 +59,29 @@ values. Don't mix more than one attribute in one attribute file.
|
|||||||
There are two types of configfs attributes:
|
There are two types of configfs attributes:
|
||||||
|
|
||||||
* Normal attributes, which similar to sysfs attributes, are small ASCII text
|
* Normal attributes, which similar to sysfs attributes, are small ASCII text
|
||||||
files, with a maximum size of one page (PAGE_SIZE, 4096 on i386). Preferably
|
files, with a maximum size of one page (PAGE_SIZE, 4096 on i386). Preferably
|
||||||
only one value per file should be used, and the same caveats from sysfs apply.
|
only one value per file should be used, and the same caveats from sysfs apply.
|
||||||
Configfs expects write(2) to store the entire buffer at once. When writing to
|
Configfs expects write(2) to store the entire buffer at once. When writing to
|
||||||
normal configfs attributes, userspace processes should first read the entire
|
normal configfs attributes, userspace processes should first read the entire
|
||||||
file, modify the portions they wish to change, and then write the entire
|
file, modify the portions they wish to change, and then write the entire
|
||||||
buffer back.
|
buffer back.
|
||||||
|
|
||||||
* Binary attributes, which are somewhat similar to sysfs binary attributes,
|
* Binary attributes, which are somewhat similar to sysfs binary attributes,
|
||||||
but with a few slight changes to semantics. The PAGE_SIZE limitation does not
|
but with a few slight changes to semantics. The PAGE_SIZE limitation does not
|
||||||
apply, but the whole binary item must fit in single kernel vmalloc'ed buffer.
|
apply, but the whole binary item must fit in single kernel vmalloc'ed buffer.
|
||||||
The write(2) calls from user space are buffered, and the attributes'
|
The write(2) calls from user space are buffered, and the attributes'
|
||||||
write_bin_attribute method will be invoked on the final close, therefore it is
|
write_bin_attribute method will be invoked on the final close, therefore it is
|
||||||
imperative for user-space to check the return code of close(2) in order to
|
imperative for user-space to check the return code of close(2) in order to
|
||||||
verify that the operation finished successfully.
|
verify that the operation finished successfully.
|
||||||
To avoid a malicious user OOMing the kernel, there's a per-binary attribute
|
To avoid a malicious user OOMing the kernel, there's a per-binary attribute
|
||||||
maximum buffer value.
|
maximum buffer value.
|
||||||
|
|
||||||
When an item needs to be destroyed, remove it with rmdir(2). An
|
When an item needs to be destroyed, remove it with rmdir(2). An
|
||||||
item cannot be destroyed if any other item has a link to it (via
|
item cannot be destroyed if any other item has a link to it (via
|
||||||
symlink(2)). Links can be removed via unlink(2).
|
symlink(2)). Links can be removed via unlink(2).
|
||||||
|
|
||||||
[Configuring FakeNBD: an Example]
|
Configuring FakeNBD: an Example
|
||||||
|
===============================
|
||||||
|
|
||||||
Imagine there's a Network Block Device (NBD) driver that allows you to
|
Imagine there's a Network Block Device (NBD) driver that allows you to
|
||||||
access remote block devices. Call it FakeNBD. FakeNBD uses configfs
|
access remote block devices. Call it FakeNBD. FakeNBD uses configfs
|
||||||
@ -86,14 +90,14 @@ sysadmins use to configure FakeNBD, but somehow that program has to tell
|
|||||||
the driver about it. Here's where configfs comes in.
|
the driver about it. Here's where configfs comes in.
|
||||||
|
|
||||||
When the FakeNBD driver is loaded, it registers itself with configfs.
|
When the FakeNBD driver is loaded, it registers itself with configfs.
|
||||||
readdir(3) sees this just fine:
|
readdir(3) sees this just fine::
|
||||||
|
|
||||||
# ls /config
|
# ls /config
|
||||||
fakenbd
|
fakenbd
|
||||||
|
|
||||||
A fakenbd connection can be created with mkdir(2). The name is
|
A fakenbd connection can be created with mkdir(2). The name is
|
||||||
arbitrary, but likely the tool will make some use of the name. Perhaps
|
arbitrary, but likely the tool will make some use of the name. Perhaps
|
||||||
it is a uuid or a disk name:
|
it is a uuid or a disk name::
|
||||||
|
|
||||||
# mkdir /config/fakenbd/disk1
|
# mkdir /config/fakenbd/disk1
|
||||||
# ls /config/fakenbd/disk1
|
# ls /config/fakenbd/disk1
|
||||||
@ -102,7 +106,7 @@ it is a uuid or a disk name:
|
|||||||
The target attribute contains the IP address of the server FakeNBD will
|
The target attribute contains the IP address of the server FakeNBD will
|
||||||
connect to. The device attribute is the device on the server.
|
connect to. The device attribute is the device on the server.
|
||||||
Predictably, the rw attribute determines whether the connection is
|
Predictably, the rw attribute determines whether the connection is
|
||||||
read-only or read-write.
|
read-only or read-write::
|
||||||
|
|
||||||
# echo 10.0.0.1 > /config/fakenbd/disk1/target
|
# echo 10.0.0.1 > /config/fakenbd/disk1/target
|
||||||
# echo /dev/sda1 > /config/fakenbd/disk1/device
|
# echo /dev/sda1 > /config/fakenbd/disk1/device
|
||||||
@ -111,7 +115,8 @@ read-only or read-write.
|
|||||||
That's it. That's all there is. Now the device is configured, via the
|
That's it. That's all there is. Now the device is configured, via the
|
||||||
shell no less.
|
shell no less.
|
||||||
|
|
||||||
[Coding With configfs]
|
Coding With configfs
|
||||||
|
====================
|
||||||
|
|
||||||
Every object in configfs is a config_item. A config_item reflects an
|
Every object in configfs is a config_item. A config_item reflects an
|
||||||
object in the subsystem. It has attributes that match values on that
|
object in the subsystem. It has attributes that match values on that
|
||||||
@ -130,7 +135,10 @@ appears as a directory at the top of the configfs filesystem. A
|
|||||||
subsystem is also a config_group, and can do everything a config_group
|
subsystem is also a config_group, and can do everything a config_group
|
||||||
can.
|
can.
|
||||||
|
|
||||||
[struct config_item]
|
struct config_item
|
||||||
|
==================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
struct config_item {
|
struct config_item {
|
||||||
char *ci_name;
|
char *ci_name;
|
||||||
@ -168,7 +176,10 @@ By itself, a config_item cannot do much more than appear in configfs.
|
|||||||
Usually a subsystem wants the item to display and/or store attributes,
|
Usually a subsystem wants the item to display and/or store attributes,
|
||||||
among other things. For that, it needs a type.
|
among other things. For that, it needs a type.
|
||||||
|
|
||||||
[struct config_item_type]
|
struct config_item_type
|
||||||
|
=======================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
struct configfs_item_operations {
|
struct configfs_item_operations {
|
||||||
void (*release)(struct config_item *);
|
void (*release)(struct config_item *);
|
||||||
@ -192,7 +203,10 @@ allocated dynamically will need to provide the ct_item_ops->release()
|
|||||||
method. This method is called when the config_item's reference count
|
method. This method is called when the config_item's reference count
|
||||||
reaches zero.
|
reaches zero.
|
||||||
|
|
||||||
[struct configfs_attribute]
|
struct configfs_attribute
|
||||||
|
=========================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
struct configfs_attribute {
|
struct configfs_attribute {
|
||||||
char *ca_name;
|
char *ca_name;
|
||||||
@ -214,7 +228,10 @@ be called whenever userspace asks for a read(2) on the attribute. If an
|
|||||||
attribute is writable and provides a ->store method, that method will be
|
attribute is writable and provides a ->store method, that method will be
|
||||||
be called whenever userspace asks for a write(2) on the attribute.
|
be called whenever userspace asks for a write(2) on the attribute.
|
||||||
|
|
||||||
[struct configfs_bin_attribute]
|
struct configfs_bin_attribute
|
||||||
|
=============================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
struct configfs_bin_attribute {
|
struct configfs_bin_attribute {
|
||||||
struct configfs_attribute cb_attr;
|
struct configfs_attribute cb_attr;
|
||||||
@ -240,11 +257,12 @@ will happen for write(2). The reads/writes are bufferred so only a
|
|||||||
single read/write will occur; the attributes' need not concern itself
|
single read/write will occur; the attributes' need not concern itself
|
||||||
with it.
|
with it.
|
||||||
|
|
||||||
[struct config_group]
|
struct config_group
|
||||||
|
===================
|
||||||
|
|
||||||
A config_item cannot live in a vacuum. The only way one can be created
|
A config_item cannot live in a vacuum. The only way one can be created
|
||||||
is via mkdir(2) on a config_group. This will trigger creation of a
|
is via mkdir(2) on a config_group. This will trigger creation of a
|
||||||
child item.
|
child item::
|
||||||
|
|
||||||
struct config_group {
|
struct config_group {
|
||||||
struct config_item cg_item;
|
struct config_item cg_item;
|
||||||
@ -264,7 +282,7 @@ The config_group structure contains a config_item. Properly configuring
|
|||||||
that item means that a group can behave as an item in its own right.
|
that item means that a group can behave as an item in its own right.
|
||||||
However, it can do more: it can create child items or groups. This is
|
However, it can do more: it can create child items or groups. This is
|
||||||
accomplished via the group operations specified on the group's
|
accomplished via the group operations specified on the group's
|
||||||
config_item_type.
|
config_item_type::
|
||||||
|
|
||||||
struct configfs_group_operations {
|
struct configfs_group_operations {
|
||||||
struct config_item *(*make_item)(struct config_group *group,
|
struct config_item *(*make_item)(struct config_group *group,
|
||||||
@ -279,7 +297,8 @@ config_item_type.
|
|||||||
};
|
};
|
||||||
|
|
||||||
A group creates child items by providing the
|
A group creates child items by providing the
|
||||||
ct_group_ops->make_item() method. If provided, this method is called from mkdir(2) in the group's directory. The subsystem allocates a new
|
ct_group_ops->make_item() method. If provided, this method is called from
|
||||||
|
mkdir(2) in the group's directory. The subsystem allocates a new
|
||||||
config_item (or more likely, its container structure), initializes it,
|
config_item (or more likely, its container structure), initializes it,
|
||||||
and returns it to configfs. Configfs will then populate the filesystem
|
and returns it to configfs. Configfs will then populate the filesystem
|
||||||
tree to reflect the new item.
|
tree to reflect the new item.
|
||||||
@ -296,13 +315,14 @@ upon item allocation. If a subsystem has no work to do, it may omit
|
|||||||
the ct_group_ops->drop_item() method, and configfs will call
|
the ct_group_ops->drop_item() method, and configfs will call
|
||||||
config_item_put() on the item on behalf of the subsystem.
|
config_item_put() on the item on behalf of the subsystem.
|
||||||
|
|
||||||
IMPORTANT: drop_item() is void, and as such cannot fail. When rmdir(2)
|
Important:
|
||||||
is called, configfs WILL remove the item from the filesystem tree
|
drop_item() is void, and as such cannot fail. When rmdir(2)
|
||||||
(assuming that it has no children to keep it busy). The subsystem is
|
is called, configfs WILL remove the item from the filesystem tree
|
||||||
responsible for responding to this. If the subsystem has references to
|
(assuming that it has no children to keep it busy). The subsystem is
|
||||||
the item in other threads, the memory is safe. It may take some time
|
responsible for responding to this. If the subsystem has references to
|
||||||
for the item to actually disappear from the subsystem's usage. But it
|
the item in other threads, the memory is safe. It may take some time
|
||||||
is gone from configfs.
|
for the item to actually disappear from the subsystem's usage. But it
|
||||||
|
is gone from configfs.
|
||||||
|
|
||||||
When drop_item() is called, the item's linkage has already been torn
|
When drop_item() is called, the item's linkage has already been torn
|
||||||
down. It no longer has a reference on its parent and has no place in
|
down. It no longer has a reference on its parent and has no place in
|
||||||
@ -319,10 +339,11 @@ is implemented in the configfs rmdir(2) code. ->drop_item() will not be
|
|||||||
called, as the item has not been dropped. rmdir(2) will fail, as the
|
called, as the item has not been dropped. rmdir(2) will fail, as the
|
||||||
directory is not empty.
|
directory is not empty.
|
||||||
|
|
||||||
[struct configfs_subsystem]
|
struct configfs_subsystem
|
||||||
|
=========================
|
||||||
|
|
||||||
A subsystem must register itself, usually at module_init time. This
|
A subsystem must register itself, usually at module_init time. This
|
||||||
tells configfs to make the subsystem appear in the file tree.
|
tells configfs to make the subsystem appear in the file tree::
|
||||||
|
|
||||||
struct configfs_subsystem {
|
struct configfs_subsystem {
|
||||||
struct config_group su_group;
|
struct config_group su_group;
|
||||||
@ -332,17 +353,19 @@ tells configfs to make the subsystem appear in the file tree.
|
|||||||
int configfs_register_subsystem(struct configfs_subsystem *subsys);
|
int configfs_register_subsystem(struct configfs_subsystem *subsys);
|
||||||
void configfs_unregister_subsystem(struct configfs_subsystem *subsys);
|
void configfs_unregister_subsystem(struct configfs_subsystem *subsys);
|
||||||
|
|
||||||
A subsystem consists of a toplevel config_group and a mutex.
|
A subsystem consists of a toplevel config_group and a mutex.
|
||||||
The group is where child config_items are created. For a subsystem,
|
The group is where child config_items are created. For a subsystem,
|
||||||
this group is usually defined statically. Before calling
|
this group is usually defined statically. Before calling
|
||||||
configfs_register_subsystem(), the subsystem must have initialized the
|
configfs_register_subsystem(), the subsystem must have initialized the
|
||||||
group via the usual group _init() functions, and it must also have
|
group via the usual group _init() functions, and it must also have
|
||||||
initialized the mutex.
|
initialized the mutex.
|
||||||
When the register call returns, the subsystem is live, and it
|
|
||||||
|
When the register call returns, the subsystem is live, and it
|
||||||
will be visible via configfs. At that point, mkdir(2) can be called and
|
will be visible via configfs. At that point, mkdir(2) can be called and
|
||||||
the subsystem must be ready for it.
|
the subsystem must be ready for it.
|
||||||
|
|
||||||
[An Example]
|
An Example
|
||||||
|
==========
|
||||||
|
|
||||||
The best example of these basic concepts is the simple_children
|
The best example of these basic concepts is the simple_children
|
||||||
subsystem/group and the simple_child item in
|
subsystem/group and the simple_child item in
|
||||||
@ -350,7 +373,8 @@ samples/configfs/configfs_sample.c. It shows a trivial object displaying
|
|||||||
and storing an attribute, and a simple group creating and destroying
|
and storing an attribute, and a simple group creating and destroying
|
||||||
these children.
|
these children.
|
||||||
|
|
||||||
[Hierarchy Navigation and the Subsystem Mutex]
|
Hierarchy Navigation and the Subsystem Mutex
|
||||||
|
============================================
|
||||||
|
|
||||||
There is an extra bonus that configfs provides. The config_groups and
|
There is an extra bonus that configfs provides. The config_groups and
|
||||||
config_items are arranged in a hierarchy due to the fact that they
|
config_items are arranged in a hierarchy due to the fact that they
|
||||||
@ -375,7 +399,8 @@ be in its parent's cg_children list for the same duration. This allows
|
|||||||
a subsystem to trust ci_parent and cg_children while they hold the
|
a subsystem to trust ci_parent and cg_children while they hold the
|
||||||
mutex.
|
mutex.
|
||||||
|
|
||||||
[Item Aggregation Via symlink(2)]
|
Item Aggregation Via symlink(2)
|
||||||
|
===============================
|
||||||
|
|
||||||
configfs provides a simple group via the group->item parent/child
|
configfs provides a simple group via the group->item parent/child
|
||||||
relationship. Often, however, a larger environment requires aggregation
|
relationship. Often, however, a larger environment requires aggregation
|
||||||
@ -403,7 +428,8 @@ A config_item cannot be removed while it links to any other item, nor
|
|||||||
can it be removed while an item links to it. Dangling symlinks are not
|
can it be removed while an item links to it. Dangling symlinks are not
|
||||||
allowed in configfs.
|
allowed in configfs.
|
||||||
|
|
||||||
[Automatically Created Subgroups]
|
Automatically Created Subgroups
|
||||||
|
===============================
|
||||||
|
|
||||||
A new config_group may want to have two types of child config_items.
|
A new config_group may want to have two types of child config_items.
|
||||||
While this could be codified by magic names in ->make_item(), it is much
|
While this could be codified by magic names in ->make_item(), it is much
|
||||||
@ -433,7 +459,8 @@ As a consequence of this, default groups cannot be removed directly via
|
|||||||
rmdir(2). They also are not considered when rmdir(2) on the parent
|
rmdir(2). They also are not considered when rmdir(2) on the parent
|
||||||
group is checking for children.
|
group is checking for children.
|
||||||
|
|
||||||
[Dependent Subsystems]
|
Dependent Subsystems
|
||||||
|
====================
|
||||||
|
|
||||||
Sometimes other drivers depend on particular configfs items. For
|
Sometimes other drivers depend on particular configfs items. For
|
||||||
example, ocfs2 mounts depend on a heartbeat region item. If that
|
example, ocfs2 mounts depend on a heartbeat region item. If that
|
||||||
@ -460,9 +487,11 @@ succeeds, then heartbeat knows the region is safe to give to ocfs2.
|
|||||||
If it fails, it was being torn down anyway, and heartbeat can gracefully
|
If it fails, it was being torn down anyway, and heartbeat can gracefully
|
||||||
pass up an error.
|
pass up an error.
|
||||||
|
|
||||||
[Committable Items]
|
Committable Items
|
||||||
|
=================
|
||||||
|
|
||||||
NOTE: Committable items are currently unimplemented.
|
Note:
|
||||||
|
Committable items are currently unimplemented.
|
||||||
|
|
||||||
Some config_items cannot have a valid initial state. That is, no
|
Some config_items cannot have a valid initial state. That is, no
|
||||||
default values can be specified for the item's attributes such that the
|
default values can be specified for the item's attributes such that the
|
||||||
@ -504,5 +533,3 @@ As rmdir(2) does not work in the "live" directory, an item must be
|
|||||||
shutdown, or "uncommitted". Again, this is done via rename(2), this
|
shutdown, or "uncommitted". Again, this is done via rename(2), this
|
||||||
time from the "live" directory back to the "pending" one. The subsystem
|
time from the "live" directory back to the "pending" one. The subsystem
|
||||||
is notified by the ct_group_ops->uncommit_object() method.
|
is notified by the ct_group_ops->uncommit_object() method.
|
||||||
|
|
||||||
|
|
@ -74,7 +74,7 @@ are zeroed out and converted to written extents before being returned to avoid
|
|||||||
exposure of uninitialized data through mmap.
|
exposure of uninitialized data through mmap.
|
||||||
|
|
||||||
These filesystems may be used for inspiration:
|
These filesystems may be used for inspiration:
|
||||||
- ext2: see Documentation/filesystems/ext2.txt
|
- ext2: see Documentation/filesystems/ext2.rst
|
||||||
- ext4: see Documentation/filesystems/ext4/
|
- ext4: see Documentation/filesystems/ext4/
|
||||||
- xfs: see Documentation/admin-guide/xfs.rst
|
- xfs: see Documentation/admin-guide/xfs.rst
|
||||||
|
|
||||||
|
@ -166,16 +166,17 @@ file::
|
|||||||
};
|
};
|
||||||
|
|
||||||
struct debugfs_regset32 {
|
struct debugfs_regset32 {
|
||||||
struct debugfs_reg32 *regs;
|
const struct debugfs_reg32 *regs;
|
||||||
int nregs;
|
int nregs;
|
||||||
void __iomem *base;
|
void __iomem *base;
|
||||||
|
struct device *dev; /* Optional device for Runtime PM */
|
||||||
};
|
};
|
||||||
|
|
||||||
debugfs_create_regset32(const char *name, umode_t mode,
|
debugfs_create_regset32(const char *name, umode_t mode,
|
||||||
struct dentry *parent,
|
struct dentry *parent,
|
||||||
struct debugfs_regset32 *regset);
|
struct debugfs_regset32 *regset);
|
||||||
|
|
||||||
void debugfs_print_regs32(struct seq_file *s, struct debugfs_reg32 *regs,
|
void debugfs_print_regs32(struct seq_file *s, const struct debugfs_reg32 *regs,
|
||||||
int nregs, void __iomem *base, char *prefix);
|
int nregs, void __iomem *base, char *prefix);
|
||||||
|
|
||||||
The "base" argument may be 0, but you may want to build the reg32 array
|
The "base" argument may be 0, but you may want to build the reg32 array
|
||||||
|
36
Documentation/filesystems/devpts.rst
Normal file
36
Documentation/filesystems/devpts.rst
Normal file
@ -0,0 +1,36 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
=====================
|
||||||
|
The Devpts Filesystem
|
||||||
|
=====================
|
||||||
|
|
||||||
|
Each mount of the devpts filesystem is now distinct such that ptys
|
||||||
|
and their indicies allocated in one mount are independent from ptys
|
||||||
|
and their indicies in all other mounts.
|
||||||
|
|
||||||
|
All mounts of the devpts filesystem now create a ``/dev/pts/ptmx`` node
|
||||||
|
with permissions ``0000``.
|
||||||
|
|
||||||
|
To retain backwards compatibility the a ptmx device node (aka any node
|
||||||
|
created with ``mknod name c 5 2``) when opened will look for an instance
|
||||||
|
of devpts under the name ``pts`` in the same directory as the ptmx device
|
||||||
|
node.
|
||||||
|
|
||||||
|
As an option instead of placing a ``/dev/ptmx`` device node at ``/dev/ptmx``
|
||||||
|
it is possible to place a symlink to ``/dev/pts/ptmx`` at ``/dev/ptmx`` or
|
||||||
|
to bind mount ``/dev/ptx/ptmx`` to ``/dev/ptmx``. If you opt for using
|
||||||
|
the devpts filesystem in this manner devpts should be mounted with
|
||||||
|
the ``ptmxmode=0666``, or ``chmod 0666 /dev/pts/ptmx`` should be called.
|
||||||
|
|
||||||
|
Total count of pty pairs in all instances is limited by sysctls::
|
||||||
|
|
||||||
|
kernel.pty.max = 4096 - global limit
|
||||||
|
kernel.pty.reserve = 1024 - reserved for filesystems mounted from the initial mount namespace
|
||||||
|
kernel.pty.nr - current count of ptys
|
||||||
|
|
||||||
|
Per-instance limit could be set by adding mount option ``max=<count>``.
|
||||||
|
|
||||||
|
This feature was added in kernel 3.4 together with
|
||||||
|
``sysctl kernel.pty.reserve``.
|
||||||
|
|
||||||
|
In kernels older than 3.4 sysctl ``kernel.pty.max`` works as per-instance limit.
|
@ -1,26 +0,0 @@
|
|||||||
Each mount of the devpts filesystem is now distinct such that ptys
|
|
||||||
and their indicies allocated in one mount are independent from ptys
|
|
||||||
and their indicies in all other mounts.
|
|
||||||
|
|
||||||
All mounts of the devpts filesystem now create a /dev/pts/ptmx node
|
|
||||||
with permissions 0000.
|
|
||||||
|
|
||||||
To retain backwards compatibility the a ptmx device node (aka any node
|
|
||||||
created with "mknod name c 5 2") when opened will look for an instance
|
|
||||||
of devpts under the name "pts" in the same directory as the ptmx device
|
|
||||||
node.
|
|
||||||
|
|
||||||
As an option instead of placing a /dev/ptmx device node at /dev/ptmx
|
|
||||||
it is possible to place a symlink to /dev/pts/ptmx at /dev/ptmx or
|
|
||||||
to bind mount /dev/ptx/ptmx to /dev/ptmx. If you opt for using
|
|
||||||
the devpts filesystem in this manner devpts should be mounted with
|
|
||||||
the ptmxmode=0666, or chmod 0666 /dev/pts/ptmx should be called.
|
|
||||||
|
|
||||||
Total count of pty pairs in all instances is limited by sysctls:
|
|
||||||
kernel.pty.max = 4096 - global limit
|
|
||||||
kernel.pty.reserve = 1024 - reserved for filesystems mounted from the initial mount namespace
|
|
||||||
kernel.pty.nr - current count of ptys
|
|
||||||
|
|
||||||
Per-instance limit could be set by adding mount option "max=<count>".
|
|
||||||
This feature was added in kernel 3.4 together with sysctl kernel.pty.reserve.
|
|
||||||
In kernels older than 3.4 sysctl kernel.pty.max works as per-instance limit.
|
|
@ -1,5 +1,8 @@
|
|||||||
Linux Directory Notification
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
============================
|
|
||||||
|
============================
|
||||||
|
Linux Directory Notification
|
||||||
|
============================
|
||||||
|
|
||||||
Stephen Rothwell <sfr@canb.auug.org.au>
|
Stephen Rothwell <sfr@canb.auug.org.au>
|
||||||
|
|
||||||
@ -12,6 +15,7 @@ being delivered using signals.
|
|||||||
The application decides which "events" it wants to be notified about.
|
The application decides which "events" it wants to be notified about.
|
||||||
The currently defined events are:
|
The currently defined events are:
|
||||||
|
|
||||||
|
========= =====================================================
|
||||||
DN_ACCESS A file in the directory was accessed (read)
|
DN_ACCESS A file in the directory was accessed (read)
|
||||||
DN_MODIFY A file in the directory was modified (write,truncate)
|
DN_MODIFY A file in the directory was modified (write,truncate)
|
||||||
DN_CREATE A file was created in the directory
|
DN_CREATE A file was created in the directory
|
||||||
@ -19,6 +23,7 @@ The currently defined events are:
|
|||||||
DN_RENAME A file in the directory was renamed
|
DN_RENAME A file in the directory was renamed
|
||||||
DN_ATTRIB A file in the directory had its attributes
|
DN_ATTRIB A file in the directory had its attributes
|
||||||
changed (chmod,chown)
|
changed (chmod,chown)
|
||||||
|
========= =====================================================
|
||||||
|
|
||||||
Usually, the application must reregister after each notification, but
|
Usually, the application must reregister after each notification, but
|
||||||
if DN_MULTISHOT is or'ed with the event mask, then the registration will
|
if DN_MULTISHOT is or'ed with the event mask, then the registration will
|
||||||
@ -36,7 +41,7 @@ especially important if DN_MULTISHOT is specified. Note that SIGRTMIN
|
|||||||
is often blocked, so it is better to use (at least) SIGRTMIN + 1.
|
is often blocked, so it is better to use (at least) SIGRTMIN + 1.
|
||||||
|
|
||||||
Implementation expectations (features and bugs :-))
|
Implementation expectations (features and bugs :-))
|
||||||
---------------------------
|
---------------------------------------------------
|
||||||
|
|
||||||
The notification should work for any local access to files even if the
|
The notification should work for any local access to files even if the
|
||||||
actual file system is on a remote server. This implies that remote
|
actual file system is on a remote server. This implies that remote
|
||||||
@ -67,4 +72,4 @@ See tools/testing/selftests/filesystems/dnotify_test.c for an example.
|
|||||||
NOTE
|
NOTE
|
||||||
----
|
----
|
||||||
Beginning with Linux 2.6.13, dnotify has been replaced by inotify.
|
Beginning with Linux 2.6.13, dnotify has been replaced by inotify.
|
||||||
See Documentation/filesystems/inotify.txt for more information on it.
|
See Documentation/filesystems/inotify.rst for more information on it.
|
@ -24,3 +24,20 @@ files that are not well-known standardized variables are created
|
|||||||
as immutable files. This doesn't prevent removal - "chattr -i" will work -
|
as immutable files. This doesn't prevent removal - "chattr -i" will work -
|
||||||
but it does prevent this kind of failure from being accomplished
|
but it does prevent this kind of failure from being accomplished
|
||||||
accidentally.
|
accidentally.
|
||||||
|
|
||||||
|
.. warning ::
|
||||||
|
When a content of an UEFI variable in /sys/firmware/efi/efivars is
|
||||||
|
displayed, for example using "hexdump", pay attention that the first
|
||||||
|
4 bytes of the output represent the UEFI variable attributes,
|
||||||
|
in little-endian format.
|
||||||
|
|
||||||
|
Practically the output of each efivar is composed of:
|
||||||
|
|
||||||
|
+-----------------------------------+
|
||||||
|
|4_bytes_of_attributes + efivar_data|
|
||||||
|
+-----------------------------------+
|
||||||
|
|
||||||
|
*See also:*
|
||||||
|
|
||||||
|
- Documentation/admin-guide/acpi/ssdt-overlays.rst
|
||||||
|
- Documentation/ABI/stable/sysfs-firmware-efi-vars
|
||||||
|
@ -1,3 +1,5 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
============
|
============
|
||||||
Fiemap Ioctl
|
Fiemap Ioctl
|
||||||
============
|
============
|
||||||
@ -10,9 +12,9 @@ returns a list of extents.
|
|||||||
Request Basics
|
Request Basics
|
||||||
--------------
|
--------------
|
||||||
|
|
||||||
A fiemap request is encoded within struct fiemap:
|
A fiemap request is encoded within struct fiemap::
|
||||||
|
|
||||||
struct fiemap {
|
struct fiemap {
|
||||||
__u64 fm_start; /* logical offset (inclusive) at
|
__u64 fm_start; /* logical offset (inclusive) at
|
||||||
* which to start mapping (in) */
|
* which to start mapping (in) */
|
||||||
__u64 fm_length; /* logical length of mapping which
|
__u64 fm_length; /* logical length of mapping which
|
||||||
@ -23,7 +25,7 @@ struct fiemap {
|
|||||||
__u32 fm_extent_count; /* size of fm_extents array (in) */
|
__u32 fm_extent_count; /* size of fm_extents array (in) */
|
||||||
__u32 fm_reserved;
|
__u32 fm_reserved;
|
||||||
struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */
|
struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|
||||||
fm_start, and fm_length specify the logical range within the file
|
fm_start, and fm_length specify the logical range within the file
|
||||||
@ -51,12 +53,12 @@ nothing to prevent the file from changing between calls to FIEMAP.
|
|||||||
|
|
||||||
The following flags can be set in fm_flags:
|
The following flags can be set in fm_flags:
|
||||||
|
|
||||||
* FIEMAP_FLAG_SYNC
|
FIEMAP_FLAG_SYNC
|
||||||
If this flag is set, the kernel will sync the file before mapping extents.
|
If this flag is set, the kernel will sync the file before mapping extents.
|
||||||
|
|
||||||
* FIEMAP_FLAG_XATTR
|
FIEMAP_FLAG_XATTR
|
||||||
If this flag is set, the extents returned will describe the inodes
|
If this flag is set, the extents returned will describe the inodes
|
||||||
extended attribute lookup tree, instead of its data tree.
|
extended attribute lookup tree, instead of its data tree.
|
||||||
|
|
||||||
|
|
||||||
Extent Mapping
|
Extent Mapping
|
||||||
@ -75,18 +77,18 @@ complete the requested range and will not have the FIEMAP_EXTENT_LAST
|
|||||||
flag set (see the next section on extent flags).
|
flag set (see the next section on extent flags).
|
||||||
|
|
||||||
Each extent is described by a single fiemap_extent structure as
|
Each extent is described by a single fiemap_extent structure as
|
||||||
returned in fm_extents.
|
returned in fm_extents::
|
||||||
|
|
||||||
struct fiemap_extent {
|
struct fiemap_extent {
|
||||||
__u64 fe_logical; /* logical offset in bytes for the start of
|
__u64 fe_logical; /* logical offset in bytes for the start of
|
||||||
* the extent */
|
* the extent */
|
||||||
__u64 fe_physical; /* physical offset in bytes for the start
|
__u64 fe_physical; /* physical offset in bytes for the start
|
||||||
* of the extent */
|
* of the extent */
|
||||||
__u64 fe_length; /* length in bytes for the extent */
|
__u64 fe_length; /* length in bytes for the extent */
|
||||||
__u64 fe_reserved64[2];
|
__u64 fe_reserved64[2];
|
||||||
__u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */
|
__u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */
|
||||||
__u32 fe_reserved[3];
|
__u32 fe_reserved[3];
|
||||||
};
|
};
|
||||||
|
|
||||||
All offsets and lengths are in bytes and mirror those on disk. It is valid
|
All offsets and lengths are in bytes and mirror those on disk. It is valid
|
||||||
for an extents logical offset to start before the request or its logical
|
for an extents logical offset to start before the request or its logical
|
||||||
@ -114,26 +116,27 @@ worry about all present and future flags which might imply unaligned
|
|||||||
data. Note that the opposite is not true - it would be valid for
|
data. Note that the opposite is not true - it would be valid for
|
||||||
FIEMAP_EXTENT_NOT_ALIGNED to appear alone.
|
FIEMAP_EXTENT_NOT_ALIGNED to appear alone.
|
||||||
|
|
||||||
* FIEMAP_EXTENT_LAST
|
FIEMAP_EXTENT_LAST
|
||||||
This is generally the last extent in the file. A mapping attempt past
|
This is generally the last extent in the file. A mapping attempt past
|
||||||
this extent may return nothing. Some implementations set this flag to
|
this extent may return nothing. Some implementations set this flag to
|
||||||
indicate this extent is the last one in the range queried by the user
|
indicate this extent is the last one in the range queried by the user
|
||||||
(via fiemap->fm_length).
|
(via fiemap->fm_length).
|
||||||
|
|
||||||
* FIEMAP_EXTENT_UNKNOWN
|
FIEMAP_EXTENT_UNKNOWN
|
||||||
The location of this extent is currently unknown. This may indicate
|
The location of this extent is currently unknown. This may indicate
|
||||||
the data is stored on an inaccessible volume or that no storage has
|
the data is stored on an inaccessible volume or that no storage has
|
||||||
been allocated for the file yet.
|
been allocated for the file yet.
|
||||||
|
|
||||||
* FIEMAP_EXTENT_DELALLOC
|
FIEMAP_EXTENT_DELALLOC
|
||||||
- This will also set FIEMAP_EXTENT_UNKNOWN.
|
This will also set FIEMAP_EXTENT_UNKNOWN.
|
||||||
Delayed allocation - while there is data for this extent, its
|
|
||||||
physical location has not been allocated yet.
|
|
||||||
|
|
||||||
* FIEMAP_EXTENT_ENCODED
|
Delayed allocation - while there is data for this extent, its
|
||||||
This extent does not consist of plain filesystem blocks but is
|
physical location has not been allocated yet.
|
||||||
encoded (e.g. encrypted or compressed). Reading the data in this
|
|
||||||
extent via I/O to the block device will have undefined results.
|
FIEMAP_EXTENT_ENCODED
|
||||||
|
This extent does not consist of plain filesystem blocks but is
|
||||||
|
encoded (e.g. encrypted or compressed). Reading the data in this
|
||||||
|
extent via I/O to the block device will have undefined results.
|
||||||
|
|
||||||
Note that it is *always* undefined to try to update the data
|
Note that it is *always* undefined to try to update the data
|
||||||
in-place by writing to the indicated location without the
|
in-place by writing to the indicated location without the
|
||||||
@ -145,32 +148,32 @@ unmounted, and then only if the FIEMAP_EXTENT_ENCODED flag is
|
|||||||
clear; user applications must not try reading or writing to the
|
clear; user applications must not try reading or writing to the
|
||||||
filesystem via the block device under any other circumstances.
|
filesystem via the block device under any other circumstances.
|
||||||
|
|
||||||
* FIEMAP_EXTENT_DATA_ENCRYPTED
|
FIEMAP_EXTENT_DATA_ENCRYPTED
|
||||||
- This will also set FIEMAP_EXTENT_ENCODED
|
This will also set FIEMAP_EXTENT_ENCODED
|
||||||
The data in this extent has been encrypted by the file system.
|
The data in this extent has been encrypted by the file system.
|
||||||
|
|
||||||
* FIEMAP_EXTENT_NOT_ALIGNED
|
FIEMAP_EXTENT_NOT_ALIGNED
|
||||||
Extent offsets and length are not guaranteed to be block aligned.
|
Extent offsets and length are not guaranteed to be block aligned.
|
||||||
|
|
||||||
* FIEMAP_EXTENT_DATA_INLINE
|
FIEMAP_EXTENT_DATA_INLINE
|
||||||
This will also set FIEMAP_EXTENT_NOT_ALIGNED
|
This will also set FIEMAP_EXTENT_NOT_ALIGNED
|
||||||
Data is located within a meta data block.
|
Data is located within a meta data block.
|
||||||
|
|
||||||
* FIEMAP_EXTENT_DATA_TAIL
|
FIEMAP_EXTENT_DATA_TAIL
|
||||||
This will also set FIEMAP_EXTENT_NOT_ALIGNED
|
This will also set FIEMAP_EXTENT_NOT_ALIGNED
|
||||||
Data is packed into a block with data from other files.
|
Data is packed into a block with data from other files.
|
||||||
|
|
||||||
* FIEMAP_EXTENT_UNWRITTEN
|
FIEMAP_EXTENT_UNWRITTEN
|
||||||
Unwritten extent - the extent is allocated but its data has not been
|
Unwritten extent - the extent is allocated but its data has not been
|
||||||
initialized. This indicates the extent's data will be all zero if read
|
initialized. This indicates the extent's data will be all zero if read
|
||||||
through the filesystem but the contents are undefined if read directly from
|
through the filesystem but the contents are undefined if read directly from
|
||||||
the device.
|
the device.
|
||||||
|
|
||||||
* FIEMAP_EXTENT_MERGED
|
FIEMAP_EXTENT_MERGED
|
||||||
This will be set when a file does not support extents, i.e., it uses a block
|
This will be set when a file does not support extents, i.e., it uses a block
|
||||||
based addressing scheme. Since returning an extent for each block back to
|
based addressing scheme. Since returning an extent for each block back to
|
||||||
userspace would be highly inefficient, the kernel will try to merge most
|
userspace would be highly inefficient, the kernel will try to merge most
|
||||||
adjacent blocks into 'extents'.
|
adjacent blocks into 'extents'.
|
||||||
|
|
||||||
|
|
||||||
VFS -> File System Implementation
|
VFS -> File System Implementation
|
||||||
@ -179,23 +182,23 @@ VFS -> File System Implementation
|
|||||||
File systems wishing to support fiemap must implement a ->fiemap callback on
|
File systems wishing to support fiemap must implement a ->fiemap callback on
|
||||||
their inode_operations structure. The fs ->fiemap call is responsible for
|
their inode_operations structure. The fs ->fiemap call is responsible for
|
||||||
defining its set of supported fiemap flags, and calling a helper function on
|
defining its set of supported fiemap flags, and calling a helper function on
|
||||||
each discovered extent:
|
each discovered extent::
|
||||||
|
|
||||||
struct inode_operations {
|
struct inode_operations {
|
||||||
...
|
...
|
||||||
|
|
||||||
int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
|
int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
|
||||||
u64 len);
|
u64 len);
|
||||||
|
|
||||||
->fiemap is passed struct fiemap_extent_info which describes the
|
->fiemap is passed struct fiemap_extent_info which describes the
|
||||||
fiemap request:
|
fiemap request::
|
||||||
|
|
||||||
struct fiemap_extent_info {
|
struct fiemap_extent_info {
|
||||||
unsigned int fi_flags; /* Flags as passed from user */
|
unsigned int fi_flags; /* Flags as passed from user */
|
||||||
unsigned int fi_extents_mapped; /* Number of mapped extents */
|
unsigned int fi_extents_mapped; /* Number of mapped extents */
|
||||||
unsigned int fi_extents_max; /* Size of fiemap_extent array */
|
unsigned int fi_extents_max; /* Size of fiemap_extent array */
|
||||||
struct fiemap_extent *fi_extents_start; /* Start of fiemap_extent array */
|
struct fiemap_extent *fi_extents_start; /* Start of fiemap_extent array */
|
||||||
};
|
};
|
||||||
|
|
||||||
It is intended that the file system should not need to access any of this
|
It is intended that the file system should not need to access any of this
|
||||||
structure directly. Filesystem handlers should be tolerant to signals and return
|
structure directly. Filesystem handlers should be tolerant to signals and return
|
||||||
@ -203,9 +206,9 @@ EINTR once fatal signal received.
|
|||||||
|
|
||||||
|
|
||||||
Flag checking should be done at the beginning of the ->fiemap callback via the
|
Flag checking should be done at the beginning of the ->fiemap callback via the
|
||||||
fiemap_check_flags() helper:
|
fiemap_check_flags() helper::
|
||||||
|
|
||||||
int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);
|
int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);
|
||||||
|
|
||||||
The struct fieinfo should be passed in as received from ioctl_fiemap(). The
|
The struct fieinfo should be passed in as received from ioctl_fiemap(). The
|
||||||
set of fiemap flags which the fs understands should be passed via fs_flags. If
|
set of fiemap flags which the fs understands should be passed via fs_flags. If
|
||||||
@ -216,10 +219,10 @@ ioctl_fiemap().
|
|||||||
|
|
||||||
|
|
||||||
For each extent in the request range, the file system should call
|
For each extent in the request range, the file system should call
|
||||||
the helper function, fiemap_fill_next_extent():
|
the helper function, fiemap_fill_next_extent()::
|
||||||
|
|
||||||
int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
|
int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
|
||||||
u64 phys, u64 len, u32 flags, u32 dev);
|
u64 phys, u64 len, u32 flags, u32 dev);
|
||||||
|
|
||||||
fiemap_fill_next_extent() will use the passed values to populate the
|
fiemap_fill_next_extent() will use the passed values to populate the
|
||||||
next free extent in the fm_extents array. 'General' extent flags will
|
next free extent in the fm_extents array. 'General' extent flags will
|
@ -1,5 +1,8 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
===================================
|
||||||
File management in the Linux kernel
|
File management in the Linux kernel
|
||||||
-----------------------------------
|
===================================
|
||||||
|
|
||||||
This document describes how locking for files (struct file)
|
This document describes how locking for files (struct file)
|
||||||
and file descriptor table (struct files) works.
|
and file descriptor table (struct files) works.
|
||||||
@ -34,7 +37,7 @@ appear atomic. Here are the locking rules for
|
|||||||
the fdtable structure -
|
the fdtable structure -
|
||||||
|
|
||||||
1. All references to the fdtable must be done through
|
1. All references to the fdtable must be done through
|
||||||
the files_fdtable() macro :
|
the files_fdtable() macro::
|
||||||
|
|
||||||
struct fdtable *fdt;
|
struct fdtable *fdt;
|
||||||
|
|
||||||
@ -61,7 +64,8 @@ the fdtable structure -
|
|||||||
4. To look up the file structure given an fd, a reader
|
4. To look up the file structure given an fd, a reader
|
||||||
must use either fcheck() or fcheck_files() APIs. These
|
must use either fcheck() or fcheck_files() APIs. These
|
||||||
take care of barrier requirements due to lock-free lookup.
|
take care of barrier requirements due to lock-free lookup.
|
||||||
An example :
|
|
||||||
|
An example::
|
||||||
|
|
||||||
struct file *file;
|
struct file *file;
|
||||||
|
|
||||||
@ -77,7 +81,7 @@ the fdtable structure -
|
|||||||
of the fd (fget()/fget_light()) are lock-free, it is possible
|
of the fd (fget()/fget_light()) are lock-free, it is possible
|
||||||
that look-up may race with the last put() operation on the
|
that look-up may race with the last put() operation on the
|
||||||
file structure. This is avoided using atomic_long_inc_not_zero()
|
file structure. This is avoided using atomic_long_inc_not_zero()
|
||||||
on ->f_count :
|
on ->f_count::
|
||||||
|
|
||||||
rcu_read_lock();
|
rcu_read_lock();
|
||||||
file = fcheck_files(files, fd);
|
file = fcheck_files(files, fd);
|
||||||
@ -106,7 +110,8 @@ the fdtable structure -
|
|||||||
holding files->file_lock. If ->file_lock is dropped, then
|
holding files->file_lock. If ->file_lock is dropped, then
|
||||||
another thread expand the files thereby creating a new
|
another thread expand the files thereby creating a new
|
||||||
fdtable and making the earlier fdtable pointer stale.
|
fdtable and making the earlier fdtable pointer stale.
|
||||||
For example :
|
|
||||||
|
For example::
|
||||||
|
|
||||||
spin_lock(&files->file_lock);
|
spin_lock(&files->file_lock);
|
||||||
fd = locate_fd(files, file, start);
|
fd = locate_fd(files, file, start);
|
@ -1,3 +1,9 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
==============
|
||||||
|
Fuse I/O Modes
|
||||||
|
==============
|
||||||
|
|
||||||
Fuse supports the following I/O modes:
|
Fuse supports the following I/O modes:
|
||||||
|
|
||||||
- direct-io
|
- direct-io
|
@ -24,6 +24,22 @@ algorithms work.
|
|||||||
splice
|
splice
|
||||||
locking
|
locking
|
||||||
directory-locking
|
directory-locking
|
||||||
|
devpts
|
||||||
|
dnotify
|
||||||
|
fiemap
|
||||||
|
files
|
||||||
|
locks
|
||||||
|
mandatory-locking
|
||||||
|
mount_api
|
||||||
|
quota
|
||||||
|
seq_file
|
||||||
|
sharedsubtree
|
||||||
|
sysfs-pci
|
||||||
|
sysfs-tagging
|
||||||
|
|
||||||
|
automount-support
|
||||||
|
|
||||||
|
caching/index
|
||||||
|
|
||||||
porting
|
porting
|
||||||
|
|
||||||
@ -57,7 +73,10 @@ Documentation for filesystem implementations.
|
|||||||
befs
|
befs
|
||||||
bfs
|
bfs
|
||||||
btrfs
|
btrfs
|
||||||
|
cifs/cifsroot
|
||||||
ceph
|
ceph
|
||||||
|
coda
|
||||||
|
configfs
|
||||||
cramfs
|
cramfs
|
||||||
debugfs
|
debugfs
|
||||||
dlmfs
|
dlmfs
|
||||||
@ -73,6 +92,7 @@ Documentation for filesystem implementations.
|
|||||||
hfsplus
|
hfsplus
|
||||||
hpfs
|
hpfs
|
||||||
fuse
|
fuse
|
||||||
|
fuse-io
|
||||||
inotify
|
inotify
|
||||||
isofs
|
isofs
|
||||||
nilfs2
|
nilfs2
|
||||||
@ -88,6 +108,7 @@ Documentation for filesystem implementations.
|
|||||||
ramfs-rootfs-initramfs
|
ramfs-rootfs-initramfs
|
||||||
relay
|
relay
|
||||||
romfs
|
romfs
|
||||||
|
spufs/index
|
||||||
squashfs
|
squashfs
|
||||||
sysfs
|
sysfs
|
||||||
sysv-fs
|
sysv-fs
|
||||||
@ -97,4 +118,6 @@ Documentation for filesystem implementations.
|
|||||||
udf
|
udf
|
||||||
virtiofs
|
virtiofs
|
||||||
vfat
|
vfat
|
||||||
|
xfs-delayed-logging-design
|
||||||
|
xfs-self-describing-metadata
|
||||||
zonefs
|
zonefs
|
||||||
|
@ -1,4 +1,8 @@
|
|||||||
File Locking Release Notes
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
==========================
|
||||||
|
File Locking Release Notes
|
||||||
|
==========================
|
||||||
|
|
||||||
Andy Walker <andy@lysaker.kvaerner.no>
|
Andy Walker <andy@lysaker.kvaerner.no>
|
||||||
|
|
||||||
@ -6,7 +10,7 @@
|
|||||||
|
|
||||||
|
|
||||||
1. What's New?
|
1. What's New?
|
||||||
--------------
|
==============
|
||||||
|
|
||||||
1.1 Broken Flock Emulation
|
1.1 Broken Flock Emulation
|
||||||
--------------------------
|
--------------------------
|
||||||
@ -25,7 +29,7 @@ anyway (see the file "Documentation/process/changes.rst".)
|
|||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
1.2.1 Typical Problems - Sendmail
|
1.2.1 Typical Problems - Sendmail
|
||||||
---------------------------------
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
Because sendmail was unable to use the old flock() emulation, many sendmail
|
Because sendmail was unable to use the old flock() emulation, many sendmail
|
||||||
installations use fcntl() instead of flock(). This is true of Slackware 3.0
|
installations use fcntl() instead of flock(). This is true of Slackware 3.0
|
||||||
for example. This gave rise to some other subtle problems if sendmail was
|
for example. This gave rise to some other subtle problems if sendmail was
|
||||||
@ -37,7 +41,7 @@ to lock solid with deadlocked processes.
|
|||||||
|
|
||||||
|
|
||||||
1.2.2 The Solution
|
1.2.2 The Solution
|
||||||
------------------
|
^^^^^^^^^^^^^^^^^^
|
||||||
The solution I have chosen, after much experimentation and discussion,
|
The solution I have chosen, after much experimentation and discussion,
|
||||||
is to make flock() and fcntl() locks oblivious to each other. Both can
|
is to make flock() and fcntl() locks oblivious to each other. Both can
|
||||||
exists, and neither will have any effect on the other.
|
exists, and neither will have any effect on the other.
|
||||||
@ -54,7 +58,7 @@ fcntl(), with all the problems that implies.
|
|||||||
---------------------------------------
|
---------------------------------------
|
||||||
|
|
||||||
Mandatory locking, as described in
|
Mandatory locking, as described in
|
||||||
'Documentation/filesystems/mandatory-locking.txt' was prior to this release a
|
'Documentation/filesystems/mandatory-locking.rst' was prior to this release a
|
||||||
general configuration option that was valid for all mounted filesystems. This
|
general configuration option that was valid for all mounted filesystems. This
|
||||||
had a number of inherent dangers, not the least of which was the ability to
|
had a number of inherent dangers, not the least of which was the ability to
|
||||||
freeze an NFS server by asking it to read a file for which a mandatory lock
|
freeze an NFS server by asking it to read a file for which a mandatory lock
|
@ -1,8 +1,13 @@
|
|||||||
Mandatory File Locking For The Linux Operating System
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
=====================================================
|
||||||
|
Mandatory File Locking For The Linux Operating System
|
||||||
|
=====================================================
|
||||||
|
|
||||||
Andy Walker <andy@lysaker.kvaerner.no>
|
Andy Walker <andy@lysaker.kvaerner.no>
|
||||||
|
|
||||||
15 April 1996
|
15 April 1996
|
||||||
|
|
||||||
(Updated September 2007)
|
(Updated September 2007)
|
||||||
|
|
||||||
0. Why you should avoid mandatory locking
|
0. Why you should avoid mandatory locking
|
||||||
@ -53,15 +58,17 @@ possible on existing user code. The scheme is based on marking individual files
|
|||||||
as candidates for mandatory locking, and using the existing fcntl()/lockf()
|
as candidates for mandatory locking, and using the existing fcntl()/lockf()
|
||||||
interface for applying locks just as if they were normal, advisory locks.
|
interface for applying locks just as if they were normal, advisory locks.
|
||||||
|
|
||||||
Note 1: In saying "file" in the paragraphs above I am actually not telling
|
.. Note::
|
||||||
the whole truth. System V locking is based on fcntl(). The granularity of
|
|
||||||
fcntl() is such that it allows the locking of byte ranges in files, in addition
|
|
||||||
to entire files, so the mandatory locking rules also have byte level
|
|
||||||
granularity.
|
|
||||||
|
|
||||||
Note 2: POSIX.1 does not specify any scheme for mandatory locking, despite
|
1. In saying "file" in the paragraphs above I am actually not telling
|
||||||
borrowing the fcntl() locking scheme from System V. The mandatory locking
|
the whole truth. System V locking is based on fcntl(). The granularity of
|
||||||
scheme is defined by the System V Interface Definition (SVID) Version 3.
|
fcntl() is such that it allows the locking of byte ranges in files, in
|
||||||
|
addition to entire files, so the mandatory locking rules also have byte
|
||||||
|
level granularity.
|
||||||
|
|
||||||
|
2. POSIX.1 does not specify any scheme for mandatory locking, despite
|
||||||
|
borrowing the fcntl() locking scheme from System V. The mandatory locking
|
||||||
|
scheme is defined by the System V Interface Definition (SVID) Version 3.
|
||||||
|
|
||||||
2. Marking a file for mandatory locking
|
2. Marking a file for mandatory locking
|
||||||
---------------------------------------
|
---------------------------------------
|
@ -1,8 +1,10 @@
|
|||||||
====================
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
FILESYSTEM MOUNT API
|
|
||||||
====================
|
|
||||||
|
|
||||||
CONTENTS
|
====================
|
||||||
|
fILESYSTEM Mount API
|
||||||
|
====================
|
||||||
|
|
||||||
|
.. CONTENTS
|
||||||
|
|
||||||
(1) Overview.
|
(1) Overview.
|
||||||
|
|
||||||
@ -21,8 +23,7 @@ CONTENTS
|
|||||||
(8) Parameter helper functions.
|
(8) Parameter helper functions.
|
||||||
|
|
||||||
|
|
||||||
========
|
Overview
|
||||||
OVERVIEW
|
|
||||||
========
|
========
|
||||||
|
|
||||||
The creation of new mounts is now to be done in a multistep process:
|
The creation of new mounts is now to be done in a multistep process:
|
||||||
@ -43,7 +44,7 @@ The creation of new mounts is now to be done in a multistep process:
|
|||||||
|
|
||||||
(7) Destroy the context.
|
(7) Destroy the context.
|
||||||
|
|
||||||
To support this, the file_system_type struct gains two new fields:
|
To support this, the file_system_type struct gains two new fields::
|
||||||
|
|
||||||
int (*init_fs_context)(struct fs_context *fc);
|
int (*init_fs_context)(struct fs_context *fc);
|
||||||
const struct fs_parameter_description *parameters;
|
const struct fs_parameter_description *parameters;
|
||||||
@ -57,12 +58,11 @@ Note that security initialisation is done *after* the filesystem is called so
|
|||||||
that the namespaces may be adjusted first.
|
that the namespaces may be adjusted first.
|
||||||
|
|
||||||
|
|
||||||
======================
|
The Filesystem context
|
||||||
THE FILESYSTEM CONTEXT
|
|
||||||
======================
|
======================
|
||||||
|
|
||||||
The creation and reconfiguration of a superblock is governed by a filesystem
|
The creation and reconfiguration of a superblock is governed by a filesystem
|
||||||
context. This is represented by the fs_context structure:
|
context. This is represented by the fs_context structure::
|
||||||
|
|
||||||
struct fs_context {
|
struct fs_context {
|
||||||
const struct fs_context_operations *ops;
|
const struct fs_context_operations *ops;
|
||||||
@ -86,78 +86,106 @@ context. This is represented by the fs_context structure:
|
|||||||
|
|
||||||
The fs_context fields are as follows:
|
The fs_context fields are as follows:
|
||||||
|
|
||||||
(*) const struct fs_context_operations *ops
|
* ::
|
||||||
|
|
||||||
|
const struct fs_context_operations *ops
|
||||||
|
|
||||||
These are operations that can be done on a filesystem context (see
|
These are operations that can be done on a filesystem context (see
|
||||||
below). This must be set by the ->init_fs_context() file_system_type
|
below). This must be set by the ->init_fs_context() file_system_type
|
||||||
operation.
|
operation.
|
||||||
|
|
||||||
(*) struct file_system_type *fs_type
|
* ::
|
||||||
|
|
||||||
|
struct file_system_type *fs_type
|
||||||
|
|
||||||
A pointer to the file_system_type of the filesystem that is being
|
A pointer to the file_system_type of the filesystem that is being
|
||||||
constructed or reconfigured. This retains a reference on the type owner.
|
constructed or reconfigured. This retains a reference on the type owner.
|
||||||
|
|
||||||
(*) void *fs_private
|
* ::
|
||||||
|
|
||||||
|
void *fs_private
|
||||||
|
|
||||||
A pointer to the file system's private data. This is where the filesystem
|
A pointer to the file system's private data. This is where the filesystem
|
||||||
will need to store any options it parses.
|
will need to store any options it parses.
|
||||||
|
|
||||||
(*) struct dentry *root
|
* ::
|
||||||
|
|
||||||
|
struct dentry *root
|
||||||
|
|
||||||
A pointer to the root of the mountable tree (and indirectly, the
|
A pointer to the root of the mountable tree (and indirectly, the
|
||||||
superblock thereof). This is filled in by the ->get_tree() op. If this
|
superblock thereof). This is filled in by the ->get_tree() op. If this
|
||||||
is set, an active reference on root->d_sb must also be held.
|
is set, an active reference on root->d_sb must also be held.
|
||||||
|
|
||||||
(*) struct user_namespace *user_ns
|
* ::
|
||||||
(*) struct net *net_ns
|
|
||||||
|
struct user_namespace *user_ns
|
||||||
|
struct net *net_ns
|
||||||
|
|
||||||
There are a subset of the namespaces in use by the invoking process. They
|
There are a subset of the namespaces in use by the invoking process. They
|
||||||
retain references on each namespace. The subscribed namespaces may be
|
retain references on each namespace. The subscribed namespaces may be
|
||||||
replaced by the filesystem to reflect other sources, such as the parent
|
replaced by the filesystem to reflect other sources, such as the parent
|
||||||
mount superblock on an automount.
|
mount superblock on an automount.
|
||||||
|
|
||||||
(*) const struct cred *cred
|
* ::
|
||||||
|
|
||||||
|
const struct cred *cred
|
||||||
|
|
||||||
The mounter's credentials. This retains a reference on the credentials.
|
The mounter's credentials. This retains a reference on the credentials.
|
||||||
|
|
||||||
(*) char *source
|
* ::
|
||||||
|
|
||||||
|
char *source
|
||||||
|
|
||||||
This specifies the source. It may be a block device (e.g. /dev/sda1) or
|
This specifies the source. It may be a block device (e.g. /dev/sda1) or
|
||||||
something more exotic, such as the "host:/path" that NFS desires.
|
something more exotic, such as the "host:/path" that NFS desires.
|
||||||
|
|
||||||
(*) char *subtype
|
* ::
|
||||||
|
|
||||||
|
char *subtype
|
||||||
|
|
||||||
This is a string to be added to the type displayed in /proc/mounts to
|
This is a string to be added to the type displayed in /proc/mounts to
|
||||||
qualify it (used by FUSE). This is available for the filesystem to set if
|
qualify it (used by FUSE). This is available for the filesystem to set if
|
||||||
desired.
|
desired.
|
||||||
|
|
||||||
(*) void *security
|
* ::
|
||||||
|
|
||||||
|
void *security
|
||||||
|
|
||||||
A place for the LSMs to hang their security data for the superblock. The
|
A place for the LSMs to hang their security data for the superblock. The
|
||||||
relevant security operations are described below.
|
relevant security operations are described below.
|
||||||
|
|
||||||
(*) void *s_fs_info
|
* ::
|
||||||
|
|
||||||
|
void *s_fs_info
|
||||||
|
|
||||||
The proposed s_fs_info for a new superblock, set in the superblock by
|
The proposed s_fs_info for a new superblock, set in the superblock by
|
||||||
sget_fc(). This can be used to distinguish superblocks.
|
sget_fc(). This can be used to distinguish superblocks.
|
||||||
|
|
||||||
(*) unsigned int sb_flags
|
* ::
|
||||||
(*) unsigned int sb_flags_mask
|
|
||||||
|
unsigned int sb_flags
|
||||||
|
unsigned int sb_flags_mask
|
||||||
|
|
||||||
Which bits SB_* flags are to be set/cleared in super_block::s_flags.
|
Which bits SB_* flags are to be set/cleared in super_block::s_flags.
|
||||||
|
|
||||||
(*) unsigned int s_iflags
|
* ::
|
||||||
|
|
||||||
|
unsigned int s_iflags
|
||||||
|
|
||||||
These will be bitwise-OR'd with s->s_iflags when a superblock is created.
|
These will be bitwise-OR'd with s->s_iflags when a superblock is created.
|
||||||
|
|
||||||
(*) enum fs_context_purpose
|
* ::
|
||||||
|
|
||||||
|
enum fs_context_purpose
|
||||||
|
|
||||||
This indicates the purpose for which the context is intended. The
|
This indicates the purpose for which the context is intended. The
|
||||||
available values are:
|
available values are:
|
||||||
|
|
||||||
FS_CONTEXT_FOR_MOUNT, -- New superblock for explicit mount
|
========================== ======================================
|
||||||
FS_CONTEXT_FOR_SUBMOUNT -- New automatic submount of extant mount
|
FS_CONTEXT_FOR_MOUNT, New superblock for explicit mount
|
||||||
FS_CONTEXT_FOR_RECONFIGURE -- Change an existing mount
|
FS_CONTEXT_FOR_SUBMOUNT New automatic submount of extant mount
|
||||||
|
FS_CONTEXT_FOR_RECONFIGURE Change an existing mount
|
||||||
|
========================== ======================================
|
||||||
|
|
||||||
The mount context is created by calling vfs_new_fs_context() or
|
The mount context is created by calling vfs_new_fs_context() or
|
||||||
vfs_dup_fs_context() and is destroyed with put_fs_context(). Note that the
|
vfs_dup_fs_context() and is destroyed with put_fs_context(). Note that the
|
||||||
@ -176,11 +204,10 @@ mount context. For instance, NFS might pin the appropriate protocol version
|
|||||||
module.
|
module.
|
||||||
|
|
||||||
|
|
||||||
=================================
|
The Filesystem Context Operations
|
||||||
THE FILESYSTEM CONTEXT OPERATIONS
|
|
||||||
=================================
|
=================================
|
||||||
|
|
||||||
The filesystem context points to a table of operations:
|
The filesystem context points to a table of operations::
|
||||||
|
|
||||||
struct fs_context_operations {
|
struct fs_context_operations {
|
||||||
void (*free)(struct fs_context *fc);
|
void (*free)(struct fs_context *fc);
|
||||||
@ -195,24 +222,32 @@ The filesystem context points to a table of operations:
|
|||||||
These operations are invoked by the various stages of the mount procedure to
|
These operations are invoked by the various stages of the mount procedure to
|
||||||
manage the filesystem context. They are as follows:
|
manage the filesystem context. They are as follows:
|
||||||
|
|
||||||
(*) void (*free)(struct fs_context *fc);
|
* ::
|
||||||
|
|
||||||
|
void (*free)(struct fs_context *fc);
|
||||||
|
|
||||||
Called to clean up the filesystem-specific part of the filesystem context
|
Called to clean up the filesystem-specific part of the filesystem context
|
||||||
when the context is destroyed. It should be aware that parts of the
|
when the context is destroyed. It should be aware that parts of the
|
||||||
context may have been removed and NULL'd out by ->get_tree().
|
context may have been removed and NULL'd out by ->get_tree().
|
||||||
|
|
||||||
(*) int (*dup)(struct fs_context *fc, struct fs_context *src_fc);
|
* ::
|
||||||
|
|
||||||
|
int (*dup)(struct fs_context *fc, struct fs_context *src_fc);
|
||||||
|
|
||||||
Called when a filesystem context has been duplicated to duplicate the
|
Called when a filesystem context has been duplicated to duplicate the
|
||||||
filesystem-private data. An error may be returned to indicate failure to
|
filesystem-private data. An error may be returned to indicate failure to
|
||||||
do this.
|
do this.
|
||||||
|
|
||||||
[!] Note that even if this fails, put_fs_context() will be called
|
.. Warning::
|
||||||
|
|
||||||
|
Note that even if this fails, put_fs_context() will be called
|
||||||
immediately thereafter, so ->dup() *must* make the
|
immediately thereafter, so ->dup() *must* make the
|
||||||
filesystem-private data safe for ->free().
|
filesystem-private data safe for ->free().
|
||||||
|
|
||||||
(*) int (*parse_param)(struct fs_context *fc,
|
* ::
|
||||||
struct struct fs_parameter *param);
|
|
||||||
|
int (*parse_param)(struct fs_context *fc,
|
||||||
|
struct struct fs_parameter *param);
|
||||||
|
|
||||||
Called when a parameter is being added to the filesystem context. param
|
Called when a parameter is being added to the filesystem context. param
|
||||||
points to the key name and maybe a value object. VFS-specific options
|
points to the key name and maybe a value object. VFS-specific options
|
||||||
@ -224,7 +259,9 @@ manage the filesystem context. They are as follows:
|
|||||||
|
|
||||||
If successful, 0 should be returned or a negative error code otherwise.
|
If successful, 0 should be returned or a negative error code otherwise.
|
||||||
|
|
||||||
(*) int (*parse_monolithic)(struct fs_context *fc, void *data);
|
* ::
|
||||||
|
|
||||||
|
int (*parse_monolithic)(struct fs_context *fc, void *data);
|
||||||
|
|
||||||
Called when the mount(2) system call is invoked to pass the entire data
|
Called when the mount(2) system call is invoked to pass the entire data
|
||||||
page in one go. If this is expected to be just a list of "key[=val]"
|
page in one go. If this is expected to be just a list of "key[=val]"
|
||||||
@ -236,7 +273,9 @@ manage the filesystem context. They are as follows:
|
|||||||
finds it's the standard key-val list then it may pass it off to
|
finds it's the standard key-val list then it may pass it off to
|
||||||
generic_parse_monolithic().
|
generic_parse_monolithic().
|
||||||
|
|
||||||
(*) int (*get_tree)(struct fs_context *fc);
|
* ::
|
||||||
|
|
||||||
|
int (*get_tree)(struct fs_context *fc);
|
||||||
|
|
||||||
Called to get or create the mountable root and superblock, using the
|
Called to get or create the mountable root and superblock, using the
|
||||||
information stored in the filesystem context (reconfiguration goes via a
|
information stored in the filesystem context (reconfiguration goes via a
|
||||||
@ -249,7 +288,9 @@ manage the filesystem context. They are as follows:
|
|||||||
The phase on a userspace-driven context will be set to only allow this to
|
The phase on a userspace-driven context will be set to only allow this to
|
||||||
be called once on any particular context.
|
be called once on any particular context.
|
||||||
|
|
||||||
(*) int (*reconfigure)(struct fs_context *fc);
|
* ::
|
||||||
|
|
||||||
|
int (*reconfigure)(struct fs_context *fc);
|
||||||
|
|
||||||
Called to effect reconfiguration of a superblock using information stored
|
Called to effect reconfiguration of a superblock using information stored
|
||||||
in the filesystem context. It may detach any resources it desires from
|
in the filesystem context. It may detach any resources it desires from
|
||||||
@ -259,19 +300,20 @@ manage the filesystem context. They are as follows:
|
|||||||
On success it should return 0. In the case of an error, it should return
|
On success it should return 0. In the case of an error, it should return
|
||||||
a negative error code.
|
a negative error code.
|
||||||
|
|
||||||
[NOTE] reconfigure is intended as a replacement for remount_fs.
|
.. Note:: reconfigure is intended as a replacement for remount_fs.
|
||||||
|
|
||||||
|
|
||||||
===========================
|
Filesystem context Security
|
||||||
FILESYSTEM CONTEXT SECURITY
|
|
||||||
===========================
|
===========================
|
||||||
|
|
||||||
The filesystem context contains a security pointer that the LSMs can use for
|
The filesystem context contains a security pointer that the LSMs can use for
|
||||||
building up a security context for the superblock to be mounted. There are a
|
building up a security context for the superblock to be mounted. There are a
|
||||||
number of operations used by the new mount code for this purpose:
|
number of operations used by the new mount code for this purpose:
|
||||||
|
|
||||||
(*) int security_fs_context_alloc(struct fs_context *fc,
|
* ::
|
||||||
struct dentry *reference);
|
|
||||||
|
int security_fs_context_alloc(struct fs_context *fc,
|
||||||
|
struct dentry *reference);
|
||||||
|
|
||||||
Called to initialise fc->security (which is preset to NULL) and allocate
|
Called to initialise fc->security (which is preset to NULL) and allocate
|
||||||
any resources needed. It should return 0 on success or a negative error
|
any resources needed. It should return 0 on success or a negative error
|
||||||
@ -283,22 +325,28 @@ number of operations used by the new mount code for this purpose:
|
|||||||
non-NULL in the case of a submount (FS_CONTEXT_FOR_SUBMOUNT) in which case
|
non-NULL in the case of a submount (FS_CONTEXT_FOR_SUBMOUNT) in which case
|
||||||
it indicates the automount point.
|
it indicates the automount point.
|
||||||
|
|
||||||
(*) int security_fs_context_dup(struct fs_context *fc,
|
* ::
|
||||||
struct fs_context *src_fc);
|
|
||||||
|
int security_fs_context_dup(struct fs_context *fc,
|
||||||
|
struct fs_context *src_fc);
|
||||||
|
|
||||||
Called to initialise fc->security (which is preset to NULL) and allocate
|
Called to initialise fc->security (which is preset to NULL) and allocate
|
||||||
any resources needed. The original filesystem context is pointed to by
|
any resources needed. The original filesystem context is pointed to by
|
||||||
src_fc and may be used for reference. It should return 0 on success or a
|
src_fc and may be used for reference. It should return 0 on success or a
|
||||||
negative error code on failure.
|
negative error code on failure.
|
||||||
|
|
||||||
(*) void security_fs_context_free(struct fs_context *fc);
|
* ::
|
||||||
|
|
||||||
|
void security_fs_context_free(struct fs_context *fc);
|
||||||
|
|
||||||
Called to clean up anything attached to fc->security. Note that the
|
Called to clean up anything attached to fc->security. Note that the
|
||||||
contents may have been transferred to a superblock and the pointer cleared
|
contents may have been transferred to a superblock and the pointer cleared
|
||||||
during get_tree.
|
during get_tree.
|
||||||
|
|
||||||
(*) int security_fs_context_parse_param(struct fs_context *fc,
|
* ::
|
||||||
struct fs_parameter *param);
|
|
||||||
|
int security_fs_context_parse_param(struct fs_context *fc,
|
||||||
|
struct fs_parameter *param);
|
||||||
|
|
||||||
Called for each mount parameter, including the source. The arguments are
|
Called for each mount parameter, including the source. The arguments are
|
||||||
as for the ->parse_param() method. It should return 0 to indicate that
|
as for the ->parse_param() method. It should return 0 to indicate that
|
||||||
@ -310,7 +358,9 @@ number of operations used by the new mount code for this purpose:
|
|||||||
(provided the value pointer is NULL'd out). If it is stolen, 1 must be
|
(provided the value pointer is NULL'd out). If it is stolen, 1 must be
|
||||||
returned to prevent it being passed to the filesystem.
|
returned to prevent it being passed to the filesystem.
|
||||||
|
|
||||||
(*) int security_fs_context_validate(struct fs_context *fc);
|
* ::
|
||||||
|
|
||||||
|
int security_fs_context_validate(struct fs_context *fc);
|
||||||
|
|
||||||
Called after all the options have been parsed to validate the collection
|
Called after all the options have been parsed to validate the collection
|
||||||
as a whole and to do any necessary allocation so that
|
as a whole and to do any necessary allocation so that
|
||||||
@ -320,36 +370,43 @@ number of operations used by the new mount code for this purpose:
|
|||||||
In the case of reconfiguration, the target superblock will be accessible
|
In the case of reconfiguration, the target superblock will be accessible
|
||||||
via fc->root.
|
via fc->root.
|
||||||
|
|
||||||
(*) int security_sb_get_tree(struct fs_context *fc);
|
* ::
|
||||||
|
|
||||||
|
int security_sb_get_tree(struct fs_context *fc);
|
||||||
|
|
||||||
Called during the mount procedure to verify that the specified superblock
|
Called during the mount procedure to verify that the specified superblock
|
||||||
is allowed to be mounted and to transfer the security data there. It
|
is allowed to be mounted and to transfer the security data there. It
|
||||||
should return 0 or a negative error code.
|
should return 0 or a negative error code.
|
||||||
|
|
||||||
(*) void security_sb_reconfigure(struct fs_context *fc);
|
* ::
|
||||||
|
|
||||||
|
void security_sb_reconfigure(struct fs_context *fc);
|
||||||
|
|
||||||
Called to apply any reconfiguration to an LSM's context. It must not
|
Called to apply any reconfiguration to an LSM's context. It must not
|
||||||
fail. Error checking and resource allocation must be done in advance by
|
fail. Error checking and resource allocation must be done in advance by
|
||||||
the parameter parsing and validation hooks.
|
the parameter parsing and validation hooks.
|
||||||
|
|
||||||
(*) int security_sb_mountpoint(struct fs_context *fc, struct path *mountpoint,
|
* ::
|
||||||
unsigned int mnt_flags);
|
|
||||||
|
int security_sb_mountpoint(struct fs_context *fc,
|
||||||
|
struct path *mountpoint,
|
||||||
|
unsigned int mnt_flags);
|
||||||
|
|
||||||
Called during the mount procedure to verify that the root dentry attached
|
Called during the mount procedure to verify that the root dentry attached
|
||||||
to the context is permitted to be attached to the specified mountpoint.
|
to the context is permitted to be attached to the specified mountpoint.
|
||||||
It should return 0 on success or a negative error code on failure.
|
It should return 0 on success or a negative error code on failure.
|
||||||
|
|
||||||
|
|
||||||
==========================
|
VFS Filesystem context API
|
||||||
VFS FILESYSTEM CONTEXT API
|
|
||||||
==========================
|
==========================
|
||||||
|
|
||||||
There are four operations for creating a filesystem context and one for
|
There are four operations for creating a filesystem context and one for
|
||||||
destroying a context:
|
destroying a context:
|
||||||
|
|
||||||
(*) struct fs_context *fs_context_for_mount(
|
* ::
|
||||||
struct file_system_type *fs_type,
|
|
||||||
unsigned int sb_flags);
|
struct fs_context *fs_context_for_mount(struct file_system_type *fs_type,
|
||||||
|
unsigned int sb_flags);
|
||||||
|
|
||||||
Allocate a filesystem context for the purpose of setting up a new mount,
|
Allocate a filesystem context for the purpose of setting up a new mount,
|
||||||
whether that be with a new superblock or sharing an existing one. This
|
whether that be with a new superblock or sharing an existing one. This
|
||||||
@ -359,7 +416,9 @@ destroying a context:
|
|||||||
fs_type specifies the filesystem type that will manage the context and
|
fs_type specifies the filesystem type that will manage the context and
|
||||||
sb_flags presets the superblock flags stored therein.
|
sb_flags presets the superblock flags stored therein.
|
||||||
|
|
||||||
(*) struct fs_context *fs_context_for_reconfigure(
|
* ::
|
||||||
|
|
||||||
|
struct fs_context *fs_context_for_reconfigure(
|
||||||
struct dentry *dentry,
|
struct dentry *dentry,
|
||||||
unsigned int sb_flags,
|
unsigned int sb_flags,
|
||||||
unsigned int sb_flags_mask);
|
unsigned int sb_flags_mask);
|
||||||
@ -369,7 +428,9 @@ destroying a context:
|
|||||||
configured. sb_flags and sb_flags_mask indicate which superblock flags
|
configured. sb_flags and sb_flags_mask indicate which superblock flags
|
||||||
need changing and to what.
|
need changing and to what.
|
||||||
|
|
||||||
(*) struct fs_context *fs_context_for_submount(
|
* ::
|
||||||
|
|
||||||
|
struct fs_context *fs_context_for_submount(
|
||||||
struct file_system_type *fs_type,
|
struct file_system_type *fs_type,
|
||||||
struct dentry *reference);
|
struct dentry *reference);
|
||||||
|
|
||||||
@ -382,7 +443,9 @@ destroying a context:
|
|||||||
Note that it's not a requirement that the reference dentry be of the same
|
Note that it's not a requirement that the reference dentry be of the same
|
||||||
filesystem type as fs_type.
|
filesystem type as fs_type.
|
||||||
|
|
||||||
(*) struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc);
|
* ::
|
||||||
|
|
||||||
|
struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc);
|
||||||
|
|
||||||
Duplicate a filesystem context, copying any options noted and duplicating
|
Duplicate a filesystem context, copying any options noted and duplicating
|
||||||
or additionally referencing any resources held therein. This is available
|
or additionally referencing any resources held therein. This is available
|
||||||
@ -392,14 +455,18 @@ destroying a context:
|
|||||||
|
|
||||||
The purpose in the new context is inherited from the old one.
|
The purpose in the new context is inherited from the old one.
|
||||||
|
|
||||||
(*) void put_fs_context(struct fs_context *fc);
|
* ::
|
||||||
|
|
||||||
|
void put_fs_context(struct fs_context *fc);
|
||||||
|
|
||||||
Destroy a filesystem context, releasing any resources it holds. This
|
Destroy a filesystem context, releasing any resources it holds. This
|
||||||
calls the ->free() operation. This is intended to be called by anyone who
|
calls the ->free() operation. This is intended to be called by anyone who
|
||||||
created a filesystem context.
|
created a filesystem context.
|
||||||
|
|
||||||
[!] filesystem contexts are not refcounted, so this causes unconditional
|
.. Warning::
|
||||||
destruction.
|
|
||||||
|
filesystem contexts are not refcounted, so this causes unconditional
|
||||||
|
destruction.
|
||||||
|
|
||||||
In all the above operations, apart from the put op, the return is a mount
|
In all the above operations, apart from the put op, the return is a mount
|
||||||
context pointer or a negative error code.
|
context pointer or a negative error code.
|
||||||
@ -407,8 +474,10 @@ context pointer or a negative error code.
|
|||||||
For the remaining operations, if an error occurs, a negative error code will be
|
For the remaining operations, if an error occurs, a negative error code will be
|
||||||
returned.
|
returned.
|
||||||
|
|
||||||
(*) int vfs_parse_fs_param(struct fs_context *fc,
|
* ::
|
||||||
struct fs_parameter *param);
|
|
||||||
|
int vfs_parse_fs_param(struct fs_context *fc,
|
||||||
|
struct fs_parameter *param);
|
||||||
|
|
||||||
Supply a single mount parameter to the filesystem context. This include
|
Supply a single mount parameter to the filesystem context. This include
|
||||||
the specification of the source/device which is specified as the "source"
|
the specification of the source/device which is specified as the "source"
|
||||||
@ -423,53 +492,64 @@ returned.
|
|||||||
|
|
||||||
The parameter value is typed and can be one of:
|
The parameter value is typed and can be one of:
|
||||||
|
|
||||||
fs_value_is_flag, Parameter not given a value.
|
==================== =============================
|
||||||
fs_value_is_string, Value is a string
|
fs_value_is_flag Parameter not given a value
|
||||||
fs_value_is_blob, Value is a binary blob
|
fs_value_is_string Value is a string
|
||||||
fs_value_is_filename, Value is a filename* + dirfd
|
fs_value_is_blob Value is a binary blob
|
||||||
fs_value_is_file, Value is an open file (file*)
|
fs_value_is_filename Value is a filename* + dirfd
|
||||||
|
fs_value_is_file Value is an open file (file*)
|
||||||
|
==================== =============================
|
||||||
|
|
||||||
If there is a value, that value is stored in a union in the struct in one
|
If there is a value, that value is stored in a union in the struct in one
|
||||||
of param->{string,blob,name,file}. Note that the function may steal and
|
of param->{string,blob,name,file}. Note that the function may steal and
|
||||||
clear the pointer, but then becomes responsible for disposing of the
|
clear the pointer, but then becomes responsible for disposing of the
|
||||||
object.
|
object.
|
||||||
|
|
||||||
(*) int vfs_parse_fs_string(struct fs_context *fc, const char *key,
|
* ::
|
||||||
const char *value, size_t v_size);
|
|
||||||
|
int vfs_parse_fs_string(struct fs_context *fc, const char *key,
|
||||||
|
const char *value, size_t v_size);
|
||||||
|
|
||||||
A wrapper around vfs_parse_fs_param() that copies the value string it is
|
A wrapper around vfs_parse_fs_param() that copies the value string it is
|
||||||
passed.
|
passed.
|
||||||
|
|
||||||
(*) int generic_parse_monolithic(struct fs_context *fc, void *data);
|
* ::
|
||||||
|
|
||||||
|
int generic_parse_monolithic(struct fs_context *fc, void *data);
|
||||||
|
|
||||||
Parse a sys_mount() data page, assuming the form to be a text list
|
Parse a sys_mount() data page, assuming the form to be a text list
|
||||||
consisting of key[=val] options separated by commas. Each item in the
|
consisting of key[=val] options separated by commas. Each item in the
|
||||||
list is passed to vfs_mount_option(). This is the default when the
|
list is passed to vfs_mount_option(). This is the default when the
|
||||||
->parse_monolithic() method is NULL.
|
->parse_monolithic() method is NULL.
|
||||||
|
|
||||||
(*) int vfs_get_tree(struct fs_context *fc);
|
* ::
|
||||||
|
|
||||||
|
int vfs_get_tree(struct fs_context *fc);
|
||||||
|
|
||||||
Get or create the mountable root and superblock, using the parameters in
|
Get or create the mountable root and superblock, using the parameters in
|
||||||
the filesystem context to select/configure the superblock. This invokes
|
the filesystem context to select/configure the superblock. This invokes
|
||||||
the ->get_tree() method.
|
the ->get_tree() method.
|
||||||
|
|
||||||
(*) struct vfsmount *vfs_create_mount(struct fs_context *fc);
|
* ::
|
||||||
|
|
||||||
|
struct vfsmount *vfs_create_mount(struct fs_context *fc);
|
||||||
|
|
||||||
Create a mount given the parameters in the specified filesystem context.
|
Create a mount given the parameters in the specified filesystem context.
|
||||||
Note that this does not attach the mount to anything.
|
Note that this does not attach the mount to anything.
|
||||||
|
|
||||||
|
|
||||||
===========================
|
Superblock Creation Helpers
|
||||||
SUPERBLOCK CREATION HELPERS
|
|
||||||
===========================
|
===========================
|
||||||
|
|
||||||
A number of VFS helpers are available for use by filesystems for the creation
|
A number of VFS helpers are available for use by filesystems for the creation
|
||||||
or looking up of superblocks.
|
or looking up of superblocks.
|
||||||
|
|
||||||
(*) struct super_block *
|
* ::
|
||||||
sget_fc(struct fs_context *fc,
|
|
||||||
int (*test)(struct super_block *sb, struct fs_context *fc),
|
struct super_block *
|
||||||
int (*set)(struct super_block *sb, struct fs_context *fc));
|
sget_fc(struct fs_context *fc,
|
||||||
|
int (*test)(struct super_block *sb, struct fs_context *fc),
|
||||||
|
int (*set)(struct super_block *sb, struct fs_context *fc));
|
||||||
|
|
||||||
This is the core routine. If test is non-NULL, it searches for an
|
This is the core routine. If test is non-NULL, it searches for an
|
||||||
existing superblock matching the criteria held in the fs_context, using
|
existing superblock matching the criteria held in the fs_context, using
|
||||||
@ -482,10 +562,12 @@ or looking up of superblocks.
|
|||||||
|
|
||||||
The following helpers all wrap sget_fc():
|
The following helpers all wrap sget_fc():
|
||||||
|
|
||||||
(*) int vfs_get_super(struct fs_context *fc,
|
* ::
|
||||||
enum vfs_get_super_keying keying,
|
|
||||||
int (*fill_super)(struct super_block *sb,
|
int vfs_get_super(struct fs_context *fc,
|
||||||
struct fs_context *fc))
|
enum vfs_get_super_keying keying,
|
||||||
|
int (*fill_super)(struct super_block *sb,
|
||||||
|
struct fs_context *fc))
|
||||||
|
|
||||||
This creates/looks up a deviceless superblock. The keying indicates how
|
This creates/looks up a deviceless superblock. The keying indicates how
|
||||||
many superblocks of this type may exist and in what manner they may be
|
many superblocks of this type may exist and in what manner they may be
|
||||||
@ -515,14 +597,14 @@ PARAMETER DESCRIPTION
|
|||||||
=====================
|
=====================
|
||||||
|
|
||||||
Parameters are described using structures defined in linux/fs_parser.h.
|
Parameters are described using structures defined in linux/fs_parser.h.
|
||||||
There's a core description struct that links everything together:
|
There's a core description struct that links everything together::
|
||||||
|
|
||||||
struct fs_parameter_description {
|
struct fs_parameter_description {
|
||||||
const struct fs_parameter_spec *specs;
|
const struct fs_parameter_spec *specs;
|
||||||
const struct fs_parameter_enum *enums;
|
const struct fs_parameter_enum *enums;
|
||||||
};
|
};
|
||||||
|
|
||||||
For example:
|
For example::
|
||||||
|
|
||||||
enum {
|
enum {
|
||||||
Opt_autocell,
|
Opt_autocell,
|
||||||
@ -539,10 +621,12 @@ For example:
|
|||||||
|
|
||||||
The members are as follows:
|
The members are as follows:
|
||||||
|
|
||||||
(1) const struct fs_parameter_specification *specs;
|
(1) ::
|
||||||
|
|
||||||
|
const struct fs_parameter_specification *specs;
|
||||||
|
|
||||||
Table of parameter specifications, terminated with a null entry, where the
|
Table of parameter specifications, terminated with a null entry, where the
|
||||||
entries are of type:
|
entries are of type::
|
||||||
|
|
||||||
struct fs_parameter_spec {
|
struct fs_parameter_spec {
|
||||||
const char *name;
|
const char *name;
|
||||||
@ -558,6 +642,7 @@ The members are as follows:
|
|||||||
|
|
||||||
The 'type' field indicates the desired value type and must be one of:
|
The 'type' field indicates the desired value type and must be one of:
|
||||||
|
|
||||||
|
======================= ======================= =====================
|
||||||
TYPE NAME EXPECTED VALUE RESULT IN
|
TYPE NAME EXPECTED VALUE RESULT IN
|
||||||
======================= ======================= =====================
|
======================= ======================= =====================
|
||||||
fs_param_is_flag No value n/a
|
fs_param_is_flag No value n/a
|
||||||
@ -573,19 +658,23 @@ The members are as follows:
|
|||||||
fs_param_is_blockdev Blockdev path * Needs lookup
|
fs_param_is_blockdev Blockdev path * Needs lookup
|
||||||
fs_param_is_path Path * Needs lookup
|
fs_param_is_path Path * Needs lookup
|
||||||
fs_param_is_fd File descriptor result->int_32
|
fs_param_is_fd File descriptor result->int_32
|
||||||
|
======================= ======================= =====================
|
||||||
|
|
||||||
Note that if the value is of fs_param_is_bool type, fs_parse() will try
|
Note that if the value is of fs_param_is_bool type, fs_parse() will try
|
||||||
to match any string value against "0", "1", "no", "yes", "false", "true".
|
to match any string value against "0", "1", "no", "yes", "false", "true".
|
||||||
|
|
||||||
Each parameter can also be qualified with 'flags':
|
Each parameter can also be qualified with 'flags':
|
||||||
|
|
||||||
|
======================= ================================================
|
||||||
fs_param_v_optional The value is optional
|
fs_param_v_optional The value is optional
|
||||||
fs_param_neg_with_no result->negated set if key is prefixed with "no"
|
fs_param_neg_with_no result->negated set if key is prefixed with "no"
|
||||||
fs_param_neg_with_empty result->negated set if value is ""
|
fs_param_neg_with_empty result->negated set if value is ""
|
||||||
fs_param_deprecated The parameter is deprecated.
|
fs_param_deprecated The parameter is deprecated.
|
||||||
|
======================= ================================================
|
||||||
|
|
||||||
These are wrapped with a number of convenience wrappers:
|
These are wrapped with a number of convenience wrappers:
|
||||||
|
|
||||||
|
======================= ===============================================
|
||||||
MACRO SPECIFIES
|
MACRO SPECIFIES
|
||||||
======================= ===============================================
|
======================= ===============================================
|
||||||
fsparam_flag() fs_param_is_flag
|
fsparam_flag() fs_param_is_flag
|
||||||
@ -602,9 +691,10 @@ The members are as follows:
|
|||||||
fsparam_bdev() fs_param_is_blockdev
|
fsparam_bdev() fs_param_is_blockdev
|
||||||
fsparam_path() fs_param_is_path
|
fsparam_path() fs_param_is_path
|
||||||
fsparam_fd() fs_param_is_fd
|
fsparam_fd() fs_param_is_fd
|
||||||
|
======================= ===============================================
|
||||||
|
|
||||||
all of which take two arguments, name string and option number - for
|
all of which take two arguments, name string and option number - for
|
||||||
example:
|
example::
|
||||||
|
|
||||||
static const struct fs_parameter_spec afs_param_specs[] = {
|
static const struct fs_parameter_spec afs_param_specs[] = {
|
||||||
fsparam_flag ("autocell", Opt_autocell),
|
fsparam_flag ("autocell", Opt_autocell),
|
||||||
@ -618,10 +708,12 @@ The members are as follows:
|
|||||||
of arguments to specify the type and the flags for anything that doesn't
|
of arguments to specify the type and the flags for anything that doesn't
|
||||||
match one of the above macros.
|
match one of the above macros.
|
||||||
|
|
||||||
(2) const struct fs_parameter_enum *enums;
|
(2) ::
|
||||||
|
|
||||||
|
const struct fs_parameter_enum *enums;
|
||||||
|
|
||||||
Table of enum value names to integer mappings, terminated with a null
|
Table of enum value names to integer mappings, terminated with a null
|
||||||
entry. This is of type:
|
entry. This is of type::
|
||||||
|
|
||||||
struct fs_parameter_enum {
|
struct fs_parameter_enum {
|
||||||
u8 opt;
|
u8 opt;
|
||||||
@ -630,7 +722,7 @@ The members are as follows:
|
|||||||
};
|
};
|
||||||
|
|
||||||
Where the array is an unsorted list of { parameter ID, name }-keyed
|
Where the array is an unsorted list of { parameter ID, name }-keyed
|
||||||
elements that indicate the value to map to, e.g.:
|
elements that indicate the value to map to, e.g.::
|
||||||
|
|
||||||
static const struct fs_parameter_enum afs_param_enums[] = {
|
static const struct fs_parameter_enum afs_param_enums[] = {
|
||||||
{ Opt_bar, "x", 1},
|
{ Opt_bar, "x", 1},
|
||||||
@ -648,18 +740,19 @@ CONFIG_VALIDATE_FS_PARSER=y) and will allow the description to be queried from
|
|||||||
userspace using the fsinfo() syscall.
|
userspace using the fsinfo() syscall.
|
||||||
|
|
||||||
|
|
||||||
==========================
|
Parameter Helper Functions
|
||||||
PARAMETER HELPER FUNCTIONS
|
|
||||||
==========================
|
==========================
|
||||||
|
|
||||||
A number of helper functions are provided to help a filesystem or an LSM
|
A number of helper functions are provided to help a filesystem or an LSM
|
||||||
process the parameters it is given.
|
process the parameters it is given.
|
||||||
|
|
||||||
(*) int lookup_constant(const struct constant_table tbl[],
|
* ::
|
||||||
const char *name, int not_found);
|
|
||||||
|
int lookup_constant(const struct constant_table tbl[],
|
||||||
|
const char *name, int not_found);
|
||||||
|
|
||||||
Look up a constant by name in a table of name -> integer mappings. The
|
Look up a constant by name in a table of name -> integer mappings. The
|
||||||
table is an array of elements of the following type:
|
table is an array of elements of the following type::
|
||||||
|
|
||||||
struct constant_table {
|
struct constant_table {
|
||||||
const char *name;
|
const char *name;
|
||||||
@ -669,9 +762,11 @@ process the parameters it is given.
|
|||||||
If a match is found, the corresponding value is returned. If a match
|
If a match is found, the corresponding value is returned. If a match
|
||||||
isn't found, the not_found value is returned instead.
|
isn't found, the not_found value is returned instead.
|
||||||
|
|
||||||
(*) bool validate_constant_table(const struct constant_table *tbl,
|
* ::
|
||||||
size_t tbl_size,
|
|
||||||
int low, int high, int special);
|
bool validate_constant_table(const struct constant_table *tbl,
|
||||||
|
size_t tbl_size,
|
||||||
|
int low, int high, int special);
|
||||||
|
|
||||||
Validate a constant table. Checks that all the elements are appropriately
|
Validate a constant table. Checks that all the elements are appropriately
|
||||||
ordered, that there are no duplicates and that the values are between low
|
ordered, that there are no duplicates and that the values are between low
|
||||||
@ -682,16 +777,20 @@ process the parameters it is given.
|
|||||||
If all is good, true is returned. If the table is invalid, errors are
|
If all is good, true is returned. If the table is invalid, errors are
|
||||||
logged to dmesg and false is returned.
|
logged to dmesg and false is returned.
|
||||||
|
|
||||||
(*) bool fs_validate_description(const struct fs_parameter_description *desc);
|
* ::
|
||||||
|
|
||||||
|
bool fs_validate_description(const struct fs_parameter_description *desc);
|
||||||
|
|
||||||
This performs some validation checks on a parameter description. It
|
This performs some validation checks on a parameter description. It
|
||||||
returns true if the description is good and false if it is not. It will
|
returns true if the description is good and false if it is not. It will
|
||||||
log errors to dmesg if validation fails.
|
log errors to dmesg if validation fails.
|
||||||
|
|
||||||
(*) int fs_parse(struct fs_context *fc,
|
* ::
|
||||||
const struct fs_parameter_description *desc,
|
|
||||||
struct fs_parameter *param,
|
int fs_parse(struct fs_context *fc,
|
||||||
struct fs_parse_result *result);
|
const struct fs_parameter_description *desc,
|
||||||
|
struct fs_parameter *param,
|
||||||
|
struct fs_parse_result *result);
|
||||||
|
|
||||||
This is the main interpreter of parameters. It uses the parameter
|
This is the main interpreter of parameters. It uses the parameter
|
||||||
description to look up a parameter by key name and to convert that to an
|
description to look up a parameter by key name and to convert that to an
|
||||||
@ -711,14 +810,16 @@ process the parameters it is given.
|
|||||||
parameter is matched, but the value is erroneous, -EINVAL will be
|
parameter is matched, but the value is erroneous, -EINVAL will be
|
||||||
returned; otherwise the parameter's option number will be returned.
|
returned; otherwise the parameter's option number will be returned.
|
||||||
|
|
||||||
(*) int fs_lookup_param(struct fs_context *fc,
|
* ::
|
||||||
struct fs_parameter *value,
|
|
||||||
bool want_bdev,
|
int fs_lookup_param(struct fs_context *fc,
|
||||||
struct path *_path);
|
struct fs_parameter *value,
|
||||||
|
bool want_bdev,
|
||||||
|
struct path *_path);
|
||||||
|
|
||||||
This takes a parameter that carries a string or filename type and attempts
|
This takes a parameter that carries a string or filename type and attempts
|
||||||
to do a path lookup on it. If the parameter expects a blockdev, a check
|
to do a path lookup on it. If the parameter expects a blockdev, a check
|
||||||
is made that the inode actually represents one.
|
is made that the inode actually represents one.
|
||||||
|
|
||||||
Returns 0 if successful and *_path will be set; returns a negative error
|
Returns 0 if successful and ``*_path`` will be set; returns a negative
|
||||||
code if not.
|
error code if not.
|
@ -119,9 +119,7 @@ it comes to that question::
|
|||||||
|
|
||||||
/opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf
|
/opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf
|
||||||
|
|
||||||
Create an /etc/pvfs2tab file::
|
Create an /etc/pvfs2tab file (localhost is fine)::
|
||||||
|
|
||||||
Localhost is fine for your pvfs2tab file:
|
|
||||||
|
|
||||||
echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \
|
echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \
|
||||||
/etc/pvfs2tab
|
/etc/pvfs2tab
|
||||||
|
@ -1871,7 +1871,7 @@ unbindable mount is unbindable
|
|||||||
|
|
||||||
For more information on mount propagation see:
|
For more information on mount propagation see:
|
||||||
|
|
||||||
Documentation/filesystems/sharedsubtree.txt
|
Documentation/filesystems/sharedsubtree.rst
|
||||||
|
|
||||||
|
|
||||||
3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm
|
3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm
|
||||||
|
@ -1,4 +1,6 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
===============
|
||||||
Quota subsystem
|
Quota subsystem
|
||||||
===============
|
===============
|
||||||
|
|
||||||
@ -39,6 +41,7 @@ Currently, the interface supports only one message type QUOTA_NL_C_WARNING.
|
|||||||
This command is used to send a notification about any of the above mentioned
|
This command is used to send a notification about any of the above mentioned
|
||||||
events. Each message has six attributes. These are (type of the argument is
|
events. Each message has six attributes. These are (type of the argument is
|
||||||
in parentheses):
|
in parentheses):
|
||||||
|
|
||||||
QUOTA_NL_A_QTYPE (u32)
|
QUOTA_NL_A_QTYPE (u32)
|
||||||
- type of quota being exceeded (one of USRQUOTA, GRPQUOTA)
|
- type of quota being exceeded (one of USRQUOTA, GRPQUOTA)
|
||||||
QUOTA_NL_A_EXCESS_ID (u64)
|
QUOTA_NL_A_EXCESS_ID (u64)
|
||||||
@ -48,20 +51,34 @@ in parentheses):
|
|||||||
- UID of a user who caused the event
|
- UID of a user who caused the event
|
||||||
QUOTA_NL_A_WARNING (u32)
|
QUOTA_NL_A_WARNING (u32)
|
||||||
- what kind of limit is exceeded:
|
- what kind of limit is exceeded:
|
||||||
QUOTA_NL_IHARDWARN - inode hardlimit
|
|
||||||
QUOTA_NL_ISOFTLONGWARN - inode softlimit is exceeded longer
|
QUOTA_NL_IHARDWARN
|
||||||
than given grace period
|
inode hardlimit
|
||||||
QUOTA_NL_ISOFTWARN - inode softlimit
|
QUOTA_NL_ISOFTLONGWARN
|
||||||
QUOTA_NL_BHARDWARN - space (block) hardlimit
|
inode softlimit is exceeded longer
|
||||||
QUOTA_NL_BSOFTLONGWARN - space (block) softlimit is exceeded
|
than given grace period
|
||||||
longer than given grace period.
|
QUOTA_NL_ISOFTWARN
|
||||||
QUOTA_NL_BSOFTWARN - space (block) softlimit
|
inode softlimit
|
||||||
|
QUOTA_NL_BHARDWARN
|
||||||
|
space (block) hardlimit
|
||||||
|
QUOTA_NL_BSOFTLONGWARN
|
||||||
|
space (block) softlimit is exceeded
|
||||||
|
longer than given grace period.
|
||||||
|
QUOTA_NL_BSOFTWARN
|
||||||
|
space (block) softlimit
|
||||||
|
|
||||||
- four warnings are also defined for the event when user stops
|
- four warnings are also defined for the event when user stops
|
||||||
exceeding some limit:
|
exceeding some limit:
|
||||||
QUOTA_NL_IHARDBELOW - inode hardlimit
|
|
||||||
QUOTA_NL_ISOFTBELOW - inode softlimit
|
QUOTA_NL_IHARDBELOW
|
||||||
QUOTA_NL_BHARDBELOW - space (block) hardlimit
|
inode hardlimit
|
||||||
QUOTA_NL_BSOFTBELOW - space (block) softlimit
|
QUOTA_NL_ISOFTBELOW
|
||||||
|
inode softlimit
|
||||||
|
QUOTA_NL_BHARDBELOW
|
||||||
|
space (block) hardlimit
|
||||||
|
QUOTA_NL_BSOFTBELOW
|
||||||
|
space (block) softlimit
|
||||||
|
|
||||||
QUOTA_NL_A_DEV_MAJOR (u32)
|
QUOTA_NL_A_DEV_MAJOR (u32)
|
||||||
- major number of a device with the affected filesystem
|
- major number of a device with the affected filesystem
|
||||||
QUOTA_NL_A_DEV_MINOR (u32)
|
QUOTA_NL_A_DEV_MINOR (u32)
|
@ -71,7 +71,7 @@ be allowed write access to a ramfs mount.
|
|||||||
|
|
||||||
A ramfs derivative called tmpfs was created to add size limits, and the ability
|
A ramfs derivative called tmpfs was created to add size limits, and the ability
|
||||||
to write the data to swap space. Normal users can be allowed write access to
|
to write the data to swap space. Normal users can be allowed write access to
|
||||||
tmpfs mounts. See Documentation/filesystems/tmpfs.txt for more information.
|
tmpfs mounts. See Documentation/filesystems/tmpfs.rst for more information.
|
||||||
|
|
||||||
What is rootfs?
|
What is rootfs?
|
||||||
---------------
|
---------------
|
||||||
|
@ -1,6 +1,11 @@
|
|||||||
The seq_file interface
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
======================
|
||||||
|
The seq_file Interface
|
||||||
|
======================
|
||||||
|
|
||||||
Copyright 2003 Jonathan Corbet <corbet@lwn.net>
|
Copyright 2003 Jonathan Corbet <corbet@lwn.net>
|
||||||
|
|
||||||
This file is originally from the LWN.net Driver Porting series at
|
This file is originally from the LWN.net Driver Porting series at
|
||||||
http://lwn.net/Articles/driver-porting/
|
http://lwn.net/Articles/driver-porting/
|
||||||
|
|
||||||
@ -43,7 +48,7 @@ loadable module which creates a file called /proc/sequence. The file, when
|
|||||||
read, simply produces a set of increasing integer values, one per line. The
|
read, simply produces a set of increasing integer values, one per line. The
|
||||||
sequence will continue until the user loses patience and finds something
|
sequence will continue until the user loses patience and finds something
|
||||||
better to do. The file is seekable, in that one can do something like the
|
better to do. The file is seekable, in that one can do something like the
|
||||||
following:
|
following::
|
||||||
|
|
||||||
dd if=/proc/sequence of=out1 count=1
|
dd if=/proc/sequence of=out1 count=1
|
||||||
dd if=/proc/sequence skip=1 of=out2 count=1
|
dd if=/proc/sequence skip=1 of=out2 count=1
|
||||||
@ -55,16 +60,18 @@ wanting to see the full source for this module can find it at
|
|||||||
http://lwn.net/Articles/22359/).
|
http://lwn.net/Articles/22359/).
|
||||||
|
|
||||||
Deprecated create_proc_entry
|
Deprecated create_proc_entry
|
||||||
|
============================
|
||||||
|
|
||||||
Note that the above article uses create_proc_entry which was removed in
|
Note that the above article uses create_proc_entry which was removed in
|
||||||
kernel 3.10. Current versions require the following update
|
kernel 3.10. Current versions require the following update::
|
||||||
|
|
||||||
- entry = create_proc_entry("sequence", 0, NULL);
|
- entry = create_proc_entry("sequence", 0, NULL);
|
||||||
- if (entry)
|
- if (entry)
|
||||||
- entry->proc_fops = &ct_file_ops;
|
- entry->proc_fops = &ct_file_ops;
|
||||||
+ entry = proc_create("sequence", 0, NULL, &ct_file_ops);
|
+ entry = proc_create("sequence", 0, NULL, &ct_file_ops);
|
||||||
|
|
||||||
The iterator interface
|
The iterator interface
|
||||||
|
======================
|
||||||
|
|
||||||
Modules implementing a virtual file with seq_file must implement an
|
Modules implementing a virtual file with seq_file must implement an
|
||||||
iterator object that allows stepping through the data of interest
|
iterator object that allows stepping through the data of interest
|
||||||
@ -99,7 +106,7 @@ position. The pos passed to start() will always be either zero, or
|
|||||||
the most recent pos used in the previous session.
|
the most recent pos used in the previous session.
|
||||||
|
|
||||||
For our simple sequence example,
|
For our simple sequence example,
|
||||||
the start() function looks like:
|
the start() function looks like::
|
||||||
|
|
||||||
static void *ct_seq_start(struct seq_file *s, loff_t *pos)
|
static void *ct_seq_start(struct seq_file *s, loff_t *pos)
|
||||||
{
|
{
|
||||||
@ -129,7 +136,7 @@ move the iterator forward to the next position in the sequence. The
|
|||||||
example module can simply increment the position by one; more useful
|
example module can simply increment the position by one; more useful
|
||||||
modules will do what is needed to step through some data structure. The
|
modules will do what is needed to step through some data structure. The
|
||||||
next() function returns a new iterator, or NULL if the sequence is
|
next() function returns a new iterator, or NULL if the sequence is
|
||||||
complete. Here's the example version:
|
complete. Here's the example version::
|
||||||
|
|
||||||
static void *ct_seq_next(struct seq_file *s, void *v, loff_t *pos)
|
static void *ct_seq_next(struct seq_file *s, void *v, loff_t *pos)
|
||||||
{
|
{
|
||||||
@ -141,10 +148,10 @@ complete. Here's the example version:
|
|||||||
The stop() function closes a session; its job, of course, is to clean
|
The stop() function closes a session; its job, of course, is to clean
|
||||||
up. If dynamic memory is allocated for the iterator, stop() is the
|
up. If dynamic memory is allocated for the iterator, stop() is the
|
||||||
place to free it; if a lock was taken by start(), stop() must release
|
place to free it; if a lock was taken by start(), stop() must release
|
||||||
that lock. The value that *pos was set to by the last next() call
|
that lock. The value that ``*pos`` was set to by the last next() call
|
||||||
before stop() is remembered, and used for the first start() call of
|
before stop() is remembered, and used for the first start() call of
|
||||||
the next session unless lseek() has been called on the file; in that
|
the next session unless lseek() has been called on the file; in that
|
||||||
case next start() will be asked to start at position zero.
|
case next start() will be asked to start at position zero::
|
||||||
|
|
||||||
static void ct_seq_stop(struct seq_file *s, void *v)
|
static void ct_seq_stop(struct seq_file *s, void *v)
|
||||||
{
|
{
|
||||||
@ -152,7 +159,7 @@ case next start() will be asked to start at position zero.
|
|||||||
}
|
}
|
||||||
|
|
||||||
Finally, the show() function should format the object currently pointed to
|
Finally, the show() function should format the object currently pointed to
|
||||||
by the iterator for output. The example module's show() function is:
|
by the iterator for output. The example module's show() function is::
|
||||||
|
|
||||||
static int ct_seq_show(struct seq_file *s, void *v)
|
static int ct_seq_show(struct seq_file *s, void *v)
|
||||||
{
|
{
|
||||||
@ -169,7 +176,7 @@ generated output before returning SEQ_SKIP, that output will be dropped.
|
|||||||
|
|
||||||
We will look at seq_printf() in a moment. But first, the definition of the
|
We will look at seq_printf() in a moment. But first, the definition of the
|
||||||
seq_file iterator is finished by creating a seq_operations structure with
|
seq_file iterator is finished by creating a seq_operations structure with
|
||||||
the four functions we have just defined:
|
the four functions we have just defined::
|
||||||
|
|
||||||
static const struct seq_operations ct_seq_ops = {
|
static const struct seq_operations ct_seq_ops = {
|
||||||
.start = ct_seq_start,
|
.start = ct_seq_start,
|
||||||
@ -194,6 +201,7 @@ other locks while the iterator is active.
|
|||||||
|
|
||||||
|
|
||||||
Formatted output
|
Formatted output
|
||||||
|
================
|
||||||
|
|
||||||
The seq_file code manages positioning within the output created by the
|
The seq_file code manages positioning within the output created by the
|
||||||
iterator and getting it into the user's buffer. But, for that to work, that
|
iterator and getting it into the user's buffer. But, for that to work, that
|
||||||
@ -203,7 +211,7 @@ been defined which make this task easy.
|
|||||||
Most code will simply use seq_printf(), which works pretty much like
|
Most code will simply use seq_printf(), which works pretty much like
|
||||||
printk(), but which requires the seq_file pointer as an argument.
|
printk(), but which requires the seq_file pointer as an argument.
|
||||||
|
|
||||||
For straight character output, the following functions may be used:
|
For straight character output, the following functions may be used::
|
||||||
|
|
||||||
seq_putc(struct seq_file *m, char c);
|
seq_putc(struct seq_file *m, char c);
|
||||||
seq_puts(struct seq_file *m, const char *s);
|
seq_puts(struct seq_file *m, const char *s);
|
||||||
@ -213,7 +221,7 @@ The first two output a single character and a string, just like one would
|
|||||||
expect. seq_escape() is like seq_puts(), except that any character in s
|
expect. seq_escape() is like seq_puts(), except that any character in s
|
||||||
which is in the string esc will be represented in octal form in the output.
|
which is in the string esc will be represented in octal form in the output.
|
||||||
|
|
||||||
There are also a pair of functions for printing filenames:
|
There are also a pair of functions for printing filenames::
|
||||||
|
|
||||||
int seq_path(struct seq_file *m, const struct path *path,
|
int seq_path(struct seq_file *m, const struct path *path,
|
||||||
const char *esc);
|
const char *esc);
|
||||||
@ -226,8 +234,10 @@ the path relative to the current process's filesystem root. If a different
|
|||||||
root is desired, it can be used with seq_path_root(). If it turns out that
|
root is desired, it can be used with seq_path_root(). If it turns out that
|
||||||
path cannot be reached from root, seq_path_root() returns SEQ_SKIP.
|
path cannot be reached from root, seq_path_root() returns SEQ_SKIP.
|
||||||
|
|
||||||
A function producing complicated output may want to check
|
A function producing complicated output may want to check::
|
||||||
|
|
||||||
bool seq_has_overflowed(struct seq_file *m);
|
bool seq_has_overflowed(struct seq_file *m);
|
||||||
|
|
||||||
and avoid further seq_<output> calls if true is returned.
|
and avoid further seq_<output> calls if true is returned.
|
||||||
|
|
||||||
A true return from seq_has_overflowed means that the seq_file buffer will
|
A true return from seq_has_overflowed means that the seq_file buffer will
|
||||||
@ -236,6 +246,7 @@ buffer and retry printing.
|
|||||||
|
|
||||||
|
|
||||||
Making it all work
|
Making it all work
|
||||||
|
==================
|
||||||
|
|
||||||
So far, we have a nice set of functions which can produce output within the
|
So far, we have a nice set of functions which can produce output within the
|
||||||
seq_file system, but we have not yet turned them into a file that a user
|
seq_file system, but we have not yet turned them into a file that a user
|
||||||
@ -244,7 +255,7 @@ creation of a set of file_operations which implement the operations on that
|
|||||||
file. The seq_file interface provides a set of canned operations which do
|
file. The seq_file interface provides a set of canned operations which do
|
||||||
most of the work. The virtual file author still must implement the open()
|
most of the work. The virtual file author still must implement the open()
|
||||||
method, however, to hook everything up. The open function is often a single
|
method, however, to hook everything up. The open function is often a single
|
||||||
line, as in the example module:
|
line, as in the example module::
|
||||||
|
|
||||||
static int ct_open(struct inode *inode, struct file *file)
|
static int ct_open(struct inode *inode, struct file *file)
|
||||||
{
|
{
|
||||||
@ -263,7 +274,7 @@ by the iterator functions.
|
|||||||
There is also a wrapper function to seq_open() called seq_open_private(). It
|
There is also a wrapper function to seq_open() called seq_open_private(). It
|
||||||
kmallocs a zero filled block of memory and stores a pointer to it in the
|
kmallocs a zero filled block of memory and stores a pointer to it in the
|
||||||
private field of the seq_file structure, returning 0 on success. The
|
private field of the seq_file structure, returning 0 on success. The
|
||||||
block size is specified in a third parameter to the function, e.g.:
|
block size is specified in a third parameter to the function, e.g.::
|
||||||
|
|
||||||
static int ct_open(struct inode *inode, struct file *file)
|
static int ct_open(struct inode *inode, struct file *file)
|
||||||
{
|
{
|
||||||
@ -273,7 +284,7 @@ block size is specified in a third parameter to the function, e.g.:
|
|||||||
|
|
||||||
There is also a variant function, __seq_open_private(), which is functionally
|
There is also a variant function, __seq_open_private(), which is functionally
|
||||||
identical except that, if successful, it returns the pointer to the allocated
|
identical except that, if successful, it returns the pointer to the allocated
|
||||||
memory block, allowing further initialisation e.g.:
|
memory block, allowing further initialisation e.g.::
|
||||||
|
|
||||||
static int ct_open(struct inode *inode, struct file *file)
|
static int ct_open(struct inode *inode, struct file *file)
|
||||||
{
|
{
|
||||||
@ -295,7 +306,7 @@ frees the memory allocated in the corresponding open.
|
|||||||
|
|
||||||
The other operations of interest - read(), llseek(), and release() - are
|
The other operations of interest - read(), llseek(), and release() - are
|
||||||
all implemented by the seq_file code itself. So a virtual file's
|
all implemented by the seq_file code itself. So a virtual file's
|
||||||
file_operations structure will look like:
|
file_operations structure will look like::
|
||||||
|
|
||||||
static const struct file_operations ct_file_ops = {
|
static const struct file_operations ct_file_ops = {
|
||||||
.owner = THIS_MODULE,
|
.owner = THIS_MODULE,
|
||||||
@ -309,7 +320,7 @@ There is also a seq_release_private() which passes the contents of the
|
|||||||
seq_file private field to kfree() before releasing the structure.
|
seq_file private field to kfree() before releasing the structure.
|
||||||
|
|
||||||
The final step is the creation of the /proc file itself. In the example
|
The final step is the creation of the /proc file itself. In the example
|
||||||
code, that is done in the initialization code in the usual way:
|
code, that is done in the initialization code in the usual way::
|
||||||
|
|
||||||
static int ct_init(void)
|
static int ct_init(void)
|
||||||
{
|
{
|
||||||
@ -325,9 +336,10 @@ And that is pretty much it.
|
|||||||
|
|
||||||
|
|
||||||
seq_list
|
seq_list
|
||||||
|
========
|
||||||
|
|
||||||
If your file will be iterating through a linked list, you may find these
|
If your file will be iterating through a linked list, you may find these
|
||||||
routines useful:
|
routines useful::
|
||||||
|
|
||||||
struct list_head *seq_list_start(struct list_head *head,
|
struct list_head *seq_list_start(struct list_head *head,
|
||||||
loff_t pos);
|
loff_t pos);
|
||||||
@ -338,15 +350,16 @@ routines useful:
|
|||||||
|
|
||||||
These helpers will interpret pos as a position within the list and iterate
|
These helpers will interpret pos as a position within the list and iterate
|
||||||
accordingly. Your start() and next() functions need only invoke the
|
accordingly. Your start() and next() functions need only invoke the
|
||||||
seq_list_* helpers with a pointer to the appropriate list_head structure.
|
``seq_list_*`` helpers with a pointer to the appropriate list_head structure.
|
||||||
|
|
||||||
|
|
||||||
The extra-simple version
|
The extra-simple version
|
||||||
|
========================
|
||||||
|
|
||||||
For extremely simple virtual files, there is an even easier interface. A
|
For extremely simple virtual files, there is an even easier interface. A
|
||||||
module can define only the show() function, which should create all the
|
module can define only the show() function, which should create all the
|
||||||
output that the virtual file will contain. The file's open() method then
|
output that the virtual file will contain. The file's open() method then
|
||||||
calls:
|
calls::
|
||||||
|
|
||||||
int single_open(struct file *file,
|
int single_open(struct file *file,
|
||||||
int (*show)(struct seq_file *m, void *p),
|
int (*show)(struct seq_file *m, void *p),
|
@ -1,7 +1,10 @@
|
|||||||
Shared Subtrees
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
---------------
|
|
||||||
|
|
||||||
Contents:
|
===============
|
||||||
|
Shared Subtrees
|
||||||
|
===============
|
||||||
|
|
||||||
|
.. Contents:
|
||||||
1) Overview
|
1) Overview
|
||||||
2) Features
|
2) Features
|
||||||
3) Setting mount states
|
3) Setting mount states
|
||||||
@ -41,31 +44,38 @@ replicas continue to be exactly same.
|
|||||||
|
|
||||||
Here is an example:
|
Here is an example:
|
||||||
|
|
||||||
Let's say /mnt has a mount that is shared.
|
Let's say /mnt has a mount that is shared::
|
||||||
mount --make-shared /mnt
|
|
||||||
|
mount --make-shared /mnt
|
||||||
|
|
||||||
Note: mount(8) command now supports the --make-shared flag,
|
Note: mount(8) command now supports the --make-shared flag,
|
||||||
so the sample 'smount' program is no longer needed and has been
|
so the sample 'smount' program is no longer needed and has been
|
||||||
removed.
|
removed.
|
||||||
|
|
||||||
# mount --bind /mnt /tmp
|
::
|
||||||
|
|
||||||
|
# mount --bind /mnt /tmp
|
||||||
|
|
||||||
The above command replicates the mount at /mnt to the mountpoint /tmp
|
The above command replicates the mount at /mnt to the mountpoint /tmp
|
||||||
and the contents of both the mounts remain identical.
|
and the contents of both the mounts remain identical.
|
||||||
|
|
||||||
#ls /mnt
|
::
|
||||||
a b c
|
|
||||||
|
|
||||||
#ls /tmp
|
#ls /mnt
|
||||||
a b c
|
a b c
|
||||||
|
|
||||||
Now let's say we mount a device at /tmp/a
|
#ls /tmp
|
||||||
# mount /dev/sd0 /tmp/a
|
a b c
|
||||||
|
|
||||||
#ls /tmp/a
|
Now let's say we mount a device at /tmp/a::
|
||||||
t1 t2 t3
|
|
||||||
|
|
||||||
#ls /mnt/a
|
# mount /dev/sd0 /tmp/a
|
||||||
t1 t2 t3
|
|
||||||
|
#ls /tmp/a
|
||||||
|
t1 t2 t3
|
||||||
|
|
||||||
|
#ls /mnt/a
|
||||||
|
t1 t2 t3
|
||||||
|
|
||||||
Note that the mount has propagated to the mount at /mnt as well.
|
Note that the mount has propagated to the mount at /mnt as well.
|
||||||
|
|
||||||
@ -123,14 +133,15 @@ replicas continue to be exactly same.
|
|||||||
|
|
||||||
2d) A unbindable mount is a unbindable private mount
|
2d) A unbindable mount is a unbindable private mount
|
||||||
|
|
||||||
let's say we have a mount at /mnt and we make it unbindable
|
let's say we have a mount at /mnt and we make it unbindable::
|
||||||
|
|
||||||
# mount --make-unbindable /mnt
|
# mount --make-unbindable /mnt
|
||||||
|
|
||||||
Let's try to bind mount this mount somewhere else.
|
Let's try to bind mount this mount somewhere else::
|
||||||
# mount --bind /mnt /tmp
|
|
||||||
mount: wrong fs type, bad option, bad superblock on /mnt,
|
# mount --bind /mnt /tmp
|
||||||
or too many mounted file systems
|
mount: wrong fs type, bad option, bad superblock on /mnt,
|
||||||
|
or too many mounted file systems
|
||||||
|
|
||||||
Binding a unbindable mount is a invalid operation.
|
Binding a unbindable mount is a invalid operation.
|
||||||
|
|
||||||
@ -138,12 +149,12 @@ replicas continue to be exactly same.
|
|||||||
3) Setting mount states
|
3) Setting mount states
|
||||||
|
|
||||||
The mount command (util-linux package) can be used to set mount
|
The mount command (util-linux package) can be used to set mount
|
||||||
states:
|
states::
|
||||||
|
|
||||||
mount --make-shared mountpoint
|
mount --make-shared mountpoint
|
||||||
mount --make-slave mountpoint
|
mount --make-slave mountpoint
|
||||||
mount --make-private mountpoint
|
mount --make-private mountpoint
|
||||||
mount --make-unbindable mountpoint
|
mount --make-unbindable mountpoint
|
||||||
|
|
||||||
|
|
||||||
4) Use cases
|
4) Use cases
|
||||||
@ -154,9 +165,10 @@ replicas continue to be exactly same.
|
|||||||
|
|
||||||
Solution:
|
Solution:
|
||||||
|
|
||||||
The system administrator can make the mount at /cdrom shared
|
The system administrator can make the mount at /cdrom shared::
|
||||||
mount --bind /cdrom /cdrom
|
|
||||||
mount --make-shared /cdrom
|
mount --bind /cdrom /cdrom
|
||||||
|
mount --make-shared /cdrom
|
||||||
|
|
||||||
Now any process that clones off a new namespace will have a
|
Now any process that clones off a new namespace will have a
|
||||||
mount at /cdrom which is a replica of the same mount in the
|
mount at /cdrom which is a replica of the same mount in the
|
||||||
@ -172,14 +184,14 @@ replicas continue to be exactly same.
|
|||||||
Solution:
|
Solution:
|
||||||
|
|
||||||
To begin with, the administrator can mark the entire mount tree
|
To begin with, the administrator can mark the entire mount tree
|
||||||
as shareable.
|
as shareable::
|
||||||
|
|
||||||
mount --make-rshared /
|
mount --make-rshared /
|
||||||
|
|
||||||
A new process can clone off a new namespace. And mark some part
|
A new process can clone off a new namespace. And mark some part
|
||||||
of its namespace as slave
|
of its namespace as slave::
|
||||||
|
|
||||||
mount --make-rslave /myprivatetree
|
mount --make-rslave /myprivatetree
|
||||||
|
|
||||||
Hence forth any mounts within the /myprivatetree done by the
|
Hence forth any mounts within the /myprivatetree done by the
|
||||||
process will not show up in any other namespace. However mounts
|
process will not show up in any other namespace. However mounts
|
||||||
@ -206,13 +218,13 @@ replicas continue to be exactly same.
|
|||||||
versions of the file depending on the path used to access that
|
versions of the file depending on the path used to access that
|
||||||
file.
|
file.
|
||||||
|
|
||||||
An example is:
|
An example is::
|
||||||
|
|
||||||
mount --make-shared /
|
mount --make-shared /
|
||||||
mount --rbind / /view/v1
|
mount --rbind / /view/v1
|
||||||
mount --rbind / /view/v2
|
mount --rbind / /view/v2
|
||||||
mount --rbind / /view/v3
|
mount --rbind / /view/v3
|
||||||
mount --rbind / /view/v4
|
mount --rbind / /view/v4
|
||||||
|
|
||||||
and if /usr has a versioning filesystem mounted, then that
|
and if /usr has a versioning filesystem mounted, then that
|
||||||
mount appears at /view/v1/usr, /view/v2/usr, /view/v3/usr and
|
mount appears at /view/v1/usr, /view/v2/usr, /view/v3/usr and
|
||||||
@ -224,8 +236,8 @@ replicas continue to be exactly same.
|
|||||||
filesystem is being requested and return the corresponding
|
filesystem is being requested and return the corresponding
|
||||||
inode.
|
inode.
|
||||||
|
|
||||||
5) Detailed semantics:
|
5) Detailed semantics
|
||||||
-------------------
|
---------------------
|
||||||
The section below explains the detailed semantics of
|
The section below explains the detailed semantics of
|
||||||
bind, rbind, move, mount, umount and clone-namespace operations.
|
bind, rbind, move, mount, umount and clone-namespace operations.
|
||||||
|
|
||||||
@ -235,6 +247,7 @@ replicas continue to be exactly same.
|
|||||||
5a) Mount states
|
5a) Mount states
|
||||||
|
|
||||||
A given mount can be in one of the following states
|
A given mount can be in one of the following states
|
||||||
|
|
||||||
1) shared
|
1) shared
|
||||||
2) slave
|
2) slave
|
||||||
3) shared and slave
|
3) shared and slave
|
||||||
@ -252,7 +265,8 @@ replicas continue to be exactly same.
|
|||||||
A 'shared mount' is defined as a vfsmount that belongs to a
|
A 'shared mount' is defined as a vfsmount that belongs to a
|
||||||
'peer group'.
|
'peer group'.
|
||||||
|
|
||||||
For example:
|
For example::
|
||||||
|
|
||||||
mount --make-shared /mnt
|
mount --make-shared /mnt
|
||||||
mount --bind /mnt /tmp
|
mount --bind /mnt /tmp
|
||||||
|
|
||||||
@ -270,7 +284,7 @@ replicas continue to be exactly same.
|
|||||||
A slave mount as the name implies has a master mount from which
|
A slave mount as the name implies has a master mount from which
|
||||||
mount/unmount events are received. Events do not propagate from
|
mount/unmount events are received. Events do not propagate from
|
||||||
the slave mount to the master. Only a shared mount can be made
|
the slave mount to the master. Only a shared mount can be made
|
||||||
a slave by executing the following command
|
a slave by executing the following command::
|
||||||
|
|
||||||
mount --make-slave mount
|
mount --make-slave mount
|
||||||
|
|
||||||
@ -290,8 +304,10 @@ replicas continue to be exactly same.
|
|||||||
peer group.
|
peer group.
|
||||||
|
|
||||||
Only a slave vfsmount can be made as 'shared and slave' by
|
Only a slave vfsmount can be made as 'shared and slave' by
|
||||||
either executing the following command
|
either executing the following command::
|
||||||
|
|
||||||
mount --make-shared mount
|
mount --make-shared mount
|
||||||
|
|
||||||
or by moving the slave vfsmount under a shared vfsmount.
|
or by moving the slave vfsmount under a shared vfsmount.
|
||||||
|
|
||||||
(4) Private mount
|
(4) Private mount
|
||||||
@ -307,30 +323,32 @@ replicas continue to be exactly same.
|
|||||||
|
|
||||||
|
|
||||||
State diagram:
|
State diagram:
|
||||||
|
|
||||||
The state diagram below explains the state transition of a mount,
|
The state diagram below explains the state transition of a mount,
|
||||||
in response to various commands.
|
in response to various commands::
|
||||||
------------------------------------------------------------------------
|
|
||||||
| |make-shared | make-slave | make-private |make-unbindab|
|
|
||||||
--------------|------------|--------------|--------------|-------------|
|
|
||||||
|shared |shared |*slave/private| private | unbindable |
|
|
||||||
| | | | | |
|
|
||||||
|-------------|------------|--------------|--------------|-------------|
|
|
||||||
|slave |shared | **slave | private | unbindable |
|
|
||||||
| |and slave | | | |
|
|
||||||
|-------------|------------|--------------|--------------|-------------|
|
|
||||||
|shared |shared | slave | private | unbindable |
|
|
||||||
|and slave |and slave | | | |
|
|
||||||
|-------------|------------|--------------|--------------|-------------|
|
|
||||||
|private |shared | **private | private | unbindable |
|
|
||||||
|-------------|------------|--------------|--------------|-------------|
|
|
||||||
|unbindable |shared |**unbindable | private | unbindable |
|
|
||||||
------------------------------------------------------------------------
|
|
||||||
|
|
||||||
* if the shared mount is the only mount in its peer group, making it
|
-----------------------------------------------------------------------
|
||||||
slave, makes it private automatically. Note that there is no master to
|
| |make-shared | make-slave | make-private |make-unbindab|
|
||||||
which it can be slaved to.
|
--------------|------------|--------------|--------------|-------------|
|
||||||
|
|shared |shared |*slave/private| private | unbindable |
|
||||||
|
| | | | | |
|
||||||
|
|-------------|------------|--------------|--------------|-------------|
|
||||||
|
|slave |shared | **slave | private | unbindable |
|
||||||
|
| |and slave | | | |
|
||||||
|
|-------------|------------|--------------|--------------|-------------|
|
||||||
|
|shared |shared | slave | private | unbindable |
|
||||||
|
|and slave |and slave | | | |
|
||||||
|
|-------------|------------|--------------|--------------|-------------|
|
||||||
|
|private |shared | **private | private | unbindable |
|
||||||
|
|-------------|------------|--------------|--------------|-------------|
|
||||||
|
|unbindable |shared |**unbindable | private | unbindable |
|
||||||
|
------------------------------------------------------------------------
|
||||||
|
|
||||||
** slaving a non-shared mount has no effect on the mount.
|
* if the shared mount is the only mount in its peer group, making it
|
||||||
|
slave, makes it private automatically. Note that there is no master to
|
||||||
|
which it can be slaved to.
|
||||||
|
|
||||||
|
** slaving a non-shared mount has no effect on the mount.
|
||||||
|
|
||||||
Apart from the commands listed below, the 'move' operation also changes
|
Apart from the commands listed below, the 'move' operation also changes
|
||||||
the state of a mount depending on type of the destination mount. Its
|
the state of a mount depending on type of the destination mount. Its
|
||||||
@ -338,31 +356,32 @@ replicas continue to be exactly same.
|
|||||||
|
|
||||||
5b) Bind semantics
|
5b) Bind semantics
|
||||||
|
|
||||||
Consider the following command
|
Consider the following command::
|
||||||
|
|
||||||
mount --bind A/a B/b
|
mount --bind A/a B/b
|
||||||
|
|
||||||
where 'A' is the source mount, 'a' is the dentry in the mount 'A', 'B'
|
where 'A' is the source mount, 'a' is the dentry in the mount 'A', 'B'
|
||||||
is the destination mount and 'b' is the dentry in the destination mount.
|
is the destination mount and 'b' is the dentry in the destination mount.
|
||||||
|
|
||||||
The outcome depends on the type of mount of 'A' and 'B'. The table
|
The outcome depends on the type of mount of 'A' and 'B'. The table
|
||||||
below contains quick reference.
|
below contains quick reference::
|
||||||
---------------------------------------------------------------------------
|
|
||||||
| BIND MOUNT OPERATION |
|
--------------------------------------------------------------------------
|
||||||
|**************************************************************************
|
| BIND MOUNT OPERATION |
|
||||||
|source(A)->| shared | private | slave | unbindable |
|
|************************************************************************|
|
||||||
| dest(B) | | | | |
|
|source(A)->| shared | private | slave | unbindable |
|
||||||
| | | | | | |
|
| dest(B) | | | | |
|
||||||
| v | | | | |
|
| | | | | | |
|
||||||
|**************************************************************************
|
| v | | | | |
|
||||||
| shared | shared | shared | shared & slave | invalid |
|
|************************************************************************|
|
||||||
| | | | | |
|
| shared | shared | shared | shared & slave | invalid |
|
||||||
|non-shared| shared | private | slave | invalid |
|
| | | | | |
|
||||||
***************************************************************************
|
|non-shared| shared | private | slave | invalid |
|
||||||
|
**************************************************************************
|
||||||
|
|
||||||
Details:
|
Details:
|
||||||
|
|
||||||
1. 'A' is a shared mount and 'B' is a shared mount. A new mount 'C'
|
1. 'A' is a shared mount and 'B' is a shared mount. A new mount 'C'
|
||||||
which is clone of 'A', is created. Its root dentry is 'a' . 'C' is
|
which is clone of 'A', is created. Its root dentry is 'a' . 'C' is
|
||||||
mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
|
mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
|
||||||
are created and mounted at the dentry 'b' on all mounts where 'B'
|
are created and mounted at the dentry 'b' on all mounts where 'B'
|
||||||
@ -371,7 +390,7 @@ replicas continue to be exactly same.
|
|||||||
'B'. And finally the peer-group of 'C' is merged with the peer group
|
'B'. And finally the peer-group of 'C' is merged with the peer group
|
||||||
of 'A'.
|
of 'A'.
|
||||||
|
|
||||||
2. 'A' is a private mount and 'B' is a shared mount. A new mount 'C'
|
2. 'A' is a private mount and 'B' is a shared mount. A new mount 'C'
|
||||||
which is clone of 'A', is created. Its root dentry is 'a'. 'C' is
|
which is clone of 'A', is created. Its root dentry is 'a'. 'C' is
|
||||||
mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
|
mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
|
||||||
are created and mounted at the dentry 'b' on all mounts where 'B'
|
are created and mounted at the dentry 'b' on all mounts where 'B'
|
||||||
@ -379,7 +398,7 @@ replicas continue to be exactly same.
|
|||||||
'C', 'C1', .., 'Cn' with exactly the same configuration as the
|
'C', 'C1', .., 'Cn' with exactly the same configuration as the
|
||||||
propagation tree for 'B'.
|
propagation tree for 'B'.
|
||||||
|
|
||||||
3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. A new
|
3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. A new
|
||||||
mount 'C' which is clone of 'A', is created. Its root dentry is 'a' .
|
mount 'C' which is clone of 'A', is created. Its root dentry is 'a' .
|
||||||
'C' is mounted on mount 'B' at dentry 'b'. Also new mounts 'C1', 'C2',
|
'C' is mounted on mount 'B' at dentry 'b'. Also new mounts 'C1', 'C2',
|
||||||
'C3' ... are created and mounted at the dentry 'b' on all mounts where
|
'C3' ... are created and mounted at the dentry 'b' on all mounts where
|
||||||
@ -389,19 +408,19 @@ replicas continue to be exactly same.
|
|||||||
is made the slave of mount 'Z'. In other words, mount 'C' is in the
|
is made the slave of mount 'Z'. In other words, mount 'C' is in the
|
||||||
state 'slave and shared'.
|
state 'slave and shared'.
|
||||||
|
|
||||||
4. 'A' is a unbindable mount and 'B' is a shared mount. This is a
|
4. 'A' is a unbindable mount and 'B' is a shared mount. This is a
|
||||||
invalid operation.
|
invalid operation.
|
||||||
|
|
||||||
5. 'A' is a private mount and 'B' is a non-shared(private or slave or
|
5. 'A' is a private mount and 'B' is a non-shared(private or slave or
|
||||||
unbindable) mount. A new mount 'C' which is clone of 'A', is created.
|
unbindable) mount. A new mount 'C' which is clone of 'A', is created.
|
||||||
Its root dentry is 'a'. 'C' is mounted on mount 'B' at dentry 'b'.
|
Its root dentry is 'a'. 'C' is mounted on mount 'B' at dentry 'b'.
|
||||||
|
|
||||||
6. 'A' is a shared mount and 'B' is a non-shared mount. A new mount 'C'
|
6. 'A' is a shared mount and 'B' is a non-shared mount. A new mount 'C'
|
||||||
which is a clone of 'A' is created. Its root dentry is 'a'. 'C' is
|
which is a clone of 'A' is created. Its root dentry is 'a'. 'C' is
|
||||||
mounted on mount 'B' at dentry 'b'. 'C' is made a member of the
|
mounted on mount 'B' at dentry 'b'. 'C' is made a member of the
|
||||||
peer-group of 'A'.
|
peer-group of 'A'.
|
||||||
|
|
||||||
7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. A
|
7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. A
|
||||||
new mount 'C' which is a clone of 'A' is created. Its root dentry is
|
new mount 'C' which is a clone of 'A' is created. Its root dentry is
|
||||||
'a'. 'C' is mounted on mount 'B' at dentry 'b'. Also 'C' is set as a
|
'a'. 'C' is mounted on mount 'B' at dentry 'b'. Also 'C' is set as a
|
||||||
slave mount of 'Z'. In other words 'A' and 'C' are both slave mounts of
|
slave mount of 'Z'. In other words 'A' and 'C' are both slave mounts of
|
||||||
@ -409,7 +428,7 @@ replicas continue to be exactly same.
|
|||||||
mount/unmount on 'A' do not propagate anywhere else. Similarly
|
mount/unmount on 'A' do not propagate anywhere else. Similarly
|
||||||
mount/unmount on 'C' do not propagate anywhere else.
|
mount/unmount on 'C' do not propagate anywhere else.
|
||||||
|
|
||||||
8. 'A' is a unbindable mount and 'B' is a non-shared mount. This is a
|
8. 'A' is a unbindable mount and 'B' is a non-shared mount. This is a
|
||||||
invalid operation. A unbindable mount cannot be bind mounted.
|
invalid operation. A unbindable mount cannot be bind mounted.
|
||||||
|
|
||||||
5c) Rbind semantics
|
5c) Rbind semantics
|
||||||
@ -422,7 +441,9 @@ replicas continue to be exactly same.
|
|||||||
then the subtree under the unbindable mount is pruned in the new
|
then the subtree under the unbindable mount is pruned in the new
|
||||||
location.
|
location.
|
||||||
|
|
||||||
eg: let's say we have the following mount tree.
|
eg:
|
||||||
|
|
||||||
|
let's say we have the following mount tree::
|
||||||
|
|
||||||
A
|
A
|
||||||
/ \
|
/ \
|
||||||
@ -430,12 +451,12 @@ replicas continue to be exactly same.
|
|||||||
/ \ / \
|
/ \ / \
|
||||||
D E F G
|
D E F G
|
||||||
|
|
||||||
Let's say all the mount except the mount C in the tree are
|
Let's say all the mount except the mount C in the tree are
|
||||||
of a type other than unbindable.
|
of a type other than unbindable.
|
||||||
|
|
||||||
If this tree is rbound to say Z
|
If this tree is rbound to say Z
|
||||||
|
|
||||||
We will have the following tree at the new location.
|
We will have the following tree at the new location::
|
||||||
|
|
||||||
Z
|
Z
|
||||||
|
|
|
|
||||||
@ -457,24 +478,26 @@ replicas continue to be exactly same.
|
|||||||
the dentry in the destination mount.
|
the dentry in the destination mount.
|
||||||
|
|
||||||
The outcome depends on the type of the mount of 'A' and 'B'. The table
|
The outcome depends on the type of the mount of 'A' and 'B'. The table
|
||||||
below is a quick reference.
|
below is a quick reference::
|
||||||
---------------------------------------------------------------------------
|
|
||||||
| MOVE MOUNT OPERATION |
|
---------------------------------------------------------------------------
|
||||||
|**************************************************************************
|
| MOVE MOUNT OPERATION |
|
||||||
| source(A)->| shared | private | slave | unbindable |
|
|**************************************************************************
|
||||||
| dest(B) | | | | |
|
| source(A)->| shared | private | slave | unbindable |
|
||||||
| | | | | | |
|
| dest(B) | | | | |
|
||||||
| v | | | | |
|
| | | | | | |
|
||||||
|**************************************************************************
|
| v | | | | |
|
||||||
| shared | shared | shared |shared and slave| invalid |
|
|**************************************************************************
|
||||||
| | | | | |
|
| shared | shared | shared |shared and slave| invalid |
|
||||||
|non-shared| shared | private | slave | unbindable |
|
| | | | | |
|
||||||
***************************************************************************
|
|non-shared| shared | private | slave | unbindable |
|
||||||
NOTE: moving a mount residing under a shared mount is invalid.
|
***************************************************************************
|
||||||
|
|
||||||
|
.. Note:: moving a mount residing under a shared mount is invalid.
|
||||||
|
|
||||||
Details follow:
|
Details follow:
|
||||||
|
|
||||||
1. 'A' is a shared mount and 'B' is a shared mount. The mount 'A' is
|
1. 'A' is a shared mount and 'B' is a shared mount. The mount 'A' is
|
||||||
mounted on mount 'B' at dentry 'b'. Also new mounts 'A1', 'A2'...'An'
|
mounted on mount 'B' at dentry 'b'. Also new mounts 'A1', 'A2'...'An'
|
||||||
are created and mounted at dentry 'b' on all mounts that receive
|
are created and mounted at dentry 'b' on all mounts that receive
|
||||||
propagation from mount 'B'. A new propagation tree is created in the
|
propagation from mount 'B'. A new propagation tree is created in the
|
||||||
@ -483,7 +506,7 @@ replicas continue to be exactly same.
|
|||||||
propagation tree is appended to the already existing propagation tree
|
propagation tree is appended to the already existing propagation tree
|
||||||
of 'A'.
|
of 'A'.
|
||||||
|
|
||||||
2. 'A' is a private mount and 'B' is a shared mount. The mount 'A' is
|
2. 'A' is a private mount and 'B' is a shared mount. The mount 'A' is
|
||||||
mounted on mount 'B' at dentry 'b'. Also new mount 'A1', 'A2'... 'An'
|
mounted on mount 'B' at dentry 'b'. Also new mount 'A1', 'A2'... 'An'
|
||||||
are created and mounted at dentry 'b' on all mounts that receive
|
are created and mounted at dentry 'b' on all mounts that receive
|
||||||
propagation from mount 'B'. The mount 'A' becomes a shared mount and a
|
propagation from mount 'B'. The mount 'A' becomes a shared mount and a
|
||||||
@ -491,7 +514,7 @@ replicas continue to be exactly same.
|
|||||||
'B'. This new propagation tree contains all the new mounts 'A1',
|
'B'. This new propagation tree contains all the new mounts 'A1',
|
||||||
'A2'... 'An'.
|
'A2'... 'An'.
|
||||||
|
|
||||||
3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. The
|
3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. The
|
||||||
mount 'A' is mounted on mount 'B' at dentry 'b'. Also new mounts 'A1',
|
mount 'A' is mounted on mount 'B' at dentry 'b'. Also new mounts 'A1',
|
||||||
'A2'... 'An' are created and mounted at dentry 'b' on all mounts that
|
'A2'... 'An' are created and mounted at dentry 'b' on all mounts that
|
||||||
receive propagation from mount 'B'. A new propagation tree is created
|
receive propagation from mount 'B'. A new propagation tree is created
|
||||||
@ -501,32 +524,32 @@ replicas continue to be exactly same.
|
|||||||
'A'. Mount 'A' continues to be the slave mount of 'Z' but it also
|
'A'. Mount 'A' continues to be the slave mount of 'Z' but it also
|
||||||
becomes 'shared'.
|
becomes 'shared'.
|
||||||
|
|
||||||
4. 'A' is a unbindable mount and 'B' is a shared mount. The operation
|
4. 'A' is a unbindable mount and 'B' is a shared mount. The operation
|
||||||
is invalid. Because mounting anything on the shared mount 'B' can
|
is invalid. Because mounting anything on the shared mount 'B' can
|
||||||
create new mounts that get mounted on the mounts that receive
|
create new mounts that get mounted on the mounts that receive
|
||||||
propagation from 'B'. And since the mount 'A' is unbindable, cloning
|
propagation from 'B'. And since the mount 'A' is unbindable, cloning
|
||||||
it to mount at other mountpoints is not possible.
|
it to mount at other mountpoints is not possible.
|
||||||
|
|
||||||
5. 'A' is a private mount and 'B' is a non-shared(private or slave or
|
5. 'A' is a private mount and 'B' is a non-shared(private or slave or
|
||||||
unbindable) mount. The mount 'A' is mounted on mount 'B' at dentry 'b'.
|
unbindable) mount. The mount 'A' is mounted on mount 'B' at dentry 'b'.
|
||||||
|
|
||||||
6. 'A' is a shared mount and 'B' is a non-shared mount. The mount 'A'
|
6. 'A' is a shared mount and 'B' is a non-shared mount. The mount 'A'
|
||||||
is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a
|
is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a
|
||||||
shared mount.
|
shared mount.
|
||||||
|
|
||||||
7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount.
|
7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount.
|
||||||
The mount 'A' is mounted on mount 'B' at dentry 'b'. Mount 'A'
|
The mount 'A' is mounted on mount 'B' at dentry 'b'. Mount 'A'
|
||||||
continues to be a slave mount of mount 'Z'.
|
continues to be a slave mount of mount 'Z'.
|
||||||
|
|
||||||
8. 'A' is a unbindable mount and 'B' is a non-shared mount. The mount
|
8. 'A' is a unbindable mount and 'B' is a non-shared mount. The mount
|
||||||
'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a
|
'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a
|
||||||
unbindable mount.
|
unbindable mount.
|
||||||
|
|
||||||
5e) Mount semantics
|
5e) Mount semantics
|
||||||
|
|
||||||
Consider the following command
|
Consider the following command::
|
||||||
|
|
||||||
mount device B/b
|
mount device B/b
|
||||||
|
|
||||||
'B' is the destination mount and 'b' is the dentry in the destination
|
'B' is the destination mount and 'b' is the dentry in the destination
|
||||||
mount.
|
mount.
|
||||||
@ -537,9 +560,9 @@ replicas continue to be exactly same.
|
|||||||
|
|
||||||
5f) Unmount semantics
|
5f) Unmount semantics
|
||||||
|
|
||||||
Consider the following command
|
Consider the following command::
|
||||||
|
|
||||||
umount A
|
umount A
|
||||||
|
|
||||||
where 'A' is a mount mounted on mount 'B' at dentry 'b'.
|
where 'A' is a mount mounted on mount 'B' at dentry 'b'.
|
||||||
|
|
||||||
@ -592,10 +615,12 @@ replicas continue to be exactly same.
|
|||||||
|
|
||||||
A. What is the result of the following command sequence?
|
A. What is the result of the following command sequence?
|
||||||
|
|
||||||
mount --bind /mnt /mnt
|
::
|
||||||
mount --make-shared /mnt
|
|
||||||
mount --bind /mnt /tmp
|
mount --bind /mnt /mnt
|
||||||
mount --move /tmp /mnt/1
|
mount --make-shared /mnt
|
||||||
|
mount --bind /mnt /tmp
|
||||||
|
mount --move /tmp /mnt/1
|
||||||
|
|
||||||
what should be the contents of /mnt /mnt/1 /mnt/1/1 should be?
|
what should be the contents of /mnt /mnt/1 /mnt/1/1 should be?
|
||||||
Should they all be identical? or should /mnt and /mnt/1 be
|
Should they all be identical? or should /mnt and /mnt/1 be
|
||||||
@ -604,23 +629,27 @@ replicas continue to be exactly same.
|
|||||||
|
|
||||||
B. What is the result of the following command sequence?
|
B. What is the result of the following command sequence?
|
||||||
|
|
||||||
mount --make-rshared /
|
::
|
||||||
mkdir -p /v/1
|
|
||||||
mount --rbind / /v/1
|
mount --make-rshared /
|
||||||
|
mkdir -p /v/1
|
||||||
|
mount --rbind / /v/1
|
||||||
|
|
||||||
what should be the content of /v/1/v/1 be?
|
what should be the content of /v/1/v/1 be?
|
||||||
|
|
||||||
|
|
||||||
C. What is the result of the following command sequence?
|
C. What is the result of the following command sequence?
|
||||||
|
|
||||||
mount --bind /mnt /mnt
|
::
|
||||||
mount --make-shared /mnt
|
|
||||||
mkdir -p /mnt/1/2/3 /mnt/1/test
|
mount --bind /mnt /mnt
|
||||||
mount --bind /mnt/1 /tmp
|
mount --make-shared /mnt
|
||||||
mount --make-slave /mnt
|
mkdir -p /mnt/1/2/3 /mnt/1/test
|
||||||
mount --make-shared /mnt
|
mount --bind /mnt/1 /tmp
|
||||||
mount --bind /mnt/1/2 /tmp1
|
mount --make-slave /mnt
|
||||||
mount --make-slave /mnt
|
mount --make-shared /mnt
|
||||||
|
mount --bind /mnt/1/2 /tmp1
|
||||||
|
mount --make-slave /mnt
|
||||||
|
|
||||||
At this point we have the first mount at /tmp and
|
At this point we have the first mount at /tmp and
|
||||||
its root dentry is 1. Let's call this mount 'A'
|
its root dentry is 1. Let's call this mount 'A'
|
||||||
@ -668,7 +697,8 @@ replicas continue to be exactly same.
|
|||||||
|
|
||||||
step 1:
|
step 1:
|
||||||
let's say the root tree has just two directories with
|
let's say the root tree has just two directories with
|
||||||
one vfsmount.
|
one vfsmount::
|
||||||
|
|
||||||
root
|
root
|
||||||
/ \
|
/ \
|
||||||
tmp usr
|
tmp usr
|
||||||
@ -676,14 +706,17 @@ replicas continue to be exactly same.
|
|||||||
And we want to replicate the tree at multiple
|
And we want to replicate the tree at multiple
|
||||||
mountpoints under /root/tmp
|
mountpoints under /root/tmp
|
||||||
|
|
||||||
step2:
|
step 2:
|
||||||
mount --make-shared /root
|
::
|
||||||
|
|
||||||
mkdir -p /tmp/m1
|
|
||||||
|
|
||||||
mount --rbind /root /tmp/m1
|
mount --make-shared /root
|
||||||
|
|
||||||
the new tree now looks like this:
|
mkdir -p /tmp/m1
|
||||||
|
|
||||||
|
mount --rbind /root /tmp/m1
|
||||||
|
|
||||||
|
the new tree now looks like this::
|
||||||
|
|
||||||
root
|
root
|
||||||
/ \
|
/ \
|
||||||
@ -697,11 +730,13 @@ replicas continue to be exactly same.
|
|||||||
|
|
||||||
it has two vfsmounts
|
it has two vfsmounts
|
||||||
|
|
||||||
step3:
|
step 3:
|
||||||
|
::
|
||||||
|
|
||||||
mkdir -p /tmp/m2
|
mkdir -p /tmp/m2
|
||||||
mount --rbind /root /tmp/m2
|
mount --rbind /root /tmp/m2
|
||||||
|
|
||||||
the new tree now looks like this:
|
the new tree now looks like this::
|
||||||
|
|
||||||
root
|
root
|
||||||
/ \
|
/ \
|
||||||
@ -724,6 +759,7 @@ replicas continue to be exactly same.
|
|||||||
it has 6 vfsmounts
|
it has 6 vfsmounts
|
||||||
|
|
||||||
step 4:
|
step 4:
|
||||||
|
::
|
||||||
mkdir -p /tmp/m3
|
mkdir -p /tmp/m3
|
||||||
mount --rbind /root /tmp/m3
|
mount --rbind /root /tmp/m3
|
||||||
|
|
||||||
@ -740,7 +776,8 @@ replicas continue to be exactly same.
|
|||||||
|
|
||||||
step 1:
|
step 1:
|
||||||
let's say the root tree has just two directories with
|
let's say the root tree has just two directories with
|
||||||
one vfsmount.
|
one vfsmount::
|
||||||
|
|
||||||
root
|
root
|
||||||
/ \
|
/ \
|
||||||
tmp usr
|
tmp usr
|
||||||
@ -748,17 +785,20 @@ replicas continue to be exactly same.
|
|||||||
How do we set up the same tree at multiple locations under
|
How do we set up the same tree at multiple locations under
|
||||||
/root/tmp
|
/root/tmp
|
||||||
|
|
||||||
step2:
|
step 2:
|
||||||
mount --bind /root/tmp /root/tmp
|
::
|
||||||
|
|
||||||
mount --make-rshared /root
|
|
||||||
mount --make-unbindable /root/tmp
|
|
||||||
|
|
||||||
mkdir -p /tmp/m1
|
mount --bind /root/tmp /root/tmp
|
||||||
|
|
||||||
mount --rbind /root /tmp/m1
|
mount --make-rshared /root
|
||||||
|
mount --make-unbindable /root/tmp
|
||||||
|
|
||||||
the new tree now looks like this:
|
mkdir -p /tmp/m1
|
||||||
|
|
||||||
|
mount --rbind /root /tmp/m1
|
||||||
|
|
||||||
|
the new tree now looks like this::
|
||||||
|
|
||||||
root
|
root
|
||||||
/ \
|
/ \
|
||||||
@ -768,11 +808,13 @@ replicas continue to be exactly same.
|
|||||||
/ \
|
/ \
|
||||||
tmp usr
|
tmp usr
|
||||||
|
|
||||||
step3:
|
step 3:
|
||||||
|
::
|
||||||
|
|
||||||
mkdir -p /tmp/m2
|
mkdir -p /tmp/m2
|
||||||
mount --rbind /root /tmp/m2
|
mount --rbind /root /tmp/m2
|
||||||
|
|
||||||
the new tree now looks like this:
|
the new tree now looks like this::
|
||||||
|
|
||||||
root
|
root
|
||||||
/ \
|
/ \
|
||||||
@ -782,12 +824,13 @@ replicas continue to be exactly same.
|
|||||||
/ \ / \
|
/ \ / \
|
||||||
tmp usr tmp usr
|
tmp usr tmp usr
|
||||||
|
|
||||||
step4:
|
step 4:
|
||||||
|
::
|
||||||
|
|
||||||
mkdir -p /tmp/m3
|
mkdir -p /tmp/m3
|
||||||
mount --rbind /root /tmp/m3
|
mount --rbind /root /tmp/m3
|
||||||
|
|
||||||
the new tree now looks like this:
|
the new tree now looks like this::
|
||||||
|
|
||||||
root
|
root
|
||||||
/ \
|
/ \
|
||||||
@ -801,25 +844,31 @@ replicas continue to be exactly same.
|
|||||||
|
|
||||||
8A) Datastructure
|
8A) Datastructure
|
||||||
|
|
||||||
4 new fields are introduced to struct vfsmount
|
4 new fields are introduced to struct vfsmount:
|
||||||
->mnt_share
|
|
||||||
->mnt_slave_list
|
|
||||||
->mnt_slave
|
|
||||||
->mnt_master
|
|
||||||
|
|
||||||
->mnt_share links together all the mount to/from which this vfsmount
|
* ->mnt_share
|
||||||
|
* ->mnt_slave_list
|
||||||
|
* ->mnt_slave
|
||||||
|
* ->mnt_master
|
||||||
|
|
||||||
|
->mnt_share
|
||||||
|
links together all the mount to/from which this vfsmount
|
||||||
send/receives propagation events.
|
send/receives propagation events.
|
||||||
|
|
||||||
->mnt_slave_list links all the mounts to which this vfsmount propagates
|
->mnt_slave_list
|
||||||
|
links all the mounts to which this vfsmount propagates
|
||||||
to.
|
to.
|
||||||
|
|
||||||
->mnt_slave links together all the slaves that its master vfsmount
|
->mnt_slave
|
||||||
|
links together all the slaves that its master vfsmount
|
||||||
propagates to.
|
propagates to.
|
||||||
|
|
||||||
->mnt_master points to the master vfsmount from which this vfsmount
|
->mnt_master
|
||||||
|
points to the master vfsmount from which this vfsmount
|
||||||
receives propagation.
|
receives propagation.
|
||||||
|
|
||||||
->mnt_flags takes two more flags to indicate the propagation status of
|
->mnt_flags
|
||||||
|
takes two more flags to indicate the propagation status of
|
||||||
the vfsmount. MNT_SHARE indicates that the vfsmount is a shared
|
the vfsmount. MNT_SHARE indicates that the vfsmount is a shared
|
||||||
vfsmount. MNT_UNCLONABLE indicates that the vfsmount cannot be
|
vfsmount. MNT_UNCLONABLE indicates that the vfsmount cannot be
|
||||||
replicated.
|
replicated.
|
||||||
@ -842,7 +891,7 @@ replicas continue to be exactly same.
|
|||||||
|
|
||||||
A example propagation tree looks as shown in the figure below.
|
A example propagation tree looks as shown in the figure below.
|
||||||
[ NOTE: Though it looks like a forest, if we consider all the shared
|
[ NOTE: Though it looks like a forest, if we consider all the shared
|
||||||
mounts as a conceptual entity called 'pnode', it becomes a tree]
|
mounts as a conceptual entity called 'pnode', it becomes a tree]::
|
||||||
|
|
||||||
|
|
||||||
A <--> B <--> C <---> D
|
A <--> B <--> C <---> D
|
||||||
@ -864,14 +913,19 @@ replicas continue to be exactly same.
|
|||||||
A's ->mnt_slave_list links with ->mnt_slave of 'E', 'K', 'F' and 'G'
|
A's ->mnt_slave_list links with ->mnt_slave of 'E', 'K', 'F' and 'G'
|
||||||
|
|
||||||
E's ->mnt_share links with ->mnt_share of K
|
E's ->mnt_share links with ->mnt_share of K
|
||||||
'E', 'K', 'F', 'G' have their ->mnt_master point to struct
|
|
||||||
vfsmount of 'A'
|
'E', 'K', 'F', 'G' have their ->mnt_master point to struct vfsmount of 'A'
|
||||||
|
|
||||||
'M', 'L', 'N' have their ->mnt_master point to struct vfsmount of 'K'
|
'M', 'L', 'N' have their ->mnt_master point to struct vfsmount of 'K'
|
||||||
|
|
||||||
K's ->mnt_slave_list links with ->mnt_slave of 'M', 'L' and 'N'
|
K's ->mnt_slave_list links with ->mnt_slave of 'M', 'L' and 'N'
|
||||||
|
|
||||||
C's ->mnt_slave_list links with ->mnt_slave of 'J' and 'K'
|
C's ->mnt_slave_list links with ->mnt_slave of 'J' and 'K'
|
||||||
|
|
||||||
J and K's ->mnt_master points to struct vfsmount of C
|
J and K's ->mnt_master points to struct vfsmount of C
|
||||||
|
|
||||||
and finally D's ->mnt_slave_list links with ->mnt_slave of 'H' and 'I'
|
and finally D's ->mnt_slave_list links with ->mnt_slave of 'H' and 'I'
|
||||||
|
|
||||||
'H' and 'I' have their ->mnt_master pointing to struct vfsmount of 'D'.
|
'H' and 'I' have their ->mnt_master pointing to struct vfsmount of 'D'.
|
||||||
|
|
||||||
|
|
||||||
@ -903,6 +957,7 @@ replicas continue to be exactly same.
|
|||||||
Prepare phase:
|
Prepare phase:
|
||||||
|
|
||||||
for each mount in the source tree:
|
for each mount in the source tree:
|
||||||
|
|
||||||
a) Create the necessary number of mount trees to
|
a) Create the necessary number of mount trees to
|
||||||
be attached to each of the mounts that receive
|
be attached to each of the mounts that receive
|
||||||
propagation from the destination mount.
|
propagation from the destination mount.
|
||||||
@ -929,11 +984,12 @@ replicas continue to be exactly same.
|
|||||||
Abort phase
|
Abort phase
|
||||||
delete all the newly created trees.
|
delete all the newly created trees.
|
||||||
|
|
||||||
NOTE: all the propagation related functionality resides in the file
|
.. Note::
|
||||||
pnode.c
|
all the propagation related functionality resides in the file pnode.c
|
||||||
|
|
||||||
|
|
||||||
------------------------------------------------------------------------
|
------------------------------------------------------------------------
|
||||||
|
|
||||||
version 0.1 (created the initial document, Ram Pai linuxram@us.ibm.com)
|
version 0.1 (created the initial document, Ram Pai linuxram@us.ibm.com)
|
||||||
|
|
||||||
version 0.2 (Incorporated comments from Al Viro)
|
version 0.2 (Incorporated comments from Al Viro)
|
13
Documentation/filesystems/spufs/index.rst
Normal file
13
Documentation/filesystems/spufs/index.rst
Normal file
@ -0,0 +1,13 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
==============
|
||||||
|
SPU Filesystem
|
||||||
|
==============
|
||||||
|
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
|
||||||
|
spufs
|
||||||
|
spu_create
|
||||||
|
spu_run
|
131
Documentation/filesystems/spufs/spu_create.rst
Normal file
131
Documentation/filesystems/spufs/spu_create.rst
Normal file
@ -0,0 +1,131 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
==========
|
||||||
|
spu_create
|
||||||
|
==========
|
||||||
|
|
||||||
|
Name
|
||||||
|
====
|
||||||
|
spu_create - create a new spu context
|
||||||
|
|
||||||
|
|
||||||
|
Synopsis
|
||||||
|
========
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
#include <sys/types.h>
|
||||||
|
#include <sys/spu.h>
|
||||||
|
|
||||||
|
int spu_create(const char *pathname, int flags, mode_t mode);
|
||||||
|
|
||||||
|
Description
|
||||||
|
===========
|
||||||
|
The spu_create system call is used on PowerPC machines that implement
|
||||||
|
the Cell Broadband Engine Architecture in order to access Synergistic
|
||||||
|
Processor Units (SPUs). It creates a new logical context for an SPU in
|
||||||
|
pathname and returns a handle to associated with it. pathname must
|
||||||
|
point to a non-existing directory in the mount point of the SPU file
|
||||||
|
system (spufs). When spu_create is successful, a directory gets cre-
|
||||||
|
ated on pathname and it is populated with files.
|
||||||
|
|
||||||
|
The returned file handle can only be passed to spu_run(2) or closed,
|
||||||
|
other operations are not defined on it. When it is closed, all associ-
|
||||||
|
ated directory entries in spufs are removed. When the last file handle
|
||||||
|
pointing either inside of the context directory or to this file
|
||||||
|
descriptor is closed, the logical SPU context is destroyed.
|
||||||
|
|
||||||
|
The parameter flags can be zero or any bitwise or'd combination of the
|
||||||
|
following constants:
|
||||||
|
|
||||||
|
SPU_RAWIO
|
||||||
|
Allow mapping of some of the hardware registers of the SPU into
|
||||||
|
user space. This flag requires the CAP_SYS_RAWIO capability, see
|
||||||
|
capabilities(7).
|
||||||
|
|
||||||
|
The mode parameter specifies the permissions used for creating the new
|
||||||
|
directory in spufs. mode is modified with the user's umask(2) value
|
||||||
|
and then used for both the directory and the files contained in it. The
|
||||||
|
file permissions mask out some more bits of mode because they typically
|
||||||
|
support only read or write access. See stat(2) for a full list of the
|
||||||
|
possible mode values.
|
||||||
|
|
||||||
|
|
||||||
|
Return Value
|
||||||
|
============
|
||||||
|
spu_create returns a new file descriptor. It may return -1 to indicate
|
||||||
|
an error condition and set errno to one of the error codes listed
|
||||||
|
below.
|
||||||
|
|
||||||
|
|
||||||
|
Errors
|
||||||
|
======
|
||||||
|
EACCES
|
||||||
|
The current user does not have write access on the spufs mount
|
||||||
|
point.
|
||||||
|
|
||||||
|
EEXIST An SPU context already exists at the given path name.
|
||||||
|
|
||||||
|
EFAULT pathname is not a valid string pointer in the current address
|
||||||
|
space.
|
||||||
|
|
||||||
|
EINVAL pathname is not a directory in the spufs mount point.
|
||||||
|
|
||||||
|
ELOOP Too many symlinks were found while resolving pathname.
|
||||||
|
|
||||||
|
EMFILE The process has reached its maximum open file limit.
|
||||||
|
|
||||||
|
ENAMETOOLONG
|
||||||
|
pathname was too long.
|
||||||
|
|
||||||
|
ENFILE The system has reached the global open file limit.
|
||||||
|
|
||||||
|
ENOENT Part of pathname could not be resolved.
|
||||||
|
|
||||||
|
ENOMEM The kernel could not allocate all resources required.
|
||||||
|
|
||||||
|
ENOSPC There are not enough SPU resources available to create a new
|
||||||
|
context or the user specific limit for the number of SPU con-
|
||||||
|
texts has been reached.
|
||||||
|
|
||||||
|
ENOSYS the functionality is not provided by the current system, because
|
||||||
|
either the hardware does not provide SPUs or the spufs module is
|
||||||
|
not loaded.
|
||||||
|
|
||||||
|
ENOTDIR
|
||||||
|
A part of pathname is not a directory.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Notes
|
||||||
|
=====
|
||||||
|
spu_create is meant to be used from libraries that implement a more
|
||||||
|
abstract interface to SPUs, not to be used from regular applications.
|
||||||
|
See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec-
|
||||||
|
ommended libraries.
|
||||||
|
|
||||||
|
|
||||||
|
Files
|
||||||
|
=====
|
||||||
|
pathname must point to a location beneath the mount point of spufs. By
|
||||||
|
convention, it gets mounted in /spu.
|
||||||
|
|
||||||
|
|
||||||
|
Conforming to
|
||||||
|
=============
|
||||||
|
This call is Linux specific and only implemented by the ppc64 architec-
|
||||||
|
ture. Programs using this system call are not portable.
|
||||||
|
|
||||||
|
|
||||||
|
Bugs
|
||||||
|
====
|
||||||
|
The code does not yet fully implement all features lined out here.
|
||||||
|
|
||||||
|
|
||||||
|
Author
|
||||||
|
======
|
||||||
|
Arnd Bergmann <arndb@de.ibm.com>
|
||||||
|
|
||||||
|
See Also
|
||||||
|
========
|
||||||
|
capabilities(7), close(2), spu_run(2), spufs(7)
|
138
Documentation/filesystems/spufs/spu_run.rst
Normal file
138
Documentation/filesystems/spufs/spu_run.rst
Normal file
@ -0,0 +1,138 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
=======
|
||||||
|
spu_run
|
||||||
|
=======
|
||||||
|
|
||||||
|
|
||||||
|
Name
|
||||||
|
====
|
||||||
|
spu_run - execute an spu context
|
||||||
|
|
||||||
|
|
||||||
|
Synopsis
|
||||||
|
========
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
#include <sys/spu.h>
|
||||||
|
|
||||||
|
int spu_run(int fd, unsigned int *npc, unsigned int *event);
|
||||||
|
|
||||||
|
Description
|
||||||
|
===========
|
||||||
|
The spu_run system call is used on PowerPC machines that implement the
|
||||||
|
Cell Broadband Engine Architecture in order to access Synergistic Pro-
|
||||||
|
cessor Units (SPUs). It uses the fd that was returned from spu_cre-
|
||||||
|
ate(2) to address a specific SPU context. When the context gets sched-
|
||||||
|
uled to a physical SPU, it starts execution at the instruction pointer
|
||||||
|
passed in npc.
|
||||||
|
|
||||||
|
Execution of SPU code happens synchronously, meaning that spu_run does
|
||||||
|
not return while the SPU is still running. If there is a need to exe-
|
||||||
|
cute SPU code in parallel with other code on either the main CPU or
|
||||||
|
other SPUs, you need to create a new thread of execution first, e.g.
|
||||||
|
using the pthread_create(3) call.
|
||||||
|
|
||||||
|
When spu_run returns, the current value of the SPU instruction pointer
|
||||||
|
is written back to npc, so you can call spu_run again without updating
|
||||||
|
the pointers.
|
||||||
|
|
||||||
|
event can be a NULL pointer or point to an extended status code that
|
||||||
|
gets filled when spu_run returns. It can be one of the following con-
|
||||||
|
stants:
|
||||||
|
|
||||||
|
SPE_EVENT_DMA_ALIGNMENT
|
||||||
|
A DMA alignment error
|
||||||
|
|
||||||
|
SPE_EVENT_SPE_DATA_SEGMENT
|
||||||
|
A DMA segmentation error
|
||||||
|
|
||||||
|
SPE_EVENT_SPE_DATA_STORAGE
|
||||||
|
A DMA storage error
|
||||||
|
|
||||||
|
If NULL is passed as the event argument, these errors will result in a
|
||||||
|
signal delivered to the calling process.
|
||||||
|
|
||||||
|
Return Value
|
||||||
|
============
|
||||||
|
spu_run returns the value of the spu_status register or -1 to indicate
|
||||||
|
an error and set errno to one of the error codes listed below. The
|
||||||
|
spu_status register value contains a bit mask of status codes and
|
||||||
|
optionally a 14 bit code returned from the stop-and-signal instruction
|
||||||
|
on the SPU. The bit masks for the status codes are:
|
||||||
|
|
||||||
|
0x02
|
||||||
|
SPU was stopped by stop-and-signal.
|
||||||
|
|
||||||
|
0x04
|
||||||
|
SPU was stopped by halt.
|
||||||
|
|
||||||
|
0x08
|
||||||
|
SPU is waiting for a channel.
|
||||||
|
|
||||||
|
0x10
|
||||||
|
SPU is in single-step mode.
|
||||||
|
|
||||||
|
0x20
|
||||||
|
SPU has tried to execute an invalid instruction.
|
||||||
|
|
||||||
|
0x40
|
||||||
|
SPU has tried to access an invalid channel.
|
||||||
|
|
||||||
|
0x3fff0000
|
||||||
|
The bits masked with this value contain the code returned from
|
||||||
|
stop-and-signal.
|
||||||
|
|
||||||
|
There are always one or more of the lower eight bits set or an error
|
||||||
|
code is returned from spu_run.
|
||||||
|
|
||||||
|
Errors
|
||||||
|
======
|
||||||
|
EAGAIN or EWOULDBLOCK
|
||||||
|
fd is in non-blocking mode and spu_run would block.
|
||||||
|
|
||||||
|
EBADF fd is not a valid file descriptor.
|
||||||
|
|
||||||
|
EFAULT npc is not a valid pointer or status is neither NULL nor a valid
|
||||||
|
pointer.
|
||||||
|
|
||||||
|
EINTR A signal occurred while spu_run was in progress. The npc value
|
||||||
|
has been updated to the new program counter value if necessary.
|
||||||
|
|
||||||
|
EINVAL fd is not a file descriptor returned from spu_create(2).
|
||||||
|
|
||||||
|
ENOMEM Insufficient memory was available to handle a page fault result-
|
||||||
|
ing from an MFC direct memory access.
|
||||||
|
|
||||||
|
ENOSYS the functionality is not provided by the current system, because
|
||||||
|
either the hardware does not provide SPUs or the spufs module is
|
||||||
|
not loaded.
|
||||||
|
|
||||||
|
|
||||||
|
Notes
|
||||||
|
=====
|
||||||
|
spu_run is meant to be used from libraries that implement a more
|
||||||
|
abstract interface to SPUs, not to be used from regular applications.
|
||||||
|
See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec-
|
||||||
|
ommended libraries.
|
||||||
|
|
||||||
|
|
||||||
|
Conforming to
|
||||||
|
=============
|
||||||
|
This call is Linux specific and only implemented by the ppc64 architec-
|
||||||
|
ture. Programs using this system call are not portable.
|
||||||
|
|
||||||
|
|
||||||
|
Bugs
|
||||||
|
====
|
||||||
|
The code does not yet fully implement all features lined out here.
|
||||||
|
|
||||||
|
|
||||||
|
Author
|
||||||
|
======
|
||||||
|
Arnd Bergmann <arndb@de.ibm.com>
|
||||||
|
|
||||||
|
See Also
|
||||||
|
========
|
||||||
|
capabilities(7), close(2), spu_create(2), spufs(7)
|
@ -1,12 +1,18 @@
|
|||||||
SPUFS(2) Linux Programmer's Manual SPUFS(2)
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
=====
|
||||||
|
spufs
|
||||||
|
=====
|
||||||
|
|
||||||
|
Name
|
||||||
|
====
|
||||||
|
|
||||||
NAME
|
|
||||||
spufs - the SPU file system
|
spufs - the SPU file system
|
||||||
|
|
||||||
|
|
||||||
DESCRIPTION
|
Description
|
||||||
|
===========
|
||||||
|
|
||||||
The SPU file system is used on PowerPC machines that implement the Cell
|
The SPU file system is used on PowerPC machines that implement the Cell
|
||||||
Broadband Engine Architecture in order to access Synergistic Processor
|
Broadband Engine Architecture in order to access Synergistic Processor
|
||||||
Units (SPUs).
|
Units (SPUs).
|
||||||
@ -21,7 +27,9 @@ DESCRIPTION
|
|||||||
ally add or remove files.
|
ally add or remove files.
|
||||||
|
|
||||||
|
|
||||||
MOUNT OPTIONS
|
Mount Options
|
||||||
|
=============
|
||||||
|
|
||||||
uid=<uid>
|
uid=<uid>
|
||||||
set the user owning the mount point, the default is 0 (root).
|
set the user owning the mount point, the default is 0 (root).
|
||||||
|
|
||||||
@ -29,7 +37,9 @@ MOUNT OPTIONS
|
|||||||
set the group owning the mount point, the default is 0 (root).
|
set the group owning the mount point, the default is 0 (root).
|
||||||
|
|
||||||
|
|
||||||
FILES
|
Files
|
||||||
|
=====
|
||||||
|
|
||||||
The files in spufs mostly follow the standard behavior for regular sys-
|
The files in spufs mostly follow the standard behavior for regular sys-
|
||||||
tem calls like read(2) or write(2), but often support only a subset of
|
tem calls like read(2) or write(2), but often support only a subset of
|
||||||
the operations supported on regular file systems. This list details the
|
the operations supported on regular file systems. This list details the
|
||||||
@ -125,14 +135,12 @@ FILES
|
|||||||
space is available for writing.
|
space is available for writing.
|
||||||
|
|
||||||
|
|
||||||
/mbox_stat
|
/mbox_stat, /ibox_stat, /wbox_stat
|
||||||
/ibox_stat
|
|
||||||
/wbox_stat
|
|
||||||
Read-only files that contain the length of the current queue, i.e. how
|
Read-only files that contain the length of the current queue, i.e. how
|
||||||
many words can be read from mbox or ibox or how many words can be
|
many words can be read from mbox or ibox or how many words can be
|
||||||
written to wbox without blocking. The files can be read only in 4-byte
|
written to wbox without blocking. The files can be read only in 4-byte
|
||||||
units and return a big-endian binary integer number. The possible
|
units and return a big-endian binary integer number. The possible
|
||||||
operations on an open *box_stat file are:
|
operations on an open ``*box_stat`` file are:
|
||||||
|
|
||||||
read(2)
|
read(2)
|
||||||
If a count smaller than four is requested, read returns -1 and
|
If a count smaller than four is requested, read returns -1 and
|
||||||
@ -143,12 +151,7 @@ FILES
|
|||||||
in EAGAIN.
|
in EAGAIN.
|
||||||
|
|
||||||
|
|
||||||
/npc
|
/npc, /decr, /decr_status, /spu_tag_mask, /event_mask, /srr0
|
||||||
/decr
|
|
||||||
/decr_status
|
|
||||||
/spu_tag_mask
|
|
||||||
/event_mask
|
|
||||||
/srr0
|
|
||||||
Internal registers of the SPU. The representation is an ASCII string
|
Internal registers of the SPU. The representation is an ASCII string
|
||||||
with the numeric value of the next instruction to be executed. These
|
with the numeric value of the next instruction to be executed. These
|
||||||
can be used in read/write mode for debugging, but normal operation of
|
can be used in read/write mode for debugging, but normal operation of
|
||||||
@ -157,17 +160,14 @@ FILES
|
|||||||
|
|
||||||
The contents of these files are:
|
The contents of these files are:
|
||||||
|
|
||||||
|
=================== ===================================
|
||||||
npc Next Program Counter
|
npc Next Program Counter
|
||||||
|
|
||||||
decr SPU Decrementer
|
decr SPU Decrementer
|
||||||
|
|
||||||
decr_status Decrementer Status
|
decr_status Decrementer Status
|
||||||
|
|
||||||
spu_tag_mask MFC tag mask for SPU DMA
|
spu_tag_mask MFC tag mask for SPU DMA
|
||||||
|
|
||||||
event_mask Event mask for SPU interrupts
|
event_mask Event mask for SPU interrupts
|
||||||
|
|
||||||
srr0 Interrupt Return address register
|
srr0 Interrupt Return address register
|
||||||
|
=================== ===================================
|
||||||
|
|
||||||
|
|
||||||
The possible operations on an open npc, decr, decr_status,
|
The possible operations on an open npc, decr, decr_status,
|
||||||
@ -206,8 +206,7 @@ FILES
|
|||||||
from the data buffer, updating the value of the fpcr register.
|
from the data buffer, updating the value of the fpcr register.
|
||||||
|
|
||||||
|
|
||||||
/signal1
|
/signal1, /signal2
|
||||||
/signal2
|
|
||||||
The two signal notification channels of an SPU. These are read-write
|
The two signal notification channels of an SPU. These are read-write
|
||||||
files that operate on a 32 bit word. Writing to one of these files
|
files that operate on a 32 bit word. Writing to one of these files
|
||||||
triggers an interrupt on the SPU. The value written to the signal
|
triggers an interrupt on the SPU. The value written to the signal
|
||||||
@ -233,8 +232,7 @@ FILES
|
|||||||
file.
|
file.
|
||||||
|
|
||||||
|
|
||||||
/signal1_type
|
/signal1_type, /signal2_type
|
||||||
/signal2_type
|
|
||||||
These two files change the behavior of the signal1 and signal2 notifi-
|
These two files change the behavior of the signal1 and signal2 notifi-
|
||||||
cation files. The contain a numerical ASCII string which is read as
|
cation files. The contain a numerical ASCII string which is read as
|
||||||
either "1" or "0". In mode 0 (overwrite), the hardware replaces the
|
either "1" or "0". In mode 0 (overwrite), the hardware replaces the
|
||||||
@ -259,263 +257,17 @@ FILES
|
|||||||
the previous setting.
|
the previous setting.
|
||||||
|
|
||||||
|
|
||||||
EXAMPLES
|
Examples
|
||||||
|
========
|
||||||
/etc/fstab entry
|
/etc/fstab entry
|
||||||
none /spu spufs gid=spu 0 0
|
none /spu spufs gid=spu 0 0
|
||||||
|
|
||||||
|
|
||||||
AUTHORS
|
Authors
|
||||||
|
=======
|
||||||
Arnd Bergmann <arndb@de.ibm.com>, Mark Nutter <mnutter@us.ibm.com>,
|
Arnd Bergmann <arndb@de.ibm.com>, Mark Nutter <mnutter@us.ibm.com>,
|
||||||
Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
|
Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
|
||||||
|
|
||||||
SEE ALSO
|
See Also
|
||||||
|
========
|
||||||
capabilities(7), close(2), spu_create(2), spu_run(2), spufs(7)
|
capabilities(7), close(2), spu_create(2), spu_run(2), spufs(7)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Linux 2005-09-28 SPUFS(2)
|
|
||||||
|
|
||||||
------------------------------------------------------------------------------
|
|
||||||
|
|
||||||
SPU_RUN(2) Linux Programmer's Manual SPU_RUN(2)
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
NAME
|
|
||||||
spu_run - execute an spu context
|
|
||||||
|
|
||||||
|
|
||||||
SYNOPSIS
|
|
||||||
#include <sys/spu.h>
|
|
||||||
|
|
||||||
int spu_run(int fd, unsigned int *npc, unsigned int *event);
|
|
||||||
|
|
||||||
DESCRIPTION
|
|
||||||
The spu_run system call is used on PowerPC machines that implement the
|
|
||||||
Cell Broadband Engine Architecture in order to access Synergistic Pro-
|
|
||||||
cessor Units (SPUs). It uses the fd that was returned from spu_cre-
|
|
||||||
ate(2) to address a specific SPU context. When the context gets sched-
|
|
||||||
uled to a physical SPU, it starts execution at the instruction pointer
|
|
||||||
passed in npc.
|
|
||||||
|
|
||||||
Execution of SPU code happens synchronously, meaning that spu_run does
|
|
||||||
not return while the SPU is still running. If there is a need to exe-
|
|
||||||
cute SPU code in parallel with other code on either the main CPU or
|
|
||||||
other SPUs, you need to create a new thread of execution first, e.g.
|
|
||||||
using the pthread_create(3) call.
|
|
||||||
|
|
||||||
When spu_run returns, the current value of the SPU instruction pointer
|
|
||||||
is written back to npc, so you can call spu_run again without updating
|
|
||||||
the pointers.
|
|
||||||
|
|
||||||
event can be a NULL pointer or point to an extended status code that
|
|
||||||
gets filled when spu_run returns. It can be one of the following con-
|
|
||||||
stants:
|
|
||||||
|
|
||||||
SPE_EVENT_DMA_ALIGNMENT
|
|
||||||
A DMA alignment error
|
|
||||||
|
|
||||||
SPE_EVENT_SPE_DATA_SEGMENT
|
|
||||||
A DMA segmentation error
|
|
||||||
|
|
||||||
SPE_EVENT_SPE_DATA_STORAGE
|
|
||||||
A DMA storage error
|
|
||||||
|
|
||||||
If NULL is passed as the event argument, these errors will result in a
|
|
||||||
signal delivered to the calling process.
|
|
||||||
|
|
||||||
RETURN VALUE
|
|
||||||
spu_run returns the value of the spu_status register or -1 to indicate
|
|
||||||
an error and set errno to one of the error codes listed below. The
|
|
||||||
spu_status register value contains a bit mask of status codes and
|
|
||||||
optionally a 14 bit code returned from the stop-and-signal instruction
|
|
||||||
on the SPU. The bit masks for the status codes are:
|
|
||||||
|
|
||||||
0x02 SPU was stopped by stop-and-signal.
|
|
||||||
|
|
||||||
0x04 SPU was stopped by halt.
|
|
||||||
|
|
||||||
0x08 SPU is waiting for a channel.
|
|
||||||
|
|
||||||
0x10 SPU is in single-step mode.
|
|
||||||
|
|
||||||
0x20 SPU has tried to execute an invalid instruction.
|
|
||||||
|
|
||||||
0x40 SPU has tried to access an invalid channel.
|
|
||||||
|
|
||||||
0x3fff0000
|
|
||||||
The bits masked with this value contain the code returned from
|
|
||||||
stop-and-signal.
|
|
||||||
|
|
||||||
There are always one or more of the lower eight bits set or an error
|
|
||||||
code is returned from spu_run.
|
|
||||||
|
|
||||||
ERRORS
|
|
||||||
EAGAIN or EWOULDBLOCK
|
|
||||||
fd is in non-blocking mode and spu_run would block.
|
|
||||||
|
|
||||||
EBADF fd is not a valid file descriptor.
|
|
||||||
|
|
||||||
EFAULT npc is not a valid pointer or status is neither NULL nor a valid
|
|
||||||
pointer.
|
|
||||||
|
|
||||||
EINTR A signal occurred while spu_run was in progress. The npc value
|
|
||||||
has been updated to the new program counter value if necessary.
|
|
||||||
|
|
||||||
EINVAL fd is not a file descriptor returned from spu_create(2).
|
|
||||||
|
|
||||||
ENOMEM Insufficient memory was available to handle a page fault result-
|
|
||||||
ing from an MFC direct memory access.
|
|
||||||
|
|
||||||
ENOSYS the functionality is not provided by the current system, because
|
|
||||||
either the hardware does not provide SPUs or the spufs module is
|
|
||||||
not loaded.
|
|
||||||
|
|
||||||
|
|
||||||
NOTES
|
|
||||||
spu_run is meant to be used from libraries that implement a more
|
|
||||||
abstract interface to SPUs, not to be used from regular applications.
|
|
||||||
See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec-
|
|
||||||
ommended libraries.
|
|
||||||
|
|
||||||
|
|
||||||
CONFORMING TO
|
|
||||||
This call is Linux specific and only implemented by the ppc64 architec-
|
|
||||||
ture. Programs using this system call are not portable.
|
|
||||||
|
|
||||||
|
|
||||||
BUGS
|
|
||||||
The code does not yet fully implement all features lined out here.
|
|
||||||
|
|
||||||
|
|
||||||
AUTHOR
|
|
||||||
Arnd Bergmann <arndb@de.ibm.com>
|
|
||||||
|
|
||||||
SEE ALSO
|
|
||||||
capabilities(7), close(2), spu_create(2), spufs(7)
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Linux 2005-09-28 SPU_RUN(2)
|
|
||||||
|
|
||||||
------------------------------------------------------------------------------
|
|
||||||
|
|
||||||
SPU_CREATE(2) Linux Programmer's Manual SPU_CREATE(2)
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
NAME
|
|
||||||
spu_create - create a new spu context
|
|
||||||
|
|
||||||
|
|
||||||
SYNOPSIS
|
|
||||||
#include <sys/types.h>
|
|
||||||
#include <sys/spu.h>
|
|
||||||
|
|
||||||
int spu_create(const char *pathname, int flags, mode_t mode);
|
|
||||||
|
|
||||||
DESCRIPTION
|
|
||||||
The spu_create system call is used on PowerPC machines that implement
|
|
||||||
the Cell Broadband Engine Architecture in order to access Synergistic
|
|
||||||
Processor Units (SPUs). It creates a new logical context for an SPU in
|
|
||||||
pathname and returns a handle to associated with it. pathname must
|
|
||||||
point to a non-existing directory in the mount point of the SPU file
|
|
||||||
system (spufs). When spu_create is successful, a directory gets cre-
|
|
||||||
ated on pathname and it is populated with files.
|
|
||||||
|
|
||||||
The returned file handle can only be passed to spu_run(2) or closed,
|
|
||||||
other operations are not defined on it. When it is closed, all associ-
|
|
||||||
ated directory entries in spufs are removed. When the last file handle
|
|
||||||
pointing either inside of the context directory or to this file
|
|
||||||
descriptor is closed, the logical SPU context is destroyed.
|
|
||||||
|
|
||||||
The parameter flags can be zero or any bitwise or'd combination of the
|
|
||||||
following constants:
|
|
||||||
|
|
||||||
SPU_RAWIO
|
|
||||||
Allow mapping of some of the hardware registers of the SPU into
|
|
||||||
user space. This flag requires the CAP_SYS_RAWIO capability, see
|
|
||||||
capabilities(7).
|
|
||||||
|
|
||||||
The mode parameter specifies the permissions used for creating the new
|
|
||||||
directory in spufs. mode is modified with the user's umask(2) value
|
|
||||||
and then used for both the directory and the files contained in it. The
|
|
||||||
file permissions mask out some more bits of mode because they typically
|
|
||||||
support only read or write access. See stat(2) for a full list of the
|
|
||||||
possible mode values.
|
|
||||||
|
|
||||||
|
|
||||||
RETURN VALUE
|
|
||||||
spu_create returns a new file descriptor. It may return -1 to indicate
|
|
||||||
an error condition and set errno to one of the error codes listed
|
|
||||||
below.
|
|
||||||
|
|
||||||
|
|
||||||
ERRORS
|
|
||||||
EACCES
|
|
||||||
The current user does not have write access on the spufs mount
|
|
||||||
point.
|
|
||||||
|
|
||||||
EEXIST An SPU context already exists at the given path name.
|
|
||||||
|
|
||||||
EFAULT pathname is not a valid string pointer in the current address
|
|
||||||
space.
|
|
||||||
|
|
||||||
EINVAL pathname is not a directory in the spufs mount point.
|
|
||||||
|
|
||||||
ELOOP Too many symlinks were found while resolving pathname.
|
|
||||||
|
|
||||||
EMFILE The process has reached its maximum open file limit.
|
|
||||||
|
|
||||||
ENAMETOOLONG
|
|
||||||
pathname was too long.
|
|
||||||
|
|
||||||
ENFILE The system has reached the global open file limit.
|
|
||||||
|
|
||||||
ENOENT Part of pathname could not be resolved.
|
|
||||||
|
|
||||||
ENOMEM The kernel could not allocate all resources required.
|
|
||||||
|
|
||||||
ENOSPC There are not enough SPU resources available to create a new
|
|
||||||
context or the user specific limit for the number of SPU con-
|
|
||||||
texts has been reached.
|
|
||||||
|
|
||||||
ENOSYS the functionality is not provided by the current system, because
|
|
||||||
either the hardware does not provide SPUs or the spufs module is
|
|
||||||
not loaded.
|
|
||||||
|
|
||||||
ENOTDIR
|
|
||||||
A part of pathname is not a directory.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
NOTES
|
|
||||||
spu_create is meant to be used from libraries that implement a more
|
|
||||||
abstract interface to SPUs, not to be used from regular applications.
|
|
||||||
See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec-
|
|
||||||
ommended libraries.
|
|
||||||
|
|
||||||
|
|
||||||
FILES
|
|
||||||
pathname must point to a location beneath the mount point of spufs. By
|
|
||||||
convention, it gets mounted in /spu.
|
|
||||||
|
|
||||||
|
|
||||||
CONFORMING TO
|
|
||||||
This call is Linux specific and only implemented by the ppc64 architec-
|
|
||||||
ture. Programs using this system call are not portable.
|
|
||||||
|
|
||||||
|
|
||||||
BUGS
|
|
||||||
The code does not yet fully implement all features lined out here.
|
|
||||||
|
|
||||||
|
|
||||||
AUTHOR
|
|
||||||
Arnd Bergmann <arndb@de.ibm.com>
|
|
||||||
|
|
||||||
SEE ALSO
|
|
||||||
capabilities(7), close(2), spu_run(2), spufs(7)
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Linux 2005-09-28 SPU_CREATE(2)
|
|
@ -1,8 +1,11 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
============================================
|
||||||
Accessing PCI device resources through sysfs
|
Accessing PCI device resources through sysfs
|
||||||
--------------------------------------------
|
============================================
|
||||||
|
|
||||||
sysfs, usually mounted at /sys, provides access to PCI resources on platforms
|
sysfs, usually mounted at /sys, provides access to PCI resources on platforms
|
||||||
that support it. For example, a given bus might look like this:
|
that support it. For example, a given bus might look like this::
|
||||||
|
|
||||||
/sys/devices/pci0000:17
|
/sys/devices/pci0000:17
|
||||||
|-- 0000:17:00.0
|
|-- 0000:17:00.0
|
||||||
@ -30,8 +33,9 @@ This bus contains a single function device in slot 0. The domain and bus
|
|||||||
numbers are reproduced for convenience. Under the device directory are several
|
numbers are reproduced for convenience. Under the device directory are several
|
||||||
files, each with their own function.
|
files, each with their own function.
|
||||||
|
|
||||||
|
=================== =====================================================
|
||||||
file function
|
file function
|
||||||
---- --------
|
=================== =====================================================
|
||||||
class PCI class (ascii, ro)
|
class PCI class (ascii, ro)
|
||||||
config PCI config space (binary, rw)
|
config PCI config space (binary, rw)
|
||||||
device PCI device (ascii, ro)
|
device PCI device (ascii, ro)
|
||||||
@ -40,13 +44,16 @@ files, each with their own function.
|
|||||||
local_cpus nearby CPU mask (cpumask, ro)
|
local_cpus nearby CPU mask (cpumask, ro)
|
||||||
remove remove device from kernel's list (ascii, wo)
|
remove remove device from kernel's list (ascii, wo)
|
||||||
resource PCI resource host addresses (ascii, ro)
|
resource PCI resource host addresses (ascii, ro)
|
||||||
resource0..N PCI resource N, if present (binary, mmap, rw[1])
|
resource0..N PCI resource N, if present (binary, mmap, rw\ [1]_)
|
||||||
resource0_wc..N_wc PCI WC map resource N, if prefetchable (binary, mmap)
|
resource0_wc..N_wc PCI WC map resource N, if prefetchable (binary, mmap)
|
||||||
revision PCI revision (ascii, ro)
|
revision PCI revision (ascii, ro)
|
||||||
rom PCI ROM resource, if present (binary, ro)
|
rom PCI ROM resource, if present (binary, ro)
|
||||||
subsystem_device PCI subsystem device (ascii, ro)
|
subsystem_device PCI subsystem device (ascii, ro)
|
||||||
subsystem_vendor PCI subsystem vendor (ascii, ro)
|
subsystem_vendor PCI subsystem vendor (ascii, ro)
|
||||||
vendor PCI vendor (ascii, ro)
|
vendor PCI vendor (ascii, ro)
|
||||||
|
=================== =====================================================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
ro - read only file
|
ro - read only file
|
||||||
rw - file is readable and writable
|
rw - file is readable and writable
|
||||||
@ -56,7 +63,7 @@ files, each with their own function.
|
|||||||
binary - file contains binary data
|
binary - file contains binary data
|
||||||
cpumask - file contains a cpumask type
|
cpumask - file contains a cpumask type
|
||||||
|
|
||||||
[1] rw for RESOURCE_IO (I/O port) regions only
|
.. [1] rw for RESOURCE_IO (I/O port) regions only
|
||||||
|
|
||||||
The read only files are informational, writes to them will be ignored, with
|
The read only files are informational, writes to them will be ignored, with
|
||||||
the exception of the 'rom' file. Writable files can be used to perform
|
the exception of the 'rom' file. Writable files can be used to perform
|
||||||
@ -67,11 +74,11 @@ don't support mmapping of certain resources, so be sure to check the return
|
|||||||
value from any attempted mmap. The most notable of these are I/O port
|
value from any attempted mmap. The most notable of these are I/O port
|
||||||
resources, which also provide read/write access.
|
resources, which also provide read/write access.
|
||||||
|
|
||||||
The 'enable' file provides a counter that indicates how many times the device
|
The 'enable' file provides a counter that indicates how many times the device
|
||||||
has been enabled. If the 'enable' file currently returns '4', and a '1' is
|
has been enabled. If the 'enable' file currently returns '4', and a '1' is
|
||||||
echoed into it, it will then return '5'. Echoing a '0' into it will decrease
|
echoed into it, it will then return '5'. Echoing a '0' into it will decrease
|
||||||
the count. Even when it returns to 0, though, some of the initialisation
|
the count. Even when it returns to 0, though, some of the initialisation
|
||||||
may not be reversed.
|
may not be reversed.
|
||||||
|
|
||||||
The 'rom' file is special in that it provides read-only access to the device's
|
The 'rom' file is special in that it provides read-only access to the device's
|
||||||
ROM file, if available. It's disabled by default, however, so applications
|
ROM file, if available. It's disabled by default, however, so applications
|
||||||
@ -93,7 +100,7 @@ Accessing legacy resources through sysfs
|
|||||||
|
|
||||||
Legacy I/O port and ISA memory resources are also provided in sysfs if the
|
Legacy I/O port and ISA memory resources are also provided in sysfs if the
|
||||||
underlying platform supports them. They're located in the PCI class hierarchy,
|
underlying platform supports them. They're located in the PCI class hierarchy,
|
||||||
e.g.
|
e.g.::
|
||||||
|
|
||||||
/sys/class/pci_bus/0000:17/
|
/sys/class/pci_bus/0000:17/
|
||||||
|-- bridge -> ../../../devices/pci0000:17
|
|-- bridge -> ../../../devices/pci0000:17
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user