2015-03-24 14:02:41 +00:00
acpi= [HW,ACPI,X86,ARM64]
2007-03-06 02:29:44 -08:00
Advanced Configuration and Power Interface
2016-04-12 16:09:11 +02:00
Format: { force | on | off | strict | noirq | rsdt |
2015-09-26 19:27:57 +03:00
copy_dsdt }
2005-04-16 15:20:36 -07:00
force -- enable ACPI if default was off
2016-04-12 16:09:11 +02:00
on -- enable ACPI but allow fallback to DT [arm64]
2005-04-16 15:20:36 -07:00
off -- disable ACPI if default was on
noirq -- do not use ACPI for IRQ routing
2005-10-23 12:57:11 -07:00
strict -- Be less tolerant of platforms that are not
2005-04-16 15:20:36 -07:00
strictly ACPI specification compliant.
2008-12-17 16:55:18 +08:00
rsdt -- prefer RSDT over (default) XSDT
2010-04-08 14:34:27 +08:00
copy_dsdt -- copy DSDT to memory
2016-04-12 16:09:11 +02:00
For ARM64, ONLY "acpi=off", "acpi=on" or "acpi=force"
are available
2005-04-16 15:20:36 -07:00
2019-06-13 07:10:36 -03:00
See also Documentation/power/runtime_pm.rst, pci=noacpi
2005-04-16 15:20:36 -07:00
2007-03-11 03:26:14 -04:00
acpi_apic_instance= [ACPI, IOAPIC]
Format: <int>
2: use 2nd APIC table, if available
1,0: use 1st APIC table
2007-03-30 14:16:10 -04:00
default: 0
2007-03-11 03:26:14 -04:00
2008-08-01 17:37:55 +02:00
acpi_backlight= [HW,ACPI]
2020-03-30 17:17:37 -07:00
{ vendor | video | native | none }
If set to vendor, prefer vendor-specific driver
2008-08-01 17:37:55 +02:00
(e.g. thinkpad_acpi, sony_acpi, etc.) instead
of the ACPI video.ko driver.
2020-03-30 17:17:37 -07:00
If set to video, use the ACPI video.ko driver.
If set to native, use the device's native backlight mode.
If set to none, disable the ACPI backlight interface.
2008-08-01 17:37:55 +02:00
2016-01-21 17:05:47 +00:00
acpi_force_32bit_fadt_addr
force FADT to use 32 bit addresses rather than the
64 bit X_* addresses. Some firmware have broken 64
bit addresses for force ACPI ignore these and use
the older legacy 32 bit addresses.
2015-05-14 15:31:28 +02:00
acpica_no_return_repair [HW, ACPI]
Disable AML predefined validation mechanism
This mechanism can repair the evaluation result to make
the return objects more ACPI specification compliant.
This option is useful for developers to identify the
root cause of an AML interpreter issue when the issue
has something to do with the repair mechanism.
2008-11-07 16:58:05 -07:00
acpi.debug_layer= [HW,ACPI,ACPI_DEBUG]
acpi.debug_level= [HW,ACPI,ACPI_DEBUG]
2005-04-16 15:20:36 -07:00
Format: <int>
2008-11-07 16:58:05 -07:00
CONFIG_ACPI_DEBUG must be enabled to produce any ACPI
debug output. Bits in debug_layer correspond to a
_COMPONENT in an ACPI source file, e.g.,
#define _COMPONENT ACPI_PCI_COMPONENT
Bits in debug_level correspond to a level in
ACPI_DEBUG_PRINT statements, e.g.,
ACPI_DEBUG_PRINT((ACPI_DB_INFO, ...
2008-11-13 17:30:13 -06:00
The debug_level mask defaults to "info". See
2019-06-07 15:54:32 -03:00
Documentation/firmware-guide/acpi/debug.rst for more information about
2008-11-13 17:30:13 -06:00
debug layers and levels.
2008-11-07 16:58:05 -07:00
2008-11-13 17:30:13 -06:00
Enable processor driver info messages:
acpi.debug_layer=0x20000000
Enable PCI/PCI interrupt routing info messages:
acpi.debug_layer=0x400000
2008-11-07 16:58:05 -07:00
Enable AML "Debug" output, i.e., stores to the Debug
object while interpreting AML:
acpi.debug_layer=0xffffffff acpi.debug_level=0x2
Enable all messages related to ACPI hardware:
acpi.debug_layer=0x2 acpi.debug_level=0xffffffff
Some values produce so much output that the system is
unusable. The "log_buf_len" parameter may be useful
if you need to capture more output.
2007-04-24 13:53:22 +08:00
2015-05-14 15:31:28 +02:00
acpi_enforce_resources= [ACPI]
{ strict | lax | no }
Check for resource conflicts between native drivers
and ACPI OperationRegions (SystemIO and SystemMemory
only). IO ports and memory declared in ACPI might be
used by the ACPI subsystem in arbitrary AML code and
can interfere with legacy drivers.
strict (default): access to resources claimed by ACPI
is denied; legacy drivers trying to access reserved
resources will fail to bind to device using them.
lax: access to resources claimed by ACPI is allowed;
legacy drivers trying to access reserved resources
will bind successfully but a warning message is logged.
no: ACPI OperationRegions are not marked as reserved,
no further checks are performed.
2014-05-31 08:15:02 +08:00
acpi_force_table_verification [HW,ACPI]
Enable table checksum verification during early stage.
By default, this is disabled due to x86 early mapping
size limitation.
2009-04-05 15:55:22 -07:00
acpi_irq_balance [HW,ACPI]
ACPI will balance active IRQs
default in APIC mode
acpi_irq_nobalance [HW,ACPI]
ACPI will not move active IRQs (default)
default in PIC mode
acpi_irq_isa= [HW,ACPI] If irq_balance, mark listed IRQs used by ISA
Format: <irq>,<irq>...
acpi_irq_pci= [HW,ACPI] If irq_balance, clear listed IRQs for
use by PCI
Format: <irq>,<irq>...
2018-04-18 20:51:39 +02:00
acpi_mask_gpe= [HW,ACPI]
2016-12-16 12:07:57 +08:00
Due to the existence of _Lxx/_Exx, some GPEs triggered
by unsupported hardware/firmware features can result in
2018-04-18 20:51:39 +02:00
GPE floodings that cannot be automatically disabled by
the GPE dispatcher.
2016-12-16 12:07:57 +08:00
This facility can be used to prevent such uncontrolled
GPE floodings.
2019-11-14 15:16:24 +08:00
Format: <byte>
2016-12-16 12:07:57 +08:00
2014-03-24 14:49:22 +08:00
acpi_no_auto_serialize [HW,ACPI]
Disable auto-serialization of AML methods
2014-03-24 14:49:00 +08:00
AML control methods that contain the opcodes to create
named objects will be marked as "Serialized" by the
auto-serialization feature.
2014-03-24 14:49:22 +08:00
This feature is enabled by default.
This option allows to turn off the feature.
2014-03-24 14:49:00 +08:00
2015-05-14 15:31:28 +02:00
acpi_no_memhotplug [ACPI] Disable memory hotplug. Useful for kdump
kernels.
2014-04-04 12:39:11 +08:00
acpi_no_static_ssdt [HW,ACPI]
Disable installation of static SSDTs at early boot time
By default, SSDTs contained in the RSDT/XSDT will be
installed automatically and they will appear under
/sys/firmware/acpi/tables.
This option turns off this feature.
Note that specifying this option does not affect
dynamic table installation which will install SSDT
tables to /sys/firmware/acpi/tables/dynamic.
2009-04-05 15:55:22 -07:00
2020-02-06 16:58:45 +01:00
acpi_no_watchdog [HW,ACPI,WDT]
Ignore the ACPI-based watchdog interface (WDAT) and let
a native driver control the watchdog device instead.
2015-05-14 15:31:28 +02:00
acpi_rsdp= [ACPI,EFI,KEXEC]
Pass the RSDP address to the kernel, mostly used
on machines running EFI runtime service to boot the
second kernel for kdump.
2014-02-11 11:01:52 +08:00
2009-04-05 15:55:22 -07:00
acpi_os_name= [HW,ACPI] Tell ACPI BIOS the name of the OS
Format: To spoof as Windows 98: ="Microsoft Windows"
2015-07-03 01:06:00 +02:00
acpi_rev_override [ACPI] Override the _REV object to return 5 (instead
of 2 which is mandated by ACPI 6) as the supported ACPI
specification revision (when using this switch, it may
be necessary to carry out a cold reboot _twice_ in a
row to make it take effect on the platform firmware).
2009-04-05 15:55:22 -07:00
acpi_osi= [HW,ACPI] Modify list of supported OS interface strings
2013-07-22 16:08:25 +08:00
acpi_osi="string1" # add string1
acpi_osi="!string2" # remove string2
2013-07-22 16:08:36 +08:00
acpi_osi=!* # remove all strings
2013-07-22 16:08:25 +08:00
acpi_osi=! # disable all built-in OS vendor
strings
2016-05-03 16:48:32 +08:00
acpi_osi=!! # enable all built-in OS vendor
strings
2009-04-05 15:55:22 -07:00
acpi_osi= # disable all strings
2013-07-22 16:08:25 +08:00
'acpi_osi=!' can be used in combination with single or
multiple 'acpi_osi="string1"' to support specific OS
vendor string(s). Note that such command can only
affect the default state of the OS vendor strings, thus
it cannot affect the default state of the feature group
strings and the current state of the OS vendor strings,
specifying it multiple times through kernel command line
2013-07-22 16:08:36 +08:00
is meaningless. This command is useful when one do not
care about the state of the feature group strings which
should be controlled by the OSPM.
2013-07-22 16:08:25 +08:00
Examples:
1. 'acpi_osi=! acpi_osi="Windows 2000"' is equivalent
to 'acpi_osi="Windows 2000" acpi_osi=!', they all
can make '_OSI("Windows 2000")' TRUE.
'acpi_osi=' cannot be used in combination with other
'acpi_osi=' command lines, the _OSI method will not
exist in the ACPI namespace. NOTE that such command can
only affect the _OSI support state, thus specifying it
multiple times through kernel command line is also
meaningless.
Examples:
1. 'acpi_osi=' can make 'CondRefOf(_OSI, Local1)'
FALSE.
2013-07-22 16:08:36 +08:00
'acpi_osi=!*' can be used in combination with single or
multiple 'acpi_osi="string1"' to support specific
string(s). Note that such command can affect the
current state of both the OS vendor strings and the
feature group strings, thus specifying it multiple times
through kernel command line is meaningful. But it may
still not able to affect the final state of a string if
there are quirks related to this string. This command
is useful when one want to control the state of the
feature group strings to debug BIOS issues related to
the OSPM features.
Examples:
1. 'acpi_osi="Module Device" acpi_osi=!*' can make
'_OSI("Module Device")' FALSE.
2. 'acpi_osi=!* acpi_osi="Module Device"' can make
'_OSI("Module Device")' TRUE.
3. 'acpi_osi=! acpi_osi=!* acpi_osi="Windows 2000"' is
equivalent to
'acpi_osi=!* acpi_osi=! acpi_osi="Windows 2000"'
and
'acpi_osi=!* acpi_osi="Windows 2000" acpi_osi=!',
they all will make '_OSI("Windows 2000")' TRUE.
2009-04-14 14:03:43 +05:30
acpi_pm_good [X86]
2009-04-05 15:55:22 -07:00
Override the pmtimer bug detection: force the kernel
to assume that this machine's pmtimer latches its value
and always returns good values.
2009-04-17 18:30:28 -07:00
acpi_sci= [HW,ACPI] ACPI System Control Interrupt trigger mode
Format: { level | edge | high | low }
acpi_skip_timer_override [HW,ACPI]
Recognize and ignore IRQ0/pin2 Interrupt Override.
For broken nForce2 BIOS resulting in XT-PIC timer.
acpi_sleep= [HW,ACPI] Sleep options
Format: { s3_bios, s3_mode, s3_beep, s4_nohwsig,
2017-11-15 02:16:55 +01:00
old_ordering, nonvs, sci_force_enable, nobl }
2019-06-13 07:10:36 -03:00
See Documentation/power/video.rst for information on
2009-04-17 18:30:28 -07:00
s3_bios and s3_mode.
s3_beep is for debugging; it makes the PC's speaker beep
as soon as the kernel's real-mode entry point is called.
s4_nohwsig prevents ACPI hardware signature from being
used during resume from hibernation.
old_ordering causes the ACPI 1.0 ordering of the _PTS
control method, with respect to putting devices into
low power states, to be enforced (the ACPI 2.0 ordering
of _PTS is used by default).
2010-07-23 22:59:09 +02:00
nonvs prevents the kernel from saving/restoring the
ACPI NVS memory during suspend/hibernation and resume.
2009-12-30 15:36:42 +08:00
sci_force_enable causes the kernel to set SCI_EN directly
on resume from S1/S3 (which is against the ACPI spec,
but some broken systems don't work without it).
2017-11-15 02:16:55 +01:00
nobl causes the internal blacklist of systems known to
behave incorrectly in some ways with respect to system
suspend and resume to be ignored (use wisely).
2009-04-17 18:30:28 -07:00
acpi_use_timer_override [HW,ACPI]
Use timer override. For some broken Nvidia NF5 boards
that require a timer override, but don't have HPET
add_efi_memmap [EFI; X86] Include EFI memory map in
kernel's map of available physical RAM.
2009-04-05 15:55:22 -07:00
agp= [AGP]
{ off | try_unsupported }
off: disable AGP support
try_unsupported: try to drive unsupported chipsets
(may crash computer or cause data corruption)
2010-06-07 17:10:38 -07:00
ALSA [HW,ALSA]
2018-06-14 07:43:07 -03:00
See Documentation/sound/alsa-configuration.rst
2010-06-07 17:10:38 -07:00
2010-02-20 16:13:29 +00:00
alignment= [KNL,ARM]
Allow the default userspace alignment fault handler
behaviour to be specified. Bit 0 enables warnings,
bit 1 enables fixups, and bit 2 sends a segfault.
2011-08-05 15:15:08 +02:00
align_va_addr= [X86-64]
Align virtual addresses by clearing slice [14:12] when
allocating a VMA at process creation time. This option
gives you up to 3% performance improvement on AMD F15h
machines (where it is enabled by default) for a
CPU-intensive style benchmark, and it can vary highly in
a microbenchmark depending on workload and compiler.
2011-11-21 12:10:19 +01:00
32: only for 32-bit processes
64: only for 64-bit processes
2011-08-05 15:15:08 +02:00
on: enable for both 32- and 64-bit processes
off: disable for both 32- and 64-bit processes
2013-03-07 22:48:09 -05:00
alloc_snapshot [FTRACE]
Allocate the ftrace snapshot buffer on boot up when the
main buffer is allocated. This is handy if debugging
and you need to use tracing_snapshot() on boot up, and
do not want to use tracing_snapshot_alloc() as it needs
to be done where GFP_KERNEL allocations are allowed.
2011-12-05 23:08:32 +01:00
amd_iommu= [HW,X86-64]
2008-06-26 21:28:10 +02:00
Pass parameters to the AMD IOMMU driver in the system.
Possible values are:
2008-09-20 01:23:30 +09:00
fullflush - enable flushing of IO/TLB entries when
they are unmapped. Otherwise they are
flushed before they will be reused, which
is a lot of faster
2010-05-11 17:12:33 +02:00
off - do not initialize any AMD IOMMU found in
the system
2011-12-01 15:49:45 +01:00
force_isolation - Force device isolation for all
devices. The IOMMU driver is not
allowed anymore to lift isolation
requirements as needed. This option
does not override iommu=pt
2008-09-20 01:23:30 +09:00
2012-05-24 15:58:25 -06:00
amd_iommu_dump= [HW,X86-64]
Enable AMD IOMMU driver option to dump the ACPI table
for AMD IOMMU. With this option enabled, AMD IOMMU
driver will print ACPI tables for AMD IOMMU during
IOMMU initialization.
2016-08-23 13:52:32 -05:00
amd_iommu_intr= [HW,X86-64]
Specifies one of the following AMD IOMMU interrupt
remapping modes:
legacy - Use legacy interrupt remapping mode.
vapic - Use virtual APIC mode, which allows IOMMU
to inject interrupts directly into guest.
This mode requires kvm-amd.avic=1.
(Default when IOMMU HW support is present.)
2005-04-16 15:20:36 -07:00
amijoy.map= [HW,JOY] Amiga joystick support
Map of devices attached to JOY0DAT and JOY1DAT
Format: <a>,<b>
2017-10-10 12:36:23 -05:00
See also Documentation/input/joydev/joystick.rst
2005-04-16 15:20:36 -07:00
analog.map= [HW,JOY] Analog joystick and gamepad support
Specifies type or capabilities of an analog joystick
connected to one of 16 gameports
Format: <type1>,<type2>,..<type16>
2005-10-23 12:57:11 -07:00
apc= [HW,SPARC]
Power management functions (SPARCstation-4/5 + deriv.)
2005-04-16 15:20:36 -07:00
Format: noidle
Disable APC CPU standby support. SPARCstation-Fox does
not play well with APC CPU idle - disable it if you have
APC and your system crashes randomly.
2017-12-04 12:03:13 +08:00
apic= [APIC,X86] Advanced Programmable Interrupt Controller
2018-11-19 11:02:45 +00:00
Change the output verbosity while booting
2005-04-16 15:20:36 -07:00
Format: { quiet (default) | verbose | debug }
Change the amount of debugging information output
when initialising the APIC and IO-APIC components.
2017-12-04 12:03:13 +08:00
For X86-32, this can also be used to specify an APIC
driver name.
Format: apic=driver_name
Examples: apic=bigsmp
2005-10-23 12:57:11 -07:00
2015-12-14 11:19:12 +01:00
apic_extnmi= [APIC,X86] External NMI delivery setting
Format: { bsp (default) | all | none }
bsp: External NMI is delivered only to CPU 0
all: External NMIs are broadcast to all CPUs as a
backup of CPU 0
none: External NMI is masked for all CPUs. This is
useful so that a dump capture kernel won't be
shot down by NMI
2010-02-04 13:36:50 -08:00
autoconf= [IPV6]
2020-04-28 00:01:50 +02:00
See Documentation/networking/ipv6.rst.
2010-02-04 13:36:50 -08:00
2009-10-14 19:09:04 +04:00
show_lapic= [APIC,X86] Advanced Programmable Interrupt Controller
Limit apic dumping. The parameter defines the maximal
number of local apics being dumped. Also it is possible
to set it to "all" by meaning -- no limit here.
Format: { 1 (default) | 2 | ... | all }.
The parameter valid if only apic=debug or
apic=verbose is specified.
Example: apic=debug show_lapic=all
2005-04-16 15:20:36 -07:00
apm= [APM] Advanced Power Management
2008-07-04 09:59:43 -07:00
See header of arch/x86/kernel/apm_32.c.
2005-04-16 15:20:36 -07:00
arcrimi= [HW,NET] ARCnet - "RIM I" (entirely mem-mapped) cards
Format: <io>,<irq>,<nodeID>
ataflop= [HW,M68k]
atarimouse= [HW,MOUSE] Atari Mouse
atkbd.extra= [HW] Enable extra LEDs and keys on IBM RapidAccess,
EzKey and similar keyboards
atkbd.reset= [HW] Reset keyboard during initialization
2005-10-23 12:57:11 -07:00
atkbd.set= [HW] Select keyboard code set
Format: <int> (2 = AT (default), 3 = PS/2)
2005-04-16 15:20:36 -07:00
atkbd.scroll= [HW] Enable scroll wheel on MS Office and similar
keyboards
atkbd.softraw= [HW] Choose between synthetic and real raw mode
Format: <bool> (0 = real, 1 = synthetic (default))
2005-10-23 12:57:11 -07:00
atkbd.softrepeat= [HW]
Use software keyboard repeat
2005-04-16 15:20:36 -07:00
2013-09-23 21:53:35 -04:00
audit= [KNL] Enable the audit sub-system
2018-03-05 15:05:20 -07:00
Format: { "0" | "1" | "off" | "on" }
0 | off - kernel audit is disabled and can not be
enabled until the next reboot
2014-01-13 16:01:06 -05:00
unset - kernel audit is initialized but disabled and
will be fully enabled by the userspace auditd.
2018-03-05 15:05:20 -07:00
1 | on - kernel audit is initialized and partially
enabled, storing at most audit_backlog_limit
messages in RAM until it is fully enabled by the
userspace auditd.
2013-09-23 21:53:35 -04:00
Default: unset
2013-09-17 12:34:52 -04:00
2013-09-17 12:34:52 -04:00
audit_backlog_limit= [KNL] Set the audit queue size limit.
Format: <int> (must be >=0)
Default: 64
2016-03-31 14:18:29 -05:00
bau= [X86_UV] Enable the BAU on SGI UV. The default
behavior is to disable the BAU (i.e. bau=0).
Format: { "0" | "1" }
0 - Disable the BAU.
1 - Enable the BAU.
unset - Disable the BAU.
2005-04-16 15:20:36 -07:00
baycom_epp= [HW,AX25]
Format: <io>,<mode>
2005-10-23 12:57:11 -07:00
2005-04-16 15:20:36 -07:00
baycom_par= [HW,AX25] BayCom Parallel Port AX.25 Modem
Format: <io>,<mode>
See header of drivers/net/hamradio/baycom_par.c.
2005-10-23 12:57:11 -07:00
baycom_ser_fdx= [HW,AX25]
BayCom Serial Port AX.25 Modem (Full Duplex Mode)
2005-04-16 15:20:36 -07:00
Format: <io>,<irq>,<mode>[,<baud>]
See header of drivers/net/hamradio/baycom_ser_fdx.c.
2005-10-23 12:57:11 -07:00
baycom_ser_hdx= [HW,AX25]
BayCom Serial Port AX.25 Modem (Half Duplex Mode)
2005-04-16 15:20:36 -07:00
Format: <io>,<irq>,<mode>
See header of drivers/net/hamradio/baycom_ser_hdx.c.
2013-09-30 13:45:19 -07:00
blkdevparts= Manual partition parsing of block device(s) for
embedded devices based on command line input.
2019-04-18 19:45:00 -03:00
See Documentation/block/cmdline-partition.rst
2013-09-30 13:45:19 -07:00
2007-10-16 01:23:46 -07:00
boot_delay= Milliseconds to delay each printk during boot.
Values larger than 10 seconds (10000) are changed to
no delay (0).
Format: integer
2020-02-04 07:33:53 -05:00
bootconfig [KNL]
Extended command line options can be added to an initrd
and this will cause the kernel to look for it.
See Documentation/admin-guide/bootconfig.rst
ACPI / APEI: Add Boot Error Record Table (BERT) support
ACPI/APEI is designed to verifiy/report H/W errors, like Corrected
Error(CE) and Uncorrected Error(UC). It contains four tables: HEST,
ERST, EINJ and BERT. The first three tables have been merged for
a long time, but because of lacking BIOS support for BERT, the
support for BERT is pending until now. Recently on ARM 64 platform
it is has been supported. So here we come.
Under normal circumstances, when a hardware error occurs, kernel will
be notified via NMI, MCE or some other method, then kernel will
process the error condition, report it, and recover it if possible.
But sometime, the situation is so bad, so that firmware may choose to
reset directly without notifying Linux kernel.
Linux kernel can use the Boot Error Record Table (BERT) to get the
un-notified hardware errors that occurred in a previous boot. In this
patch, the error information is reported via printk.
For more information about BERT, please refer to ACPI Specification
version 6.0, section 18.3.1:
http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
The following log is a BERT record after system reboot because of hitting
a fatal memory error:
BERT: Error records from previous boot:
[Hardware Error]: It has been corrected by h/w and requires no further action
[Hardware Error]: event severity: corrected
[Hardware Error]: Error 0, type: recoverable
[Hardware Error]: section_type: memory error
[Hardware Error]: error_status: 0x0000000000000400
[Hardware Error]: physical_address: 0xffffffffffffffff
[Hardware Error]: card: 1 module: 2 bank: 3 row: 1 column: 2 bit_position: 5
[Hardware Error]: error_type: 2, single-bit ECC
[Tomasz Nowicki: Clear error status at the end of error handling]
[Tony: Applied some cleanups suggested by Fu Wei]
[Fu Wei: delete EXPORT_SYMBOL_GPL(bert_disable), improve the code]
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Tested-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Fu Wei <fu.wei@linaro.org>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-29 13:04:29 -07:00
bert_disable [ACPI]
Disable BERT OS support on buggy BIOSes.
2020-03-04 15:55:29 -07:00
bgrt_disable [ACPI][X86]
Disable BGRT to avoid flickering OEM logo.
2005-04-16 15:20:36 -07:00
bttv.card= [HW,V4L] bttv (bt848 + bt878 based grabber cards)
2005-10-23 12:57:11 -07:00
bttv.radio= Most important insmod options are available as
kernel args too.
2020-03-04 13:08:03 +01:00
bttv.pll= See Documentation/admin-guide/media/bttv.rst
2011-08-15 02:02:26 +02:00
bttv.tuner=
2005-04-16 15:20:36 -07:00
2010-09-28 15:33:12 +00:00
bulk_remove=off [PPC] This parameter disables the use of the pSeries
firmware feature for flushing multiple hpte entries
at a time.
2005-04-16 15:20:36 -07:00
c101= [NET] Moxa C101 synchronous serial card
2007-07-31 00:37:59 -07:00
cachesize= [BUGS=X86-32] Override level 2 CPU cache size detection.
2005-04-16 15:20:36 -07:00
Sometimes CPU hardware bugs make them report the cache
size incorrectly. The kernel will attempt work arounds
to fix known problems, but for some CPUs it is not
possible to determine what the correct size should be.
This option provides an override for these situations.
2019-01-31 11:14:18 +01:00
carrier_timeout=
[NET] Specifies amount of time (in seconds) that
the kernel should wait for a network carrier. By default
it waits 120 seconds.
2014-06-17 11:56:58 +03:00
ca_keys= [KEYS] This parameter identifies a specific key(s) on
the system trusted keyring to be used for certificate
trust validation.
2014-06-17 11:56:59 +03:00
format: { id:<keyid> | builtin }
2014-06-17 11:56:58 +03:00
2014-06-25 16:41:13 -07:00
cca= [MIPS] Override the kernel pages' cache coherency
algorithm. Accepted values range from 0 to 7
inclusive. See arch/mips/include/asm/pgtable-bits.h
for platform specific values (SB1, Loongson3 and
others).
2018-04-18 20:51:39 +02:00
ccw_timeout_log [S390]
2019-06-08 23:27:16 -03:00
See Documentation/s390/common_io.rst for details.
2005-04-16 15:20:36 -07:00
2018-04-18 20:51:39 +02:00
cgroup_disable= [KNL] Disable a particular controller
2008-04-04 14:29:57 -07:00
Format: {name of the controller(s) to disable}
2013-11-06 13:18:09 -08:00
The effects of cgroup_disable=foo are:
- foo isn't auto-mounted if you mount all cgroups in
a single hierarchy
- foo isn't visible as an individually mountable
subsystem
{Currently only "memory" controller deal with this and
cut the overhead, others just disable the usage. So
only cgroup_disable=memory is actually worthy}
2008-04-04 14:29:57 -07:00
2018-12-28 10:31:07 -08:00
cgroup_no_v1= [KNL] Disable cgroup controllers and named hierarchies in v1
Format: { { controller | "all" | "named" }
[,{ controller | "all" | "named" }...] }
2016-02-16 13:21:14 -05:00
Like cgroup_disable, but only applies to cgroup v1;
the blacklisted controllers remain available in cgroup2.
2018-12-28 10:31:07 -08:00
"all" blacklists all controllers and "named" disables
named mounts. Specifying both "all" and "named" disables
all v1 hierarchies.
2016-02-16 13:21:14 -05:00
2016-01-14 15:21:29 -08:00
cgroup.memory= [KNL] Pass options to the cgroup memory controller.
Format: <string>
nosocket -- Disable socket memory accounting.
2016-01-20 15:02:38 -08:00
nokmem -- Disable kernel memory accounting.
2016-01-14 15:21:29 -08:00
2005-04-16 15:20:36 -07:00
checkreqprot [SELINUX] Set initial checkreqprot flag value.
Format: { "0" | "1" }
See security/selinux/Kconfig help text.
2005-10-23 12:57:11 -07:00
0 -- check protection applied by kernel (includes
any implied execute protection).
2005-04-16 15:20:36 -07:00
1 -- check protection requested by application.
Default value is set via a kernel config option.
2005-10-23 12:57:11 -07:00
Value can be changed at runtime via
2020-01-07 11:35:04 -05:00
/sys/fs/selinux/checkreqprot.
2020-01-08 11:24:47 -05:00
Setting checkreqprot to 1 is deprecated.
2005-10-23 12:57:11 -07:00
2008-01-26 14:10:36 +01:00
cio_ignore= [S390]
2019-06-08 23:27:16 -03:00
See Documentation/s390/common_io.rst for details.
2013-04-27 14:10:18 -07:00
clk_ignore_unused
[CLK]
2014-09-30 14:24:38 -07:00
Prevents the clock framework from automatically gating
clocks that have not been explicitly enabled by a Linux
device driver but are enabled in hardware at reset or
by the bootloader/firmware. Note that this does not
force such clocks to be always-on nor does it reserve
those clocks in any way. This parameter is useful for
debug and development, but should not be needed on a
platform with proper driver support. For more
2018-05-07 06:35:44 -03:00
information, see Documentation/driver-api/clk.rst.
2008-01-26 14:10:36 +01:00
2007-07-31 00:37:59 -07:00
clock= [BUGS=X86-32, HW] gettimeofday clocksource override.
2006-06-26 00:25:05 -07:00
[Deprecated]
2006-10-03 22:45:33 +02:00
Forces specified clocksource (if available) to be used
2006-06-26 00:25:05 -07:00
when calculating gettimeofday(). If specified
2006-10-03 22:45:33 +02:00
clocksource is not available, it defaults to PIT.
2005-04-16 15:20:36 -07:00
Format: { pit | tsc | cyclone | pmtmr }
2010-07-13 17:56:20 -07:00
clocksource= Override the default clocksource
2007-05-23 13:58:16 -07:00
Format: <string>
Override the default clocksource and use the clocksource
with the name specified.
Some clocksource names to choose from, depending on
the platform:
[all] jiffies (this is the base, fallback clocksource)
[ACPI] acpi_pm
[ARM] imx_timer1,OSTS,netx_timer,mpu_timer2,
pxa_timer,timer3,32k_counter,timer0_1
2010-08-23 14:49:11 -07:00
[X86-32] pit,hpet,tsc;
2007-05-23 13:58:16 -07:00
scx200_hrt on Geode; cyclone on IBM x440
[MIPS] MIPS
[PARISC] cr16
[S390] tod
[SH] SuperH
[SPARC64] tick
[X86-64] hpet,tsc
2016-06-27 17:30:13 +01:00
clocksource.arm_arch_timer.evtstrm=
[ARM,ARM64]
Format: <bool>
Enable/disable the eventstream feature of the ARM
architected timer so that code using WFE-based polling
loops can be debugged more effectively on production
systems.
2008-01-30 13:33:21 +01:00
clearcpuid=BITNUM [X86]
Disable CPUID feature X for the kernel. See
2016-01-26 22:12:04 +01:00
arch/x86/include/asm/cpufeatures.h for the valid bit
2009-01-06 14:42:41 -08:00
numbers. Note the Linux specific bits are not necessarily
2008-01-30 13:33:21 +01:00
stable over kernel options, but the vendor specific
ones should be.
Also note that user programs calling CPUID directly
or using the feature without checking anything
will still see it. This just prevents it from
being used by the kernel or shown in /proc/cpuinfo.
Also note the kernel might malfunction if you disable
some critical bits.
2014-06-04 16:06:54 -07:00
cma=nn[MG]@[start[MG][-end[MG]]]
[ARM,X86,KNL]
Sets the size of kernel global memory area for
contiguous memory allocations and optionally the
placement constraint by the physical address range of
2014-10-09 15:29:41 -07:00
memory allocations. A value of 0 disables CMA
altogether. For more information, see
2011-12-29 13:09:51 +01:00
include/linux/dma-contiguous.h
2009-04-15 05:55:32 +00:00
cmo_free_hint= [PPC] Format: { yes | no }
Specify whether pages are marked as being inactive
when they are freed. This is used in CMO environments
to determine OS memory pressure for page stealing by
a hypervisor.
Default: yes
2011-12-29 13:09:51 +01:00
coherent_pool=nn[KMG] [ARM,KNL]
Sets the size of memory pool for coherent, atomic dma
2012-07-30 09:11:33 +02:00
allocations, by default set to 256K.
2011-12-29 13:09:51 +01:00
2005-04-16 15:20:36 -07:00
com20020= [HW,NET] ARCnet - COM20020 chipset
2005-10-23 12:57:11 -07:00
Format:
<io>[,<irq>[,<nodeID>[,<backplane>[,<ckp>[,<timeout>]]]]]
2005-04-16 15:20:36 -07:00
com90io= [HW,NET] ARCnet - COM90xx chipset (IO-mapped buffers)
Format: <io>[,<irq>]
2005-10-23 12:57:11 -07:00
com90xx= [HW,NET]
ARCnet - COM90xx chipset (memory-mapped buffers)
2005-04-16 15:20:36 -07:00
Format: <io>[,<irq>[,<memstart>]]
condev= [HW,S390] console device
conmode=
2005-10-23 12:57:11 -07:00
2005-04-16 15:20:36 -07:00
console= [KNL] Output console device and options.
tty<n> Use the virtual console device <n>.
ttyS<n>[,options]
2006-03-25 03:08:17 -08:00
ttyUSB0[,options]
2005-04-16 15:20:36 -07:00
Use the specified serial port. The options are of
2006-03-25 03:08:17 -08:00
the form "bbbbpnf", where "bbbb" is the baud rate,
"p" is parity ("n", "o", or "e"), "n" is number of
bits, and "f" is flow control ("r" for RTS or
omit it). Default is "9600n8".
2016-11-03 12:10:10 +02:00
See Documentation/admin-guide/serial-console.rst for more
2006-03-25 03:08:17 -08:00
information. See
2020-04-30 18:04:02 +02:00
Documentation/networking/netconsole.rst for an
2006-03-25 03:08:17 -08:00
alternative.
2005-04-16 15:20:36 -07:00
serial: convert early_uart to earlycon for 8250
Beacuse SERIAL_PORT_DFNS is removed from include/asm-i386/serial.h and
include/asm-x86_64/serial.h. the serial8250_ports need to be probed late in
serial initializing stage. the console_init=>serial8250_console_init=>
register_console=>serial8250_console_setup will return -ENDEV, and console
ttyS0 can not be enabled at that time. need to wait till uart_add_one_port in
drivers/serial/serial_core.c to call register_console to get console ttyS0.
that is too late.
Make early_uart to use early_param, so uart console can be used earlier. Make
it to be bootconsole with CON_BOOT flag, so can use console handover feature.
and it will switch to corresponding normal serial console automatically.
new command line will be:
console=uart8250,io,0x3f8,9600n8
console=uart8250,mmio,0xff5e0000,115200n8
or
earlycon=uart8250,io,0x3f8,9600n8
earlycon=uart8250,mmio,0xff5e0000,115200n8
it will print in very early stage:
Early serial console at I/O port 0x3f8 (options '9600n8')
console [uart0] enabled
later for console it will print:
console handover: boot [uart0] -> real [ttyS0]
Signed-off-by: <yinghai.lu@sun.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Gerd Hoffmann <kraxel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-15 23:37:59 -07:00
uart[8250],io,<addr>[,options]
uart[8250],mmio,<addr>[,options]
2015-10-28 12:46:05 +09:00
uart[8250],mmio16,<addr>[,options]
earlycon: 8250: Document kernel command line options
Document the expected behavior of kernel command lines of the forms:
console=uart[8250],io|mmio|mmio32,<addr>[,options]
console=uart[8250],<addr>[,options]
and
earlycon=uart[8250],io|mmio|mmio32,<addr>[,options]
earlycon=uart[8250],<addr>[,options]
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-06 10:52:39 -04:00
uart[8250],mmio32,<addr>[,options]
uart[8250],0x<addr>[,options]
2005-04-16 15:20:36 -07:00
Start an early, polled-mode console on the 8250/16550
UART at the specified I/O port or MMIO address,
earlycon: 8250: Document kernel command line options
Document the expected behavior of kernel command lines of the forms:
console=uart[8250],io|mmio|mmio32,<addr>[,options]
console=uart[8250],<addr>[,options]
and
earlycon=uart[8250],io|mmio|mmio32,<addr>[,options]
earlycon=uart[8250],<addr>[,options]
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-06 10:52:39 -04:00
switching to the matching ttyS device later.
MMIO inter-register address stride is either 8-bit
2015-10-28 12:46:05 +09:00
(mmio), 16-bit (mmio16), or 32-bit (mmio32).
If none of [io|mmio|mmio16|mmio32], <addr> is assumed
to be equivalent to 'mmio'. 'options' are specified in
the same format described for ttyS above; if unspecified,
earlycon: 8250: Document kernel command line options
Document the expected behavior of kernel command lines of the forms:
console=uart[8250],io|mmio|mmio32,<addr>[,options]
console=uart[8250],<addr>[,options]
and
earlycon=uart[8250],io|mmio|mmio32,<addr>[,options]
earlycon=uart[8250],<addr>[,options]
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-06 10:52:39 -04:00
the h/w is not re-initialized.
2013-02-25 15:54:09 -05:00
hvc<n> Use the hypervisor console device <n>. This is for
both Xen and PowerPC hypervisors.
2005-04-16 15:20:36 -07:00
2018-04-18 20:51:39 +02:00
If the device connected to the port is not a TTY but a braille
device, prepend "brl," before the device type, for instance
2008-04-30 00:54:51 -07:00
console=brl,ttyS0
For now, only VisioBraille is supported.
printk: add console_msg_format command line option
0day and kernelCI automatically parse kernel log - basically some sort
of grepping using the pre-defined text patterns - in order to detect
and report regressions/errors. There are several sources they get the
kernel logs from:
a) dmesg or /proc/ksmg
This is the preferred way. Because `dmesg --raw' (see later Note)
and /proc/kmsg output contains facility and log level, which greatly
simplifies grepping for EMERG/ALERT/CRIT/ERR messages.
b) serial consoles
This option is harder to maintain, because serial console messages
don't contain facility and log level.
This patch introduces a `console_msg_format=' command line option,
to switch between different message formatting on serial consoles.
For the time being we have just two options - default and syslog.
The "default" option just keeps the existing format. While the
"syslog" option makes serial console messages to appear in syslog
format [syslog() syscall], matching the `dmesg -S --raw' and
`cat /proc/kmsg' output formats:
- facility and log level
- time stamp (depends on printk_time/PRINTK_TIME)
- message
<%u>[time stamp] text\n
NOTE: while Kevin and Fengguang talk about "dmesg --raw", it's actually
"dmesg -S --raw" that always prints messages in syslog format [per
Petr Mladek]. Running "dmesg --raw" may produce output in non-syslog
format sometimes. console_msg_format=syslog enables syslog format,
thus in documentation we mention "dmesg -S --raw", not "dmesg --raw".
Per Kevin Hilman:
: Right now we can get this info from a "dmesg --raw" after bootup,
: but it would be really nice in certain automation frameworks to
: have a kernel command-line option to enable printing of loglevels
: in default boot log.
:
: This is especially useful when ingesting kernel logs into advanced
: search/analytics frameworks (I'm playing with and ELK stack: Elastic
: Search, Logstash, Kibana).
:
: The other important reason for having this on the command line is that
: for testing linux-next (and other bleeding edge developer branches),
: it's common that we never make it to userspace, so can't even run
: "dmesg --raw" (or equivalent.) So we really want this on the primary
: boot (serial) console.
Per Fengguang Wu, 0day scripts should quickly benefit from that
feature, because they will be able to switch to a more reliable
parsing, based on messages' facility and log levels [1]:
`#{grep} -a -E -e '^<[0123]>' -e '^kern :(err |crit |alert |emerg )'
instead of doing text pattern matching
`#{grep} -a -F -f /lkp/printk-error-messages #{kmsg_file} |
grep -a -v -E -f #{LKP_SRC}/etc/oops-pattern |
grep -a -v -F -f #{LKP_SRC}/etc/kmsg-blacklist`
[1] https://github.com/fengguang/lkp-tests/blob/master/lib/dmesg.rb
Link: http://lkml.kernel.org/r/20171221054149.4398-1-sergey.senozhatsky@gmail.com
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Kevin Hilman <khilman@baylibre.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2017-12-21 14:41:49 +09:00
console_msg_format=
[KNL] Change console messages format
default
By default we print messages on consoles in
"[time stamp] text\n" format (time stamp may not be
printed, depending on CONFIG_PRINTK_TIME or
`printk_time' param).
syslog
Switch to syslog format: "<%u>[time stamp] text\n"
IOW, each message will have a facility and loglevel
prefix. The format is similar to one used by syslog()
syscall, or to executing "dmesg -S --raw" or to reading
from /proc/kmsg.
2009-06-16 15:33:52 -07:00
consoleblank= [KNL] The console blank (screen saver) timeout in
2017-09-18 22:21:25 -07:00
seconds. A value of 0 disables the blank timer.
2018-04-18 20:51:39 +02:00
Defaults to 0.
2009-06-16 15:33:52 -07:00
2009-01-06 14:42:47 -08:00
coredump_filter=
[KNL] Change the default value for
/proc/<pid>/coredump_filter.
2020-04-02 19:26:14 +02:00
See also Documentation/filesystems/proc.rst.
2009-01-06 14:42:47 -08:00
2017-06-05 14:15:12 -06:00
coresight_cpu_debug.enable
[ARM,ARM64]
Format: <bool>
Enable/disable the CPU sampling based debugging.
0: default value, disable debugging
1: enable debugging at boot time
2011-04-01 18:13:10 -04:00
cpuidle.off=1 [CPU_IDLE]
disable the cpuidle sub-system
2018-12-05 23:45:34 +01:00
cpuidle.governor=
[CPU_IDLE] Name of the cpuidle governor to use.
2017-02-28 16:44:16 -05:00
cpufreq.off=1 [CPU_FREQ]
disable the cpufreq sub-system
2015-05-11 17:27:09 -04:00
cpu_init_udelay=N
[X86] Delay for N microsec between assert and de-assert
of APIC INIT to start processors. This delay occurs
on every CPU online, such as boot, and resume from suspend.
Default: 10000
2005-04-16 15:20:36 -07:00
cpcihp_generic= [HW,PCI] Generic port I/O CompactPCI driver
2005-10-23 12:57:11 -07:00
Format:
<first_slot>,<last_slot>,<port>,<enum_bit>[,<debug>]
2005-04-16 15:20:36 -07:00
2011-02-20 20:08:35 -08:00
crashkernel=size[KMG][@offset[KMG]]
[KNL] Using kexec, Linux can switch to a 'crash kernel'
upon panic. This parameter reserves the physical
memory region [offset, offset + size] for that kernel
image. If '@offset' is omitted, then a suitable offset
2019-04-22 11:19:05 +08:00
is selected automatically.
[KNL, x86_64] select a region under 4G first, and
fall back to reserve region above 4G when '@offset'
hasn't been specified.
2019-06-13 15:21:39 -03:00
See Documentation/admin-guide/kdump/kdump.rst for further details.
2005-06-25 14:57:52 -07:00
2007-10-18 23:41:02 -07:00
crashkernel=range1:size1[,range2:size2,...][@offset]
[KNL] Same as above, but depends on the memory
in the running system. The syntax of range is
start-[end] where start and end are both
a memory unit (amount[KMG]). See also
2019-06-13 15:21:39 -03:00
Documentation/admin-guide/kdump/kdump.rst for an example.
2007-10-18 23:41:02 -07:00
x86, kdump: Change crashkernel_high/low= to crashkernel=,high/low
Per hpa, use crashkernel=X,high crashkernel=Y,low instead of
crashkernel_hign=X crashkernel_low=Y. As that could be extensible.
-v2: according to Vivek, change delimiter to ;
-v3: let hign and low only handle simple form and it conforms to
description in kernel-parameters.txt
still keep crashkernel=X override any crashkernel=X,high
crashkernel=Y,low
-v4: update get_last_crashkernel returning and add more strict
checking in parse_crashkernel_simple() found by HATAYAMA.
-v5: Change delimiter back to , according to HPA.
also separate parse_suffix from parse_simper according to vivek.
so we can avoid @pos in that path.
-v6: Tight the checking about crashkernel=X,highblahblah,high
found by HTYAYAMA.
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1366089828-19692-5-git-send-email-yinghai@kernel.org
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-15 22:23:48 -07:00
crashkernel=size[KMG],high
2013-04-15 22:23:47 -07:00
[KNL, x86_64] range could be above 4G. Allow kernel
to allocate physical memory region from top, so could
be above 4G if system have more than 4G ram installed.
Otherwise memory region will be allocated below 4G, if
available.
It will be ignored if crashkernel=X is specified.
x86, kdump: Change crashkernel_high/low= to crashkernel=,high/low
Per hpa, use crashkernel=X,high crashkernel=Y,low instead of
crashkernel_hign=X crashkernel_low=Y. As that could be extensible.
-v2: according to Vivek, change delimiter to ;
-v3: let hign and low only handle simple form and it conforms to
description in kernel-parameters.txt
still keep crashkernel=X override any crashkernel=X,high
crashkernel=Y,low
-v4: update get_last_crashkernel returning and add more strict
checking in parse_crashkernel_simple() found by HATAYAMA.
-v5: Change delimiter back to , according to HPA.
also separate parse_suffix from parse_simper according to vivek.
so we can avoid @pos in that path.
-v6: Tight the checking about crashkernel=X,highblahblah,high
found by HTYAYAMA.
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1366089828-19692-5-git-send-email-yinghai@kernel.org
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-15 22:23:48 -07:00
crashkernel=size[KMG],low
[KNL, x86_64] range under 4G. When crashkernel=X,high
is passed, kernel could allocate physical memory region
2013-04-15 22:23:45 -07:00
above 4G, that cause second kernel crash on system
that require some amount of low memory, e.g. swiotlb
2015-09-24 16:51:25 +08:00
requires at least 64M+32K low memory, also enough extra
low memory is needed to make sure DMA buffers for 32-bit
devices won't run out. Kernel would try to allocate at
at least 256M below 4G automatically.
2013-04-15 22:23:45 -07:00
This one let user to specify own low range under 4G
for second kernel instead.
0: to disable low allocation.
x86, kdump: Change crashkernel_high/low= to crashkernel=,high/low
Per hpa, use crashkernel=X,high crashkernel=Y,low instead of
crashkernel_hign=X crashkernel_low=Y. As that could be extensible.
-v2: according to Vivek, change delimiter to ;
-v3: let hign and low only handle simple form and it conforms to
description in kernel-parameters.txt
still keep crashkernel=X override any crashkernel=X,high
crashkernel=Y,low
-v4: update get_last_crashkernel returning and add more strict
checking in parse_crashkernel_simple() found by HATAYAMA.
-v5: Change delimiter back to , according to HPA.
also separate parse_suffix from parse_simper according to vivek.
so we can avoid @pos in that path.
-v6: Tight the checking about crashkernel=X,highblahblah,high
found by HTYAYAMA.
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1366089828-19692-5-git-send-email-yinghai@kernel.org
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-15 22:23:48 -07:00
It will be ignored when crashkernel=X,high is not used
2013-04-15 22:23:47 -07:00
or memory reserved is below 4G.
2013-04-15 22:23:45 -07:00
2016-05-03 10:00:17 +01:00
cryptomgr.notests
2018-04-18 20:51:39 +02:00
[KNL] Disable crypto self-tests
2016-05-03 10:00:17 +01:00
2005-04-16 15:20:36 -07:00
cs89x0_dma= [HW,NET]
Format: <dma>
cs89x0_media= [HW,NET]
Format: { rj45 | aui | bnc }
2005-10-23 12:57:11 -07:00
dasd= [HW,NET]
2005-04-16 15:20:36 -07:00
See header of drivers/s390/block/dasd_devmap.c.
db9.dev[2|3]= [HW,JOY] Multisystem joystick support via parallel port
(one device per port)
Format: <port#>,<type>
2017-10-10 12:36:23 -05:00
See also Documentation/input/devices/joystick-parport.rst
2005-04-16 15:20:36 -07:00
2018-04-18 20:51:39 +02:00
ddebug_query= [KNL,DYNAMIC_DEBUG] Enable debug messages at early boot
2017-06-14 12:24:12 +02:00
time. See
Documentation/admin-guide/dynamic-debug-howto.rst for
2012-04-27 14:30:41 -06:00
details. Deprecated, see dyndbg.
2010-08-06 16:11:02 +02:00
2005-04-16 15:20:36 -07:00
debug [KNL] Enable kernel debugging (events log level).
2018-06-22 09:15:34 +10:00
debug_boot_weak_hash
[KNL] Enable printing [hashed] pointers early in the
boot sequence. If enabled, we use a weak hash instead
of siphash to hash pointers. Use this option if you are
seeing instances of '(___ptrval___)') and need to see a
value (hashed pointer) instead. Cryptographically
insecure, please do not use on production kernels.
2006-07-03 00:24:48 -07:00
debug_locks_verbose=
[KNL] verbose self-tests
Format=<0|1>
Print debugging info while doing the locking API
self-tests.
We default to 0 (no extra messages), setting it to
1 will print _a lot_ more information - normally
only useful to kernel developers.
2008-04-30 00:55:01 -07:00
debug_objects [KNL] Enable object debugging
2009-03-01 20:41:41 -05:00
no_debug_objects
[KNL] Disable object debugging
2012-01-10 15:07:28 -08:00
debug_guardpage_minorder=
[KNL] When CONFIG_DEBUG_PAGEALLOC is set, this
parameter allows control of the order of pages that will
be intentionally kept free (and hence protected) by the
buddy allocator. Bigger value increase the probability
of catching random memory corruption, but reduce the
amount of memory for normal system use. The maximum
possible value is MAX_ORDER/2. Setting this parameter
to 1 or 2 should be enough to identify most random
memory corruption problems caused by bugs in kernel or
driver code when a CPU writes to (or reads from) a
random memory location. Note that there exists a class
of memory corruptions problems caused by buggy H/W or
F/W or by drivers badly programing DMA (basically when
memory is written at bus level and the CPU MMU is
bypassed) which are not detectable by
CONFIG_DEBUG_PAGEALLOC, hence this option will not help
tracking down these problems.
2014-12-12 16:55:52 -08:00
debug_pagealloc=
2019-07-11 20:55:13 -07:00
[KNL] When CONFIG_DEBUG_PAGEALLOC is set, this parameter
enables the feature at boot time. By default, it is
disabled and the system will work mostly the same as a
kernel built without CONFIG_DEBUG_PAGEALLOC.
mm, page_owner, debug_pagealloc: save and dump freeing stack trace
The debug_pagealloc functionality is useful to catch buggy page allocator
users that cause e.g. use after free or double free. When page
inconsistency is detected, debugging is often simpler by knowing the call
stack of process that last allocated and freed the page. When page_owner
is also enabled, we record the allocation stack trace, but not freeing.
This patch therefore adds recording of freeing process stack trace to page
owner info, if both page_owner and debug_pagealloc are configured and
enabled. With only page_owner enabled, this info is not useful for the
memory leak debugging use case. dump_page() is adjusted to print the
info. An example result of calling __free_pages() twice may look like
this (note the page last free stack trace):
BUG: Bad page state in process bash pfn:13d8f8
page:ffffc31984f63e00 refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x1affff800000000()
raw: 01affff800000000 dead000000000100 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000
page dumped because: nonzero _refcount
page_owner tracks the page as freed
page last allocated via order 0, migratetype Unmovable, gfp_mask 0xcc0(GFP_KERNEL)
prep_new_page+0x143/0x150
get_page_from_freelist+0x289/0x380
__alloc_pages_nodemask+0x13c/0x2d0
khugepaged+0x6e/0xc10
kthread+0xf9/0x130
ret_from_fork+0x3a/0x50
page last free stack trace:
free_pcp_prepare+0x134/0x1e0
free_unref_page+0x18/0x90
khugepaged+0x7b/0xc10
kthread+0xf9/0x130
ret_from_fork+0x3a/0x50
Modules linked in:
CPU: 3 PID: 271 Comm: bash Not tainted 5.3.0-rc4-2.g07a1a73-default+ #57
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0x85/0xc0
bad_page.cold+0xba/0xbf
rmqueue_pcplist.isra.0+0x6c5/0x6d0
rmqueue+0x2d/0x810
get_page_from_freelist+0x191/0x380
__alloc_pages_nodemask+0x13c/0x2d0
__get_free_pages+0xd/0x30
__pud_alloc+0x2c/0x110
copy_page_range+0x4f9/0x630
dup_mmap+0x362/0x480
dup_mm+0x68/0x110
copy_process+0x19e1/0x1b40
_do_fork+0x73/0x310
__x64_sys_clone+0x75/0x80
do_syscall_64+0x6e/0x1e0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f10af854a10
...
Link: http://lkml.kernel.org/r/20190820131828.22684-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-23 15:34:42 -07:00
Note: to get most of debug_pagealloc error reports, it's
useful to also enable the page_owner functionality.
2014-12-12 16:55:52 -08:00
on: enable the feature
2008-07-15 15:04:56 +02:00
debugpat [X86] Enable PAT debugging
2008-02-03 15:18:45 +02:00
decnet.addr= [HW,NET]
2005-04-16 15:20:36 -07:00
Format: <area>[,<node>]
2020-04-28 00:01:30 +02:00
See also Documentation/networking/decnet.rst.
2005-04-16 15:20:36 -07:00
2009-04-05 15:55:22 -07:00
default_hugepagesz=
2020-06-03 16:00:46 -07:00
[HW] The size of the default HugeTLB page. This is
the size represented by the legacy /proc/ hugepages
APIs. In addition, this is the default hugetlb size
used for shmget(), mmap() and mounting hugetlbfs
filesystems. If not specified, defaults to the
architecture's default huge page size. Huge page
sizes are architecture dependent. See also
Documentation/admin-guide/mm/hugetlbpage.rst.
Format: size[KMG]
2007-05-08 00:38:53 -07:00
2018-07-09 09:41:48 -06:00
deferred_probe_timeout=
[KNL] Debugging option to set a timeout in seconds for
deferred probe to give up waiting on dependencies to
probe. Only specific dependencies (subsystems or
drivers) that have opted in will be ignored. A timeout of 0
will timeout at the end of initcalls. This option will also
dump out devices still on the deferred probe list after
retrying.
2020-01-30 22:16:27 -08:00
dfltcc= [HW,S390]
Format: { on | off | def_only | inf_only | always }
on: s390 zlib hardware support for compression on
level 1 and decompression (default)
off: No s390 zlib hardware support
def_only: s390 zlib hardware support for deflate
only (compression on level 1)
inf_only: s390 zlib hardware support for inflate
only (decompression)
always: Same as 'on' but ignores the selected compression
level always using hardware support (used for debugging)
2005-04-16 15:20:36 -07:00
dhash_entries= [KNL]
Set number of hash buckets for dentry cache.
2005-10-23 12:57:11 -07:00
2016-07-05 11:43:21 +10:00
disable_1tb_segments [PPC]
Disables the use of 1TB hash page table segments. This
causes the kernel to fall back to 256MB segments which
can be useful when debugging issues that require an SLB
miss to occur.
2020-05-11 22:58:24 +10:00
stress_slb [PPC]
Limits the number of kernel SLB entries, and flushes
them frequently to increase the rate of SLB faults
on kernel addresses.
2010-02-04 13:36:50 -08:00
disable= [IPV6]
2020-04-28 00:01:50 +02:00
See Documentation/networking/ipv6.rst.
2010-02-04 13:36:50 -08:00
2018-07-03 15:43:08 -04:00
hardened_usercopy=
[KNL] Under CONFIG_HARDENED_USERCOPY, whether
hardening is enabled for this boot. Hardened
usercopy checking is used to protect the kernel
from reading or writing beyond known memory
allocation boundaries as a proactive defense
against bounds-checking flaws in the kernel's
copy_to_user()/copy_from_user() interface.
on Perform hardened usercopy checks (default).
off Disable hardened usercopy checks.
2016-07-13 15:05:31 +05:30
disable_radix [PPC]
Disable RADIX MMU mode on POWER9
2019-09-03 01:29:31 +10:00
disable_tlbie [PPC]
Disable TLBIE instruction. Currently does not work
with KVM, with HASH MMU, or with coherent accelerators.
x86, apic, kexec: Add disable_cpu_apicid kernel parameter
Add disable_cpu_apicid kernel parameter. To use this kernel parameter,
specify an initial APIC ID of the corresponding CPU you want to
disable.
This is mostly used for the kdump 2nd kernel to disable BSP to wake up
multiple CPUs without causing system reset or hang due to sending INIT
from AP to BSP.
Kdump users first figure out initial APIC ID of the BSP, CPU0 in the
1st kernel, for example from /proc/cpuinfo and then set up this kernel
parameter for the 2nd kernel using the obtained APIC ID.
However, doing this procedure at each boot time manually is awkward,
which should be automatically done by user-land service scripts, for
example, kexec-tools on fedora/RHEL distributions.
This design is more flexible than disabling BSP in kernel boot time
automatically in that in kernel boot time we have no choice but
referring to ACPI/MP table to obtain initial APIC ID for BSP, meaning
that the method is not applicable to the systems without such BIOS
tables.
One assumption behind this design is that users get initial APIC ID of
the BSP in still healthy state and so BSP is uniquely kept in
CPU0. Thus, through the kernel parameter, only one initial APIC ID can
be specified.
In a comparison with disabled_cpu_apicid, we use read_apic_id(), not
boot_cpu_physical_apicid, because on some platforms, the variable is
modified to the apicid reported as BSP through MP table and this
function is executed with the temporarily modified
boot_cpu_physical_apicid. As a result, disabled_cpu_apicid kernel
parameter doesn't work well for apicids of APs.
Fixing the wrong handling of boot_cpu_physical_apicid requires some
reviews and tests beyond some platforms and it could take some
time. The fix here is a kind of workaround to focus on the main topic
of this patch.
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/20140115064458.1545.38775.stgit@localhost6.localdomain6
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-01-15 15:44:58 +09:00
disable_cpu_apicid= [X86,APIC,SMP]
Format: <int>
The number of initial APIC ID for the
corresponding CPU to be disabled at boot,
mostly used for the kdump 2nd kernel to
disable BSP to wake up multiple CPUs without
causing system reset or hang due to sending
INIT from AP to BSP.
2018-11-20 18:08:42 +01:00
perf_v4_pmi= [X86,INTEL]
Format: <bool>
2018-08-08 00:12:07 -07:00
Disable Intel PMU counter freezing feature.
The feature only exists starting from
Arch Perfmon v4 (Skylake and newer).
2018-04-18 20:51:39 +02:00
disable_ddw [PPC/PSERIES]
2011-02-10 09:10:47 +00:00
Disable Dynamic DMA Window support. Use this if
to workaround buggy firmware.
2010-02-04 13:36:50 -08:00
disable_ipv6= [IPV6]
2020-04-28 00:01:50 +02:00
See Documentation/networking/ipv6.rst.
2010-02-04 13:36:50 -08:00
2008-04-29 03:52:33 -07:00
disable_mtrr_cleanup [X86]
The kernel tries to adjust MTRR layout from continuous
to discrete, to make X server driver able to add WB
2009-04-05 15:55:22 -07:00
entry later. This parameter disables that.
2008-04-29 03:52:33 -07:00
2008-01-30 13:33:32 +01:00
disable_mtrr_trim [X86, Intel and AMD only]
x86, 32-bit: trim memory not covered by wb mtrrs
On some machines, buggy BIOSes don't properly setup WB MTRRs to cover all
available RAM, meaning the last few megs (or even gigs) of memory will be
marked uncached. Since Linux tends to allocate from high memory addresses
first, this causes the machine to be unusably slow as soon as the kernel
starts really using memory (i.e. right around init time).
This patch works around the problem by scanning the MTRRs at boot and
figuring out whether the current end_pfn value (setup by early e820 code)
goes beyond the highest WB MTRR range, and if so, trimming it to match. A
fairly obnoxious KERN_WARNING is printed too, letting the user know that
not all of their memory is available due to a likely BIOS bug.
Something similar could be done on i386 if needed, but the boot ordering
would be slightly different, since the MTRR code on i386 depends on the
boot_cpu_data structure being setup.
This patch fixes a bug in the last patch that caused the code to run on
non-Intel machines (AMD machines apparently don't need it and it's untested
on other non-Intel machines, so best keep it off).
Further enhancements and fixes from:
Yinghai Lu <Yinghai.Lu@Sun.COM>
Andi Kleen <ak@suse.de>
Signed-off-by: Jesse Barnes <jesse.barnes@intel.com>
Tested-by: Justin Piszcz <jpiszcz@lucidpixels.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:18 +01:00
By default the kernel will trim any uncacheable
memory out of your available memory pool based on
MTRR settings. This parameter disables that behavior,
possibly causing your machine to run very slowly.
2009-04-14 14:03:43 +05:30
disable_timer_pin_1 [X86]
2009-04-05 15:55:22 -07:00
Disable PIN 1 of APIC timer
Can be useful to work around chipset bugs.
2015-08-25 13:34:53 -04:00
dis_ucode_ldr [X86] Disable the microcode loader.
2009-04-05 15:55:22 -07:00
dma_debug=off If the kernel is compiled with DMA_API_DEBUG support,
this option disables the debugging code at boot.
dma_debug_entries=<number>
This option allows to tune the number of preallocated
entries for DMA-API debugging code. One entry is
required per DMA-API allocation. Use this if the
DMA-API debugging code disables itself because the
architectural default is too low.
2009-05-22 21:49:51 +02:00
dma_debug_driver=<driver_name>
With this option the DMA-API debugging driver
filter feature can be enabled at boot time. Just
pass the driver to filter for as the parameter.
The filter can be disabled or changed to another
driver later using sysfs.
2019-02-13 15:47:36 +08:00
driver_async_probe= [KNL]
List of driver names to be probed asynchronously.
Format: <driver_name1>,<driver_name2>...
drm: handle override and firmware EDID at drm_do_get_edid() level
Handle debugfs override edid and firmware edid at the low level to
transparently and completely replace the real edid. Previously, we
practically only used the modes from the override EDID, and none of the
other data, such as audio parameters.
This change also prevents actual EDID reads when the EDID is to be
overridden, but retains the DDC probe. This is useful if the reason for
preferring override EDID are problems with reading the data, or
corruption of the data.
Move firmware EDID loading from helper to core, as the functionality
moves to lower level as well. This will result in a change of module
parameter from drm_kms_helper.edid_firmware to drm.edid_firmware, which
arguably makes more sense anyway.
Some future work remains related to override and firmware EDID
validation. Like before, no validation is done for override EDID. The
firmware EDID is validated separately in the loader. Some unification
and deduplication would be in order, to validate all of them at the
drm_do_get_edid() level, like "real" EDIDs.
v2: move firmware loading to core
v3: rebase, commit message refresh
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Dave Airlie <airlied@gmail.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1e8a710bcac46e5136c1a7b430074893c81f364a.1505203831.git.jani.nikula@intel.com
2017-09-12 11:19:26 +03:00
drm.edid_firmware=[<connector>:]<file>[,[<connector>:]<file>]
2015-08-27 10:04:13 -07:00
Broken monitors, graphic adapters, KVMs and EDIDless
panels may send no or incorrect EDID data sets.
This parameter allows to specify an EDID data sets
in the /lib/firmware directory that are used instead.
2012-03-18 22:37:33 +01:00
Generic built-in EDID data sets are used, if one of
edid/1024x768.bin, edid/1280x1024.bin,
edid/1680x1050.bin, or edid/1920x1080.bin is given
and no file with the same name exists. Details and
instructions how to build your own EDID data are
2020-04-02 19:26:14 +02:00
available in Documentation/admin-guide/edid.rst. An EDID
2012-03-18 22:37:33 +01:00
data set will only be used for a particular connector,
if its name and a colon are prepended to the EDID
2015-08-27 10:04:13 -07:00
name. Each connector may use a unique EDID data
set by separating the files with a comma. An EDID
data set with no connector name will be used for
any connectors not explicitly specified.
2012-03-18 22:37:33 +01:00
2005-04-16 15:20:36 -07:00
dscc4.setup= [NET]
2017-05-11 21:24:41 +10:00
dt_cpu_ftrs= [PPC]
Format: {"off" | "known"}
Control how the dt_cpu_ftrs device-tree binding is
used for CPU feature discovery and setup (if it
exists).
off: Do not use it, fall back to legacy cpu table.
known: Do not pass through unknown features to guests
or userspace, only those that the kernel is aware of.
x86/efi: Retrieve and assign Apple device properties
Apple's EFI drivers supply device properties which are needed to support
Macs optimally. They contain vital information which cannot be obtained
any other way (e.g. Thunderbolt Device ROM). They're also used to convey
the current device state so that OS drivers can pick up where EFI
drivers left (e.g. GPU mode setting).
There's an EFI driver dubbed "AAPL,PathProperties" which implements a
per-device key/value store. Other EFI drivers populate it using a custom
protocol. The macOS bootloader /System/Library/CoreServices/boot.efi
retrieves the properties with the same protocol. The kernel extension
AppleACPIPlatform.kext subsequently merges them into the I/O Kit
registry (see ioreg(8)) where they can be queried by other kernel
extensions and user space.
This commit extends the efistub to retrieve the device properties before
ExitBootServices is called. It assigns them to devices in an fs_initcall
so that they can be queried with the API in <linux/property.h>.
Note that the device properties will only be available if the kernel is
booted with the efistub. Distros should adjust their installers to
always use the efistub on Macs. grub with the "linux" directive will not
work unless the functionality of this commit is duplicated in grub.
(The "linuxefi" directive should work but is not included upstream as of
this writing.)
The custom protocol has GUID 91BD12FE-F6C3-44FB-A5B7-5122AB303AE0 and
looks like this:
typedef struct {
unsigned long version; /* 0x10000 */
efi_status_t (*get) (
IN struct apple_properties_protocol *this,
IN struct efi_dev_path *device,
IN efi_char16_t *property_name,
OUT void *buffer,
IN OUT u32 *buffer_len);
/* EFI_SUCCESS, EFI_NOT_FOUND, EFI_BUFFER_TOO_SMALL */
efi_status_t (*set) (
IN struct apple_properties_protocol *this,
IN struct efi_dev_path *device,
IN efi_char16_t *property_name,
IN void *property_value,
IN u32 property_value_len);
/* allocates copies of property name and value */
/* EFI_SUCCESS, EFI_OUT_OF_RESOURCES */
efi_status_t (*del) (
IN struct apple_properties_protocol *this,
IN struct efi_dev_path *device,
IN efi_char16_t *property_name);
/* EFI_SUCCESS, EFI_NOT_FOUND */
efi_status_t (*get_all) (
IN struct apple_properties_protocol *this,
OUT void *buffer,
IN OUT u32 *buffer_len);
/* EFI_SUCCESS, EFI_BUFFER_TOO_SMALL */
} apple_properties_protocol;
Thanks to Pedro Vilaça for this blog post which was helpful in reverse
engineering Apple's EFI drivers and bootloader:
https://reverse.put.as/2016/06/25/apple-efi-firmware-passwords-and-the-scbo-myth/
If someone at Apple is reading this, please note there's a memory leak
in your implementation of the del() function as the property struct is
freed but the name and value allocations are not.
Neither the macOS bootloader nor Apple's EFI drivers check the protocol
version, but we do to avoid breakage if it's ever changed. It's been the
same since at least OS X 10.6 (2009).
The get_all() function conveniently fills a buffer with all properties
in marshalled form which can be passed to the kernel as a setup_data
payload. The number of device properties is dynamic and can change
between a first invocation of get_all() (to determine the buffer size)
and a second invocation (to retrieve the actual buffer), hence the
peculiar loop which does not finish until the buffer size settles.
The macOS bootloader does the same.
The setup_data payload is later on unmarshalled in an fs_initcall. The
idea is that most buses instantiate devices in "subsys" initcall level
and drivers are usually bound to these devices in "device" initcall
level, so we assign the properties in-between, i.e. in "fs" initcall
level.
This assumes that devices to which properties pertain are instantiated
from a "subsys" initcall or earlier. That should always be the case
since on macOS, AppleACPIPlatformExpert::matchEFIDevicePath() only
supports ACPI and PCI nodes and we've fully scanned those buses during
"subsys" initcall level.
The second assumption is that properties are only needed from a "device"
initcall or later. Seems reasonable to me, but should this ever not work
out, an alternative approach would be to store the property sets e.g. in
a btree early during boot. Then whenever device_add() is called, an EFI
Device Path would have to be constructed for the newly added device,
and looked up in the btree. That way, the property set could be assigned
to the device immediately on instantiation. And this would also work for
devices instantiated in a deferred fashion. It seems like this approach
would be more complicated and require more code. That doesn't seem
justified without a specific use case.
For comparison, the strategy on macOS is to assign properties to objects
in the ACPI namespace (AppleACPIPlatformExpert::mergeEFIProperties()).
That approach is definitely wrong as it fails for devices not present in
the namespace: The NHI EFI driver supplies properties for attached
Thunderbolt devices, yet on Macs with Thunderbolt 1 only one device
level behind the host controller is described in the namespace.
Consequently macOS cannot assign properties for chained devices. With
Thunderbolt 2 they started to describe three device levels behind host
controllers in the namespace but this grossly inflates the SSDT and
still fails if the user daisy-chained more than three devices.
We copy the property names and values from the setup_data payload to
swappable virtual memory and afterwards make the payload available to
the page allocator. This is just for the sake of good housekeeping, it
wouldn't occupy a meaningful amount of physical memory (4444 bytes on my
machine). Only the payload is freed, not the setup_data header since
otherwise we'd break the list linkage and we cannot safely update the
predecessor's ->next link because there's no locking for the list.
The payload is currently not passed on to kexec'ed kernels, same for PCI
ROMs retrieved by setup_efi_pci(). This can be added later if there is
demand by amending setup_efi_state(). The payload can then no longer be
made available to the page allocator of course.
Tested-by: Lukas Wunner <lukas@wunner.de> [MacBookPro9,1]
Tested-by: Pierre Moreau <pierre.morrow@free.fr> [MacBookPro11,3]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andreas Noever <andreas.noever@gmail.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pedro Vilaça <reverser@put.as>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: grub-devel@gnu.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20161112213237.8804-9-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-12 21:32:36 +00:00
dump_apple_properties [X86]
Dump name and content of EFI device properties on
x86 Macs. Useful for driver authors to determine
what data is available or for reverse-engineering.
2012-04-27 14:30:41 -06:00
dyndbg[="val"] [KNL,DYNAMIC_DEBUG]
module.dyndbg[="val"]
Enable debug messages at boot time. See
2017-06-14 12:24:12 +02:00
Documentation/admin-guide/dynamic-debug-howto.rst
for details.
2012-04-27 14:30:41 -06:00
2016-02-12 13:02:29 -08:00
nopku [X86] Disable Memory Protection Keys CPU feature found
in some Intel CPUs.
2015-03-30 16:20:05 -07:00
module.async_probe [KNL]
Enable asynchronous probe on this module.
2014-04-07 15:39:53 -07:00
early_ioremap_debug [KNL]
Enable debug messages in early_ioremap support. This
is useful for tracking down temporary early mappings
which are not unmapped.
2009-04-05 15:55:22 -07:00
earlycon= [KNL] Output early console device and options.
2014-04-18 17:19:57 -05:00
2019-09-17 09:15:23 +02:00
When used with no options, the early console is
determined by stdout-path property in device tree's
chosen node or the ACPI SPCR table if supported by
the platform.
2015-09-14 19:54:07 -05:00
2016-09-22 16:58:16 +01:00
cdns,<addr>[,options]
Start an early, polled-mode console on a Cadence
(xuartps) serial port at the specified address. Only
supported option is baud rate. If baud rate is not
specified, the serial port must already be setup and
configured.
2014-09-10 12:43:02 +02:00
2009-04-05 15:55:22 -07:00
uart[8250],io,<addr>[,options]
uart[8250],mmio,<addr>[,options]
2010-07-20 15:26:51 -07:00
uart[8250],mmio32,<addr>[,options]
2015-05-25 06:54:28 +03:00
uart[8250],mmio32be,<addr>[,options]
earlycon: 8250: Document kernel command line options
Document the expected behavior of kernel command lines of the forms:
console=uart[8250],io|mmio|mmio32,<addr>[,options]
console=uart[8250],<addr>[,options]
and
earlycon=uart[8250],io|mmio|mmio32,<addr>[,options]
earlycon=uart[8250],<addr>[,options]
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-06 10:52:39 -04:00
uart[8250],0x<addr>[,options]
2009-04-05 15:55:22 -07:00
Start an early, polled-mode console on the 8250/16550
UART at the specified I/O port or MMIO address.
2011-08-13 12:34:52 -07:00
MMIO inter-register address stride is either 8-bit
2015-05-25 06:54:28 +03:00
(mmio) or 32-bit (mmio32 or mmio32be).
If none of [io|mmio|mmio32|mmio32be], <addr> is assumed
to be equivalent to 'mmio'. 'options' are specified
in the same format described for "console=ttyS<n>"; if
earlycon: 8250: Document kernel command line options
Document the expected behavior of kernel command lines of the forms:
console=uart[8250],io|mmio|mmio32,<addr>[,options]
console=uart[8250],<addr>[,options]
and
earlycon=uart[8250],io|mmio|mmio32,<addr>[,options]
earlycon=uart[8250],<addr>[,options]
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-06 10:52:39 -04:00
unspecified, the h/w is not initialized.
2009-04-05 15:55:22 -07:00
2014-04-18 17:19:57 -05:00
pl011,<addr>
2016-01-04 15:37:42 -06:00
pl011,mmio32,<addr>
2014-04-18 17:19:57 -05:00
Start an early, polled-mode console on a pl011 serial
port at the specified address. The pl011 serial port
must already be setup and configured. Options are not
2016-01-04 15:37:42 -06:00
yet supported. If 'mmio32' is specified, then only
the driver will use only 32-bit accessors to read/write
the device registers.
2014-04-18 17:19:57 -05:00
2016-03-06 12:21:24 +01:00
meson,<addr>
Start an early, polled-mode console on a meson serial
port at the specified address. The serial port must
already be setup and configured. Options are not yet
supported.
2014-09-15 17:22:51 -07:00
msm_serial,<addr>
Start an early, polled-mode console on an msm serial
port at the specified address. The serial port
must already be setup and configured. Options are not
yet supported.
msm_serial_dm,<addr>
Start an early, polled-mode console on an msm serial
dm port at the specified address. The serial port
must already be setup and configured. Options are not
yet supported.
2017-06-19 03:46:40 +02:00
owl,<addr>
Start an early, polled-mode console on a serial port
of an Actions Semi SoC, such as S500 or S900, at the
specified address. The serial port must already be
setup and configured. Options are not yet supported.
2018-12-18 20:32:37 +05:30
rda,<addr>
Start an early, polled-mode console on a serial port
of an RDA Micro SoC, such as RDA8810PL, at the
specified address. The serial port must already be
2017-06-19 03:46:40 +02:00
setup and configured. Options are not yet supported.
2019-09-13 13:38:43 -07:00
sbi
Use RISC-V SBI (Supervisor Binary Interface) for early
console.
2014-04-18 17:19:58 -05:00
smh Use ARM semihosting calls for early console.
2015-01-23 14:47:41 +01:00
s3c2410,<addr>
s3c2412,<addr>
s3c2440,<addr>
s3c6400,<addr>
s5pv210,<addr>
exynos4210,<addr>
Use early console provided by serial driver available
on Samsung SoCs, requires selecting proper type and
a correct base address of the selected UART port. The
serial port must already be setup and configured.
Options are not yet supported.
2016-12-11 21:42:23 +01:00
lantiq,<addr>
Start an early, polled-mode console on a lantiq serial
(lqasc) port at the specified address. The serial port
must already be setup and configured. Options are not
yet supported.
2015-10-17 00:45:55 -07:00
lpuart,<addr>
lpuart32,<addr>
Use early console provided by Freescale LP UART driver
found on Freescale Vybrid and QorIQ LS1021A processors.
A valid base address must be provided, and the serial
port must already be setup and configured.
2020-02-29 14:27:48 +01:00
ec_imx21,<addr>
ec_imx6q,<addr>
Start an early, polled-mode, output-only console on the
Freescale i.MX UART at the specified address. The UART
must already be setup and configured.
2017-05-04 00:49:36 +01:00
ar3700_uart,<addr>
2016-02-16 19:14:53 +01:00
Start an early, polled-mode console on the
Armada 3700 serial port at the specified
address. The serial port must already be setup
and configured. Options are not yet supported.
2018-05-03 14:14:40 -06:00
qcom_geni,<addr>
Start an early, polled-mode console on a Qualcomm
Generic Interface (GENI) based serial port at the
specified address. The serial port must already be
setup and configured. Options are not yet supported.
2019-02-02 10:41:18 +01:00
efifb,[options]
Start an early, unaccelerated console on the EFI
memory mapped framebuffer (if available). On cache
coherent non-x86 systems that use system memory for
the framebuffer, pass the 'ram' option so that it is
mapped with the correct attributes.
2019-08-09 11:29:16 +00:00
linflex,<addr>
2019-10-16 15:48:25 +03:00
Use early console provided by Freescale LINFlexD UART
2019-08-09 11:29:16 +00:00
serial driver for NXP S32V234 SoCs. A valid base
address must be provided, and the serial port must
already be setup and configured.
2018-03-07 22:23:24 +01:00
earlyprintk= [X86,SH,ARM,M68k,S390]
2005-04-16 15:20:36 -07:00
earlyprintk=vga
2017-01-11 09:14:52 +01:00
earlyprintk=sclp
2013-02-25 15:54:08 -05:00
earlyprintk=xen
2005-04-16 15:20:36 -07:00
earlyprintk=serial[,ttySn[,baudrate]]
2013-04-10 14:03:38 -07:00
earlyprintk=serial[,0x...[,baudrate]]
2009-09-24 09:08:30 -05:00
earlyprintk=ttySn[,baudrate]
2009-08-20 15:39:57 -05:00
earlyprintk=dbgp[debugController#]
2018-10-03 00:49:21 +08:00
earlyprintk=pciserial[,force],bus:device.function[,baudrate]
2017-03-21 16:01:31 +08:00
earlyprintk=xdbc[xhciController#]
2005-04-16 15:20:36 -07:00
2013-04-10 14:03:38 -07:00
earlyprintk is useful when the kernel crashes before
the normal console is initialized. It is not enabled by
default because it has some cosmetic problems.
2005-10-23 12:57:11 -07:00
Append ",keep" to not disable it when the real console
2005-04-16 15:20:36 -07:00
takes over.
2013-10-04 09:36:56 +01:00
Only one of vga, efi, serial, or usb debug port can
be used at a time.
2005-04-16 15:20:36 -07:00
2013-04-10 14:03:38 -07:00
Currently only ttyS0 and ttyS1 may be specified by
name. Other I/O ports may be explicitly specified
on some architectures (x86 and arm at least) by
replacing ttySn with an I/O port address, like this:
earlyprintk=serial,0x1008,115200
You can find the port for a given device in
/proc/tty/driver/serial:
2: uart:ST16650V2 port:00001008 irq:18 ...
2005-04-16 15:20:36 -07:00
Interaction with the standard serial driver is not
very good.
2013-10-04 09:36:56 +01:00
The VGA and EFI output is eventually overwritten by
the real console.
2005-04-16 15:20:36 -07:00
2013-02-25 15:54:08 -05:00
The xen output can only be used by Xen PV guests.
2017-01-11 09:14:52 +01:00
The sclp output can only be used on s390.
2018-10-03 00:49:21 +08:00
The optional "force" to "pciserial" enables use of a
PCI device even when its classcode is not of the
UART class.
2013-12-06 01:17:08 -05:00
edac_report= [HW,EDAC] Control how to report EDAC event
Format: {"on" | "off" | "force"}
on: enable EDAC to report H/W event. May be overridden
by other higher priority error reporting module.
off: disable H/W event reporting through EDAC.
force: enforce the use of EDAC to report H/W event.
default: on.
2010-05-20 21:04:30 -05:00
ekgdboc= [X86,KGDB] Allow early kernel console debugging
ekgdboc=kbd
2011-03-30 22:57:33 -03:00
This is designed to be used in conjunction with
2010-05-20 21:04:30 -05:00
the boot argument: earlyprintk=vga
2020-05-07 13:08:47 -07:00
This parameter works in place of the kgdboc parameter
but can only be used if the backing tty is available
very early in the boot process. For early debugging
via a serial port see kgdboc_earlycon instead.
2005-04-16 15:20:36 -07:00
edd= [EDD]
2008-04-29 01:02:45 -07:00
Format: {"off" | "on" | "skip[mbr]"}
2005-04-16 15:20:36 -07:00
2013-10-31 17:25:08 +01:00
efi= [EFI]
2019-11-06 17:43:11 -08:00
Format: { "old_map", "nochunk", "noruntime", "debug",
2020-01-03 12:39:50 +01:00
"nosoftreserve", "disable_early_pci_dma",
"no_disable_early_pci_dma" }
2013-10-31 17:25:08 +01:00
old_map [X86-64]: switch to the old ioremap-based EFI
efi/x86: Limit EFI old memory map to SGI UV machines
We carry a quirk in the x86 EFI code to switch back to an older
method of mapping the EFI runtime services memory regions, because
it was deemed risky at the time to implement a new method without
providing a fallback to the old method in case problems arose.
Such problems did arise, but they appear to be limited to SGI UV1
machines, and so these are the only ones for which the fallback gets
enabled automatically (via a DMI quirk). The fallback can be enabled
manually as well, by passing efi=old_map, but there is very little
evidence that suggests that this is something that is being relied
upon in the field.
Given that UV1 support is not enabled by default by the distros
(Ubuntu, Fedora), there is no point in carrying this fallback code
all the time if there are no other users. So let's move it into the
UV support code, and document that efi=old_map now requires this
support code to be enabled.
Note that efi=old_map has been used in the past on other SGI UV
machines to work around kernel regressions in production, so we
keep the option to enable it by hand, but only if the kernel was
built with UV support.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-8-ardb@kernel.org
2020-01-13 18:22:39 +01:00
runtime services mapping. [Needs CONFIG_X86_UV=y]
2014-08-05 11:52:11 +01:00
nochunk: disable reading files in "chunks" in the EFI
boot stub, as chunking can cause problems with some
firmware implementations.
2014-08-14 17:15:28 +08:00
noruntime : disable EFI runtime services support
2015-02-05 11:44:41 +01:00
debug: enable misc debug output
2019-11-06 17:43:11 -08:00
nosoftreserve: The EFI_MEMORY_SP (Specific Purpose)
attribute may cause the kernel to reserve the
memory range for a memory mapping driver to
claim. Specify efi=nosoftreserve to disable this
reservation and treat the memory by its base type
(i.e. EFI_CONVENTIONAL_MEMORY / "System RAM").
2020-01-03 12:39:50 +01:00
disable_early_pci_dma: Disable the busmaster bit on all
PCI bridges while in the EFI boot stub
no_disable_early_pci_dma: Leave the busmaster bit set
on all PCI bridges while in the EFI boot stub
2013-10-31 17:25:08 +01:00
2013-04-17 01:00:53 +02:00
efi_no_storage_paranoia [EFI; X86]
Using this parameter you can use more than 50% of
your efi variable storage. Use this parameter only if
you are really sure that your UEFI does sane gc and
fulfills the spec otherwise your board may brick.
2015-09-30 23:01:56 +09:00
efi_fake_mem= nn[KMG]@ss[KMG]:aa[,nn[KMG]@ss[KMG]:aa,..] [EFI; X86]
Add arbitrary attribute to specific memory range by
updating original EFI memory map.
Region of memory which aa attribute is added to is
from ss to ss+nn.
2019-11-06 17:43:26 -08:00
2015-09-30 23:01:56 +09:00
If efi_fake_mem=2G@4G:0x10000,2G@0x10a0000000:0x10000
is specified, EFI_MEMORY_MORE_RELIABLE(0x10000)
attribute is added to range 0x100000000-0x180000000 and
0x10a0000000-0x1120000000.
2019-11-06 17:43:26 -08:00
If efi_fake_mem=8G@9G:0x40000 is specified, the
EFI_MEMORY_SP(0x40000) attribute is added to
range 0x240000000-0x43fffffff.
2015-09-30 23:01:56 +09:00
Using this parameter you can do debugging of EFI memmap
2019-11-06 17:43:26 -08:00
related features. For example, you can do debugging of
2015-09-30 23:01:56 +09:00
Address Range Mirroring feature even if your box
2019-11-06 17:43:26 -08:00
doesn't support it, or mark specific memory as
"soft reserved".
2015-09-30 23:01:56 +09:00
2016-07-08 19:13:12 +03:00
efivar_ssdt= [EFI; X86] Name of an EFI variable that contains an SSDT
that is to be dynamically loaded by Linux. If there are
multiple variables with the same name but with different
vendor GUIDs, all of them will be loaded. See
2019-06-07 15:54:32 -03:00
Documentation/admin-guide/acpi/ssdt-overlays.rst for details.
2016-07-08 19:13:12 +03:00
2005-04-16 15:20:36 -07:00
eisa_irq_edge= [PARISC,HW]
See header of drivers/parisc/eisa.c.
2007-07-31 00:37:59 -07:00
elanfreq= [X86-32]
2005-04-16 15:20:36 -07:00
See comment before function elanfreq_setup() in
2008-07-04 09:59:43 -07:00
arch/x86/kernel/cpu/cpufreq/elanfreq.c.
2005-04-16 15:20:36 -07:00
2011-10-30 15:16:37 +01:00
elfcorehdr=[size[KMG]@]offset[KMG] [IA64,PPC,SH,X86,S390]
2005-10-23 12:57:11 -07:00
Specifies physical address of start of kernel core
2011-10-30 15:16:37 +01:00
image elf header and optionally the size. Generally
kexec loader will pass this option to capture kernel.
2019-06-13 15:21:39 -03:00
See Documentation/admin-guide/kdump/kdump.rst for details.
2005-04-16 15:20:36 -07:00
2009-04-05 15:55:22 -07:00
enable_mtrr_cleanup [X86]
The kernel tries to adjust MTRR layout from continuous
to discrete, to make X server driver able to add WB
entry later. This parameter enables that.
2009-05-06 16:02:58 -07:00
enable_timer_pin_1 [X86]
2009-04-05 15:55:22 -07:00
Enable PIN 1 of APIC timer
Can be useful to work around chipset bugs
(in particular on some ATI chipsets).
The kernel tries to set a reasonable default.
2005-04-16 15:20:36 -07:00
enforcing [SELINUX] Set initial enforcing status.
Format: {"0" | "1"}
See security/selinux/Kconfig help text.
0 -- permissive (log only, no denials).
1 -- enforcing (deny and log).
Default value is 0.
2020-01-07 11:35:04 -05:00
Value can be changed at runtime via
/sys/fs/selinux/enforce.
2005-04-16 15:20:36 -07:00
2010-05-18 14:35:21 +08:00
erst_disable [ACPI]
Disable Error Record Serialization Table (ERST)
support.
2005-04-16 15:20:36 -07:00
ether= [HW,NET] Ethernet cards parameters
This option is obsoleted by the "netdev=" option, which
has equivalent usage. See its documentation for details.
2011-05-12 18:33:20 -04:00
evm= [EVM]
Format: { "fix" }
Permit 'security.evm' to be updated regardless of
current integrity status.
2006-12-08 02:39:42 -08:00
failslab=
fail_page_alloc=
fail_make_request=[KNL]
General fault injection mechanism.
Format: <interval>,<probability>,<space>,<times>
2011-08-15 02:02:26 +02:00
See also Documentation/fault-injection/.
2006-12-08 02:39:42 -08:00
2005-04-16 15:20:36 -07:00
floppy= [HW]
2019-06-18 11:47:10 -03:00
See Documentation/admin-guide/blockdev/floppy.rst.
2005-04-16 15:20:36 -07:00
2008-05-08 14:03:23 -06:00
force_pal_cache_flush
[IA-64] Avoid check_sal_cache_flush which may hang on
buggy SAL_CACHE_FLUSH implementations. Using this
parameter will force ia64_sal_cache_flush to call
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
2018-04-18 20:51:39 +02:00
forcepae [X86-32]
2014-03-07 18:40:42 +07:00
Forcefully enable Physical Address Extension (PAE).
Many Pentium M systems disable PAE but may have a
functionally usable PAE implementation.
Warning: use of this parameter will taint the kernel
and may cause unknown problems.
2008-11-01 19:57:37 +01:00
ftrace=[tracer]
2009-05-28 13:37:24 -04:00
[FTRACE] will set and start the specified tracer
2008-11-01 19:57:37 +01:00
as early as possible in order to facilitate early
boot debugging.
2010-04-18 19:08:41 +02:00
ftrace_dump_on_oops[=orig_cpu]
2009-05-28 13:37:24 -04:00
[FTRACE] will dump the trace buffers on oops.
2010-04-18 19:08:41 +02:00
If no parameter is passed, ftrace will dump
buffers of all CPUs, but if you pass orig_cpu, it will
dump only the buffer of the CPU that triggered the
oops.
2009-05-28 13:37:24 -04:00
ftrace_filter=[function-list]
[FTRACE] Limit the functions traced by the function
tracer at boot up. function-list is a comma separated
list of functions. This list can be changed at run
time by the set_ftrace_filter file in the debugfs
2011-08-13 12:34:52 -07:00
tracing directory.
2009-05-28 13:37:24 -04:00
ftrace_notrace=[function-list]
[FTRACE] Do not trace the functions specified in
function-list. This list can be changed at run time
by the set_ftrace_notrace file in the debugfs
tracing directory.
2008-11-01 19:57:37 +01:00
2009-10-12 22:17:21 +02:00
ftrace_graph_filter=[function-list]
[FTRACE] Limit the top level callers functions traced
by the function graph tracer at boot up.
function-list is a comma separated list of functions
that can be changed at run time by the
set_graph_function file in the debugfs tracing directory.
2014-06-13 01:23:50 +09:00
ftrace_graph_notrace=[function-list]
[FTRACE] Do not trace from the functions specified in
function-list. This list is a comma separated list of
functions that can be changed at run time by the
set_graph_notrace file in the debugfs tracing directory.
2017-03-02 16:12:15 -08:00
ftrace_graph_max_depth=<uint>
[FTRACE] Used with the function graph tracer. This is
the max depth it will trace into a function. This value
can be changed at run time by the max_graph_depth file
in the tracefs tracing directory. default: 0 (no limit)
2020-02-21 17:40:35 -08:00
fw_devlink= [KNL] Create device links between consumer and supplier
devices by scanning the firmware to infer the
consumer/supplier relationships. This feature is
especially useful when drivers are loaded as modules as
it ensures proper ordering of tasks like device probing
(suppliers first, then consumers), supplier boot state
clean up (only after all consumers have probed),
suspend/resume & runtime PM (consumers first, then
suppliers).
Format: { off | permissive | on | rpm }
off -- Don't create device links from firmware info.
permissive -- Create device links from firmware info
but use it only for ordering boot state clean
up (sync_state() calls).
on -- Create device links from firmware info and use it
to enforce probe and suspend/resume ordering.
rpm -- Like "on", but also use to order runtime PM.
2005-04-16 15:20:36 -07:00
gamecon.map[2|3]=
[HW,JOY] Multisystem joystick and NES/SNES/PSX pad
support via parallel port (up to 5 devices per port)
Format: <port#>,<pad1>,<pad2>,<pad3>,<pad4>,<pad5>
2017-10-10 12:36:23 -05:00
See also Documentation/input/devices/joystick-parport.rst
2005-04-16 15:20:36 -07:00
gamma= [HW,DRM]
2018-04-18 20:51:39 +02:00
gart_fix_e820= [X86_64] disable the fix e820 for K8 GART
x86: disable the GART early, 64-bit
For K8 system: 4G RAM with memory hole remapping enabled, or more than
4G RAM installed.
when try to use kexec second kernel, and the first doesn't include
gart_shutdown. the second kernel could have different aper position than
the first kernel. and second kernel could use that hole as RAM that is
still used by GART set by the first kernel. esp. when try to kexec
2.6.24 with sparse mem enable from previous kernel (from RHEL 5 or SLES
10). the new kernel will use aper by GART (set by first kernel) for
vmemmap. and after new kernel setting one new GART. the position will be
real RAM. the _mapcount set is lost.
Bad page state in process 'swapper'
page:ffffe2000e600020 flags:0x0000000000000000 mapping:0000000000000000 mapcount:1 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
Pid: 0, comm: swapper Not tainted 2.6.24-rc7-smp-gcdf71a10-dirty #13
Call Trace:
[<ffffffff8026401f>] bad_page+0x63/0x8d
[<ffffffff80264169>] __free_pages_ok+0x7c/0x2a5
[<ffffffff80ba75d1>] free_all_bootmem_core+0xd0/0x198
[<ffffffff80ba3a42>] numa_free_all_bootmem+0x3b/0x76
[<ffffffff80ba3461>] mem_init+0x3b/0x152
[<ffffffff80b959d3>] start_kernel+0x236/0x2c2
[<ffffffff80b9511a>] _sinittext+0x11a/0x121
and
[ffffe2000e600000-ffffe2000e7fffff] PMD ->ffff81001c200000 on node 0
phys addr is : 0x1c200000
RHEL 5.1 kernel -53 said:
PCI-DMA: aperture base @ 1c000000 size 65536 KB
new kernel said:
Mapping aperture over 65536 KB of RAM @ 3c000000
So could try to disable that GART if possible.
According to Ingo
> hm, i'm wondering, instead of modifying the GART, why dont we simply
> _detect_ whatever GART settings we have inherited, and propagate that
> into our e820 maps? I.e. if there's inconsistency, then punch that out
> from the memory maps and just dont use that memory.
>
> that way it would not matter whether the GART settings came from a [old
> or crashing] Linux kernel that has not called gart_iommu_shutdown(), or
> whether it's a BIOS that has set up an aperture hole inconsistent with
> the memory map it passed. (or the memory map we _think_ i tried to pass
> us)
>
> it would also be more robust to only read and do a memory map quirk
> based on that, than actively trying to change the GART so early in the
> bootup. Later on we have to re-enable the GART _anyway_ and have to
> punch a hole for it.
>
> and as a bonus, we would have shored up our defenses against crappy
> BIOSes as well.
add e820 modification for gart inconsistent setting.
gart_fix_e820=off could be used to disable e820 fix.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:09 +01:00
Format: off | on
default: on
2009-06-17 16:28:08 -07:00
gcov_persist= [GCOV] When non-zero (default), profiling data for
kernel modules is saved and remains accessible via
debugfs, even when the module is unloaded/reloaded.
When zero, profiling data is discarded and associated
debugfs files are removed at module unload time.
2017-02-15 11:11:50 +01:00
goldfish [X86] Enable the goldfish android emulator platform.
Don't use this when you are not running on the
android emulator
2005-04-16 15:20:36 -07:00
gpt [EFI] Forces disk with valid GPT signature but
2014-01-23 15:56:03 -08:00
invalid Protective MBR to be treated as GPT. If the
primary GPT is corrupted, it enables the backup/alternate
GPT to be used instead.
2005-04-16 15:20:36 -07:00
2012-11-15 08:47:14 +01:00
grcan.enable0= [HW] Configuration of physical interface 0. Determines
the "Enable 0" bit of the configuration register.
Format: 0 | 1
Default: 0
grcan.enable1= [HW] Configuration of physical interface 1. Determines
the "Enable 0" bit of the configuration register.
Format: 0 | 1
Default: 0
grcan.select= [HW] Select which physical interface to use.
Format: 0 | 1
Default: 0
grcan.txsize= [HW] Sets the size of the tx buffer.
Format: <unsigned int> such that (txsize & ~0x1fffc0) == 0.
Default: 1024
grcan.rxsize= [HW] Sets the size of the rx buffer.
Format: <unsigned int> such that (rxsize & ~0x1fffc0) == 0.
Default: 1024
2016-08-31 11:45:46 +02:00
gpio-mockup.gpio_mockup_ranges
[HW] Sets the ranges of gpiochip of for this device.
Format: <start1>,<end1>,<start2>,<end2>...
2015-11-05 18:44:41 -08:00
hardlockup_all_cpu_backtrace=
[KNL] Should the hard-lockup detector generate
backtraces on all cpus.
2020-06-07 21:40:42 -07:00
Format: 0 | 1
2015-11-05 18:44:41 -08:00
2005-04-16 15:20:36 -07:00
hashdist= [KNL,NUMA] Large hashes allocated during boot
are distributed across NUMA nodes. Defaults on
2011-08-13 12:34:52 -07:00
for 64-bit NUMA, off otherwise.
2005-10-23 12:57:11 -07:00
Format: 0 | 1 (for off | on)
2005-04-16 15:20:36 -07:00
hcl= [IA-64] SGI's Hardware Graph compatibility layer
hd= [EIDE] (E)IDE hard drive subsystem geometry
Format: <cyl>,<head>,<sect>
2010-05-18 14:35:15 +08:00
hest_disable [ACPI]
Disable Hardware Error Source Table (HEST) support;
corresponding firmware-first mode error processing
logic will be disabled.
2005-04-16 15:20:36 -07:00
highmem=nn[KMG] [KNL,BOOT] forces the highmem zone to have an exact
size of <nn>. This works even on boxes that have no
highmem otherwise. This also works to reduce highmem
size on bigger boxes.
2007-02-16 01:28:11 -08:00
highres= [KNL] Enable/disable high resolution timer mode.
Valid parameters: "on", "off"
Default: "on"
2009-04-05 15:55:22 -07:00
hlt [BUGS=ARM,SH]
hpet= [X86-32,HPET] option to control HPET usage
Format: { enable (default) | disable | force |
verbose }
disable: disable HPET and use PIT instead
force: allow force enabled of undocumented chips (ICH4,
VIA, nVidia)
verbose: show contents of HPET registers during setup
2013-11-12 15:08:33 -08:00
hpet_mmap= [X86, HPET_MMAP] Allow userspace to mmap HPET
registers. Default set by CONFIG_HPET_MMAP_DEFAULT.
mm: hugetlb: optionally allocate gigantic hugepages using cma
Commit 944d9fec8d7a ("hugetlb: add support for gigantic page allocation
at runtime") has added the run-time allocation of gigantic pages.
However it actually works only at early stages of the system loading,
when the majority of memory is free. After some time the memory gets
fragmented by non-movable pages, so the chances to find a contiguous 1GB
block are getting close to zero. Even dropping caches manually doesn't
help a lot.
At large scale rebooting servers in order to allocate gigantic hugepages
is quite expensive and complex. At the same time keeping some constant
percentage of memory in reserved hugepages even if the workload isn't
using it is a big waste: not all workloads can benefit from using 1 GB
pages.
The following solution can solve the problem:
1) On boot time a dedicated cma area* is reserved. The size is passed
as a kernel argument.
2) Run-time allocations of gigantic hugepages are performed using the
cma allocator and the dedicated cma area
In this case gigantic hugepages can be allocated successfully with a
high probability, however the memory isn't completely wasted if nobody
is using 1GB hugepages: it can be used for pagecache, anon memory, THPs,
etc.
* On a multi-node machine a per-node cma area is allocated on each node.
Following gigantic hugetlb allocation are using the first available
numa node if the mask isn't specified by a user.
Usage:
1) configure the kernel to allocate a cma area for hugetlb allocations:
pass hugetlb_cma=10G as a kernel argument
2) allocate hugetlb pages as usual, e.g.
echo 10 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
If the option isn't enabled or the allocation of the cma area failed,
the current behavior of the system is preserved.
x86 and arm-64 are covered by this patch, other architectures can be
trivially added later.
The patch contains clean-ups and fixes proposed and implemented by Aslan
Bakirov and Randy Dunlap. It also contains ideas and suggestions
proposed by Rik van Riel, Michal Hocko and Mike Kravetz. Thanks!
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Andreas Schaufler <andreas.schaufler@gmx.de>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@kernel.org>
Cc: Aslan Bakirov <aslan@fb.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Link: http://lkml.kernel.org/r/20200407163840.92263-3-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 14:32:45 -07:00
hugetlb_cma= [HW] The size of a cma area used for allocation
of gigantic hugepages.
Format: nn[KMGTPE]
Reserve a cma area of given size and allocate gigantic
hugepages using the cma allocator. If enabled, the
boot-time allocation of gigantic hugepages is skipped.
2020-06-03 16:00:46 -07:00
hugepages= [HW] Number of HugeTLB pages to allocate at boot.
If this follows hugepagesz (below), it specifies
the number of pages of hugepagesz to be allocated.
If this is the first HugeTLB parameter on the command
line, it specifies the number of pages to allocate for
the default huge page size. See also
Documentation/admin-guide/mm/hugetlbpage.rst.
Format: <integer>
hugepagesz=
[HW] The size of the HugeTLB pages. This is used in
conjunction with hugepages (above) to allocate huge
pages of a specific size at boot. The pair
hugepagesz=X hugepages=Y can be specified once for
each supported huge page size. Huge page sizes are
architecture dependent. See also
Documentation/admin-guide/mm/hugetlbpage.rst.
Format: size[KMG]
2008-09-21 17:14:42 +09:00
2018-05-21 11:18:17 -07:00
hung_task_panic=
[KNL] Should the hung task detector generate panics.
2020-06-07 21:40:42 -07:00
Format: 0 | 1
2008-12-25 13:39:55 +01:00
2020-06-07 21:40:31 -07:00
A value of 1 instructs the kernel to panic when a
2018-05-21 11:18:17 -07:00
hung task is detected. The default value is controlled
by the CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time
option. The value selected by this boot parameter can
be changed later by the kernel.hung_task_panic sysctl.
2018-04-18 20:51:39 +02:00
hvc_iucv= [S390] Number of z/VM IUCV hypervisor console (HVC)
terminal devices. Valid values: 0..8
hvc_iucv_allow= [S390] Comma-separated list of z/VM user IDs.
If specified, z/VM IUCV HVC accepts connections
from listed z/VM user IDs only.
2018-10-08 16:29:34 +08:00
hv_nopvspin [X86,HYPER_V] Disables the paravirt spinlock optimizations
which allow the hypervisor to 'idle' the
guest on lock contention.
2011-03-22 16:34:20 -07:00
keep_bootcon [KNL]
Do not unregister boot console at start. This is only
useful for debugging when something happens in the window
between unregistering the boot console and initializing
the real console.
2018-04-18 20:51:39 +02:00
i2c_bus= [HW] Override the default board specific I2C bus speed
or register an additional I2C bus that is not
registered from board initialization code.
Format:
<bus_id>,<clkrate>
2009-03-23 18:07:47 -07:00
2008-10-06 02:51:09 -04:00
i8042.debug [HW] Toggle i8042 debug mode
2015-07-15 10:20:17 -07:00
i8042.unmask_kbd_data
[HW] Enable printing of interrupt data from the KBD port
(disabled by default, and as a pre-condition
requires that i8042.debug=1 be enabled)
2005-04-16 15:20:36 -07:00
i8042.direct [HW] Put keyboard port into non-translated mode
2006-10-03 22:53:09 +02:00
i8042.dumbkbd [HW] Pretend that controller can only read data from
keyboard and cannot control its state
2005-04-16 15:20:36 -07:00
(Don't attempt to blink the leds)
i8042.noaux [HW] Don't check for auxiliary (== mouse) port
2005-09-04 01:42:00 -05:00
i8042.nokbd [HW] Don't check/create keyboard port
2008-03-13 16:13:59 -04:00
i8042.noloop [HW] Disable the AUX Loopback command while probing
for the AUX port
2005-04-16 15:20:36 -07:00
i8042.nomux [HW] Don't check presence of an active multiplexing
2014-10-31 09:35:53 -07:00
controller
2005-04-16 15:20:36 -07:00
i8042.nopnp [HW] Don't use ACPIPnP / PnPBIOS to discover KBD/AUX
controllers
2012-02-15 00:26:42 +09:00
i8042.notimeout [HW] Ignore timeout condition signalled by controller
2016-10-01 12:07:35 -07:00
i8042.reset [HW] Reset the controller during init, cleanup and
suspend-to-ram transitions, only during s2r
transitions, or never reset
Format: { 1 | Y | y | 0 | N | n }
1, Y, y: always reset controller
0, N, n: don't ever reset controller
Default: only on s2r transitions on x86; most other
architectures force reset to be always executed
2005-04-16 15:20:36 -07:00
i8042.unlock [HW] Unlock (ignore) the keylock
2018-04-18 20:51:39 +02:00
i8042.kbdreset [HW] Reset device connected to KBD port
2005-04-16 15:20:36 -07:00
i810= [HW,DRM]
2005-06-25 14:54:25 -07:00
i8k.ignore_dmi [HW] Continue probing hardware even if DMI data
indicates that the driver is running on unsupported
hardware.
2005-04-16 15:20:36 -07:00
i8k.force [HW] Activate i8k driver even if SMM BIOS signature
does not match list of supported models.
i8k.power_status
[HW] Report power status in /proc/i8k
(disabled by default)
i8k.restricted [HW] Allow controlling fans only if SYS_ADMIN
capability is set.
2012-03-15 15:56:26 +01:00
i915.invert_brightness=
2012-03-15 15:56:25 +01:00
[DRM] Invert the sense of the variable that is used to
set the brightness of the panel backlight. Normally a
2012-03-15 15:56:26 +01:00
brightness value of 0 indicates backlight switched off,
and the maximum of the brightness value sets the backlight
to maximum brightness. If this parameter is set to 0
(default) and the machine requires it, or this parameter
is set to 1, a brightness value of 0 sets the backlight
to maximum brightness, and the maximum of the brightness
value switches the backlight off.
-1 -- never invert brightness
0 -- machine default
1 -- force brightness inversion
2012-03-15 15:56:25 +01:00
2005-04-16 15:20:36 -07:00
icn= [HW,ISDN]
Format: <io>[,<membase>[,<icn_id>[,<icn_id2>]]]
2009-02-25 20:28:21 +01:00
ide-core.nodma= [HW] (E)IDE subsystem
Format: =0.0 to prevent dma on hda, =0.1 hdb =1.0 hdc
2009-06-07 13:52:52 +02:00
.vlb_clock .pci_clock .noflush .nohpa .noprobe .nowerr
.cdrom .chs .ignore_cable are additional options
2019-06-12 14:52:47 -03:00
See Documentation/ide/ide.rst.
2005-04-16 15:20:36 -07:00
2014-10-25 17:03:52 +01:00
ide-generic.probe-mask= [HW] (E)IDE subsystem
Format: <int>
Probe mask for legacy ISA IDE ports. Depending on
platform up to 6 ports are supported, enabled by
setting corresponding bits in the mask to 1. The
default value is 0x0, which has a special meaning.
On systems that have PCI, it triggers scanning the
PCI bus for the first and the second port, which
are then probed. On systems without PCI the value
of 0x0 enables probing the two first ports as if it
was 0x3.
2009-04-05 15:55:22 -07:00
ide-pci-generic.all-generic-ide [HW] (E)IDE subsystem
Claim all unknown PCI IDE storage controllers.
2007-05-02 19:27:12 +02:00
idle= [X86]
2013-02-10 01:38:39 -05:00
Format: idle=poll, idle=halt, idle=nomwait
2008-12-19 10:57:32 -08:00
Poll forces a polling idle loop that can slightly
improve the performance of waking up a idle CPU, but
will use a lot of power and make the system run hot.
Not recommended.
idle=halt: Halt is forced to be used for CPU idle.
2008-06-24 17:58:53 +08:00
In such case C2/C3 won't be used again.
2008-12-19 10:57:32 -08:00
idle=nomwait: Disable mwait for CPU C-states
2005-10-23 12:57:11 -07:00
2015-11-13 00:48:29 +00:00
ieee754= [MIPS] Select IEEE Std 754 conformance mode
Format: { strict | legacy | 2008 | relaxed }
Default: strict
Choose which programs will be accepted for execution
based on the IEEE 754 NaN encoding(s) supported by
the FPU and the NaN encoding requested with the value
of an ELF file header flag individually set by each
binary. Hardware implementations are permitted to
support either or both of the legacy and the 2008 NaN
encoding mode.
Available settings are as follows:
strict accept binaries that request a NaN encoding
supported by the FPU
legacy only accept legacy-NaN binaries, if supported
by the FPU
2008 only accept 2008-NaN binaries, if supported
by the FPU
relaxed accept any binaries regardless of whether
supported by the FPU
The FPU emulator is always able to support both NaN
encodings, so if no FPU hardware is present or it has
been disabled with 'nofpu', then the settings of
'legacy' and '2008' strap the emulator accordingly,
'relaxed' straps the emulator for both legacy-NaN and
2008-NaN, whereas 'strict' enables legacy-NaN only on
legacy processors and both NaN encodings on MIPS32 or
MIPS64 CPUs.
The setting for ABS.fmt/NEG.fmt instruction execution
mode generally follows that for the NaN encoding,
except where unsupported by hardware.
2006-12-06 20:40:51 -08:00
ignore_loglevel [KNL]
Ignore loglevel setting - this will print /all/
kernel messages to the console. Useful for debugging.
2011-10-31 17:11:25 -07:00
We also add it as printk module parameter, so users
could change it dynamically, usually by
/sys/module/printk/parameters/ignore_loglevel.
2006-12-06 20:40:51 -08:00
2016-02-02 16:57:43 -08:00
ignore_rlimit_data
Ignore RLIMIT_DATA setting for data mappings,
print warning at first misuse. Can be changed via
/sys/module/kernel/parameters/ignore_rlimit_data.
2005-04-16 15:20:36 -07:00
ihash_entries= [KNL]
Set number of hash buckets for inode cache.
ima: integrity appraisal extension
IMA currently maintains an integrity measurement list used to assert the
integrity of the running system to a third party. The IMA-appraisal
extension adds local integrity validation and enforcement of the
measurement against a "good" value stored as an extended attribute
'security.ima'. The initial methods for validating 'security.ima' are
hashed based, which provides file data integrity, and digital signature
based, which in addition to providing file data integrity, provides
authenticity.
This patch creates and maintains the 'security.ima' xattr, containing
the file data hash measurement. Protection of the xattr is provided by
EVM, if enabled and configured.
Based on policy, IMA calls evm_verifyxattr() to verify a file's metadata
integrity and, assuming success, compares the file's current hash value
with the one stored as an extended attribute in 'security.ima'.
Changelov v4:
- changed iint cache flags to hex values
Changelog v3:
- change appraisal default for filesystems without xattr support to fail
Changelog v2:
- fix audit msg 'res' value
- removed unused 'ima_appraise=' values
Changelog v1:
- removed unused iint mutex (Dmitry Kasatkin)
- setattr hook must not reset appraised (Dmitry Kasatkin)
- evm_verifyxattr() now differentiates between no 'security.evm' xattr
(INTEGRITY_NOLABEL) and no EVM 'protected' xattrs included in the
'security.evm' (INTEGRITY_NOXATTRS).
- replace hash_status with ima_status (Dmitry Kasatkin)
- re-initialize slab element ima_status on free (Dmitry Kasatkin)
- include 'security.ima' in EVM if CONFIG_IMA_APPRAISE, not CONFIG_IMA
- merged half "ima: ima_must_appraise_or_measure API change" (Dmitry Kasatkin)
- removed unnecessary error variable in process_measurement() (Dmitry Kasatkin)
- use ima_inode_post_setattr() stub function, if IMA_APPRAISE not configured
(moved ima_inode_post_setattr() to ima_appraise.c)
- make sure ima_collect_measurement() can read file
Changelog:
- add 'iint' to evm_verifyxattr() call (Dimitry Kasatkin)
- fix the race condition between chmod, which takes the i_mutex and then
iint->mutex, and ima_file_free() and process_measurement(), which take
the locks in the reverse order, by eliminating iint->mutex. (Dmitry Kasatkin)
- cleanup of ima_appraise_measurement() (Dmitry Kasatkin)
- changes as a result of the iint not allocated for all regular files, but
only for those measured/appraised.
- don't try to appraise new/empty files
- expanded ima_appraisal description in ima/Kconfig
- IMA appraise definitions required even if IMA_APPRAISE not enabled
- add return value to ima_must_appraise() stub
- unconditionally set status = INTEGRITY_PASS *after* testing status,
not before. (Found by Joe Perches)
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
2012-02-13 10:15:05 -05:00
ima_appraise= [IMA] appraise integrity measurements
2014-05-08 13:11:29 +03:00
Format: { "off" | "enforce" | "fix" | "log" }
ima: integrity appraisal extension
IMA currently maintains an integrity measurement list used to assert the
integrity of the running system to a third party. The IMA-appraisal
extension adds local integrity validation and enforcement of the
measurement against a "good" value stored as an extended attribute
'security.ima'. The initial methods for validating 'security.ima' are
hashed based, which provides file data integrity, and digital signature
based, which in addition to providing file data integrity, provides
authenticity.
This patch creates and maintains the 'security.ima' xattr, containing
the file data hash measurement. Protection of the xattr is provided by
EVM, if enabled and configured.
Based on policy, IMA calls evm_verifyxattr() to verify a file's metadata
integrity and, assuming success, compares the file's current hash value
with the one stored as an extended attribute in 'security.ima'.
Changelov v4:
- changed iint cache flags to hex values
Changelog v3:
- change appraisal default for filesystems without xattr support to fail
Changelog v2:
- fix audit msg 'res' value
- removed unused 'ima_appraise=' values
Changelog v1:
- removed unused iint mutex (Dmitry Kasatkin)
- setattr hook must not reset appraised (Dmitry Kasatkin)
- evm_verifyxattr() now differentiates between no 'security.evm' xattr
(INTEGRITY_NOLABEL) and no EVM 'protected' xattrs included in the
'security.evm' (INTEGRITY_NOXATTRS).
- replace hash_status with ima_status (Dmitry Kasatkin)
- re-initialize slab element ima_status on free (Dmitry Kasatkin)
- include 'security.ima' in EVM if CONFIG_IMA_APPRAISE, not CONFIG_IMA
- merged half "ima: ima_must_appraise_or_measure API change" (Dmitry Kasatkin)
- removed unnecessary error variable in process_measurement() (Dmitry Kasatkin)
- use ima_inode_post_setattr() stub function, if IMA_APPRAISE not configured
(moved ima_inode_post_setattr() to ima_appraise.c)
- make sure ima_collect_measurement() can read file
Changelog:
- add 'iint' to evm_verifyxattr() call (Dimitry Kasatkin)
- fix the race condition between chmod, which takes the i_mutex and then
iint->mutex, and ima_file_free() and process_measurement(), which take
the locks in the reverse order, by eliminating iint->mutex. (Dmitry Kasatkin)
- cleanup of ima_appraise_measurement() (Dmitry Kasatkin)
- changes as a result of the iint not allocated for all regular files, but
only for those measured/appraised.
- don't try to appraise new/empty files
- expanded ima_appraisal description in ima/Kconfig
- IMA appraise definitions required even if IMA_APPRAISE not enabled
- add return value to ima_must_appraise() stub
- unconditionally set status = INTEGRITY_PASS *after* testing status,
not before. (Found by Joe Perches)
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
2012-02-13 10:15:05 -05:00
default: "enforce"
2019-04-04 20:23:22 +02:00
ima_appraise_tcb [IMA] Deprecated. Use ima_policy= instead.
ima: add appraise action keywords and default rules
Unlike the IMA measurement policy, the appraise policy can not be dependent
on runtime process information, such as the task uid, as the 'security.ima'
xattr is written on file close and must be updated each time the file changes,
regardless of the current task uid.
This patch extends the policy language with 'fowner', defines an appraise
policy, which appraises all files owned by root, and defines 'ima_appraise_tcb',
a new boot command line option, to enable the appraise policy.
Changelog v3:
- separate the measure from the appraise rules in order to support measuring
without appraising and appraising without measuring.
- change appraisal default for filesystems without xattr support to fail
- update default appraise policy for cgroups
Changelog v1:
- don't appraise RAMFS (Dmitry Kasatkin)
- merged rest of "ima: ima_must_appraise_or_measure API change" commit
(Dmtiry Kasatkin)
ima_must_appraise_or_measure() called ima_match_policy twice, which
searched the policy for a matching rule. Once for a matching measurement
rule and subsequently for an appraisal rule. Searching the policy twice
is unnecessary overhead, which could be noticeable with a large policy.
The new version of ima_must_appraise_or_measure() does everything in a
single iteration using a new version of ima_match_policy(). It returns
IMA_MEASURE, IMA_APPRAISE mask.
With the use of action mask only one efficient matching function
is enough. Removed other specific versions of matching functions.
Changelog:
- change 'owner' to 'fowner' to conform to the new LSM conditions posted by
Roberto Sassu.
- fix calls to ima_log_string()
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
2011-03-09 22:25:48 -05:00
The builtin appraise policy appraises all files
owned by uid=0.
2016-12-19 16:22:57 -08:00
ima_canonical_fmt [IMA]
Use the canonical format for the binary runtime
measurements, instead of host native format.
2009-02-04 09:06:58 -05:00
ima_hash= [IMA]
2013-06-07 12:16:37 +02:00
Format: { md5 | sha1 | rmd160 | sha256 | sha384
| sha512 | ... }
2009-02-04 09:06:58 -05:00
default: "sha1"
2013-06-07 12:16:37 +02:00
The list of supported hash algorithms is defined
in crypto/hash_info.h.
2015-06-11 20:48:33 -04:00
ima_policy= [IMA]
2017-04-24 12:04:09 -04:00
The builtin policies to load during IMA setup.
2018-02-21 11:36:32 -05:00
Format: "tcb | appraise_tcb | secure_boot |
fail_securely"
2017-04-24 12:04:09 -04:00
The "tcb" policy measures all programs exec'd, files
mmap'd for exec, and all files opened with the read
mode bit set by either the effective uid (euid=0) or
uid=0.
The "appraise_tcb" policy appraises the integrity of
2019-04-04 20:23:22 +02:00
all files owned by root.
2015-06-11 20:48:33 -04:00
2017-04-21 18:58:27 -04:00
The "secure_boot" policy appraises the integrity
of files (eg. kexec kernel image, kernel modules,
firmware, policy, etc) based on file signatures.
2015-06-11 20:48:33 -04:00
2018-02-21 11:36:32 -05:00
The "fail_securely" policy forces file signature
verification failure also on privileged mounted
filesystems with the SB_I_UNVERIFIABLE_SIGNATURE
flag.
2015-06-11 20:48:33 -04:00
ima_tcb [IMA] Deprecated. Use ima_policy= instead.
2009-05-21 15:47:06 -04:00
Load a policy which meets the needs of the Trusted
Computing Base. This means IMA will measure all
programs exec'd, files mmap'd for exec, and all files
opened for read by uid=0.
2018-04-18 20:51:39 +02:00
ima_template= [IMA]
2013-06-07 12:16:35 +02:00
Select one of defined IMA measurements template formats.
2015-04-11 17:07:03 +02:00
Formats: { "ima" | "ima-ng" | "ima-sig" }
2013-06-07 12:16:35 +02:00
Default: "ima-ng"
2014-10-13 14:08:42 +02:00
ima_template_fmt=
2018-04-18 20:51:39 +02:00
[IMA] Define a custom template format.
2014-10-13 14:08:42 +02:00
Format: { "field1|...|fieldN" }
2014-02-26 17:05:20 +02:00
ima.ahash_minsize= [IMA] Minimum file size for asynchronous hash usage
Format: <min_file_size>
Set the minimal file size for using asynchronous hash.
If left unspecified, ahash usage is disabled.
ahash performance varies for different data sizes on
different crypto accelerators. This option can be used
to achieve the best performance for a particular HW.
2014-05-06 14:47:13 +03:00
ima.ahash_bufsize= [IMA] Asynchronous hash buffer size
Format: <bufsize>
Set hashing buffer size. Default: 4k.
ahash performance varies for different chunk sizes on
different crypto accelerators. This option can be used
to achieve best performance for particular HW.
2005-04-16 15:20:36 -07:00
init= [KNL]
Format: <full_path>
Run specified binary instead of /sbin/init as init
process.
initcall_debug [KNL] Trace initcalls as they are executed. Useful
for working out where the kernel is dying during
startup.
2014-06-04 16:12:17 -07:00
initcall_blacklist= [KNL] Do not execute a comma-separated list of
initcall functions. Useful for debugging built-in
modules and initcalls.
2005-04-16 15:20:36 -07:00
initrd= [BOOT] Specify the location of the initial ramdisk
x86/setup: Add an initrdmem= option to specify initrd physical address
Add the initrdmem option:
initrdmem=ss[KMG],nn[KMG]
which is used to specify the physical address of the initrd, almost
always an address in FLASH. Also add code for x86 to use the existing
phys_init_start and phys_init_size variables in the kernel.
This is useful in cases where a kernel and an initrd is placed in FLASH,
but there is no firmware file system structure in the FLASH.
One such situation occurs when unused FLASH space on UEFI systems has
been reclaimed by, e.g., taking it from the Management Engine. For
example, on many systems, the ME is given half the FLASH part; not only
is 2.75M of an 8M part unused; but 10.75M of a 16M part is unused. This
space can be used to contain an initrd, but need to tell Linux where it
is.
This space is "raw": due to, e.g., UEFI limitations: it can not be added
to UEFI firmware volumes without rebuilding UEFI from source or writing
a UEFI device driver. It can be referenced only as a physical address
and size.
At the same time, if a kernel can be "netbooted" or loaded from GRUB or
syslinux, the option of not using the physical address specification
should be available.
Then, it is easy to boot the kernel and provide an initrd; or boot the
the kernel and let it use the initrd in FLASH. In practice, this has
proven to be very helpful when integrating Linux into FLASH on x86.
Hence, the most flexible and convenient path is to enable the initrdmem
command line option in a way that it is the last choice tried.
For example, on the DigitalLoggers Atomic Pi, an image into FLASH can be
burnt in with a built-in command line which includes:
initrdmem=0xff968000,0x200000
which specifies a location and size.
[ bp: Massage commit message, make it passive. ]
[akpm@linux-foundation.org: coding style fixes]
Signed-off-by: Ronald G. Minnich <rminnich@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Link: http://lkml.kernel.org/r/CAP6exYLK11rhreX=6QPyDQmW7wPHsKNEFtXE47pjx41xS6O7-A@mail.gmail.com
Link: https://lkml.kernel.org/r/20200426011021.1cskg0AGd%akpm@linux-foundation.org
2020-04-25 18:10:21 -07:00
initrdmem= [KNL] Specify a physical address and size from which to
load the initrd. If an initrd is compiled in or
specified in the bootparams, it takes priority over this
setting.
Format: ss[KMG],nn[KMG]
Default is 0, 0
mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
Patch series "add init_on_alloc/init_on_free boot options", v10.
Provide init_on_alloc and init_on_free boot options.
These are aimed at preventing possible information leaks and making the
control-flow bugs that depend on uninitialized values more deterministic.
Enabling either of the options guarantees that the memory returned by the
page allocator and SL[AU]B is initialized with zeroes. SLOB allocator
isn't supported at the moment, as its emulation of kmem caches complicates
handling of SLAB_TYPESAFE_BY_RCU caches correctly.
Enabling init_on_free also guarantees that pages and heap objects are
initialized right after they're freed, so it won't be possible to access
stale data by using a dangling pointer.
As suggested by Michal Hocko, right now we don't let the heap users to
disable initialization for certain allocations. There's not enough
evidence that doing so can speed up real-life cases, and introducing ways
to opt-out may result in things going out of control.
This patch (of 2):
The new options are needed to prevent possible information leaks and make
control-flow bugs that depend on uninitialized values more deterministic.
This is expected to be on-by-default on Android and Chrome OS. And it
gives the opportunity for anyone else to use it under distros too via the
boot args. (The init_on_free feature is regularly requested by folks
where memory forensics is included in their threat models.)
init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
objects with zeroes. Initialization is done at allocation time at the
places where checks for __GFP_ZERO are performed.
init_on_free=1 makes the kernel initialize freed pages and heap objects
with zeroes upon their deletion. This helps to ensure sensitive data
doesn't leak via use-after-free accesses.
Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator
returns zeroed memory. The two exceptions are slab caches with
constructors and SLAB_TYPESAFE_BY_RCU flag. Those are never
zero-initialized to preserve their semantics.
Both init_on_alloc and init_on_free default to zero, but those defaults
can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and
CONFIG_INIT_ON_FREE_DEFAULT_ON.
If either SLUB poisoning or page poisoning is enabled, those options take
precedence over init_on_alloc and init_on_free: initialization is only
applied to unpoisoned allocations.
Slowdown for the new features compared to init_on_free=0, init_on_alloc=0:
hackbench, init_on_free=1: +7.62% sys time (st.err 0.74%)
hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)
Linux build with -j12, init_on_free=1: +8.38% wall time (st.err 0.39%)
Linux build with -j12, init_on_free=1: +24.42% sys time (st.err 0.52%)
Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)
The slowdown for init_on_free=0, init_on_alloc=0 compared to the baseline
is within the standard error.
The new features are also going to pave the way for hardware memory
tagging (e.g. arm64's MTE), which will require both on_alloc and on_free
hooks to set the tags for heap objects. With MTE, tagging will have the
same cost as memory initialization.
Although init_on_free is rather costly, there are paranoid use-cases where
in-memory data lifetime is desired to be minimized. There are various
arguments for/against the realism of the associated threat models, but
given that we'll need the infrastructure for MTE anyway, and there are
people who want wipe-on-free behavior no matter what the performance cost,
it seems reasonable to include it in this series.
[glider@google.com: v8]
Link: http://lkml.kernel.org/r/20190626121943.131390-2-glider@google.com
[glider@google.com: v9]
Link: http://lkml.kernel.org/r/20190627130316.254309-2-glider@google.com
[glider@google.com: v10]
Link: http://lkml.kernel.org/r/20190628093131.199499-2-glider@google.com
Link: http://lkml.kernel.org/r/20190617151050.92663-2-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.cz> [page and dmapool parts
Acked-by: James Morris <jamorris@linux.microsoft.com>]
Cc: Christoph Lameter <cl@linux.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-11 20:59:19 -07:00
init_on_alloc= [MM] Fill newly allocated pages and heap objects with
zeroes.
Format: 0 | 1
Default set by CONFIG_INIT_ON_ALLOC_DEFAULT_ON.
init_on_free= [MM] Fill freed pages and heap objects with zeroes.
Format: 0 | 1
Default set by CONFIG_INIT_ON_FREE_DEFAULT_ON.
x86/pkeys: Default to a restrictive init PKRU
PKRU is the register that lets you disallow writes or all access to a given
protection key.
The XSAVE hardware defines an "init state" of 0 for PKRU: its most
permissive state, allowing access/writes to everything. Since we start off
all new processes with the init state, we start all processes off with the
most permissive possible PKRU.
This is unfortunate. If a thread is clone()'d [1] before a program has
time to set PKRU to a restrictive value, that thread will be able to write
to all data, no matter what pkey is set on it. This weakens any integrity
guarantees that we want pkeys to provide.
To fix this, we define a very restrictive PKRU to override the
XSAVE-provided value when we create a new FPU context. We choose a value
that only allows access to pkey 0, which is as restrictive as we can
practically make it.
This does not cause any practical problems with applications using
protection keys because we require them to specify initial permissions for
each key when it is allocated, which override the restrictive default.
In the end, this ensures that threads which do not know how to manage their
own pkey rights can not do damage to data which is pkey-protected.
I would have thought this was a pretty contrived scenario, except that I
heard a bug report from an MPX user who was creating threads in some very
early code before main(). It may be crazy, but folks evidently _do_ it.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen <dave@sr71.net>
Cc: mgorman@techsingularity.net
Cc: arnd@arndb.de
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/20160729163021.F3C25D4A@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-07-29 09:30:21 -07:00
init_pkru= [x86] Specify the default memory protection keys rights
register contents for all processes. 0x55555554 by
default (disallow access to all but pkey 0). Can
override in debugfs after boot.
2005-04-16 15:20:36 -07:00
inport.irq= [HW] Inport (ATI XL and Microsoft) busmouse driver
Format: <irq>
2018-04-18 20:51:39 +02:00
int_pln_enable [x86] Enable power limit notification interrupt
2013-05-21 15:35:17 -04:00
2013-03-18 14:48:02 -04:00
integrity_audit=[IMA]
Format: { "0" | "1" }
0 -- basic integrity auditing messages. (Default)
1 -- additional integrity auditing messages.
2007-10-21 16:41:49 -07:00
intel_iommu= [DMAR] Intel IOMMU driver (DMAR) option
2009-02-04 14:29:19 -08:00
on
Enable intel iommu driver.
2007-10-21 16:41:49 -07:00
off
Disable intel iommu driver.
igfx_off [Default Off]
By default, gfx is mapped as normal device. If a gfx
device has a dedicated DMAR unit, the DMAR unit is
bypassed by not enabling DMAR with this option. In
this case, gfx device will use physical address for
DMA.
2007-10-21 16:41:53 -07:00
forcedac [x86_64]
With this option iommu will not optimize to look
2011-08-13 12:34:52 -07:00
for io virtual address below 32-bit forcing dual
2007-10-21 16:41:53 -07:00
address cycle on pci bus for cards supporting greater
2011-08-13 12:34:52 -07:00
than 32-bit addressing. The default is to look
for translation below 32-bit and if not available
2007-10-21 16:41:53 -07:00
then look in the higher range.
2008-03-04 15:22:08 -08:00
strict [Default Off]
With this option on every unmap_single operation will
result in a hardware IOTLB flush operation as opposed
to batching them for performance.
intel-iommu: Enable super page (2MiB, 1GiB, etc.) support
There are no externally-visible changes with this. In the loop in the
internal __domain_mapping() function, we simply detect if we are mapping:
- size >= 2MiB, and
- virtual address aligned to 2MiB, and
- physical address aligned to 2MiB, and
- on hardware that supports superpages.
(and likewise for larger superpages).
We automatically use a superpage for such mappings. We never have to
worry about *breaking* superpages, since we trust that we will always
*unmap* the same range that was mapped. So all we need to do is ensure
that dma_pte_clear_range() will also cope with superpages.
Adjust pfn_to_dma_pte() to take a superpage 'level' as an argument, so
it can return a PTE at the appropriate level rather than always
extending the page tables all the way down to level 1. Again, this is
simplified by the fact that we should never encounter existing small
pages when we're creating a mapping; any old mapping that used the same
virtual range will have been entirely removed and its obsolete page
tables freed.
Provide an 'intel_iommu=sp_off' argument on the command line as a
chicken bit. Not that it should ever be required.
==
The original commit seen in the iommu-2.6.git was Youquan's
implementation (and completion) of my own half-baked code which I'd
typed into an email. Followed by half a dozen subsequent 'fixes'.
I've taken the unusual step of rewriting history and collapsing the
original commits in order to keep the main history simpler, and make
life easier for the people who are going to have to backport this to
older kernels. And also so I can give it a more coherent commit comment
which (hopefully) gives a better explanation of what's going on.
The original sequence of commits leading to identical code was:
Youquan Song (3):
intel-iommu: super page support
intel-iommu: Fix superpage alignment calculation error
intel-iommu: Fix superpage level calculation error in dma_pfn_level_pte()
David Woodhouse (4):
intel-iommu: Precalculate superpage support for dmar_domain
intel-iommu: Fix hardware_largepage_caps()
intel-iommu: Fix inappropriate use of superpages in __domain_mapping()
intel-iommu: Fix phys_pfn in __domain_mapping for sglist pages
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-05-25 19:13:49 +01:00
sp_off [Default Off]
By default, super page will be supported if Intel IOMMU
has the capability. With this option, super page will
not be supported.
2019-01-24 10:31:32 +08:00
sm_on [Default Off]
By default, scalable mode will be disabled even if the
2018-12-10 09:58:55 +08:00
hardware advertises that it has support for the scalable
mode translation. With this option set, scalable mode
2019-01-24 10:31:32 +08:00
will be used on hardware which claims to support it.
2017-04-26 09:18:35 -07:00
tboot_noforce [Default Off]
Do not force the Intel IOMMU enabled under tboot.
By default, tboot will force Intel IOMMU on, which
could harm performance of some high-throughput
devices like 40GBit network cards, even if identity
mapping is enabled.
Note that using this option lowers the security
provided by tboot because it makes the system
vulnerable to DMA attacks.
2019-09-06 14:14:49 +08:00
nobounce [Default off]
2020-02-19 12:21:33 -07:00
Disable bounce buffer for untrusted devices such as
2019-09-06 14:14:49 +08:00
the Thunderbolt devices. This will treat the untrusted
devices as the trusted ones, hence might expose security
risks of DMA attacks.
2011-12-15 01:18:52 +09:00
intel_idle.max_cstate= [KNL,HW,ACPI,X86]
0 disables intel_idle and fall back on acpi_idle.
Update the maximum depth of C-state from 6 to 9
Hi Jon,
This patch is an old one, we have corrected some minor issues on the newer one.
Please only review the newest version from my last mail with this subject
"[PATCH] ACPI: Update the maximum depth of C-state from 6 to 9".
And I also attached it to this mail.
Thanks,
Baole
On 7/11/2016 6:37 AM, Jonathan Corbet wrote:
> On Mon, 4 Jul 2016 09:55:10 +0800
> "baolex.ni" <baolex.ni@intel.com> wrote:
>
>> Currently, CPUIDLE_STATE_MAX has been defined as 10 in the cpuidle head file,
>> and max_cstate = CPUIDLE_STATE_MAX – 1, so 9 is the right maximum depth of C-state.
>> This change is reflected in one place of the kernel-param file,
>> but not in the other place where I suggest changing.
>>
>> Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
>> Signed-off-by: Baole Ni <baolex.ni@intel.com>
>
> So why are there two signoffs on a single-line patch? Which one of you
> is the actual author?
>
> Thanks,
>
> jon
>
From cf5f8aa6885874f6490b11507d3c0c86fa0a11f4 Mon Sep 17 00:00:00 2001
From: Chuansheng Liu <chuansheng.liu@intel.com>
Date: Mon, 4 Jul 2016 08:52:51 +0800
Subject: [PATCH] Update the maximum depth of C-state from 6 to 9
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Currently, CPUIDLE_STATE_MAX has been defined as 10 in the cpuidle head file,
and max_cstate = CPUIDLE_STATE_MAX – 1, so 9 is the right maximum depth of C-state.
This change is reflected in one place of the kernel-param file,
but not in the other place where I suggest changing.
Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
Signed-off-by: Baole Ni <baolex.ni@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2016-07-11 09:57:37 +08:00
1 to 9 specify maximum depth of C-state.
2011-12-15 01:18:52 +09:00
2018-04-18 20:51:39 +02:00
intel_pstate= [X86]
disable
Do not enable intel_pstate as the default
scaling driver for the supported processors
passive
Use intel_pstate as a scaling driver, but configure it
to work with generic cpufreq governors (instead of
enabling its internal governor). This mode cannot be
used along with the hardware-managed P-states (HWP)
feature.
force
Enable intel_pstate on systems that prohibit it by default
in favor of acpi-cpufreq. Forcing the intel_pstate driver
instead of acpi-cpufreq may disable platform features, such
as thermal controls and power capping, that rely on ACPI
P-States information being indicated to OSPM and therefore
should be used with caution. This option does not work with
processors that aren't supported by the intel_pstate driver
or on platforms that use pcc-cpufreq instead of acpi-cpufreq.
no_hwp
Do not enable hardware P state control (HWP)
if available.
hwp_only
Only load intel_pstate on systems which support
hardware P state control (HWP) if available.
support_acpi_ppc
Enforce ACPI _PPC performance limits. If the Fixed ACPI
Description Table, specifies preferred power management
profile as "Enterprise Server" or "Performance Server",
then this feature is turned on by default.
per_cpu_perf_limits
Allow per-logical-CPU P-State performance control limits using
cpufreq sysfs interface
2013-02-15 22:55:10 +01:00
2010-07-20 11:06:49 -07:00
intremap= [X86-64, Intel-IOMMU]
on enable Interrupt Remapping (default)
off disable Interrupt Remapping
nosid disable Source ID checking
2011-08-23 17:05:18 -07:00
no_x2apic_optout
BIOS x2APIC opt-out request will be ignored
2015-09-18 22:29:56 +08:00
nopost disable Interrupt Posting
2010-07-20 11:06:49 -07:00
2009-04-05 15:55:22 -07:00
iomem= Disable strict checking of access to MMIO memory
strict regions from userspace.
relaxed
iommu= [x86]
off
force
noforce
biomerge
panic
nopanic
merge
nomerge
soft
2018-07-20 11:02:23 -07:00
pt [x86]
nopt [x86]
2014-10-23 19:19:35 -02:00
nobypass [PPC/POWERNV]
Disable IOMMU bypass, using IOMMU for PCI devices.
2011-10-21 15:56:24 -04:00
2018-09-20 17:10:23 +01:00
iommu.strict= [ARM64] Configure TLB invalidation behaviour
Format: { "0" | "1" }
0 - Lazy mode.
Request that DMA unmap operations use deferred
invalidation of hardware TLBs, for increased
throughput at the cost of reduced device isolation.
Will fall back to strict mode if not supported by
the relevant IOMMU driver.
1 - Strict mode (default).
DMA unmap operations invalidate IOMMU hardware TLBs
synchronously.
2017-01-05 18:38:26 +00:00
iommu.passthrough=
2019-08-19 15:22:56 +02:00
[ARM64, X86] Configure DMA to bypass the IOMMU by default.
2017-01-05 18:38:26 +00:00
Format: { "0" | "1" }
0 - Use IOMMU translation for DMA.
1 - Bypass the IOMMU for DMA.
2018-09-20 14:14:26 +01:00
unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.
2009-04-05 15:55:22 -07:00
io7= [HW] IO7 for Marvel based alpha systems
See comment before marvel_specify_io7 in
arch/alpha/kernel/core_marvel.c.
2009-04-14 14:03:43 +05:30
io_delay= [X86] I/O delay method
2008-01-30 13:30:05 +01:00
0x80
Standard port 0x80 based delay
0xed
Alternate port 0xed based delay (needed on some systems)
2008-01-30 13:30:05 +01:00
udelay
2008-01-30 13:30:05 +01:00
Simple two microseconds delay
none
No delay
2008-01-30 13:30:05 +01:00
2005-04-16 15:20:36 -07:00
ip= [IP_PNP]
2020-02-12 19:13:32 +01:00
See Documentation/admin-guide/nfs/nfsroot.rst.
2005-04-16 15:20:36 -07:00
2019-05-14 15:46:29 -07:00
ipcmni_extend [KNL] Extend the maximum number of unique System V
IPC identifiers from 32,768 to 16,777,216.
2016-02-03 19:52:23 +01:00
irqaffinity= [SMP] Set the default irq affinity mask
2016-10-11 13:51:35 -07:00
The argument is a cpu list, as described above.
2016-02-03 19:52:23 +01:00
2017-10-27 10:34:22 +02:00
irqchip.gicv2_force_probe=
[ARM, ARM64]
Format: <bool>
Force the kernel to look for the second 4kB page
of a GICv2 controller even if the memory range
exposed by the device tree is too small.
2018-02-25 11:27:04 +00:00
irqchip.gicv3_nolpi=
[ARM, ARM64]
Force the kernel to ignore the availability of
LPIs (and by consequence ITSs). Intended for system
that use the kernel as a bootloader, and thus want
to let secondary kernels in charge of setting up
LPIs.
2019-01-31 14:59:03 +00:00
irqchip.gicv3_pseudo_nmi= [ARM64]
Enables support for pseudo-NMIs in the kernel. This
requires the kernel to be built with
CONFIG_ARM64_PSEUDO_NMI.
2005-06-28 20:45:18 -07:00
irqfixup [HW]
When an interrupt is not handled search all handlers
for it. Intended to get systems with badly broken
firmware running.
irqpoll [HW]
When an interrupt is not handled search all handlers
for it. Also check all handlers each timer
interrupt. Intended to get systems with badly broken
firmware running.
2005-04-16 15:20:36 -07:00
isapnp= [ISAPNP]
2005-10-23 12:57:11 -07:00
Format: <RDP>,<reset>,<pci_scan>,<verbosity>
2005-04-16 15:20:36 -07:00
2017-12-14 19:18:27 +01:00
isolcpus= [KNL,SMP,ISOL] Isolate a given set of CPUs from disturbance.
2017-10-31 04:18:34 +01:00
[Deprecated - use cpusets instead]
Format: [flag-list,]<cpu-list>
Specify one or more CPUs to isolate from disturbances
specified in the flag list (default: domain):
nohz
Disable the tick when a single task runs.
2018-02-21 05:17:29 +01:00
A residual 1Hz tick is offloaded to workqueues, which you
need to affine to housekeeping through the global
workqueue's affinity configured via the
/sys/devices/virtual/workqueue/cpumask sysfs file, or
by using the 'domain' flag described below.
NOTE: by default the global workqueue runs on all CPUs,
so to protect individual CPUs the 'cpumask' file has to
be configured manually after bootup.
2017-10-31 04:18:34 +01:00
domain
Isolate from the general SMP balancing and scheduling
algorithms. Note that performing domain isolation this way
is irreversible: it's not possible to bring back a CPU to
the domains once isolated through isolcpus. It's strongly
advised to use cpusets instead to disable scheduler load
balancing through the "cpuset.sched_load_balance" file.
It offers a much more flexible interface where CPUs can
move in and out of an isolated set anytime.
You can move a process onto or off an "isolated" CPU via
the CPU affinity syscalls or cpuset.
<cpu number> begins at 0 and the maximum value is
"number of CPUs in system - 1".
2020-01-20 17:16:25 +08:00
managed_irq
Isolate from being targeted by managed interrupts
which have an interrupt mask containing isolated
CPUs. The affinity of managed interrupts is
handled by the kernel and cannot be changed via
the /proc/irq/* interfaces.
This isolation is best effort and only effective
if the automatically assigned interrupt mask of a
device queue contains isolated and housekeeping
CPUs. If housekeeping CPUs are online then such
interrupts are directed to the housekeeping CPU
so that IO submitted on the housekeeping CPU
cannot disturb the isolated CPU.
If a queue's affinity mask contains only isolated
CPUs then this parameter has no effect on the
interrupt routing decision, though interrupts are
only delivered when tasks running on those
isolated CPUs submit IO. IO submitted on
housekeeping CPUs has no influence on those
queues.
2005-04-16 15:20:36 -07:00
2020-01-20 17:16:25 +08:00
The format of <cpu-list> is described above.
2005-04-16 15:20:36 -07:00
2005-10-23 12:57:11 -07:00
iucv= [HW,NET]
2005-04-16 15:20:36 -07:00
2013-04-09 21:27:19 +02:00
ivrs_ioapic [HW,X86_64]
Provide an override to the IOAPIC-ID<->DEVICE-ID
mapping provided in the IVRS ACPI table. For
example, to map IOAPIC-ID decimal 10 to
PCI device 00:14.0 write the parameter as:
ivrs_ioapic[10]=00:14.0
ivrs_hpet [HW,X86_64]
Provide an override to the HPET-ID<->DEVICE-ID
mapping provided in the IVRS ACPI table. For
example, to map HPET-ID decimal 0 to
PCI device 00:14.0 write the parameter as:
ivrs_hpet[0]=00:14.0
2016-04-01 09:06:01 -04:00
ivrs_acpihid [HW,X86_64]
Provide an override to the ACPI-HID:UID<->DEVICE-ID
mapping provided in the IVRS ACPI table. For
example, to map UART-HID:UID AMD0020:0 to
PCI device 00:14.5 write the parameter as:
ivrs_acpihid[00:14.5]=AMD0020:0
2005-04-16 15:20:36 -07:00
js= [HW,JOY] Analog joystick
2017-10-10 12:36:23 -05:00
See Documentation/input/joydev/joystick.rst.
2005-04-16 15:20:36 -07:00
2016-06-13 15:10:02 -07:00
nokaslr [KNL]
When CONFIG_RANDOMIZE_BASE is set, this disables
kernel and module base offset ASLR (Address Space
Layout Randomization).
2014-06-13 13:30:36 -07:00
2017-03-31 15:12:04 -07:00
kasan_multi_shot
[KNL] Enforce KASAN (Kernel Address Sanitizer) to print
report on every invalid memory access. Without this
parameter KASAN will print report only for the first
invalid access.
2009-04-05 15:55:22 -07:00
keepinitrd [HW,ARM]
2016-03-15 14:55:22 -07:00
kernelcore= [KNL,X86,IA-64,PPC]
mm, page_alloc: extend kernelcore and movablecore for percent
Both kernelcore= and movablecore= can be used to define the amount of
ZONE_NORMAL and ZONE_MOVABLE on a system, respectively. This requires
the system memory capacity to be known when specifying the command line,
however.
This introduces the ability to define both kernelcore= and movablecore=
as a percentage of total system memory. This is convenient for systems
software that wants to define the amount of ZONE_MOVABLE, for example,
as a proportion of a system's memory rather than a hardcoded byte value.
To define the percentage, the final character of the parameter should be
a '%'.
mhocko: "why is anyone using these options nowadays?"
rientjes:
:
: Fragmentation of non-__GFP_MOVABLE pages due to low on memory
: situations can pollute most pageblocks on the system, as much as 1GB of
: slab being fragmented over 128GB of memory, for example. When the
: amount of kernel memory is well bounded for certain systems, it is
: better to aggressively reclaim from existing MIGRATE_UNMOVABLE
: pageblocks rather than eagerly fallback to others.
:
: We have additional patches that help with this fragmentation if you're
: interested, specifically kcompactd compaction of MIGRATE_UNMOVABLE
: pageblocks triggered by fallback of non-__GFP_MOVABLE allocations and
: draining of pcp lists back to the zone free area to prevent stranding.
[rientjes@google.com: updates]
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1802131700160.71590@chino.kir.corp.google.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1802121622470.179479@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 16:23:09 -07:00
Format: nn[KMGTPE] | nn% | "mirror"
This parameter specifies the amount of memory usable by
the kernel for non-movable allocations. The requested
amount is spread evenly throughout all nodes in the
system as ZONE_NORMAL. The remaining memory is used for
movable memory in its own zone, ZONE_MOVABLE. In the
event, a node is too small to have both ZONE_NORMAL and
ZONE_MOVABLE, kernelcore memory will take priority and
other nodes will have a larger ZONE_MOVABLE.
ZONE_MOVABLE is used for the allocation of pages that
may be reclaimed or moved by the page migration
subsystem. Note that allocations like PTEs-from-HighMem
still use the HighMem zone if it exists, and the Normal
2007-07-17 04:03:14 -07:00
zone if it does not.
mm, page_alloc: extend kernelcore and movablecore for percent
Both kernelcore= and movablecore= can be used to define the amount of
ZONE_NORMAL and ZONE_MOVABLE on a system, respectively. This requires
the system memory capacity to be known when specifying the command line,
however.
This introduces the ability to define both kernelcore= and movablecore=
as a percentage of total system memory. This is convenient for systems
software that wants to define the amount of ZONE_MOVABLE, for example,
as a proportion of a system's memory rather than a hardcoded byte value.
To define the percentage, the final character of the parameter should be
a '%'.
mhocko: "why is anyone using these options nowadays?"
rientjes:
:
: Fragmentation of non-__GFP_MOVABLE pages due to low on memory
: situations can pollute most pageblocks on the system, as much as 1GB of
: slab being fragmented over 128GB of memory, for example. When the
: amount of kernel memory is well bounded for certain systems, it is
: better to aggressively reclaim from existing MIGRATE_UNMOVABLE
: pageblocks rather than eagerly fallback to others.
:
: We have additional patches that help with this fragmentation if you're
: interested, specifically kcompactd compaction of MIGRATE_UNMOVABLE
: pageblocks triggered by fallback of non-__GFP_MOVABLE allocations and
: draining of pcp lists back to the zone free area to prevent stranding.
[rientjes@google.com: updates]
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1802131700160.71590@chino.kir.corp.google.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1802121622470.179479@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 16:23:09 -07:00
It is possible to specify the exact amount of memory in
the form of "nn[KMGTPE]", a percentage of total system
memory in the form of "nn%", or "mirror". If "mirror"
2016-03-15 14:55:22 -07:00
option is specified, mirrored (reliable) memory is used
for non-movable allocations and remaining memory is used
mm, page_alloc: extend kernelcore and movablecore for percent
Both kernelcore= and movablecore= can be used to define the amount of
ZONE_NORMAL and ZONE_MOVABLE on a system, respectively. This requires
the system memory capacity to be known when specifying the command line,
however.
This introduces the ability to define both kernelcore= and movablecore=
as a percentage of total system memory. This is convenient for systems
software that wants to define the amount of ZONE_MOVABLE, for example,
as a proportion of a system's memory rather than a hardcoded byte value.
To define the percentage, the final character of the parameter should be
a '%'.
mhocko: "why is anyone using these options nowadays?"
rientjes:
:
: Fragmentation of non-__GFP_MOVABLE pages due to low on memory
: situations can pollute most pageblocks on the system, as much as 1GB of
: slab being fragmented over 128GB of memory, for example. When the
: amount of kernel memory is well bounded for certain systems, it is
: better to aggressively reclaim from existing MIGRATE_UNMOVABLE
: pageblocks rather than eagerly fallback to others.
:
: We have additional patches that help with this fragmentation if you're
: interested, specifically kcompactd compaction of MIGRATE_UNMOVABLE
: pageblocks triggered by fallback of non-__GFP_MOVABLE allocations and
: draining of pcp lists back to the zone free area to prevent stranding.
[rientjes@google.com: updates]
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1802131700160.71590@chino.kir.corp.google.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1802121622470.179479@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 16:23:09 -07:00
for Movable pages. "nn[KMGTPE]", "nn%", and "mirror"
are exclusive, so you cannot specify multiple forms.
2007-07-17 04:03:14 -07:00
2010-05-20 21:04:31 -05:00
kgdbdbgp= [KGDB,HW] kgdb over EHCI usb debug port.
Format: <Controller#>[,poll interval]
The controller # is the number of the ehci usb debug
port as it is probed via PCI. The poll interval is
optional and is the number seconds in between
each poll cycle to the debug port in case you need
the functionality for interrupting the kernel with
gdb or control-c on the dbgp connection. When
not using this parameter you use sysrq-g to break into
the kernel debugger.
2010-05-20 21:04:24 -05:00
kgdboc= [KGDB,HW] kgdb over consoles.
2010-05-20 21:04:24 -05:00
Requires a tty driver that supports console polling,
or a supported polling keyboard driver (non-usb).
2010-08-05 09:22:33 -05:00
Serial only format: <serial_device>[,baud]
keyboard only format: kbd
keyboard and serial format: kbd,<serial_device>[,baud]
Optional Kernel mode setting:
kms, kbd format: kms,kbd
kms, kbd and serial format: kms,kbd,<ser_dev>[,baud]
2008-04-17 20:05:38 +02:00
2020-05-07 13:08:47 -07:00
kgdboc_earlycon= [KGDB,HW]
If the boot console provides the ability to read
characters and can work in polling mode, you can use
this parameter to tell kgdb to use it as a backend
until the normal console is registered. Intended to
be used together with the kgdboc parameter which
specifies the normal console to transition to.
The name of the early console should be specified
as the value of this parameter. Note that the name of
the early console might be different than the tty
name passed to kgdboc. It's OK to leave the value
blank and the first boot console that implements
read() will be picked.
2010-05-20 21:04:24 -05:00
kgdbwait [KGDB] Stop kernel execution and enter the
kernel debugger at the earliest opportunity.
2008-08-23 18:54:37 +02:00
kmac= [MIPS] korina ethernet MAC address.
Configure the RouterBoard 532 series on-chip
Ethernet adapter MAC address.
2009-06-11 13:22:39 +01:00
kmemleak= [KNL] Boot-time kmemleak enable/disable
Valid arguments: on, off
Default: on
2014-10-24 21:24:59 +09:00
Built with CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y,
the default is off.
2009-06-11 13:22:39 +01:00
2019-05-22 17:32:35 +09:00
kprobe_event=[probe-list]
[FTRACE] Add kprobe events and enable at boot time.
The probe-list is a semicolon delimited list of probe
definitions. Each definition is same as kprobe_events
interface, but the parameters are comma delimited.
For example, to add a kprobe event on vfs_read with
arg1 and arg2, add to the command line;
kprobe_event=p,vfs_read,$arg1,$arg2
See also Documentation/trace/kprobetrace.rst "Kernel
Boot Parameter" section.
2019-01-25 12:07:00 -06:00
kpti= [ARM64] Control page table isolation of user
and kernel address spaces.
Default: enabled on cores which need mitigation.
0: force disabled
1: force enabled
2009-07-10 14:20:35 +02:00
kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs.
Default is 0 (don't ignore, but inject #GP)
2018-03-12 13:12:47 +02:00
kvm.enable_vmware_backdoor=[KVM] Support VMware backdoor PV interface.
Default is false (don't support).
2010-09-20 22:17:48 +08:00
kvm.mmu_audit= [KVM] This is a R/W parameter which allows audit
KVM MMU at runtime.
2009-07-10 14:20:35 +02:00
Default is 0 (off)
2019-11-04 12:22:02 +01:00
kvm.nx_huge_pages=
[KVM] Controls the software workaround for the
X86_BUG_ITLB_MULTIHIT bug.
force : Always deploy workaround.
off : Never deploy workaround.
auto : Deploy workaround based on the presence of
X86_BUG_ITLB_MULTIHIT.
Default is 'auto'.
If the software workaround is enabled for the host,
guests do need not to enable it for nested guests.
2019-11-04 20:26:00 +01:00
kvm.nx_huge_pages_recovery_ratio=
[KVM] Controls how many 4KiB pages are periodically zapped
back to huge pages. 0 disables the recovery, otherwise if
the value is N KVM will zap 1/Nth of the 4KiB pages every
minute. The default is 60.
2009-07-10 14:20:35 +02:00
kvm-amd.nested= [KVM,AMD] Allow nested virtualization in KVM/SVM.
2010-09-20 22:16:45 +08:00
Default is 1 (enabled)
2009-07-10 14:20:35 +02:00
kvm-amd.npt= [KVM,AMD] Disable nested paging (virtualized MMU)
for all guests.
2011-08-13 12:34:52 -07:00
Default is 1 (enabled) if in 64-bit or 32-bit PAE mode.
2009-07-10 14:20:35 +02:00
2017-06-09 12:49:46 +01:00
kvm-arm.vgic_v3_group0_trap=
[KVM,ARM] Trap guest accesses to GICv3 group-0
system registers
2017-06-09 12:49:41 +01:00
kvm-arm.vgic_v3_group1_trap=
[KVM,ARM] Trap guest accesses to GICv3 group-1
system registers
2017-06-09 12:49:53 +01:00
kvm-arm.vgic_v3_common_trap=
[KVM,ARM] Trap guest accesses to GICv3 common
system registers
2017-10-27 15:28:54 +01:00
kvm-arm.vgic_v4_enable=
[KVM,ARM] Allow use of GICv4 for direct injection of
LPIs.
2009-07-10 14:20:35 +02:00
kvm-intel.ept= [KVM,Intel] Disable extended page tables
(virtualized MMU) support on capable Intel chips.
Default is 1 (enabled)
kvm-intel.emulate_invalid_guest_state=
[KVM,Intel] Enable emulation of invalid guest states
Default is 0 (disabled)
kvm-intel.flexpriority=
[KVM,Intel] Disable FlexPriority feature (TPR shadow).
Default is 1 (enabled)
2011-08-09 14:28:35 +03:00
kvm-intel.nested=
[KVM,Intel] Enable VMX nesting (nVMX).
Default is 0 (disabled)
2009-07-10 14:20:35 +02:00
kvm-intel.unrestricted_guest=
[KVM,Intel] Disable unrestricted guest feature
(virtualized real and unpaged mode) on capable
Intel chips. Default is 1 (enabled)
2018-07-02 12:29:30 +02:00
kvm-intel.vmentry_l1d_flush=[KVM,Intel] Mitigation for L1 Terminal Fault
CVE-2018-3620.
Valid arguments: never, cond, always
always: L1D cache flush on every VMENTER.
cond: Flush L1D on VMENTER only when the code between
VMEXIT and VMENTER can leak host memory.
never: Disables the mitigation
Default is cond (do L1 cache flush in specific instances)
2009-07-10 14:20:35 +02:00
kvm-intel.vpid= [KVM,Intel] Disable Virtual Processor Identification
feature (tagged TLBs) on capable Intel chips.
Default is 1 (enabled)
x86/bugs, kvm: Introduce boot-time control of L1TF mitigations
Introduce the 'l1tf=' kernel command line option to allow for boot-time
switching of mitigation that is used on processors affected by L1TF.
The possible values are:
full
Provides all available mitigations for the L1TF vulnerability. Disables
SMT and enables all mitigations in the hypervisors. SMT control via
/sys/devices/system/cpu/smt/control is still possible after boot.
Hypervisors will issue a warning when the first VM is started in
a potentially insecure configuration, i.e. SMT enabled or L1D flush
disabled.
full,force
Same as 'full', but disables SMT control. Implies the 'nosmt=force'
command line option. sysfs control of SMT and the hypervisor flush
control is disabled.
flush
Leaves SMT enabled and enables the conditional hypervisor mitigation.
Hypervisors will issue a warning when the first VM is started in a
potentially insecure configuration, i.e. SMT enabled or L1D flush
disabled.
flush,nosmt
Disables SMT and enables the conditional hypervisor mitigation. SMT
control via /sys/devices/system/cpu/smt/control is still possible
after boot. If SMT is reenabled or flushing disabled at runtime
hypervisors will issue a warning.
flush,nowarn
Same as 'flush', but hypervisors will not warn when
a VM is started in a potentially insecure configuration.
off
Disables hypervisor mitigations and doesn't emit any warnings.
Default is 'flush'.
Let KVM adhere to these semantics, which means:
- 'lt1f=full,force' : Performe L1D flushes. No runtime control
possible.
- 'l1tf=full'
- 'l1tf-flush'
- 'l1tf=flush,nosmt' : Perform L1D flushes and warn on VM start if
SMT has been runtime enabled or L1D flushing
has been run-time enabled
- 'l1tf=flush,nowarn' : Perform L1D flushes and no warnings are emitted.
- 'l1tf=off' : L1D flushes are not performed and no warnings
are emitted.
KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush'
module parameter except when lt1f=full,force is set.
This makes KVM's private 'nosmt' option redundant, and as it is a bit
non-systematic anyway (this is something to control globally, not on
hypervisor level), remove that option.
Add the missing Documentation entry for the l1tf vulnerability sysfs file
while at it.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de
2018-07-13 16:23:25 +02:00
l1tf= [X86] Control mitigation of the L1TF vulnerability on
affected CPUs
The kernel PTE inversion protection is unconditionally
enabled and cannot be disabled.
full
Provides all available mitigations for the
L1TF vulnerability. Disables SMT and
enables all mitigations in the
hypervisors, i.e. unconditional L1D flush.
SMT control and L1D flush control via the
sysfs interface is still possible after
boot. Hypervisors will issue a warning
when the first VM is started in a
potentially insecure configuration,
i.e. SMT enabled or L1D flush disabled.
full,force
Same as 'full', but disables SMT and L1D
flush runtime control. Implies the
'nosmt=force' command line option.
(i.e. sysfs control of SMT is disabled.)
flush
Leaves SMT enabled and enables the default
hypervisor mitigation, i.e. conditional
L1D flush.
SMT control and L1D flush control via the
sysfs interface is still possible after
boot. Hypervisors will issue a warning
when the first VM is started in a
potentially insecure configuration,
i.e. SMT enabled or L1D flush disabled.
flush,nosmt
Disables SMT and enables the default
hypervisor mitigation.
SMT control and L1D flush control via the
sysfs interface is still possible after
boot. Hypervisors will issue a warning
when the first VM is started in a
potentially insecure configuration,
i.e. SMT enabled or L1D flush disabled.
flush,nowarn
Same as 'flush', but hypervisors will not
warn when a VM is started in a potentially
insecure configuration.
off
Disables hypervisor mitigations and doesn't
emit any warnings.
2018-11-13 19:49:10 +01:00
It also drops the swap size and available
RAM limit restriction on both hypervisor and
bare metal.
x86/bugs, kvm: Introduce boot-time control of L1TF mitigations
Introduce the 'l1tf=' kernel command line option to allow for boot-time
switching of mitigation that is used on processors affected by L1TF.
The possible values are:
full
Provides all available mitigations for the L1TF vulnerability. Disables
SMT and enables all mitigations in the hypervisors. SMT control via
/sys/devices/system/cpu/smt/control is still possible after boot.
Hypervisors will issue a warning when the first VM is started in
a potentially insecure configuration, i.e. SMT enabled or L1D flush
disabled.
full,force
Same as 'full', but disables SMT control. Implies the 'nosmt=force'
command line option. sysfs control of SMT and the hypervisor flush
control is disabled.
flush
Leaves SMT enabled and enables the conditional hypervisor mitigation.
Hypervisors will issue a warning when the first VM is started in a
potentially insecure configuration, i.e. SMT enabled or L1D flush
disabled.
flush,nosmt
Disables SMT and enables the conditional hypervisor mitigation. SMT
control via /sys/devices/system/cpu/smt/control is still possible
after boot. If SMT is reenabled or flushing disabled at runtime
hypervisors will issue a warning.
flush,nowarn
Same as 'flush', but hypervisors will not warn when
a VM is started in a potentially insecure configuration.
off
Disables hypervisor mitigations and doesn't emit any warnings.
Default is 'flush'.
Let KVM adhere to these semantics, which means:
- 'lt1f=full,force' : Performe L1D flushes. No runtime control
possible.
- 'l1tf=full'
- 'l1tf-flush'
- 'l1tf=flush,nosmt' : Perform L1D flushes and warn on VM start if
SMT has been runtime enabled or L1D flushing
has been run-time enabled
- 'l1tf=flush,nowarn' : Perform L1D flushes and no warnings are emitted.
- 'l1tf=off' : L1D flushes are not performed and no warnings
are emitted.
KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush'
module parameter except when lt1f=full,force is set.
This makes KVM's private 'nosmt' option redundant, and as it is a bit
non-systematic anyway (this is something to control globally, not on
hypervisor level), remove that option.
Add the missing Documentation entry for the l1tf vulnerability sysfs file
while at it.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de
2018-07-13 16:23:25 +02:00
Default is 'flush'.
2019-02-19 11:10:49 +01:00
For details see: Documentation/admin-guide/hw-vuln/l1tf.rst
x86/bugs, kvm: Introduce boot-time control of L1TF mitigations
Introduce the 'l1tf=' kernel command line option to allow for boot-time
switching of mitigation that is used on processors affected by L1TF.
The possible values are:
full
Provides all available mitigations for the L1TF vulnerability. Disables
SMT and enables all mitigations in the hypervisors. SMT control via
/sys/devices/system/cpu/smt/control is still possible after boot.
Hypervisors will issue a warning when the first VM is started in
a potentially insecure configuration, i.e. SMT enabled or L1D flush
disabled.
full,force
Same as 'full', but disables SMT control. Implies the 'nosmt=force'
command line option. sysfs control of SMT and the hypervisor flush
control is disabled.
flush
Leaves SMT enabled and enables the conditional hypervisor mitigation.
Hypervisors will issue a warning when the first VM is started in a
potentially insecure configuration, i.e. SMT enabled or L1D flush
disabled.
flush,nosmt
Disables SMT and enables the conditional hypervisor mitigation. SMT
control via /sys/devices/system/cpu/smt/control is still possible
after boot. If SMT is reenabled or flushing disabled at runtime
hypervisors will issue a warning.
flush,nowarn
Same as 'flush', but hypervisors will not warn when
a VM is started in a potentially insecure configuration.
off
Disables hypervisor mitigations and doesn't emit any warnings.
Default is 'flush'.
Let KVM adhere to these semantics, which means:
- 'lt1f=full,force' : Performe L1D flushes. No runtime control
possible.
- 'l1tf=full'
- 'l1tf-flush'
- 'l1tf=flush,nosmt' : Perform L1D flushes and warn on VM start if
SMT has been runtime enabled or L1D flushing
has been run-time enabled
- 'l1tf=flush,nowarn' : Perform L1D flushes and no warnings are emitted.
- 'l1tf=off' : L1D flushes are not performed and no warnings
are emitted.
KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush'
module parameter except when lt1f=full,force is set.
This makes KVM's private 'nosmt' option redundant, and as it is a bit
non-systematic anyway (this is something to control globally, not on
hypervisor level), remove that option.
Add the missing Documentation entry for the l1tf vulnerability sysfs file
while at it.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de
2018-07-13 16:23:25 +02:00
2005-04-16 15:20:36 -07:00
l2cr= [PPC]
2008-03-29 07:20:23 +11:00
l3cr= [PPC]
2007-07-31 00:37:59 -07:00
lapic [X86-32,APIC] Enable the local APIC even if BIOS
2005-10-23 12:57:11 -07:00
disabled it.
2005-04-16 15:20:36 -07:00
2012-10-22 14:37:58 -07:00
lapic= [x86,APIC] "notscdeadline" Do not use TSC deadline
value for LAPIC timer one-shot implementation. Default
back to the programmable timer unit in the LAPIC.
2009-04-14 14:03:43 +05:30
lapic_timer_c2_ok [X86,APIC] trust the local apic timer
2008-12-19 10:57:32 -08:00
in C2 power state.
2007-03-23 16:08:01 +01:00
2008-01-06 19:08:56 +01:00
libata.dma= [LIBATA] DMA control
libata.dma=0 Disable all PATA and SATA DMA
libata.dma=1 PATA and SATA Disk DMA only
libata.dma=2 ATAPI (CDROM) DMA only
2011-08-13 12:34:52 -07:00
libata.dma=4 Compact Flash DMA only
2008-01-06 19:08:56 +01:00
Combinations also work, so libata.dma=3 enables DMA
for disks and CDROMs, but not CFs.
2011-08-13 12:34:52 -07:00
2009-08-06 00:14:10 +02:00
libata.ignore_hpa= [LIBATA] Ignore HPA limit
libata.ignore_hpa=0 keep BIOS limits (default)
libata.ignore_hpa=1 ignore limits, using full disk
2008-01-06 19:08:56 +01:00
2007-09-27 11:50:13 -04:00
libata.noacpi [LIBATA] Disables use of ACPI in libata suspend/resume
when set.
Format: <int>
2008-02-13 09:15:09 +09:00
libata.force= [LIBATA] Force configurations. The format is comma
separated list of "[ID:]VAL" where ID is
2010-04-21 12:17:12 +02:00
PORT[.DEVICE]. PORT and DEVICE are decimal numbers
2008-02-13 09:15:09 +09:00
matching port, link or device. Basically, it matches
the ATA ID string printed on console by libata. If
the whole ID part is omitted, the last PORT and DEVICE
values are used. If ID hasn't been specified yet, the
configuration applies to all ports, links and devices.
If only DEVICE is omitted, the parameter applies to
the port and all links and devices behind it. DEVICE
number of 0 either selects the first device or the
first fan-out link behind PMP device. It does not
select the host link. DEVICE number of 15 selects the
host link and device attached to it.
The VAL specifies the configuration to force. As long
as there's no ambiguity shortcut notation is allowed.
For example, both 1.5 and 1.5G would work for 1.5Gbps.
The following configurations can be forced.
* Cable type: 40c, 80c, short40c, unk, ign or sata.
Any ID with matching PORT is used.
* SATA link speed limit: 1.5Gbps or 3.0Gbps.
* Transfer mode: pio[0-7], mwdma[0-4] and udma[0-7].
udma[/][16,25,33,44,66,100,133] notation is also
allowed.
* [no]ncq: Turn on or off NCQ.
2015-05-04 21:54:18 -04:00
* [no]ncqtrim: Turn off queued DSM TRIM.
2008-08-13 20:19:09 +09:00
* nohrst, nosrst, norst: suppress hard, soft
2018-04-18 20:51:39 +02:00
and both resets.
2008-08-13 20:19:09 +09:00
2012-06-21 23:41:41 -07:00
* rstonce: only attempt one reset during
hot-unplug link recovery
2010-05-23 12:59:11 +02:00
* dump_id: dump IDENTIFY data.
2013-05-21 22:30:58 +02:00
* atapi_dmadir: Enable ATAPI DMADIR bridge support
2013-12-16 09:31:19 -08:00
* disable: Disable this device.
2008-02-13 09:15:09 +09:00
If there are multiple matching configurations changing
the same attribute, the last one is used.
2010-07-12 14:36:09 +10:00
memblock=debug [KNL] Enable memblock debug messages.
2009-01-06 14:42:44 -08:00
2005-04-16 15:20:36 -07:00
load_ramdisk= [RAM] List of ramdisks to load from floppy
2019-06-18 11:47:10 -03:00
See Documentation/admin-guide/blockdev/ramdisk.rst.
2005-04-16 15:20:36 -07:00
2006-01-14 13:21:19 -08:00
lockd.nlm_grace_period=P [NFS] Assign grace period.
Format: <integer>
2005-04-16 15:20:36 -07:00
2006-01-14 13:21:19 -08:00
lockd.nlm_tcpport=N [NFS] Assign TCP port.
Format: <integer>
lockd.nlm_timeout=T [NFS] Assign timeout value.
Format: <integer>
lockd.nlm_udpport=M [NFS] Assign UDP port.
Format: <integer>
2005-04-16 15:20:36 -07:00
2019-08-19 17:17:39 -07:00
lockdown= [SECURITY]
{ integrity | confidentiality }
Enable the kernel lockdown feature. If set to
integrity, kernel features that allow userland to
modify the running kernel are disabled. If set to
confidentiality, kernel features that allow userland
to extract confidential information from the kernel
are also disabled.
2014-09-12 10:50:01 -07:00
locktorture.nreaders_stress= [KNL]
Set the number of locking read-acquisition kthreads.
Defaults to being automatically set based on the
number of online CPUs.
locktorture.nwriters_stress= [KNL]
Set the number of locking write-acquisition kthreads.
locktorture.onoff_holdoff= [KNL]
Set time (s) after boot for CPU-hotplug testing.
locktorture.onoff_interval= [KNL]
Set time (s) between CPU-hotplug operations, or
zero to disable CPU-hotplug testing.
locktorture.shuffle_interval= [KNL]
Set task-shuffle interval (jiffies). Shuffling
tasks allows some CPUs to go into dyntick-idle
mode during the locktorture test.
locktorture.shutdown_secs= [KNL]
Set time (s) after boot system shutdown. This
is useful for hands-off automated testing.
locktorture.stat_interval= [KNL]
Time (s) between statistics printk()s.
locktorture.stutter= [KNL]
Time (s) to stutter testing, for example,
specifying five seconds causes the test to run for
five seconds, wait for five seconds, and so on.
This tests the locking primitive's ability to
transition abruptly to and from idle.
locktorture.torture_type= [KNL]
Specify the locking implementation to test.
locktorture.verbose= [KNL]
Enable additional printk() statements.
2005-04-16 15:20:36 -07:00
logibm.irq= [HW,MOUSE] Logitech Bus Mouse Driver
Format: <irq>
loglevel= All Kernel Messages with a loglevel smaller than the
console loglevel will be printed to the console. It can
also be changed with klogd or other programs. The
loglevels are defined as follows:
0 (KERN_EMERG) system is unusable
1 (KERN_ALERT) action must be taken immediately
2 (KERN_CRIT) critical conditions
3 (KERN_ERR) error conditions
4 (KERN_WARNING) warning conditions
5 (KERN_NOTICE) normal but significant condition
6 (KERN_INFO) informational
7 (KERN_DEBUG) debug-level messages
2011-02-20 20:08:35 -08:00
log_buf_len=n[KMG] Sets the size of the printk ring buffer,
2014-08-06 16:08:56 -07:00
in bytes. n must be a power of two and greater
than the minimal size. The minimal size is defined
by LOG_BUF_SHIFT kernel config parameter. There is
also CONFIG_LOG_CPU_MAX_BUF_SHIFT config parameter
that allows to increase the default size depending on
the number of CPUs. See init/Kconfig for more details.
2005-04-16 15:20:36 -07:00
2007-10-16 01:29:37 -07:00
logo.nologo [FB] Disables display of the built-in Linux logo.
This may be used to provide more screen space for
kernel log messages and is useful when debugging
kernel boot problems.
2005-04-16 15:20:36 -07:00
lp=0 [LP] Specify parallel ports to use, e.g,
lp=port[,port...] lp=none,parport0 (lp0 not configured, lp1 uses
lp=reset first parallel port). 'lp=0' disables the
lp=auto printer driver. 'lp=reset' (which can be
specified in addition to the ports) causes
attached printers to be reset. Using
lp=port1,port2,... specifies the parallel ports
to associate lp devices with, starting with
lp0. A port specification may be 'none' to skip
that lp device, or a parport name such as
'parport0'. Specifying 'lp=auto' instead of a
port specification list means that device IDs
from each port should be examined, to see if
an IEEE 1284-compliant printer is attached; if
so, the driver will manage that printer.
See also header of drivers/char/lp.c.
lpj=n [KNL]
Sets loops_per_jiffy to given constant, thus avoiding
time-consuming boot-time autodetection (up to 250 ms per
CPU). 0 enables autodetection (default). To determine
the correct value for your kernel, boot with normal
autodetection and see what value is printed. Note that
on SMP systems the preset will be applied to all CPUs,
which is likely to cause problems if your CPUs need
significantly divergent settings. An incorrect value
will cause delays in the kernel to be wrong, leading to
unpredictable I/O errors and other breakage. Although
unlikely, in the extreme case this might damage your
hardware.
ltpc= [NET]
Format: <io>,<irq>,<dma>
2018-10-10 17:18:25 -07:00
lsm.debug [SECURITY] Enable LSM initialization debugging output.
2018-09-19 17:30:09 -07:00
lsm=lsm1,...,lsmN
[SECURITY] Choose order of LSM initialization. This
2019-02-12 10:23:18 -08:00
overrides CONFIG_LSM, and the "security=" parameter.
2018-09-19 17:30:09 -07:00
2011-08-13 12:34:52 -07:00
machvec= [IA-64] Force the use of a particular machine-vector
2005-10-23 12:57:11 -07:00
(machvec) in a generic kernel.
2019-08-13 09:25:06 +02:00
Example: machvec=hpzx1
2005-04-16 15:20:36 -07:00
2009-07-02 23:27:12 +08:00
machtype= [Loongson] Share the same kernel image file between different
yeeloong laptop.
Example: machtype=lemote-yeeloong-2f-7inch
2009-04-05 15:55:22 -07:00
max_addr=nn[KMG] [KNL,BOOT,ia64] All physical memory greater
than or equal to this physical address is ignored.
2005-04-16 15:20:36 -07:00
maxcpus= [SMP] Maximum number of processors that an SMP kernel
2016-08-24 13:06:45 +08:00
will bring up during bootup. maxcpus=n : n >= 0 limits
the kernel to bring up 'n' processors. Surely after
bootup you can bring up the other plugged cpu by executing
"echo 1 > /sys/devices/system/cpu/cpuX/online". So maxcpus
only takes effect during system bootup.
While n=0 is a special case, it is equivalent to "nosmp",
which also disables the IO APIC.
2005-04-16 15:20:36 -07:00
2011-07-31 22:08:04 +02:00
max_loop= [LOOP] The number of loop block devices that get
(loop.max_loop) unconditionally pre-created at init time. The default
number is configured by BLK_DEV_LOOP_MIN_COUNT. Instead
of statically allocating a predefined number, loop
devices can be requested on-demand with the
/dev/loop-control interface.
2005-06-29 18:00:00 -07:00
2007-07-31 00:37:59 -07:00
mce [X86-32] Machine Check Exception
2005-04-16 15:20:36 -07:00
2019-06-07 15:54:32 -03:00
mce=option [X86-64] See Documentation/x86/x86_64/boot-options.rst
2007-10-17 18:04:38 +02:00
2005-04-16 15:20:36 -07:00
md= [HW] RAID subsystems devices and level
2016-11-03 12:10:10 +02:00
See Documentation/admin-guide/md.rst.
2005-10-23 12:57:11 -07:00
2005-04-16 15:20:36 -07:00
mdacon= [MDA]
Format: <first>,<last>
Specifies range of consoles to be captured by the MDA.
2005-10-23 12:57:11 -07:00
2019-02-18 22:04:08 +01:00
mds= [X86,INTEL]
Control mitigation for the Micro-architectural Data
Sampling (MDS) vulnerability.
Certain CPUs are vulnerable to an exploit against CPU
internal buffers which can forward information to a
disclosure gadget under certain conditions.
In vulnerable processors, the speculatively
forwarded data can be used in a cache side channel
attack, to access data to which the attacker does
not have direct access.
This parameter controls the MDS mitigation. The
options are:
2019-04-02 09:59:33 -05:00
full - Enable MDS mitigation on vulnerable CPUs
full,nosmt - Enable MDS mitigation and disable
SMT on vulnerable CPUs
off - Unconditionally disable MDS mitigation
2019-02-18 22:04:08 +01:00
2019-11-15 11:14:44 -05:00
On TAA-affected machines, mds=off can be prevented by
an active TAA mitigation as both vulnerabilities are
mitigated with the same mechanism so in order to disable
this mitigation, you need to specify tsx_async_abort=off
too.
2019-02-18 22:04:08 +01:00
Not specifying this option is equivalent to
mds=full.
2019-02-19 00:02:31 +01:00
For details see: Documentation/admin-guide/hw-vuln/mds.rst
2005-04-16 15:20:36 -07:00
mem=nn[KMG] [KNL,BOOT] Force usage of a specific amount of memory
2020-04-06 20:06:50 -07:00
Amount of memory to be used in cases as follows:
1 for test;
2 when the kernel is not able to see the whole system memory;
3 memory that lies after 'mem=' boundary is excluded from
the hypervisor, then assigned to KVM guests.
2012-12-17 15:59:29 -08:00
[X86] Work as limiting max address. Use together
with memmap= to avoid physical address space collisions.
Without memmap= PCI devices could be placed at addresses
belonging to unused RAM.
2005-04-16 15:20:36 -07:00
2020-04-06 20:06:50 -07:00
Note that this only takes effects during boot time since
in above case 3, memory may need be hot added after boot
if system memory of hypervisor is not sufficient.
2007-07-31 00:37:59 -07:00
mem=nopentium [BUGS=X86-32] Disable usage of 4MB pages for kernel
2005-04-16 15:20:36 -07:00
memory.
2008-09-21 17:14:42 +09:00
memchunk=nn[KMG]
[KNL,SH] Allow user to override the default size for
per-device physically contiguous DMA buffers.
2018-04-18 20:51:39 +02:00
memhp_default_state=online/offline
2016-05-19 17:13:06 -07:00
[KNL] Set the initial state for the memory hotplug
onlining policy. If not specified, the default value is
set according to the
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel config
option.
2019-06-07 15:54:32 -03:00
See Documentation/admin-guide/mm/memory-hotplug.rst.
2016-05-19 17:13:06 -07:00
2009-04-14 14:03:43 +05:30
memmap=exactmap [KNL,X86] Enable setting of an exact
2005-04-16 15:20:36 -07:00
E820 memory map, as specified by the user.
Such memmap=exactmap lines can be constructed based on
BIOS output or other requirements. See the memmap=nn@ss
option description.
memmap=nn[KMG]@ss[KMG]
2014-02-06 12:04:19 -08:00
[KNL] Force usage of a specific region of memory.
Region of memory to be used is from ss to ss+nn.
Documentation/kernel-parameters.txt: Update 'memmap=' boot option description
In commit:
9710f581bb4c ("x86, mm: Let "memmap=" take more entries one time")
... 'memmap=' was changed to adopt multiple, comma delimited values in a
single entry, so update the related description.
In the special case of only specifying size value without an offset,
like memmap=nn[KMG], memmap behaves similarly to mem=nn[KMG], so update
it too here.
Furthermore, for memmap=nn[KMG]$ss[KMG], an escape character needs be added
before '$' for some bootloaders. E.g in grub2, if we specify memmap=100M$5G
as suggested by the documentation, "memmap=100MG" gets passed to the kernel.
Clarify all this.
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dan.j.williams@intel.com
Cc: douly.fnst@cn.fujitsu.com
Cc: dyoung@redhat.com
Cc: m.mizuma@jp.fujitsu.com
Link: http://lkml.kernel.org/r/1494654390-23861-4-git-send-email-bhe@redhat.com
[ Various spelling fixes. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-13 13:46:30 +08:00
If @ss[KMG] is omitted, it is equivalent to mem=nn[KMG],
which limits max address to nn[KMG].
Multiple different regions can be specified,
comma delimited.
Example:
memmap=100M@2G,100M#3G,1G!1024G
2005-04-16 15:20:36 -07:00
memmap=nn[KMG]#ss[KMG]
[KNL,ACPI] Mark specific memory as ACPI data.
2014-02-06 12:04:19 -08:00
Region of memory to be marked is from ss to ss+nn.
2005-04-16 15:20:36 -07:00
memmap=nn[KMG]$ss[KMG]
[KNL,ACPI] Mark specific memory as reserved.
2014-02-06 12:04:19 -08:00
Region of memory to be reserved is from ss to ss+nn.
2008-03-24 12:29:43 -07:00
Example: Exclude memory from 0x18690000-0x1869ffff
memmap=64K$0x18690000
or
memmap=0x10000$0x18690000
Documentation/kernel-parameters.txt: Update 'memmap=' boot option description
In commit:
9710f581bb4c ("x86, mm: Let "memmap=" take more entries one time")
... 'memmap=' was changed to adopt multiple, comma delimited values in a
single entry, so update the related description.
In the special case of only specifying size value without an offset,
like memmap=nn[KMG], memmap behaves similarly to mem=nn[KMG], so update
it too here.
Furthermore, for memmap=nn[KMG]$ss[KMG], an escape character needs be added
before '$' for some bootloaders. E.g in grub2, if we specify memmap=100M$5G
as suggested by the documentation, "memmap=100MG" gets passed to the kernel.
Clarify all this.
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dan.j.williams@intel.com
Cc: douly.fnst@cn.fujitsu.com
Cc: dyoung@redhat.com
Cc: m.mizuma@jp.fujitsu.com
Link: http://lkml.kernel.org/r/1494654390-23861-4-git-send-email-bhe@redhat.com
[ Various spelling fixes. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-13 13:46:30 +08:00
Some bootloaders may need an escape character before '$',
like Grub2, otherwise '$' and the following number
will be eaten.
2005-04-16 15:20:36 -07:00
2015-04-01 09:12:18 +02:00
memmap=nn[KMG]!ss[KMG]
[KNL,X86] Mark specific memory as protected.
Region of memory to be used, from ss to ss+nn.
The memory region may be marked as e820 type 12 (0xc)
and is NVDIMM or ADR memory.
2018-02-03 00:10:20 +01:00
memmap=<size>%<offset>-<oldtype>+<newtype>
[KNL,ACPI] Convert memory within the specified region
from <oldtype> to <newtype>. If "-<oldtype>" is left
out, the whole region will be marked as <newtype>,
even if previously unavailable. If "+<newtype>" is left
out, matching memory will be removed. Types are
specified as e820 types, e.g., 1 = RAM, 2 = reserved,
3 = ACPI, 12 = PRAM.
2008-09-07 01:51:34 -07:00
memory_corruption_check=0/1 [X86]
Some BIOSes seem to corrupt the first 64k of
memory when doing things like suspend/resume.
Setting this option will scan the memory
looking for corruption. Enabling this will
both detect corruption and prevent the kernel
from using the memory being corrupted.
However, its intended as a diagnostic tool; if
repeatable BIOS-originated corruption always
affects the same memory, you can use memmap=
to prevent the kernel from using that memory.
memory_corruption_check_size=size [X86]
By default it checks for corruption in the low
64k, making this memory unavailable for normal
use. Use this parameter to scan for
corruption in more or less memory.
memory_corruption_check_period=seconds [X86]
By default it checks for corruption every 60
seconds. Use this parameter to check at some
other rate. 0 disables periodic checking.
2018-09-28 15:39:20 +00:00
memtest= [KNL,X86,ARM,PPC] Enable memtest
2008-03-21 18:56:19 -07:00
Format: <integer>
default : 0 <disable>
2009-02-25 11:30:45 +01:00
Specifies the number of memtest passes to be
performed. Each pass selects another test
pattern from a given set of patterns. Memtest
fills the memory with this pattern, validates
memory contents and reserves bad memory
regions that are detected.
2008-03-21 18:56:19 -07:00
2017-07-17 16:09:58 -05:00
mem_encrypt= [X86-64] AMD Secure Memory Encryption (SME) control
Valid arguments: on, off
Default (depends on kernel configuration option):
on (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y)
off (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=n)
mem_encrypt=on: Activate SME
mem_encrypt=off: Do not activate SME
2019-07-24 09:24:49 +02:00
Refer to Documentation/virt/kvm/amd-memory-encryption.rst
2017-07-17 16:09:58 -05:00
for details on when memory encryption can be activated.
PM / sleep: System sleep state selection interface rework
There are systems in which the platform doesn't support any special
sleep states, so suspend-to-idle (PM_SUSPEND_FREEZE) is the only
available system sleep state. However, some user space frameworks
only use the "mem" and (sometimes) "standby" sleep state labels, so
the users of those systems need to modify user space in order to be
able to use system suspend at all and that may be a pain in practice.
Commit 0399d4db3edf (PM / sleep: Introduce command line argument for
sleep state enumeration) attempted to address this problem by adding
a command line argument to change the meaning of the "mem" string in
/sys/power/state to make it trigger suspend-to-idle (instead of
suspend-to-RAM).
However, there also are systems in which the platform does support
special sleep states, but suspend-to-idle is the preferred one anyway
(it even may save more energy than the platform-provided sleep states
in some cases) and the above commit doesn't help in those cases.
For this reason, rework the system sleep state selection interface
again (but preserve backwards compatibiliby). Namely, add a new
sysfs file, /sys/power/mem_sleep, that will control the system
suspend mode triggered by writing "mem" to /sys/power/state (in
analogy with what /sys/power/disk does for hibernation). Make it
select suspend-to-RAM ("deep" sleep) by default (if supported) and
fall back to suspend-to-idle ("s2idle") otherwise and add a new
command line argument, mem_sleep_default, allowing that default to
be overridden if need be.
At the same time, drop the relative_sleep_states command line
argument that doesn't make sense any more.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Mario Limonciello <mario.limonciello@dell.com>
2016-11-21 22:45:40 +01:00
mem_sleep_default= [SUSPEND] Default system suspend mode:
s2idle - Suspend-To-Idle
shallow - Power-On Suspend or equivalent (if supported)
deep - Suspend-To-RAM or equivalent (if supported)
2017-10-06 00:38:49 +02:00
See Documentation/admin-guide/pm/sleep-states.rst.
PM / sleep: System sleep state selection interface rework
There are systems in which the platform doesn't support any special
sleep states, so suspend-to-idle (PM_SUSPEND_FREEZE) is the only
available system sleep state. However, some user space frameworks
only use the "mem" and (sometimes) "standby" sleep state labels, so
the users of those systems need to modify user space in order to be
able to use system suspend at all and that may be a pain in practice.
Commit 0399d4db3edf (PM / sleep: Introduce command line argument for
sleep state enumeration) attempted to address this problem by adding
a command line argument to change the meaning of the "mem" string in
/sys/power/state to make it trigger suspend-to-idle (instead of
suspend-to-RAM).
However, there also are systems in which the platform does support
special sleep states, but suspend-to-idle is the preferred one anyway
(it even may save more energy than the platform-provided sleep states
in some cases) and the above commit doesn't help in those cases.
For this reason, rework the system sleep state selection interface
again (but preserve backwards compatibiliby). Namely, add a new
sysfs file, /sys/power/mem_sleep, that will control the system
suspend mode triggered by writing "mem" to /sys/power/state (in
analogy with what /sys/power/disk does for hibernation). Make it
select suspend-to-RAM ("deep" sleep) by default (if supported) and
fall back to suspend-to-idle ("s2idle") otherwise and add a new
command line argument, mem_sleep_default, allowing that default to
be overridden if need be.
At the same time, drop the relative_sleep_states command line
argument that doesn't make sense any more.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Mario Limonciello <mario.limonciello@dell.com>
2016-11-21 22:45:40 +01:00
2005-04-16 15:20:36 -07:00
meye.*= [HW] Set MotionEye Camera parameters
2020-03-04 13:08:03 +01:00
See Documentation/admin-guide/media/meye.rst.
2005-04-16 15:20:36 -07:00
2007-10-12 23:04:06 +02:00
mfgpt_irq= [IA-32] Specify the IRQ to use for the
Multi-Function General Purpose Timers on AMD Geode
platforms.
2008-01-30 13:33:33 +01:00
mfgptfix [X86-32] Fix MFGPT timers on AMD Geode platforms when
the BIOS has incorrectly applied a workaround. TinyBIOS
version 0.98 is known to be affected, 0.99 fixes the
problem by letting the user disable the workaround.
2005-04-16 15:20:36 -07:00
mga= [HW,DRM]
2008-11-19 15:36:16 -08:00
min_addr=nn[KMG] [KNL,BOOT,ia64] All physical memory below this
physical address is ignored.
2009-05-20 11:10:31 +01:00
mini2440= [ARM,HW,KNL]
Format:[0..2][b][c][t]
Default: "0tb"
MINI2440 configuration specification:
0 - The attached screen is the 3.5" TFT
1 - The attached screen is the 7" TFT
2 - The VGA Shield is attached (1024x768)
Leaving out the screen size parameter will not load
the TFT driver, and the framebuffer will be left
unconfigured.
b - Enable backlight. The TFT backlight pin will be
linked to the kernel VESA blanking code and a GPIO
LED. This parameter is not necessary when using the
VGA shield.
c - Enable the s3c camera interface.
t - Reserved for enabling touchscreen support. The
touchscreen support is not enabled in the mainstream
kernel as of 2.6.30, a preliminary port can be found
in the "bleeding edge" mini2440 support kernel at
http://repo.or.cz/w/linux-2.6/mini2440.git
2019-04-12 15:39:28 -05:00
mitigations=
2019-04-12 15:39:32 -05:00
[X86,PPC,S390,ARM64] Control optional mitigations for
CPU vulnerabilities. This is a set of curated,
2019-04-12 15:39:29 -05:00
arch-independent options, each of which is an
aggregation of existing arch-specific options.
2019-04-12 15:39:28 -05:00
off
Disable all optional CPU mitigations. This
improves system performance, but it may also
expose users to several CPU vulnerabilities.
2019-04-12 15:39:30 -05:00
Equivalent to: nopti [X86,PPC]
2019-04-12 15:39:32 -05:00
kpti=0 [ARM64]
x86/speculation: Enable Spectre v1 swapgs mitigations
The previous commit added macro calls in the entry code which mitigate the
Spectre v1 swapgs issue if the X86_FEATURE_FENCE_SWAPGS_* features are
enabled. Enable those features where applicable.
The mitigations may be disabled with "nospectre_v1" or "mitigations=off".
There are different features which can affect the risk of attack:
- When FSGSBASE is enabled, unprivileged users are able to place any
value in GS, using the wrgsbase instruction. This means they can
write a GS value which points to any value in kernel space, which can
be useful with the following gadget in an interrupt/exception/NMI
handler:
if (coming from user space)
swapgs
mov %gs:<percpu_offset>, %reg1
// dependent load or store based on the value of %reg
// for example: mov %(reg1), %reg2
If an interrupt is coming from user space, and the entry code
speculatively skips the swapgs (due to user branch mistraining), it
may speculatively execute the GS-based load and a subsequent dependent
load or store, exposing the kernel data to an L1 side channel leak.
Note that, on Intel, a similar attack exists in the above gadget when
coming from kernel space, if the swapgs gets speculatively executed to
switch back to the user GS. On AMD, this variant isn't possible
because swapgs is serializing with respect to future GS-based
accesses.
NOTE: The FSGSBASE patch set hasn't been merged yet, so the above case
doesn't exist quite yet.
- When FSGSBASE is disabled, the issue is mitigated somewhat because
unprivileged users must use prctl(ARCH_SET_GS) to set GS, which
restricts GS values to user space addresses only. That means the
gadget would need an additional step, since the target kernel address
needs to be read from user space first. Something like:
if (coming from user space)
swapgs
mov %gs:<percpu_offset>, %reg1
mov (%reg1), %reg2
// dependent load or store based on the value of %reg2
// for example: mov %(reg2), %reg3
It's difficult to audit for this gadget in all the handlers, so while
there are no known instances of it, it's entirely possible that it
exists somewhere (or could be introduced in the future). Without
tooling to analyze all such code paths, consider it vulnerable.
Effects of SMAP on the !FSGSBASE case:
- If SMAP is enabled, and the CPU reports RDCL_NO (i.e., not
susceptible to Meltdown), the kernel is prevented from speculatively
reading user space memory, even L1 cached values. This effectively
disables the !FSGSBASE attack vector.
- If SMAP is enabled, but the CPU *is* susceptible to Meltdown, SMAP
still prevents the kernel from speculatively reading user space
memory. But it does *not* prevent the kernel from reading the
user value from L1, if it has already been cached. This is probably
only a small hurdle for an attacker to overcome.
Thanks to Dave Hansen for contributing the speculative_smap() function.
Thanks to Andrew Cooper for providing the inside scoop on whether swapgs
is serializing on AMD.
[ tglx: Fixed the USER fence decision and polished the comment as suggested
by Dave Hansen ]
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
2019-07-08 11:52:26 -05:00
nospectre_v1 [X86,PPC]
2019-04-12 15:39:31 -05:00
nobp=0 [S390]
2019-04-12 15:39:32 -05:00
nospectre_v2 [X86,PPC,S390,ARM64]
2019-04-12 15:39:29 -05:00
spectre_v2_user=off [X86]
2019-04-12 15:39:30 -05:00
spec_store_bypass_disable=off [X86,PPC]
2019-04-12 15:39:32 -05:00
ssbd=force-off [ARM64]
2019-04-12 15:39:29 -05:00
l1tf=off [X86]
2019-04-17 16:39:02 -05:00
mds=off [X86]
2019-10-23 12:32:55 +02:00
tsx_async_abort=off [X86]
2019-11-04 12:22:02 +01:00
kvm.nx_huge_pages=off [X86]
Exceptions:
This does not have any effect on
kvm.nx_huge_pages when
kvm.nx_huge_pages=force.
2019-04-12 15:39:28 -05:00
auto (default)
Mitigate all CPU vulnerabilities, but leave SMT
enabled, even if it's vulnerable. This is for
users who don't want to be surprised by SMT
getting disabled across kernel upgrades, or who
have other ways of avoiding SMT-based attacks.
2019-04-12 15:39:29 -05:00
Equivalent to: (default behavior)
2019-04-12 15:39:28 -05:00
auto,nosmt
Mitigate all CPU vulnerabilities, disabling SMT
if needed. This is for users who always want to
be fully mitigated, even if it means losing SMT.
2019-04-12 15:39:29 -05:00
Equivalent to: l1tf=flush,nosmt [X86]
2019-04-17 16:39:02 -05:00
mds=full,nosmt [X86]
2019-10-23 12:32:55 +02:00
tsx_async_abort=full,nosmt [X86]
2019-04-12 15:39:28 -05:00
2008-07-23 21:26:49 -07:00
mminit_loglevel=
[KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this
parameter allows control of the logging verbosity for
the additional memory initialisation checks. A value
of 0 disables mminit logging and a level of 4 will
log everything. Information is printed at KERN_DEBUG
so loglevel=8 may also need to be specified.
2012-09-26 10:09:40 +01:00
module.sig_enforce
[KNL] When CONFIG_MODULE_SIG is set, this means that
modules without (valid) signatures will fail to load.
2013-03-25 20:42:06 +01:00
Note that if CONFIG_MODULE_SIG_FORCE is set, that
2012-09-26 10:09:40 +01:00
is always true, so this option does nothing.
2016-07-21 15:37:56 +09:30
module_blacklist= [KNL] Do not load a comma-separated list of
modules. Useful for debugging problem modules.
2005-04-16 15:20:36 -07:00
mousedev.tap_time=
[MOUSE] Maximum time between finger touching and
leaving touchpad surface for touch to be considered
a tap and be reported as a left button click (for
touchpads working in absolute mode only).
Format: <msecs>
mousedev.xres= [MOUSE] Horizontal screen resolution, used for devices
reporting absolute coordinates, such as tablets
mousedev.yres= [MOUSE] Vertical screen resolution, used for devices
reporting absolute coordinates, such as tablets
mm, page_alloc: extend kernelcore and movablecore for percent
Both kernelcore= and movablecore= can be used to define the amount of
ZONE_NORMAL and ZONE_MOVABLE on a system, respectively. This requires
the system memory capacity to be known when specifying the command line,
however.
This introduces the ability to define both kernelcore= and movablecore=
as a percentage of total system memory. This is convenient for systems
software that wants to define the amount of ZONE_MOVABLE, for example,
as a proportion of a system's memory rather than a hardcoded byte value.
To define the percentage, the final character of the parameter should be
a '%'.
mhocko: "why is anyone using these options nowadays?"
rientjes:
:
: Fragmentation of non-__GFP_MOVABLE pages due to low on memory
: situations can pollute most pageblocks on the system, as much as 1GB of
: slab being fragmented over 128GB of memory, for example. When the
: amount of kernel memory is well bounded for certain systems, it is
: better to aggressively reclaim from existing MIGRATE_UNMOVABLE
: pageblocks rather than eagerly fallback to others.
:
: We have additional patches that help with this fragmentation if you're
: interested, specifically kcompactd compaction of MIGRATE_UNMOVABLE
: pageblocks triggered by fallback of non-__GFP_MOVABLE allocations and
: draining of pcp lists back to the zone free area to prevent stranding.
[rientjes@google.com: updates]
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1802131700160.71590@chino.kir.corp.google.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1802121622470.179479@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 16:23:09 -07:00
movablecore= [KNL,X86,IA-64,PPC]
Format: nn[KMGTPE] | nn%
This parameter is the complement to kernelcore=, it
specifies the amount of memory used for migratable
allocations. If both kernelcore and movablecore is
specified, then kernelcore will be at *least* the
specified value but may be more. If movablecore on its
own is specified, the administrator must be careful
2009-04-05 15:55:22 -07:00
that the amount of memory usable for all allocations
is not too small.
2017-07-06 15:41:02 -07:00
movable_node [KNL] Boot-time switch to make hotplugable memory
NUMA nodes to be movable. This means that the memory
of such nodes will be usable only for movable
allocations which rules out almost all kernel
allocations. Use with caution!
mem-hotplug: introduce movable_node boot option
The hot-Pluggable field in SRAT specifies which memory is hotpluggable.
As we mentioned before, if hotpluggable memory is used by the kernel, it
cannot be hot-removed. So memory hotplug users may want to set all
hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it.
Memory hotplug users may also set a node as movable node, which has
ZONE_MOVABLE only, so that the whole node can be hot-removed.
But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the
kernel cannot use memory in movable nodes. This will cause NUMA
performance down. And other users may be unhappy.
So we need a way to allow users to enable and disable this functionality.
In this patch, we introduce movable_node boot option to allow users to
choose to not to consume hotpluggable memory at early boot time and later
we can set it as ZONE_MOVABLE.
To achieve this, the movable_node boot option will control the memblock
allocation direction. That said, after memblock is ready, before SRAT is
parsed, we should allocate memory near the kernel image as we explained in
the previous patches. So if movable_node boot option is set, the kernel
does the following:
1. After memblock is ready, make memblock allocate memory bottom up.
2. After SRAT is parsed, make memblock behave as default, allocate memory
top down.
Users can specify "movable_node" in kernel commandline to enable this
functionality. For those who don't use memory hotplug or who don't want
to lose their NUMA performance, just don't specify anything. The kernel
will work as before.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-12 15:08:10 -08:00
2005-04-16 15:20:36 -07:00
MTD_Partition= [MTD]
Format: <name>,<region-number>,<size>,<offset>
2005-10-23 12:57:11 -07:00
MTD_Region= [MTD] Format:
<name>,<region-number>[,<base>,<size>,<buswidth>,<altbuswidth>]
2005-04-16 15:20:36 -07:00
mtdparts= [MTD]
2020-02-18 16:02:19 +01:00
See drivers/mtd/parsers/cmdlinepart.c
2005-04-16 15:20:36 -07:00
2010-09-28 15:33:12 +00:00
multitce=off [PPC] This parameter disables the use of the pSeries
firmware feature for updating multiple TCE entries
at a time.
2009-05-12 13:46:57 -07:00
onenand.bdry= [HW,MTD] Flex-OneNAND Boundary Configuration
Format: [die0_boundary][,die0_lock][,die1_boundary][,die1_lock]
boundary - index of last SLC block on Flex-OneNAND.
The remaining blocks are configured as MLC blocks.
lock - Configure if Flex-OneNAND boundary should be locked.
Once locked, the boundary cannot be changed.
1 indicates lock status, 0 indicates unlock status.
2008-07-03 11:24:29 +01:00
mtdset= [ARM]
ARM/S3C2412 JIVE boot control
See arch/arm/mach-s3c2412/mach-jive.c
2005-04-16 15:20:36 -07:00
mtouchusb.raw_coordinates=
2005-10-23 12:57:11 -07:00
[HW] Make the MicroTouch USB driver use raw coordinates
('y', default) or cooked coordinates ('n')
2005-04-16 15:20:36 -07:00
2009-04-05 15:55:22 -07:00
mtrr_chunk_size=nn[KMG] [X86]
2009-04-27 15:06:31 +02:00
used for mtrr cleanup. It is largest continuous chunk
2009-04-05 15:55:22 -07:00
that could hold holes aka. UC entries.
mtrr_gran_size=nn[KMG] [X86]
Used for mtrr cleanup. It is granularity of mtrr block.
Default is 1.
Large value could prevent small alignment from
using up MTRRs.
mtrr_spare_reg_nr=n [X86]
Format: <integer>
Range: 0,7 : spare reg number
Default : 1
Used for mtrr cleanup. It is spare mtrr entries number.
Set to 2 or more if your graphical card needs more.
2005-04-16 15:20:36 -07:00
n2= [NET] SDL Inc. RISCom/N2 synchronous serial card
netdev= [NET] Network devices parameters
Format: <irq>,<io>,<mem_start>,<mem_end>,<name>
Note that mem_start is often overloaded to mean
something different and driver-specific.
2005-10-23 12:57:11 -07:00
This usage is only documented in each driver source
file if at all.
netfilter: accounting rework: ct_extend + 64bit counters (v4)
Initially netfilter has had 64bit counters for conntrack-based accounting, but
it was changed in 2.6.14 to save memory. Unfortunately in-kernel 64bit counters are
still required, for example for "connbytes" extension. However, 64bit counters
waste a lot of memory and it was not possible to enable/disable it runtime.
This patch:
- reimplements accounting with respect to the extension infrastructure,
- makes one global version of seq_print_acct() instead of two seq_print_counters(),
- makes it possible to enable it at boot time (for CONFIG_SYSCTL/CONFIG_SYSFS=n),
- makes it possible to enable/disable it at runtime by sysctl or sysfs,
- extends counters from 32bit to 64bit,
- renames ip_conntrack_counter -> nf_conn_counter,
- enables accounting code unconditionally (no longer depends on CONFIG_NF_CT_ACCT),
- set initial accounting enable state based on CONFIG_NF_CT_ACCT
- removes buggy IPCT_COUNTER_FILLING event handling.
If accounting is enabled newly created connections get additional acct extend.
Old connections are not changed as it is not possible to add a ct_extend area
to confirmed conntrack. Accounting is performed for all connections with
acct extend regardless of a current state of "net.netfilter.nf_conntrack_acct".
Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-21 10:01:34 -07:00
nf_conntrack.acct=
[NETFILTER] Enable connection tracking flow accounting
0 to disable accounting
1 to enable accounting
2010-06-25 14:46:56 +02:00
Default value is 0.
netfilter: accounting rework: ct_extend + 64bit counters (v4)
Initially netfilter has had 64bit counters for conntrack-based accounting, but
it was changed in 2.6.14 to save memory. Unfortunately in-kernel 64bit counters are
still required, for example for "connbytes" extension. However, 64bit counters
waste a lot of memory and it was not possible to enable/disable it runtime.
This patch:
- reimplements accounting with respect to the extension infrastructure,
- makes one global version of seq_print_acct() instead of two seq_print_counters(),
- makes it possible to enable it at boot time (for CONFIG_SYSCTL/CONFIG_SYSFS=n),
- makes it possible to enable/disable it at runtime by sysctl or sysfs,
- extends counters from 32bit to 64bit,
- renames ip_conntrack_counter -> nf_conn_counter,
- enables accounting code unconditionally (no longer depends on CONFIG_NF_CT_ACCT),
- set initial accounting enable state based on CONFIG_NF_CT_ACCT
- removes buggy IPCT_COUNTER_FILLING event handling.
If accounting is enabled newly created connections get additional acct extend.
Old connections are not changed as it is not possible to add a ct_extend area
to confirmed conntrack. Accounting is performed for all connections with
acct extend regardless of a current state of "net.netfilter.nf_conntrack_acct".
Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-21 10:01:34 -07:00
2010-09-17 10:54:37 -04:00
nfsaddrs= [NFS] Deprecated. Use ip= instead.
2020-02-12 19:13:32 +01:00
See Documentation/admin-guide/nfs/nfsroot.rst.
2005-04-16 15:20:36 -07:00
nfsroot= [NFS] nfs root filesystem for disk-less boxes.
2020-02-12 19:13:32 +01:00
See Documentation/admin-guide/nfs/nfsroot.rst.
2005-04-16 15:20:36 -07:00
2010-09-17 10:54:37 -04:00
nfsrootdebug [NFS] enable nfsroot debugging messages.
2020-02-12 19:13:32 +01:00
See Documentation/admin-guide/nfs/nfsroot.rst.
2010-09-17 10:54:37 -04:00
2016-08-29 20:03:52 -04:00
nfs.callback_nr_threads=
[NFSv4] set the total number of threads that the
NFS client will assign to service NFSv4 callback
requests.
2006-01-03 09:55:41 +01:00
nfs.callback_tcpport=
[NFS] set the TCP port on which the NFSv4 callback
channel should listen.
2009-08-19 18:12:27 -04:00
nfs.cache_getent=
[NFS] sets the pathname to the program which is used
to update the NFS client cache entries.
nfs.cache_getent_timeout=
[NFS] sets the timeout after which an attempt to
update a cache entry is deemed to have failed.
2006-01-03 09:55:57 +01:00
nfs.idmap_cache_timeout=
[NFS] set the maximum lifetime for idmapper cache
entries.
2007-10-09 12:01:04 -04:00
nfs.enable_ino64=
[NFS] enable 64-bit inode numbers.
If zero, the NFS client will fake up a 32-bit inode
number for the readdir() and stat() syscalls instead
of returning the full 64-bit number.
The default is to return 64-bit inode numbers.
2016-08-29 20:03:52 -04:00
nfs.max_session_cb_slots=
[NFSv4.1] Sets the maximum number of session
slots the client will assign to the callback
channel. This determines the maximum number of
callbacks the client will process in parallel for
a particular server.
2012-02-06 19:50:40 -05:00
nfs.max_session_slots=
[NFSv4.1] Sets the maximum number of session slots
the client will attempt to negotiate with the server.
This limits the number of simultaneous RPC requests
that the client can send to the NFSv4.1 server.
Note that there is little point in setting this
value higher than the max_tcp_slot_table_limit.
2011-02-22 15:44:32 -08:00
nfs.nfs4_disable_idmapping=
2012-01-09 13:46:26 -05:00
[NFSv4] When set to the default of '1', this option
ensures that both the RPC level authentication
scheme and the NFS level operations agree to use
numeric uids/gids if the mount is using the
'sec=sys' security flavour. In effect it is
disabling idmapping, which can make migration from
legacy NFSv2/v3 systems to NFSv4 easier.
Servers that do not support this mode of operation
will be autodetected by the client, and it will fall
back to using the idmapper.
To turn off this behaviour, set the value to '0'.
2012-09-14 17:24:41 -04:00
nfs.nfs4_unique_id=
[NFS4] Specify an additional fixed unique ident-
ification string that NFSv4 clients can insert into
their nfs_client_id4 string. This is typically a
UUID that is generated at system install time.
2011-02-22 15:44:32 -08:00
2012-02-17 15:20:24 -05:00
nfs.send_implementation_id =
[NFSv4.1] Send client implementation identification
information in exchange_id requests.
If zero, no implementation identification information
will be sent.
The default is to send the implementation identification
information.
2016-11-03 12:10:10 +02:00
2013-09-04 10:08:54 -04:00
nfs.recover_lost_locks =
[NFSv4] Attempt to recover locks that were lost due
to a lease timeout on the server. Please note that
doing this risks data corruption, since there are
no guarantees that the file will remain unchanged
after the locks are lost.
If you want to enable the kernel legacy behaviour of
attempting to recover these locks, then set this
parameter to '1'.
The default parameter value of '0' causes the kernel
not to attempt recovery of lost locks.
2012-02-17 15:20:24 -05:00
2015-08-24 20:39:18 -04:00
nfs4.layoutstats_timer =
[NFSv4.2] Change the rate at which the kernel sends
layoutstats to the pNFS metadata server.
Setting this to value to 0 causes the kernel to use
whatever value is the default set by the layout
driver. A non-zero value sets the minimum interval
in seconds between layoutstats transmissions.
2012-03-22 16:07:18 -04:00
nfsd.nfs4_disable_idmapping=
[NFSv4] When set to the default of '1', the NFSv4
server will return only numeric uids and gids to
clients using auth_sys, and will accept numeric uids
and gids from such clients. This is intended to ease
migration from NFSv2/v3.
2012-02-17 15:20:24 -05:00
2017-02-26 13:17:39 +01:00
nmi_debug= [KNL,SH] Specify one or more actions to take
2007-10-10 14:58:29 +02:00
when a NMI is triggered.
Format: [state][,regs][,debounce][,die]
2009-04-14 14:03:43 +05:30
nmi_watchdog= [KNL,BUGS=X86] Debugging features for SMP kernels
2011-03-22 16:34:16 -07:00
Format: [panic,][nopanic,][num]
2015-04-14 15:44:13 -07:00
Valid num: 0 or 1
2015-10-10 15:40:42 -04:00
0 - turn hardlockup detector in nmi_watchdog off
1 - turn hardlockup detector in nmi_watchdog on
2009-04-05 15:55:22 -07:00
When panic is specified, panic when an NMI watchdog
2019-05-21 10:32:08 +08:00
timeout occurs (or 'nopanic' to not panic on an NMI
watchdog, if CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is set)
To disable both hard and soft lockup detectors,
2015-10-10 15:40:42 -04:00
please see 'nowatchdog'.
2009-04-05 15:55:22 -07:00
This is useful when you use a panic=... timeout and
need the box quickly up again.
2005-04-16 15:20:36 -07:00
2017-12-10 01:48:46 -06:00
These settings can be accessed at runtime via
the nmi_watchdog and hardlockup_panic sysctls.
2009-07-08 11:10:56 -07:00
netpoll.carrier_timeout=
[NET] Specifies amount of time (in seconds) that
netpoll should wait for a carrier. By default netpoll
waits 4 seconds.
2007-07-31 00:37:59 -07:00
no387 [BUGS=X86-32] Tells the kernel to use the 387 maths
2005-04-16 15:20:36 -07:00
emulation library even if a 387 maths coprocessor
is present.
2018-05-18 13:35:25 +03:00
no5lvl [X86-64] Disable 5-level paging mode. Forces
kernel to use 4-level paging instead.
2009-04-05 15:55:22 -07:00
no_console_suspend
[HW] Never suspend the console
Disable suspending of consoles during suspend and
hibernate operations. Once disabled, debugging
messages can reach various consoles while the rest
of the system is being put to sleep (ie, while
debugging driver suspend/resume hooks). This may
not work reliably with all consoles, but is known
to work with serial and VGA consoles.
2011-10-31 17:11:27 -07:00
To facilitate more flexible debugging, we also add
console_suspend, a printk module parameter to control
it. Users could use console_suspend (usually
/sys/module/printk/parameters/console_suspend) to
turn on/off it dynamically.
2009-04-05 15:55:22 -07:00
2019-07-16 16:26:39 -07:00
novmcoredd [KNL,KDUMP]
Disable device dump. Device dump allows drivers to
append dump data to vmcore so you can collect driver
specified debug info. Drivers can append the data
without any limit and this data is stored in memory,
so this may cause significant memory stress. Disabling
device dump can help save memory but the driver debug
data will be no longer available. This parameter
is only available when CONFIG_PROC_VMCORE_DEVICE_DUMP
is set.
2007-05-31 00:40:47 -07:00
noaliencache [MM, NUMA, SLAB] Disables the allocation of alien
caches in the slab allocator. Saves per-node memory,
but will impact performance.
2006-12-06 20:32:16 -08:00
2005-10-23 12:57:11 -07:00
noalign [KNL,ARM]
s390: introduce CPU alternatives
Implement CPU alternatives, which allows to optionally patch newer
instructions at runtime, based on CPU facilities availability.
A new kernel boot parameter "noaltinstr" disables patching.
Current implementation is derived from x86 alternatives. Although
ideal instructions padding (when altinstr is longer then oldinstr)
is added at compile time, and no oldinstr nops optimization has to be
done at runtime. Also couple of compile time sanity checks are done:
1. oldinstr and altinstr must be <= 254 bytes long,
2. oldinstr and altinstr must not have an odd length.
alternative(oldinstr, altinstr, facility);
alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2);
Both compile time and runtime padding consists of either 6/4/2 bytes nop
or a jump (brcl) + 2 bytes nop filler if padding is longer then 6 bytes.
.altinstructions and .altinstr_replacement sections are part of
__init_begin : __init_end region and are freed after initialization.
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-12 13:01:47 +02:00
noaltinstr [S390] Disables alternative instructions patching
(CPU alternatives feature).
2005-04-16 15:20:36 -07:00
noapic [SMP,APIC] Tells the kernel to not make use of any
IOAPICs that may be present in the system.
sched: Add 'autogroup' scheduling feature: automated per session task groups
A recurring complaint from CFS users is that parallel kbuild has
a negative impact on desktop interactivity. This patch
implements an idea from Linus, to automatically create task
groups. Currently, only per session autogroups are implemented,
but the patch leaves the way open for enhancement.
Implementation: each task's signal struct contains an inherited
pointer to a refcounted autogroup struct containing a task group
pointer, the default for all tasks pointing to the
init_task_group. When a task calls setsid(), a new task group
is created, the process is moved into the new task group, and a
reference to the preveious task group is dropped. Child
processes inherit this task group thereafter, and increase it's
refcount. When the last thread of a process exits, the
process's reference is dropped, such that when the last process
referencing an autogroup exits, the autogroup is destroyed.
At runqueue selection time, IFF a task has no cgroup assignment,
its current autogroup is used.
Autogroup bandwidth is controllable via setting it's nice level
through the proc filesystem:
cat /proc/<pid>/autogroup
Displays the task's group and the group's nice level.
echo <nice level> > /proc/<pid>/autogroup
Sets the task group's shares to the weight of nice <level> task.
Setting nice level is rate limited for !admin users due to the
abuse risk of task group locking.
The feature is enabled from boot by default if
CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via
the boot option noautogroup, and can also be turned on/off on
the fly via:
echo [01] > /proc/sys/kernel/sched_autogroup_enabled
... which will automatically move tasks to/from the root task group.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Paul Turner <pjt@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
[ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-30 14:18:03 +01:00
noautogroup Disable scheduler automatic task group creation.
2005-04-16 15:20:36 -07:00
nobats [PPC] Do not use BATs for mapping kernel lowmem
on "Classic" PPC cores.
nocache [ARM]
2005-10-23 12:57:11 -07:00
2009-04-05 15:55:22 -07:00
noclflush [BUGS=X86] Don't use the CLFLUSH instruction
2006-07-30 03:03:11 -07:00
nodelayacct [KNL] Disable per-task delay accounting
2008-09-21 17:14:42 +09:00
nodsp [SH] Disable hardware DSP at boot time.
2014-08-14 17:15:26 +08:00
noefi Disable EFI runtime services support.
2008-01-30 13:32:11 +01:00
2005-04-16 15:20:36 -07:00
noexec [IA-64]
2009-04-14 14:03:43 +05:30
noexec [X86]
2008-04-12 10:28:25 +02:00
On X86-32 available only on PAE configured kernels.
2005-04-16 15:20:36 -07:00
noexec=on: enable non-executable mappings (default)
2008-04-12 10:28:25 +02:00
noexec=off: disable non-executable mappings
2019-04-18 16:51:20 +10:00
nosmap [X86,PPC]
2012-09-21 12:43:13 -07:00
Disable SMAP (Supervisor Mode Access Prevention)
even if it is supported by processor.
2019-04-18 16:51:19 +10:00
nosmep [X86,PPC]
2012-09-21 12:43:13 -07:00
Disable SMEP (Supervisor Mode Execution Prevention)
2011-05-11 16:51:05 -07:00
even if it is supported by processor.
2008-04-12 10:28:25 +02:00
noexec32 [X86-64]
This affects only 32-bit executables.
noexec32=on: enable non-executable mappings (default)
read doesn't imply executable mappings
noexec32=off: disable non-executable mappings
read implies executable mappings
2005-04-16 15:20:36 -07:00
2015-04-03 23:23:34 +01:00
nofpu [MIPS,SH] Disable hardware FPU at boot time.
2008-09-21 17:14:42 +09:00
2007-07-31 00:37:59 -07:00
nofxsr [BUGS=X86-32] Disables x86 floating point extended
2006-03-23 02:59:34 -08:00
register save and restore. The kernel will only save
legacy floating-point registers on task switch.
2005-04-16 15:20:36 -07:00
2019-06-10 13:08:18 +10:00
nohugeiomap [KNL,x86,PPC] Disable kernel huge I/O mappings.
2015-04-14 15:47:20 -07:00
2016-04-05 12:53:38 +02:00
nosmt [KNL,S390] Disable symmetric multithreading (SMT).
Equivalent to smt=1.
2018-05-29 17:48:27 +02:00
[KNL,x86] Disable symmetric multithreading (SMT).
2018-06-29 16:05:47 +02:00
nosmt=force: Force disable SMT, cannot be undone
via the sysfs control file.
powerpc updates for 4.19
Notable changes:
- A fix for a bug in our page table fragment allocator, where a page table page
could be freed and reallocated for something else while still in use, leading
to memory corruption etc. The fix reuses pt_mm in struct page (x86 only) for
a powerpc only refcount.
- Fixes to our pkey support. Several are user-visible changes, but bring us in
to line with x86 behaviour and/or fix outright bugs. Thanks to Florian Weimer
for reporting many of these.
- A series to improve the hvc driver & related OPAL console code, which have
been seen to cause hardlockups at times. The hvc driver changes in particular
have been in linux-next for ~month.
- Increase our MAX_PHYSMEM_BITS to 128TB when SPARSEMEM_VMEMMAP=y.
- Remove Power8 DD1 and Power9 DD1 support, neither chip should be in use
anywhere other than as a paper weight.
- An optimised memcmp implementation using Power7-or-later VMX instructions
- Support for barrier_nospec on some NXP CPUs.
- Support for flushing the count cache on context switch on some IBM CPUs
(controlled by firmware), as a Spectre v2 mitigation.
- A series to enhance the information we print on unhandled signals to bring it
into line with other arches, including showing the offending VMA and dumping
the instructions around the fault.
Thanks to:
Aaro Koskinen, Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Alexey
Spirkov, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar,
Arnd Bergmann, Bartosz Golaszewski, Benjamin Herrenschmidt, Bharat Bhushan,
Bjoern Noetel, Boqun Feng, Breno Leitao, Bryant G. Ly, Camelia Groza,
Christophe Leroy, Christoph Hellwig, Cyril Bur, Dan Carpenter, Daniel Klamt,
Darren Stevens, Dave Young, David Gibson, Diana Craciun, Finn Thain, Florian
Weimer, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Geoff Levand,
Guenter Roeck, Gustavo Romero, Haren Myneni, Hari Bathini, Joel Stanley,
Jonathan Neuschäfer, Kees Cook, Madhavan Srinivasan, Mahesh Salgaonkar, Markus
Elfring, Mathieu Malaterre, Mauro S. M. Rodrigues, Michael Hanselmann, Michael
Neuling, Michael Schmitz, Mukesh Ojha, Murilo Opsfelder Araujo, Nicholas
Piggin, Parth Y Shah, Paul Mackerras, Paul Menzel, Ram Pai, Randy Dunlap,
Rashmica Gupta, Reza Arbab, Rodrigo R. Galvao, Russell Currey, Sam Bobroff,
Scott Wood, Shilpasri G Bhat, Simon Guo, Souptick Joarder, Stan Johnson,
Thiago Jung Bauermann, Tyrel Datwyler, Vaibhav Jain, Vasant Hegde, Venkat Rao
B, zhong jiang.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAlt2O6cTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgC7hD/4+cj796Df7GsVsIMxzQm7SS9dklIdO
JuKj2Nr5HRzTH59jWlXukLG9mfTNCFgFJB4gEpK1ArDOTcHTCI9RRsLZTZ/kum66
7Pd+7T40dLYXB5uecuUs0vMXa2fI3syKh1VLzACSXv3Dh9BBIKQBwW/aD2eww4YI
1fS5LnXZ2PSxfr6KNAC6ogZnuaiD0sHXOYrtGHq+S/TFC7+Z6ySa6+AnPS+hPVoo
/rHDE1Khr66aj7uk+PP2IgUrCFj6Sbj6hTVlS/iAuwbMjUl9ty6712PmvX9x6wMZ
13hJQI+g6Ci+lqLKqmqVUpXGSr6y4NJGPS/Hko4IivBTJApI+qV/tF2H9nxU+6X0
0RqzsMHPHy13n2torA1gC7ttzOuXPI4hTvm6JWMSsfmfjTxLANJng3Dq3ejh6Bqw
76EMowpDLexwpy7/glPpqNdsP4ySf2Qm8yq3mR7qpL4m3zJVRGs11x+s5DW8NKBL
Fl5SqZvd01abH+sHwv6NLaLkEtayUyohxvyqu2RU3zu5M5vi7DhqstybTPjKPGu0
icSPh7b2y10WpOUpC6lxpdi8Me8qH47mVc/trZ+SpgBrsuEmtJhGKszEnzRCOqos
o2IhYHQv3lQv86kpaAFQlg/RO+Lv+Lo5qbJ209V+hfU5nYzXpEulZs4dx1fbA+ze
fK8GEh+u0L4uJg==
=PzRz
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Notable changes:
- A fix for a bug in our page table fragment allocator, where a page
table page could be freed and reallocated for something else while
still in use, leading to memory corruption etc. The fix reuses
pt_mm in struct page (x86 only) for a powerpc only refcount.
- Fixes to our pkey support. Several are user-visible changes, but
bring us in to line with x86 behaviour and/or fix outright bugs.
Thanks to Florian Weimer for reporting many of these.
- A series to improve the hvc driver & related OPAL console code,
which have been seen to cause hardlockups at times. The hvc driver
changes in particular have been in linux-next for ~month.
- Increase our MAX_PHYSMEM_BITS to 128TB when SPARSEMEM_VMEMMAP=y.
- Remove Power8 DD1 and Power9 DD1 support, neither chip should be in
use anywhere other than as a paper weight.
- An optimised memcmp implementation using Power7-or-later VMX
instructions
- Support for barrier_nospec on some NXP CPUs.
- Support for flushing the count cache on context switch on some IBM
CPUs (controlled by firmware), as a Spectre v2 mitigation.
- A series to enhance the information we print on unhandled signals
to bring it into line with other arches, including showing the
offending VMA and dumping the instructions around the fault.
Thanks to: Aaro Koskinen, Akshay Adiga, Alastair D'Silva, Alexey
Kardashevskiy, Alexey Spirkov, Alistair Popple, Andrew Donnellan,
Aneesh Kumar K.V, Anju T Sudhakar, Arnd Bergmann, Bartosz Golaszewski,
Benjamin Herrenschmidt, Bharat Bhushan, Bjoern Noetel, Boqun Feng,
Breno Leitao, Bryant G. Ly, Camelia Groza, Christophe Leroy, Christoph
Hellwig, Cyril Bur, Dan Carpenter, Daniel Klamt, Darren Stevens, Dave
Young, David Gibson, Diana Craciun, Finn Thain, Florian Weimer,
Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Geoff Levand,
Guenter Roeck, Gustavo Romero, Haren Myneni, Hari Bathini, Joel
Stanley, Jonathan Neuschäfer, Kees Cook, Madhavan Srinivasan, Mahesh
Salgaonkar, Markus Elfring, Mathieu Malaterre, Mauro S. M. Rodrigues,
Michael Hanselmann, Michael Neuling, Michael Schmitz, Mukesh Ojha,
Murilo Opsfelder Araujo, Nicholas Piggin, Parth Y Shah, Paul
Mackerras, Paul Menzel, Ram Pai, Randy Dunlap, Rashmica Gupta, Reza
Arbab, Rodrigo R. Galvao, Russell Currey, Sam Bobroff, Scott Wood,
Shilpasri G Bhat, Simon Guo, Souptick Joarder, Stan Johnson, Thiago
Jung Bauermann, Tyrel Datwyler, Vaibhav Jain, Vasant Hegde, Venkat
Rao, zhong jiang"
* tag 'powerpc-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (234 commits)
powerpc/mm/book3s/radix: Add mapping statistics
powerpc/uaccess: Enable get_user(u64, *p) on 32-bit
powerpc/mm/hash: Remove unnecessary do { } while(0) loop
powerpc/64s: move machine check SLB flushing to mm/slb.c
powerpc/powernv/idle: Fix build error
powerpc/mm/tlbflush: update the mmu_gather page size while iterating address range
powerpc/mm: remove warning about ‘type’ being set
powerpc/32: Include setup.h header file to fix warnings
powerpc: Move `path` variable inside DEBUG_PROM
powerpc/powermac: Make some functions static
powerpc/powermac: Remove variable x that's never read
cxl: remove a dead branch
powerpc/powermac: Add missing include of header pmac.h
powerpc/kexec: Use common error handling code in setup_new_fdt()
powerpc/xmon: Add address lookup for percpu symbols
powerpc/mm: remove huge_pte_offset_and_shift() prototype
powerpc/lib: Use patch_site to patch copy_32 functions once cache is enabled
powerpc/pseries: Fix endianness while restoring of r3 in MCE handler.
powerpc/fadump: merge adjacent memory ranges to reduce PT_LOAD segements
powerpc/fadump: handle crash memory ranges array index overflow
...
2018-08-17 11:32:50 -07:00
x86/speculation: Enable Spectre v1 swapgs mitigations
The previous commit added macro calls in the entry code which mitigate the
Spectre v1 swapgs issue if the X86_FEATURE_FENCE_SWAPGS_* features are
enabled. Enable those features where applicable.
The mitigations may be disabled with "nospectre_v1" or "mitigations=off".
There are different features which can affect the risk of attack:
- When FSGSBASE is enabled, unprivileged users are able to place any
value in GS, using the wrgsbase instruction. This means they can
write a GS value which points to any value in kernel space, which can
be useful with the following gadget in an interrupt/exception/NMI
handler:
if (coming from user space)
swapgs
mov %gs:<percpu_offset>, %reg1
// dependent load or store based on the value of %reg
// for example: mov %(reg1), %reg2
If an interrupt is coming from user space, and the entry code
speculatively skips the swapgs (due to user branch mistraining), it
may speculatively execute the GS-based load and a subsequent dependent
load or store, exposing the kernel data to an L1 side channel leak.
Note that, on Intel, a similar attack exists in the above gadget when
coming from kernel space, if the swapgs gets speculatively executed to
switch back to the user GS. On AMD, this variant isn't possible
because swapgs is serializing with respect to future GS-based
accesses.
NOTE: The FSGSBASE patch set hasn't been merged yet, so the above case
doesn't exist quite yet.
- When FSGSBASE is disabled, the issue is mitigated somewhat because
unprivileged users must use prctl(ARCH_SET_GS) to set GS, which
restricts GS values to user space addresses only. That means the
gadget would need an additional step, since the target kernel address
needs to be read from user space first. Something like:
if (coming from user space)
swapgs
mov %gs:<percpu_offset>, %reg1
mov (%reg1), %reg2
// dependent load or store based on the value of %reg2
// for example: mov %(reg2), %reg3
It's difficult to audit for this gadget in all the handlers, so while
there are no known instances of it, it's entirely possible that it
exists somewhere (or could be introduced in the future). Without
tooling to analyze all such code paths, consider it vulnerable.
Effects of SMAP on the !FSGSBASE case:
- If SMAP is enabled, and the CPU reports RDCL_NO (i.e., not
susceptible to Meltdown), the kernel is prevented from speculatively
reading user space memory, even L1 cached values. This effectively
disables the !FSGSBASE attack vector.
- If SMAP is enabled, but the CPU *is* susceptible to Meltdown, SMAP
still prevents the kernel from speculatively reading user space
memory. But it does *not* prevent the kernel from reading the
user value from L1, if it has already been cached. This is probably
only a small hurdle for an attacker to overcome.
Thanks to Dave Hansen for contributing the speculative_smap() function.
Thanks to Andrew Cooper for providing the inside scoop on whether swapgs
is serializing on AMD.
[ tglx: Fixed the USER fence decision and polished the comment as suggested
by Dave Hansen ]
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
2019-07-08 11:52:26 -05:00
nospectre_v1 [X86,PPC] Disable mitigations for Spectre Variant 1
(bounds check bypass). With this option data leaks are
possible in the system.
2018-05-29 17:48:27 +02:00
2019-04-15 16:21:20 -05:00
nospectre_v2 [X86,PPC_FSL_BOOK3E,ARM64] Disable all mitigations for
the Spectre variant 2 (indirect branch prediction)
vulnerability. System may allow data leaks with this
option.
2018-01-11 21:46:26 +00:00
2018-04-25 22:04:21 -04:00
nospec_store_bypass_disable
[HW] Disable all mitigations for the Speculative Store Bypass vulnerability
2009-05-22 12:17:45 -07:00
noxsave [BUGS=X86] Disables x86 extended register state save
and restore using xsave. The kernel will fallback to
enabling legacy floating-point and sse state.
2014-05-29 11:12:31 -07:00
noxsaveopt [X86] Disables xsaveopt used in saving x86 extended
register states. The kernel will fall back to use
xsave to save the states. By using this parameter,
performance of saving the states is degraded because
xsave doesn't support modified optimization while
xsaveopt supports it on xsaveopt enabled systems.
noxsaves [X86] Disables xsaves and xrstors used in saving and
restoring x86 extended register state in compacted
form of xsave area. The kernel will fall back to use
xsaveopt and xrstor to save and restore the states
in standard form of xsave area. By using this
parameter, xsave area per process might occupy more
memory on xsaves enabled systems.
2009-03-31 13:55:44 +01:00
nohlt [BUGS=ARM,SH] Tells the kernel that the sleep(SH) or
wfi(ARM) instruction doesn't work correctly and not to
use it. This is also useful when using JTAG debugger.
2005-10-23 12:57:11 -07:00
file capabilities: add no_file_caps switch (v4)
Add a no_file_caps boot option when file capabilities are
compiled into the kernel (CONFIG_SECURITY_FILE_CAPABILITIES=y).
This allows distributions to ship a kernel with file capabilities
compiled in, without forcing users to use (and understand and
trust) them.
When no_file_caps is specified at boot, then when a process executes
a file, any file capabilities stored with that file will not be
used in the calculation of the process' new capability sets.
This means that booting with the no_file_caps boot option will
not be the same as booting a kernel with file capabilities
compiled out - in particular a task with CAP_SETPCAP will not
have any chance of passing capabilities to another task (which
isn't "really" possible anyway, and which may soon by killed
altogether by David Howells in any case), and it will instead
be able to put new capabilities in its pI. However since fI
will always be empty and pI is masked with fI, it gains the
task nothing.
We also support the extra prctl options, setting securebits and
dropping capabilities from the per-process bounding set.
The other remaining difference is that killpriv, task_setscheduler,
setioprio, and setnice will continue to be hooked. That will
be noticable in the case where a root task changed its uid
while keeping some caps, and another task owned by the new uid
tries to change settings for the more privileged task.
Changelog:
Nov 05 2008: (v4) trivial port on top of always-start-\
with-clear-caps patch
Sep 23 2008: nixed file_caps_enabled when file caps are
not compiled in as it isn't used.
Document no_file_caps in kernel-parameters.txt.
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Andrew G. Morgan <morgan@kernel.org>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-05 16:08:52 -06:00
no_file_caps Tells the kernel not to honor file capabilities. The
only way then for a file to be executed with privilege
is to be setuid root or executed by root.
2005-04-16 15:20:36 -07:00
nohalt [IA-64] Tells the kernel not to use the power saving
function PAL_HALT_LIGHT when idle. This increases
power-consumption. On the positive side, it reduces
interrupt wake-up latency, which may improve performance
in certain environments such as networked servers or
real-time systems.
2014-06-13 13:30:35 -07:00
nohibernate [HIBERNATION] Disable hibernation and resume.
2007-02-16 01:28:03 -08:00
nohz= [KNL] Boottime enable/disable dynamic ticks
Valid arguments: on, off
Default: on
2017-12-14 19:18:27 +01:00
nohz_full= [KNL,BOOT,SMP,ISOL]
2016-10-11 13:51:35 -07:00
The argument is a cpu list, as described above.
2013-04-12 16:45:34 +02:00
In kernels built with CONFIG_NO_HZ_FULL=y, set
2012-12-18 17:32:19 +01:00
the specified list of CPUs whose tick will be stopped
2013-03-27 02:18:34 +01:00
whenever possible. The boot CPU will be forced outside
2017-06-02 11:26:43 -07:00
the range to maintain the timekeeping. Any CPUs
in this list will have their RCU callbacks offloaded,
just as if they had also been called out in the
rcu_nocbs= boot parameter.
2012-12-18 17:32:19 +01:00
2009-04-02 12:31:16 +09:00
noiotrap [SH] Disables trapped I/O port accesses.
2007-07-31 00:37:59 -07:00
noirqdebug [X86-32] Disables the code which attempts to detect and
2005-04-16 15:20:36 -07:00
disable unhandled interrupt sources.
2009-04-14 14:03:43 +05:30
no_timer_check [X86,APIC] Disables the code which tests for
2006-12-07 02:14:09 +01:00
broken timer IRQ sources.
2005-04-16 15:20:36 -07:00
noisapnp [ISAPNP] Disables ISA PnP code.
noinitrd [RAM] Tells the kernel not to load any configured
initial RAM disk.
2009-04-17 16:42:15 +08:00
nointremap [X86-64, Intel-IOMMU] Do not enable interrupt
remapping.
2010-07-20 11:06:49 -07:00
[Deprecated - use intremap=off]
2009-04-17 16:42:15 +08:00
2005-04-16 15:20:36 -07:00
nointroute [IA-64]
2016-01-29 11:42:58 -08:00
noinvpcid [X86] Disable the INVPCID cpu feature.
2011-08-13 12:34:52 -07:00
nojitter [IA-64] Disables jitter checking for ITC timers.
2007-07-20 11:22:30 -07:00
2010-08-16 17:51:20 +02:00
no-kvmclock [X86,KVM] Disable paravirtualized KVM clock driver
2010-10-14 11:22:51 +02:00
no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page
fault handling.
2016-10-28 00:54:32 -07:00
no-vmw-sched-clock
[X86,PV_OPS] Disable paravirtualized VMware scheduler
clock and use the default one.
2020-03-23 19:57:06 +00:00
no-steal-acc [X86,PV_OPS,ARM64] Disable paravirtualized steal time
2019-10-21 16:28:23 +01:00
accounting. steal time is computed, but won't
influence scheduler behaviour
2011-07-11 15:28:19 -04:00
2007-07-31 00:37:59 -07:00
nolapic [X86-32,APIC] Do not enable or use the local APIC.
2005-04-16 15:20:36 -07:00
2007-07-31 00:37:59 -07:00
nolapic_timer [X86-32,APIC] Do not use the local APIC timer.
2007-03-22 00:11:21 -08:00
2005-04-16 15:20:36 -07:00
noltlbs [PPC] Do not use large page/tlb entries for kernel
2016-02-09 17:07:52 +01:00
lowmem mapping on PPC40x and PPC8xx
2005-04-16 15:20:36 -07:00
2006-02-22 09:57:55 +09:00
nomca [IA-64] Disable machine check abort handling
2015-05-16 02:16:43 +09:00
nomce [X86-32] Disable Machine Check Exception
2006-04-01 01:36:09 +02:00
2007-10-12 23:04:06 +02:00
nomfgpt [X86-32] Disable Multi-Function General Purpose
Timer usage (for AMD Geode machines).
2011-10-13 15:14:27 -04:00
nonmi_ipi [X86] Disable using NMI IPIs during panic/reboot to
shutdown the other cpus. Instead use the REBOOT_VECTOR
irq.
2012-02-01 10:33:14 +08:00
nomodule Disable module load
2010-01-18 17:05:40 +01:00
nopat [X86] Disable PAT (page attribute table extension of
pagetables) support.
2017-06-29 08:53:20 -07:00
nopcid [X86-64] Disable the PCID cpu feature.
2009-04-05 15:55:22 -07:00
norandmaps Don't use address space randomization. Equivalent to
echo 0 > /proc/sys/kernel/randomize_va_space
2007-07-31 00:37:59 -07:00
noreplace-smp [X86-32,SMP] Don't replace SMP instructions
2007-05-02 19:27:13 +02:00
with UP alternatives
2014-05-11 20:25:20 -07:00
nordrand [X86] Disable kernel use of the RDRAND and
RDSEED instructions even if they are supported
by the processor. RDRAND and RDSEED are still
available to user space applications.
2011-07-31 14:02:19 -07:00
2005-10-23 12:57:11 -07:00
noresume [SWSUSP] Disables resume and restores original swap
space.
2005-04-16 15:20:36 -07:00
no-scroll [VGA] Disables scrollback.
This is required for the Braillex ib80-piezo Braille
reader made by F.H. Papenmeier (Germany).
nosbagart [IA-64]
2007-07-31 00:37:59 -07:00
nosep [BUGS=X86-32] Disables x86 SYSENTER/SYSEXIT support.
2006-03-23 02:59:34 -08:00
2007-08-16 03:34:22 -04:00
nosmp [SMP] Tells an SMP kernel to act as a UP kernel,
and disable the IO APIC. legacy for "maxcpus=0".
2005-04-16 15:20:36 -07:00
2007-07-15 23:41:05 -07:00
nosoftlockup [KNL] Disable the soft-lockup detector.
2005-04-16 15:20:36 -07:00
nosync [HW,M68K] Disables sync negotiation for all devices.
2015-04-14 15:44:13 -07:00
nowatchdog [KNL] Disable both lockup detectors, i.e.
2018-04-18 20:51:39 +02:00
soft-lockup and NMI watchdog (hard-lockup).
2010-05-07 17:11:44 -04:00
2005-04-16 15:20:36 -07:00
nowb [ARM]
2005-10-23 12:57:11 -07:00
2009-04-17 16:42:12 +08:00
nox2apic [X86-64,APIC] Do not enable x2APIC mode.
2012-11-13 11:32:38 -08:00
cpu0_hotplug [X86] Turn on CPU0 hotplug feature when
CONFIG_BOOTPARAM_HOTPLUG_CPU0 is off.
Some features depend on CPU0. Known dependencies are:
1. Resume from suspend/hibernate depends on CPU0.
Suspend/hibernate will fail if CPU0 is offline and you
need to online CPU0 before suspend/hibernate.
2. PIC interrupts also depend on CPU0. CPU0 can't be
removed if a PIC interrupt is detected.
It's said poweroff/reboot may depend on CPU0 on some
machines although I haven't seen such issues so far
after CPU0 is offline on a few tested machines.
If the dependencies are under your control, you can
turn on cpu0_hotplug.
2018-04-18 20:51:39 +02:00
nps_mtm_hs_ctr= [KNL,ARC]
2017-06-15 11:43:57 +03:00
This parameter sets the maximum duration, in
cycles, each HW thread of the CTOP can run
without interruptions, before HW switches it.
The actual maximum duration is 16 times this
parameter's value.
Format: integer between 1 and 255
Default: 255
2011-08-13 12:34:52 -07:00
nptcg= [IA-64] Override max number of concurrent global TLB
2008-03-14 13:57:08 -07:00
purges which is reported from either PAL_VM_SUMMARY or
SAL PALO.
2010-02-10 01:20:37 -08:00
nr_cpus= [SMP] Maximum number of processors that an SMP kernel
could support. nr_cpus=n : n >= 1 limits the kernel to
2016-08-24 13:06:45 +08:00
support 'n' processors. It could be larger than the
number of already plugged CPU during bootup, later in
runtime you can physically add extra cpu until it reaches
n. So during boot up some boot time memory for per-cpu
variables need be pre-allocated for later physical cpu
hot plugging.
2010-02-10 01:20:37 -08:00
2009-04-05 15:55:22 -07:00
nr_uarts= [SERIAL] maximum number of UARTs to be registered.
2012-11-22 11:16:36 +00:00
numa_balancing= [KNL,X86] Enable or disable automatic NUMA balancing.
Allowed values are enable and disable
2007-07-15 23:38:01 -07:00
numa_zonelist_order= [KNL, BOOT] Select zonelist order for NUMA.
2017-09-06 16:20:13 -07:00
'node', 'default' can be specified
2007-07-15 23:38:01 -07:00
This can be set from sysctl after boot.
2019-04-22 16:48:00 -03:00
See Documentation/admin-guide/sysctl/vm.rst for details.
2007-07-15 23:38:01 -07:00
2009-01-06 14:42:44 -08:00
ohci1394_dma=early [HW] enable debugging via the ohci1394 driver.
2020-05-01 17:37:50 +02:00
See Documentation/core-api/debugging-via-ohci1394.rst for more
2009-01-06 14:42:44 -08:00
info.
2008-04-29 00:59:53 -07:00
olpc_ec_timeout= [OLPC] ms delay when issuing EC commands
Rather than timing out after 20 ms if an EC
command is not properly ACKed, override the length
of the timeout. We have interrupts disabled while
waiting for the ACK, so if this is set too high
interrupts *may* be lost!
2009-12-11 16:16:32 -08:00
omap_mux= [OMAP] Override bootloader pin multiplexing.
Format: <mux_mode0.mode_name=value>...
For example, to override I2C bus2:
omap_mux=i2c2_scl.i2c2_scl=0x100,i2c2_sda.i2c2_sda=0x100
2005-04-16 15:20:36 -07:00
oprofile.timer= [HW]
Use timer interrupt instead of performance counters
2009-05-06 12:10:23 +02:00
oprofile.cpu_type= Force an oprofile cpu type
This might be useful if you have an older oprofile
userland or if you want common events.
2009-06-23 11:48:14 +02:00
Format: { arch_perfmon }
arch_perfmon: [X86] Force use of architectural
2009-05-06 12:10:23 +02:00
perfmon on Intel CPUs instead of the
CPU specific event set.
2011-10-11 19:39:16 +02:00
timer: [X86] Force use of architectural NMI
timer mode (see also oprofile.timer
for generic hr timer mode)
2009-04-27 17:44:11 +02:00
2011-04-04 15:02:24 -07:00
oops=panic Always panic on oopses. Default is to just kill the
process, but there is a small probability of
deadlocking the machine.
2011-03-22 16:34:04 -07:00
This will also cause panics on machine check exceptions.
Useful together with panic=30 to trigger a reboot.
mm: shuffle initial free memory to improve memory-side-cache utilization
Patch series "mm: Randomize free memory", v10.
This patch (of 3):
Randomization of the page allocator improves the average utilization of
a direct-mapped memory-side-cache. Memory side caching is a platform
capability that Linux has been previously exposed to in HPC
(high-performance computing) environments on specialty platforms. In
that instance it was a smaller pool of high-bandwidth-memory relative to
higher-capacity / lower-bandwidth DRAM. Now, this capability is going
to be found on general purpose server platforms where DRAM is a cache in
front of higher latency persistent memory [1].
Robert offered an explanation of the state of the art of Linux
interactions with memory-side-caches [2], and I copy it here:
It's been a problem in the HPC space:
http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/
A kernel module called zonesort is available to try to help:
https://software.intel.com/en-us/articles/xeon-phi-software
and this abandoned patch series proposed that for the kernel:
https://lkml.kernel.org/r/20170823100205.17311-1-lukasz.daniluk@intel.com
Dan's patch series doesn't attempt to ensure buffers won't conflict, but
also reduces the chance that the buffers will. This will make performance
more consistent, albeit slower than "optimal" (which is near impossible
to attain in a general-purpose kernel). That's better than forcing
users to deploy remedies like:
"To eliminate this gradual degradation, we have added a Stream
measurement to the Node Health Check that follows each job;
nodes are rebooted whenever their measured memory bandwidth
falls below 300 GB/s."
A replacement for zonesort was merged upstream in commit cc9aec03e58f
("x86/numa_emulation: Introduce uniform split capability"). With this
numa_emulation capability, memory can be split into cache sized
("near-memory" sized) numa nodes. A bind operation to such a node, and
disabling workloads on other nodes, enables full cache performance.
However, once the workload exceeds the cache size then cache conflicts
are unavoidable. While HPC environments might be able to tolerate
time-scheduling of cache sized workloads, for general purpose server
platforms, the oversubscribed cache case will be the common case.
The worst case scenario is that a server system owner benchmarks a
workload at boot with an un-contended cache only to see that performance
degrade over time, even below the average cache performance due to
excessive conflicts. Randomization clips the peaks and fills in the
valleys of cache utilization to yield steady average performance.
Here are some performance impact details of the patches:
1/ An Intel internal synthetic memory bandwidth measurement tool, saw a
3X speedup in a contrived case that tries to force cache conflicts.
The contrived cased used the numa_emulation capability to force an
instance of the benchmark to be run in two of the near-memory sized
numa nodes. If both instances were placed on the same emulated they
would fit and cause zero conflicts. While on separate emulated nodes
without randomization they underutilized the cache and conflicted
unnecessarily due to the in-order allocation per node.
2/ A well known Java server application benchmark was run with a heap
size that exceeded cache size by 3X. The cache conflict rate was 8%
for the first run and degraded to 21% after page allocator aging. With
randomization enabled the rate levelled out at 11%.
3/ A MongoDB workload did not observe measurable difference in
cache-conflict rates, but the overall throughput dropped by 7% with
randomization in one case.
4/ Mel Gorman ran his suite of performance workloads with randomization
enabled on platforms without a memory-side-cache and saw a mix of some
improvements and some losses [3].
While there is potentially significant improvement for applications that
depend on low latency access across a wide working-set, the performance
may be negligible to negative for other workloads. For this reason the
shuffle capability defaults to off unless a direct-mapped
memory-side-cache is detected. Even then, the page_alloc.shuffle=0
parameter can be specified to disable the randomization on those systems.
Outside of memory-side-cache utilization concerns there is potentially
security benefit from randomization. Some data exfiltration and
return-oriented-programming attacks rely on the ability to infer the
location of sensitive data objects. The kernel page allocator, especially
early in system boot, has predictable first-in-first out behavior for
physical pages. Pages are freed in physical address order when first
onlined.
Quoting Kees:
"While we already have a base-address randomization
(CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and
memory layouts would certainly be using the predictability of
allocation ordering (i.e. for attacks where the base address isn't
important: only the relative positions between allocated memory).
This is common in lots of heap-style attacks. They try to gain
control over ordering by spraying allocations, etc.
I'd really like to see this because it gives us something similar
to CONFIG_SLAB_FREELIST_RANDOM but for the page allocator."
While SLAB_FREELIST_RANDOM reduces the predictability of some local slab
caches it leaves vast bulk of memory to be predictably in order allocated.
However, it should be noted, the concrete security benefits are hard to
quantify, and no known CVE is mitigated by this randomization.
Introduce shuffle_free_memory(), and its helper shuffle_zone(), to perform
a Fisher-Yates shuffle of the page allocator 'free_area' lists when they
are initially populated with free memory at boot and at hotplug time. Do
this based on either the presence of a page_alloc.shuffle=Y command line
parameter, or autodetection of a memory-side-cache (to be added in a
follow-on patch).
The shuffling is done in terms of CONFIG_SHUFFLE_PAGE_ORDER sized free
pages where the default CONFIG_SHUFFLE_PAGE_ORDER is MAX_ORDER-1 i.e. 10,
4MB this trades off randomization granularity for time spent shuffling.
MAX_ORDER-1 was chosen to be minimally invasive to the page allocator
while still showing memory-side cache behavior improvements, and the
expectation that the security implications of finer granularity
randomization is mitigated by CONFIG_SLAB_FREELIST_RANDOM. The
performance impact of the shuffling appears to be in the noise compared to
other memory initialization work.
This initial randomization can be undone over time so a follow-on patch is
introduced to inject entropy on page free decisions. It is reasonable to
ask if the page free entropy is sufficient, but it is not enough due to
the in-order initial freeing of pages. At the start of that process
putting page1 in front or behind page0 still keeps them close together,
page2 is still near page1 and has a high chance of being adjacent. As
more pages are added ordering diversity improves, but there is still high
page locality for the low address pages and this leads to no significant
impact to the cache conflict rate.
[1]: https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/
[2]: https://lkml.kernel.org/r/AT5PR8401MB1169D656C8B5E121752FC0F8AB120@AT5PR8401MB1169.NAMPRD84.PROD.OUTLOOK.COM
[3]: https://lkml.org/lkml/2018/10/12/309
[dan.j.williams@intel.com: fix shuffle enable]
Link: http://lkml.kernel.org/r/154943713038.3858443.4125180191382062871.stgit@dwillia2-desk3.amr.corp.intel.com
[cai@lca.pw: fix SHUFFLE_PAGE_ALLOCATOR help texts]
Link: http://lkml.kernel.org/r/20190425201300.75650-1-cai@lca.pw
Link: http://lkml.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Robert Elliott <elliott@hpe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 15:41:28 -07:00
page_alloc.shuffle=
[KNL] Boolean flag to control whether the page allocator
should randomize its free lists. The randomization may
be automatically enabled if the kernel detects it is
running on a platform with a direct-mapped memory-side
cache, and this parameter can be used to
override/disable that behavior. The state of the flag
can be read from sysfs at:
/sys/module/page_alloc/parameters/shuffle.
mm/page_owner: keep track of page owners
This is the page owner tracking code which is introduced so far ago. It
is resident on Andrew's tree, though, nobody tried to upstream so it
remain as is. Our company uses this feature actively to debug memory leak
or to find a memory hogger so I decide to upstream this feature.
This functionality help us to know who allocates the page. When
allocating a page, we store some information about allocation in extra
memory. Later, if we need to know status of all pages, we can get and
analyze it from this stored information.
In previous version of this feature, extra memory is statically defined in
struct page, but, in this version, extra memory is allocated outside of
struct page. It enables us to turn on/off this feature at boottime
without considerable memory waste.
Although we already have tracepoint for tracing page allocation/free,
using it to analyze page owner is rather complex. We need to enlarge the
trace buffer for preventing overlapping until userspace program launched.
And, launched program continually dump out the trace buffer for later
analysis and it would change system behaviour with more possibility rather
than just keeping it in memory, so bad for debug.
Moreover, we can use page_owner feature further for various purposes. For
example, we can use it for fragmentation statistics implemented in this
patch. And, I also plan to implement some CMA failure debugging feature
using this interface.
I'd like to give the credit for all developers contributed this feature,
but, it's not easy because I don't know exact history. Sorry about that.
Below is people who has "Signed-off-by" in the patches in Andrew's tree.
Contributor:
Alexander Nyberg <alexn@dsv.su.se>
Mel Gorman <mgorman@suse.de>
Dave Hansen <dave@linux.vnet.ibm.com>
Minchan Kim <minchan@kernel.org>
Michal Nazarewicz <mina86@mina86.com>
Andrew Morton <akpm@linux-foundation.org>
Jungsoo Son <jungsoo.son@lge.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-12 16:56:01 -08:00
page_owner= [KNL] Boot-time page_owner enabling option.
Storage of the information about who allocated
each page is disabled in default. With this switch,
we can turn it on.
on: enable the feature
2016-03-15 14:56:27 -07:00
page_poison= [KNL] Boot-time parameter changing the state of
2018-08-21 21:53:10 -07:00
poisoning on the buddy allocator, available with
CONFIG_PAGE_POISONING=y.
off: turn off poisoning (default)
2016-03-15 14:56:27 -07:00
on: turn on poisoning
2011-04-04 15:02:24 -07:00
panic= [KNL] Kernel behaviour on panic: delay <timeout>
2011-07-26 16:08:52 -07:00
timeout > 0: seconds before rebooting
timeout = 0: wait forever
timeout < 0: reboot immediately
2005-04-16 15:20:36 -07:00
Format: <timeout>
2019-01-03 15:28:17 -08:00
panic_print= Bitmask for printing system info when panic happens.
User can chose combination of the following bits:
bit 0: print all tasks info
bit 1: print system memory info
bit 2: print timer info
bit 3: print locks info if CONFIG_LOCKDEP is on
bit 4: print ftrace buffer
2019-05-17 14:31:50 -07:00
bit 5: print all printk messages in buffer
2019-01-03 15:28:17 -08:00
2020-06-07 21:40:17 -07:00
panic_on_taint= Bitmask for conditionally calling panic() in add_taint()
Format: <hex>[,nousertaint]
Hexadecimal bitmask representing the set of TAINT flags
that will cause the kernel to panic when add_taint() is
called with any of the flags in this set.
The optional switch "nousertaint" can be utilized to
prevent userspace forced crashes by writing to sysctl
/proc/sys/kernel/tainted any flagset matching with the
bitmask set on panic_on_taint.
See Documentation/admin-guide/tainted-kernels.rst for
extra details on the taint flags that users can pick
to compose the bitmask to assign to panic_on_taint.
2014-12-10 15:45:50 -08:00
panic_on_warn panic() instead of WARN(). Useful to cause kdump
on a WARN().
2014-06-06 14:37:07 -07:00
crash_kexec_post_notifiers
Run kdump after running panic-notifiers and dumping
kmsg. This only for the users who doubt kdump always
succeeds in any situation.
Note that this also increases risks of kdump failure,
because some panic notifiers can make the crashed
kernel more unstable.
2005-04-16 15:20:36 -07:00
parkbd.port= [HW] Parallel port number the keyboard adapter is
connected to, default is 0.
Format: <parport#>
parkbd.mode= [HW] Parallel port keyboard adapter mode of operation,
0 for XT, 1 for AT (default is AT).
2005-10-23 12:57:11 -07:00
Format: <mode>
parport= [HW,PPT] Specify parallel ports. 0 disables.
Format: { 0 | auto | 0xBBB[,IRQ[,DMA]] }
Use 'auto' to force the driver to use any
IRQ/DMA settings detected (the default is to
ignore detected IRQ/DMA settings because of
possible conflicts). You can specify the base
address, IRQ, and DMA settings; IRQ and DMA
should be numbers, or 'auto' (for using detected
settings on that particular port), or 'nofifo'
(to avoid using a FIFO even if it is detected).
Parallel ports are assigned in the order they
are specified on the command line, starting
with parport0.
parport_init_mode= [HW,PPT]
Configure VIA parallel port to operate in
a specific mode. This is necessary on Pegasos
computer where firmware has no options for setting
up parallel port mode and sets it to spp.
Currently this function knows 686a and 8231 chips.
2005-04-16 15:20:36 -07:00
Format: [spp|ps2|epp|ecp|ecpepp]
2006-03-23 03:00:57 -08:00
pause_on_oops=
Halt all CPUs after the first oops has been printed for
the specified number of seconds. This is to be used if
your oopses keep scrolling off the screen.
2005-04-16 15:20:36 -07:00
pcbit= [HW,ISDN]
pcd. [PARIDE]
See header of drivers/block/paride/pcd.c.
2019-06-18 11:47:10 -03:00
See also Documentation/admin-guide/blockdev/paride.rst.
2005-04-16 15:20:36 -07:00
2018-07-30 10:18:37 -06:00
pci=option[,option...] [PCI] various PCI subsystem options.
Some options herein operate on a specific device
or a set of devices (<pci_dev>). These are
specified in one of the following formats:
2018-07-30 10:18:38 -06:00
[<domain>:]<bus>:<dev>.<func>[/<dev>.<func>]*
2018-07-30 10:18:37 -06:00
pci:<vendor>:<device>[:<subvendor>:<subdevice>]
Note: the first format specifies a PCI
bus/device/function address which may change
if new hardware is inserted, if motherboard
firmware changes, or due to changes caused
by other kernel parameters. If the
domain is left unspecified, it is
2018-07-30 10:18:38 -06:00
taken to be zero. Optionally, a path
to a device through multiple device/function
addresses can be specified after the base
address (this is more robust against
renumbering issues). The second format
2018-07-30 10:18:37 -06:00
selects devices using IDs from the
configuration space which may match multiple
devices in the system.
2018-06-04 22:16:09 -04:00
earlydump dump PCI config space before the kernel
2018-04-18 20:51:39 +02:00
changes anything
2008-08-22 09:53:39 +02:00
off [X86] don't probe for the PCI bus
2007-07-31 00:37:59 -07:00
bios [X86-32] force use of PCI BIOS, don't access
2005-10-23 12:57:11 -07:00
the hardware directly. Use this if your machine
has a non-standard PCI host bridge.
2007-07-31 00:37:59 -07:00
nobios [X86-32] disallow use of PCI BIOS, only direct
2005-10-23 12:57:11 -07:00
hardware access methods are allowed. Use this
if you experience crashes upon bootup and you
suspect they are caused by the BIOS.
2016-01-13 16:48:51 +01:00
conf1 [X86] Force use of PCI Configuration Access
Mechanism 1 (config address in IO port 0xCF8,
data in IO port 0xCFC, both 32-bit).
conf2 [X86] Force use of PCI Configuration Access
Mechanism 2 (IO port 0xCF8 is an 8-bit port for
the function, IO port 0xCFA, also 8-bit, sets
bus number. The config space is then accessed
through ports 0xC000-0xCFFF).
See http://wiki.osdev.org/PCI for more info
on the configuration access mechanisms.
2007-10-05 13:17:58 -07:00
noaer [PCIE] If the PCIEAER kernel config parameter is
enabled, this kernel boot option can be used to
disable the use of PCIE advanced error reporting.
2007-10-11 16:57:27 -04:00
nodomains [PCI] Disable support for multiple PCI
root domains (aka PCI segments, in ACPI-speak).
2009-04-14 14:03:43 +05:30
nommconf [X86] Disable use of MMCONFIG for PCI
2006-02-15 15:17:43 -08:00
Configuration
2009-06-07 16:15:16 +02:00
check_enable_amd_mmconf [X86] check for and enable
properly configured MMIO access to PCI
config space on AMD family 10h CPU
2006-03-05 22:33:34 -07:00
nomsi [MSI] If the PCI_MSI kernel config parameter is
enabled, this kernel boot option can be used to
disable the use of MSI interrupts system-wide.
2008-06-11 16:35:14 +02:00
noioapicquirk [APIC] Disable all boot interrupt quirks.
Safety option to keep boot IRQs enabled. This
should never be necessary.
2008-06-11 16:35:15 +02:00
ioapicreroute [APIC] Enable rerouting of boot IRQs to the
primary IO-APIC for bridges that cannot disable
boot IRQs. This fixes a source of spurious IRQs
when the system masks IRQs.
2008-07-15 13:48:55 +02:00
noioapicreroute [APIC] Disable workaround that uses the
boot IRQ equivalent of an IRQ that connects to
a chipset where boot IRQs cannot be disabled.
The opposite of ioapicreroute.
2007-07-31 00:37:59 -07:00
biosirq [X86-32] Use PCI BIOS calls to get the interrupt
2005-10-23 12:57:11 -07:00
routing table. These calls are known to be buggy
on several machines and they hang the machine
when used, but on other computers it's the only
way to get the interrupt routing table. Try
this option if the kernel is unable to allocate
IRQs or discover secondary PCI buses on your
motherboard.
2008-08-22 09:53:39 +02:00
rom [X86] Assign address space to expansion ROMs.
2005-10-23 12:57:11 -07:00
Use with caution as certain devices share
address decoders between ROMs and other
resources.
2008-08-22 09:53:39 +02:00
norom [X86] Do not assign address space to
2008-05-12 13:57:46 -07:00
expansion ROMs that do not already have
BIOS assigned address ranges.
2010-05-12 11:14:32 -07:00
nobar [X86] Do not assign address space to the
BARs that weren't assigned by the BIOS.
2008-08-22 09:53:39 +02:00
irqmask=0xMMMM [X86] Set a bit mask of IRQs allowed to be
2005-10-23 12:57:11 -07:00
assigned automatically to PCI devices. You can
make the kernel exclude IRQs of your ISA cards
this way.
2008-08-22 09:53:39 +02:00
pirqaddr=0xAAAAA [X86] Specify the physical address
2005-10-23 12:57:11 -07:00
of the PIRQ table (normally generated
by the BIOS) if it is outside the
F0000h-100000h range.
2008-08-22 09:53:39 +02:00
lastbus=N [X86] Scan all buses thru bus #N. Can be
2005-10-23 12:57:11 -07:00
useful if the kernel is unable to find your
secondary buses and you want to tell it
explicitly which ones they are.
2008-08-22 09:53:39 +02:00
assign-busses [X86] Always assign all PCI bus
2005-10-23 12:57:11 -07:00
numbers ourselves, overriding
whatever the firmware may have done.
2008-08-22 09:53:39 +02:00
usepirqmask [X86] Honor the possible IRQ mask stored
2005-10-23 12:57:11 -07:00
in the BIOS $PIR table. This is needed on
some systems with broken BIOSes, notably
some HP Pavilion N5400 and Omnibook XE3
notebooks. This will have no effect if ACPI
IRQ routing is enabled.
2008-08-22 09:53:39 +02:00
noacpi [X86] Do not use ACPI for IRQ routing
2005-10-23 12:57:11 -07:00
or for PCI scanning.
2010-02-23 10:24:41 -07:00
use_crs [X86] Use PCI host bridge window information
from ACPI. On BIOSes from 2008 or later, this
is enabled by default. If you need to use this,
please report a bug.
nocrs [X86] Ignore PCI host bridge windows from ACPI.
2018-04-18 20:51:39 +02:00
If you need to use this, please report a bug.
2005-10-23 12:57:11 -07:00
routeirq Do IRQ routing for all PCI devices.
This is normally done in pci_enable_device(),
so this option is a temporary workaround
for broken drivers that don't call it.
2008-03-27 01:31:18 -07:00
skip_isa_align [X86] do not align io start addr, so can
handle more pci cards
2006-09-26 10:52:41 +02:00
noearly [X86] Don't do any early type 1 scanning.
This might help on some broken boards which
machine check when some devices' config space
is read. But various workarounds are disabled
and some IOMMU drivers will not work.
PCI: optionally sort device lists breadth-first
Problem:
New Dell PowerEdge servers have 2 embedded ethernet ports, which are
labeled NIC1 and NIC2 on the chassis, in the BIOS setup screens, and
in the printed documentation. Assuming no other add-in ethernet ports
in the system, Linux 2.4 kernels name these eth0 and eth1
respectively. Many people have come to expect this naming. Linux 2.6
kernels name these eth1 and eth0 respectively (backwards from
expectations). I also have reports that various Sun and HP servers
have similar behavior.
Root cause:
Linux 2.4 kernels walk the pci_devices list, which happens to be
sorted in breadth-first order (or pcbios_find_device order on i386,
which most often is breadth-first also). 2.6 kernels have both the
pci_devices list and the pci_bus_type.klist_devices list, the latter
is what is walked at driver load time to match the pci_id tables; this
klist happens to be in depth-first order.
On systems where, for physical routing reasons, NIC1 appears on a
lower bus number than NIC2, but NIC2's bridge is discovered first in
the depth-first ordering, NIC2 will be discovered before NIC1. If the
list were sorted breadth-first, NIC1 would be discovered before NIC2.
A PowerEdge 1955 system has the following topology which easily
exhibits the difference between depth-first and breadth-first device
lists.
-[0000:00]-+-00.0 Intel Corporation 5000P Chipset Memory Controller Hub
+-02.0-[0000:03-08]--+-00.0-[0000:04-07]--+-00.0-[0000:05-06]----00.0-[0000:06]----00.0 Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (labeled NIC2, 2.4 kernel name eth1, 2.6 kernel name eth0)
+-1c.0-[0000:01-02]----00.0-[0000:02]----00.0 Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (labeled NIC1, 2.4 kernel name eth0, 2.6 kernel name eth1)
Other factors, such as device driver load order and the presence of
PCI slots at various points in the bus hierarchy further complicate
this problem; I'm not trying to solve those here, just restore the
device order, and thus basic behavior, that 2.4 kernels had.
Solution:
The solution can come in multiple steps.
Suggested fix #1: kernel
Patch below optionally sorts the two device lists into breadth-first
ordering to maintain compatibility with 2.4 kernels. It adds two new
command line options:
pci=bfsort
pci=nobfsort
to force the sort order, or not, as you wish. It also adds DMI checks
for the specific Dell systems which exhibit "backwards" ordering, to
make them "right".
Suggested fix #2: udev rules from userland
Many people also have the expectation that embedded NICs are always
discovered before add-in NICs (which this patch does not try to do).
Using the PCI IRQ Routing Table provided by system BIOS, it's easy to
determine which PCI devices are embedded, or if add-in, which PCI slot
they're in. I'm working on a tool that would allow udev to name
ethernet devices in ascending embedded, slot 1 .. slot N order,
subsort by PCI bus/dev/fn breadth-first. It'll be possible to use it
independent of udev as well for those distributions that don't use
udev in their installers.
Suggested fix #3: system board routing rules
One can constrain the system board layout to put NIC1 ahead of NIC2
regardless of breadth-first or depth-first discovery order. This adds
a significant level of complexity to board routing, and may not be
possible in all instances (witness the above systems from several
major manufacturers). I don't want to encourage this particular train
of thought too far, at the expense of not doing #1 or #2 above.
Feedback appreciated. Patch tested on a Dell PowerEdge 1955 blade
with 2.6.18.
You'll also note I took some liberty and temporarily break the klist
abstraction to simplify and speed up the sort algorithm. I think
that's both safe and appropriate in this instance.
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-09-29 15:23:23 -05:00
bfsort Sort PCI devices into breadth-first order.
This sorting is done to get a device
order compatible with older (<= 2.4) kernels.
nobfsort Don't sort PCI devices into breadth-first order.
2013-01-30 09:40:52 +08:00
pcie_bus_tune_off Disable PCIe MPS (Max Payload Size)
tuning and use the BIOS-configured MPS defaults.
pcie_bus_safe Set every device's MPS to the largest value
supported by all devices below the root complex.
pcie_bus_perf Set device MPS to the largest allowable MPS
based on its parent bus. Also set MRRS (Max
Read Request Size) to the largest supported
value (no larger than the MPS that the device
or bus can support) for best performance.
pcie_bus_peer2peer Set every device's MPS to 128B, which
every device is guaranteed to support. This
configuration allows peer-to-peer DMA between
any pair of devices, possibly at the cost of
reduced performance. This also guarantees
that hot-added devices will work.
2007-02-05 16:36:06 -08:00
cbiosize=nn[KMG] The fixed amount of bus space which is
reserved for the CardBus bridge's IO window.
The default value is 256 bytes.
cbmemsize=nn[KMG] The fixed amount of bus space which is
reserved for the CardBus bridge's memory
window. The default value is 64 megabytes.
2009-03-16 17:13:39 +09:00
resource_alignment=
Format:
2018-07-30 10:18:37 -06:00
[<order of align>@]<pci_dev>[; ...]
2009-03-16 17:13:39 +09:00
Specifies alignment and device to reassign
2018-07-30 10:18:37 -06:00
aligned memory resources. How to
specify the device is described above.
2009-03-16 17:13:39 +09:00
If <order of align> is not specified,
PAGE_SIZE is used as alignment.
2019-06-06 13:25:57 +10:00
A PCI-PCI bridge can be specified if resource
2009-03-16 17:13:39 +09:00
windows need to be expanded.
2016-08-09 10:33:31 +02:00
To specify the alignment for several
instances of a device, the PCI vendor,
device, subvendor, and subdevice may be
2019-06-06 13:25:57 +10:00
specified, e.g., 12@pci:8086:9c22:103c:198f
for 4096-byte alignment.
2009-04-22 16:52:09 -06:00
ecrc= Enable/disable PCIe ECRC (transaction layer
end-to-end CRC checking).
bios: Use BIOS/firmware settings. This is the
the default.
off: Turn ECRC off
on: Turn ECRC on.
2013-01-23 20:29:06 +08:00
hpiosize=nn[KMG] The fixed amount of bus space which is
reserved for hotplug bridge's IO window.
Default size is 256 bytes.
2019-10-23 12:12:29 +00:00
hpmmiosize=nn[KMG] The fixed amount of bus space which is
reserved for hotplug bridge's MMIO window.
Default size is 2 megabytes.
hpmmioprefsize=nn[KMG] The fixed amount of bus space which is
reserved for hotplug bridge's MMIO_PREF window.
Default size is 2 megabytes.
2013-01-23 20:29:06 +08:00
hpmemsize=nn[KMG] The fixed amount of bus space which is
2019-10-23 12:12:29 +00:00
reserved for hotplug bridge's MMIO and
MMIO_PREF window.
2013-01-23 20:29:06 +08:00
Default size is 2 megabytes.
2016-07-21 21:40:28 -06:00
hpbussize=nn The minimum amount of additional bus numbers
reserved for buses below a hotplug bridge.
Default is 1.
2012-02-23 19:23:30 -08:00
realloc= Enable/disable reallocating PCI bridge resources
if allocations done by BIOS are too small to
accommodate resources required by all child
devices.
off: Turn realloc off
on: Turn realloc on
realloc same as realloc=on
2012-03-01 00:06:33 +01:00
noari do not use PCIe ARI.
2018-05-10 17:56:02 -05:00
noats [PCIE, Intel-IOMMU, AMD-IOMMU]
do not use PCIe ATS (and IOMMU device IOTLB).
2012-04-30 15:21:02 -06:00
pcie_scan_all Scan all possible PCIe devices. Otherwise we
only look for one device below a PCIe downstream
port.
2018-01-11 14:23:29 +01:00
big_root_window Try to add a big 64bit memory window to the PCIe
root complex on AMD CPUs. Some GFX hardware
can resize a BAR to allow access to all VRAM.
Adding the window is slightly risky (it may
conflict with unreported devices), so this
taints the kernel.
2018-07-30 10:18:40 -06:00
disable_acs_redir=<pci_dev>[; ...]
Specify one or more PCI devices (in the format
specified above) separated by semicolons.
Each device specified will have the PCI ACS
redirect capabilities forced off which will
allow P2P traffic between devices through
bridges without forcing it upstream. Note:
this removes isolation between devices and
may put more devices in an IOMMU group.
2019-02-26 16:07:32 +01:00
force_floating [S390] Force usage of floating interrupts.
2019-04-18 21:39:06 +02:00
nomio [S390] Do not use MIO instructions.
2020-04-01 11:12:24 +02:00
norid [S390] ignore the RID field and force use of
one PCI domain per PCI function
PCI: optionally sort device lists breadth-first
Problem:
New Dell PowerEdge servers have 2 embedded ethernet ports, which are
labeled NIC1 and NIC2 on the chassis, in the BIOS setup screens, and
in the printed documentation. Assuming no other add-in ethernet ports
in the system, Linux 2.4 kernels name these eth0 and eth1
respectively. Many people have come to expect this naming. Linux 2.6
kernels name these eth1 and eth0 respectively (backwards from
expectations). I also have reports that various Sun and HP servers
have similar behavior.
Root cause:
Linux 2.4 kernels walk the pci_devices list, which happens to be
sorted in breadth-first order (or pcbios_find_device order on i386,
which most often is breadth-first also). 2.6 kernels have both the
pci_devices list and the pci_bus_type.klist_devices list, the latter
is what is walked at driver load time to match the pci_id tables; this
klist happens to be in depth-first order.
On systems where, for physical routing reasons, NIC1 appears on a
lower bus number than NIC2, but NIC2's bridge is discovered first in
the depth-first ordering, NIC2 will be discovered before NIC1. If the
list were sorted breadth-first, NIC1 would be discovered before NIC2.
A PowerEdge 1955 system has the following topology which easily
exhibits the difference between depth-first and breadth-first device
lists.
-[0000:00]-+-00.0 Intel Corporation 5000P Chipset Memory Controller Hub
+-02.0-[0000:03-08]--+-00.0-[0000:04-07]--+-00.0-[0000:05-06]----00.0-[0000:06]----00.0 Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (labeled NIC2, 2.4 kernel name eth1, 2.6 kernel name eth0)
+-1c.0-[0000:01-02]----00.0-[0000:02]----00.0 Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (labeled NIC1, 2.4 kernel name eth0, 2.6 kernel name eth1)
Other factors, such as device driver load order and the presence of
PCI slots at various points in the bus hierarchy further complicate
this problem; I'm not trying to solve those here, just restore the
device order, and thus basic behavior, that 2.4 kernels had.
Solution:
The solution can come in multiple steps.
Suggested fix #1: kernel
Patch below optionally sorts the two device lists into breadth-first
ordering to maintain compatibility with 2.4 kernels. It adds two new
command line options:
pci=bfsort
pci=nobfsort
to force the sort order, or not, as you wish. It also adds DMI checks
for the specific Dell systems which exhibit "backwards" ordering, to
make them "right".
Suggested fix #2: udev rules from userland
Many people also have the expectation that embedded NICs are always
discovered before add-in NICs (which this patch does not try to do).
Using the PCI IRQ Routing Table provided by system BIOS, it's easy to
determine which PCI devices are embedded, or if add-in, which PCI slot
they're in. I'm working on a tool that would allow udev to name
ethernet devices in ascending embedded, slot 1 .. slot N order,
subsort by PCI bus/dev/fn breadth-first. It'll be possible to use it
independent of udev as well for those distributions that don't use
udev in their installers.
Suggested fix #3: system board routing rules
One can constrain the system board layout to put NIC1 ahead of NIC2
regardless of breadth-first or depth-first discovery order. This adds
a significant level of complexity to board routing, and may not be
possible in all instances (witness the above systems from several
major manufacturers). I don't want to encourage this particular train
of thought too far, at the expense of not doing #1 or #2 above.
Feedback appreciated. Patch tested on a Dell PowerEdge 1955 blade
with 2.6.18.
You'll also note I took some liberty and temporarily break the klist
abstraction to simplify and speed up the sort algorithm. I think
that's both safe and appropriate in this instance.
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-09-29 15:23:23 -05:00
2008-09-24 20:40:34 -04:00
pcie_aspm= [PCIE] Forcibly enable or disable PCIe Active State Power
Management.
off Disable ASPM.
force Enable ASPM even on devices that claim not to support it.
WARNING: Forcing ASPM on may cause system lockups.
2018-03-09 11:21:28 -06:00
pcie_ports= [PCIE] PCIe port services handling:
native Use native PCIe services (PME, AER, DPC, PCIe hotplug)
even if the platform doesn't give the OS permission to
use them. This may cause conflicts if the platform
also tries to use these services.
PCI/DPC: Add "pcie_ports=dpc-native" to allow DPC without AER control
Prior to eed85ff4c0da7 ("PCI/DPC: Enable DPC only if AER is available"),
Linux handled DPC events regardless of whether firmware had granted it
ownership of AER or DPC, e.g., via _OSC.
PCIe r5.0, sec 6.2.10, recommends that the OS link control of DPC to
control of AER, so after eed85ff4c0da7, Linux handles DPC events only if it
has control of AER.
On platforms that do not grant OS control of AER via _OSC, Linux DPC
handling worked before eed85ff4c0da7 but not after.
To make Linux DPC handling work on those platforms the same way they did
before, add a "pcie_ports=dpc-native" kernel parameter that makes Linux
handle DPC events regardless of whether it has control of AER.
[bhelgaas: commit log, move pcie_ports_dpc_native to drivers/pci/]
Link: https://lore.kernel.org/r/20191023192205.97024-1-olof@lixom.net
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-10-23 12:22:05 -07:00
dpc-native Use native PCIe service for DPC only. May
cause conflicts if firmware uses AER or DPC.
2018-03-09 11:21:28 -06:00
compat Disable native PCIe services (PME, AER, DPC, PCIe
hotplug).
2010-08-21 01:51:44 +02:00
2016-06-02 11:17:12 +03:00
pcie_port_pm= [PCIE] PCIe port power management handling:
off Disable power management of all PCIe ports
force Forcibly enable power management of all PCIe ports
2010-02-17 23:39:08 +01:00
pcie_pme= [PCIE,PM] Native PCIe PME signaling options:
2010-02-17 23:40:07 +01:00
nomsi Do not use MSI for native PCIe PME signaling (this makes
PCI: PCIe: Ask BIOS for control of all native services at once
After commit 852972acff8f10f3a15679be2059bb94916cba5d (ACPI: Disable
ASPM if the platform won't provide _OSC control for PCIe) control of
the PCIe Capability Structure is unconditionally requested by
acpi_pci_root_add(), which in principle may cause problems to
happen in two ways. First, the BIOS may refuse to give control of
the PCIe Capability Structure if it is not asked for any of the
_OSC features depending on it at the same time. Second, the BIOS may
assume that control of the _OSC features depending on the PCIe
Capability Structure will be requested in the future and may behave
incorrectly if that doesn't happen. For this reason, control of
the PCIe Capability Structure should always be requested along with
control of any other _OSC features that may depend on it (ie. PCIe
native PME, PCIe native hot-plug, PCIe AER).
Rework the PCIe port driver so that (1) it checks which native PCIe
port services can be enabled, according to the BIOS, and (2) it
requests control of all these services simultaneously. In
particular, this causes pcie_portdrv_probe() to fail if the BIOS
refuses to grant control of the PCIe Capability Structure, which
means that no native PCIe port services can be enabled for the PCIe
Root Complex the given port belongs to. If that happens, ASPM is
disabled to avoid problems with mishandling it by the part of the
PCIe hierarchy for which control of the PCIe Capability Structure
has not been received.
Make it possible to override this behavior using 'pcie_ports=native'
(use the PCIe native services regardless of the BIOS response to the
control request), or 'pcie_ports=compat' (do not use the PCIe native
services at all).
Accordingly, rework the existing PCIe port service drivers so that
they don't request control of the services directly.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-08-21 22:02:38 +02:00
all PCIe root ports use INTx for all services).
2010-02-17 23:39:08 +01:00
2005-04-16 15:20:36 -07:00
pcmv= [HW,PCMCIA] BadgePAD 4
2014-03-28 10:50:21 +05:30
pd_ignore_unused
[PM]
Keep all power-domains already enabled by bootloader on,
even if no driver has claimed them. This is useful
for debug and development, but should not be
needed on a platform with proper driver support.
2005-04-16 15:20:36 -07:00
pd. [PARIDE]
2019-06-18 11:47:10 -03:00
See Documentation/admin-guide/blockdev/paride.rst.
2005-04-16 15:20:36 -07:00
pdcchassis= [PARISC,HW] Disable/Enable PDC Chassis Status codes at
boot time.
Format: { 0 | 1 }
See arch/parisc/kernel/pdc_chassis.c
2009-08-14 15:00:50 +09:00
percpu_alloc= Select which percpu first chunk allocator to use.
2009-08-14 15:00:53 +09:00
Currently supported values are "embed" and "page".
Archs may support subset or none of the selections.
See comments in mm/percpu.c for details on each
allocator. This parameter is primarily for debugging
and performance comparison.
2009-06-22 11:56:24 +09:00
2005-04-16 15:20:36 -07:00
pf. [PARIDE]
2019-06-18 11:47:10 -03:00
See Documentation/admin-guide/blockdev/paride.rst.
2005-04-16 15:20:36 -07:00
pg. [PARIDE]
2019-06-18 11:47:10 -03:00
See Documentation/admin-guide/blockdev/paride.rst.
2005-04-16 15:20:36 -07:00
pirq= [SMP,APIC] Manual mp-table setup
2019-06-07 15:54:32 -03:00
See Documentation/x86/i386/IO-APIC.rst.
2005-04-16 15:20:36 -07:00
plip= [PPT,NET] Parallel port network link
Format: { parport<nr> | timid | 0 }
2017-10-10 12:36:16 -05:00
See also Documentation/admin-guide/parport.rst.
2005-04-16 15:20:36 -07:00
2011-08-13 12:34:52 -07:00
pmtmr= [X86] Manual setup of pmtmr I/O Port.
2008-07-12 05:33:30 +02:00
Override pmtimer IOPort with a hex value.
e.g. pmtmr=0x508
2020-04-02 15:56:52 +08:00
pm_debug_messages [SUSPEND,KNL]
Enable suspend/resume debug messages during boot up.
2011-08-11 12:14:05 -06:00
pnp.debug=1 [PNP]
Enable PNP debug messages (depends on the
CONFIG_PNP_DEBUG_MESSAGES option). Change at run-time
via /sys/module/pnp/parameters/debug. We always show
current resource usage; turning this on also shows
possible settings and some assignment information.
2008-08-19 16:53:41 -06:00
2005-04-16 15:20:36 -07:00
pnpacpi= [ACPI]
{ off }
pnpbios= [ISAPNP]
{ on | off | curr | res | no-curr | no-res }
pnp_reserve_irq=
[ISAPNP] Exclude IRQs for the autoconfiguration
pnp_reserve_dma=
[ISAPNP] Exclude DMAs for the autoconfiguration
pnp_reserve_io= [ISAPNP] Exclude I/O ports for the autoconfiguration
2005-10-23 12:57:11 -07:00
Ranges are in pairs (I/O port base and size).
2005-04-16 15:20:36 -07:00
pnp_reserve_mem=
2005-10-23 12:57:11 -07:00
[ISAPNP] Exclude memory regions for the
autoconfiguration.
2005-04-16 15:20:36 -07:00
Ranges are in pairs (memory base and size).
2009-04-17 18:30:28 -07:00
ports= [IP_VS_FTP] IPVS ftp helper module
Default is 21.
Up to 8 (IP_VS_APP_MAX_PORTS) ports
may be specified.
Format: <port>,<port>....
2016-12-02 00:08:26 +11:00
powersave=off [PPC] This option disables power saving features.
It specifically disables cpuidle and sets the
platform machine description specific power_save
function to NULL. On Idle the CPU just reduces
execution priority.
2015-10-29 11:44:06 +11:00
ppc_strict_facility_enable
[PPC] This option catches any kernel floating point,
Altivec, VSX and SPE outside of regions specifically
allowed (eg kernel_enable_fpu()/kernel_disable_fpu()).
There is some performance impact when enabling this.
2017-10-12 21:17:16 +11:00
ppc_tm= [PPC]
Format: {"off"}
Disable Hardware Transactional Memory
2007-07-15 23:40:10 -07:00
print-fatal-signals=
[KNL] debug: print fatal signals
2009-11-09 00:46:42 +09:00
If enabled, warn about various signal handling
related application anomalies: too many signals,
too many POSIX.1 timers, fatal signals causing a
coredump - etc.
If you hit the warning due to signal overflow,
you might want to try "ulimit -i unlimited".
2007-07-15 23:40:10 -07:00
default: off.
2012-03-05 14:59:10 -08:00
printk.always_kmsg_dump=
Trigger kmsg_dump for cases other than kernel oops or
panics
Format: <bool> (1/Y/y=enable, 0/N/n=disable)
default: disabled
2016-08-02 14:04:07 -07:00
printk.devkmsg={on,off,ratelimit}
Control writing to /dev/kmsg.
on - unlimited logging to /dev/kmsg from userspace
off - logging to /dev/kmsg disabled
ratelimit - ratelimit the logging
Default: ratelimit
2007-07-15 23:40:25 -07:00
printk.time= Show timing data prefixed to each printk message line
Format: <bool> (1/Y/y=enable, 0/N/n=disable)
2009-04-05 15:55:22 -07:00
processor.max_cstate= [HW,ACPI]
Limit processor to maximum C-state
max_cstate=9 overrides any DMI blacklist limit.
processor.nocst [HW,ACPI]
Ignore the _CST method to determine C-states,
instead using the legacy FADT method
2005-04-16 15:20:36 -07:00
profile= [KNL] Enable kernel profiling via /proc/profile
2017-11-19 21:08:11 -08:00
Format: [<profiletype>,]<number>
Param: <profiletype>: "schedule", "sleep", or "kvm"
[defaults to kernel profiling]
2005-10-23 12:57:11 -07:00
Param: "schedule" - profile schedule points.
2007-10-24 18:23:50 +02:00
Param: "sleep" - profile D-state sleeping (millisecs).
Requires CONFIG_SCHEDSTATS
2007-10-20 03:08:22 +02:00
Param: "kvm" - profile VM exits.
2017-11-19 21:08:11 -08:00
Param: <number> - step/bucket size as a power of 2 for
statistical time based profiling.
2005-04-16 15:20:36 -07:00
prompt_ramdisk= [RAM] List of RAM disks to prompt for floppy disk
before loading.
2019-06-18 11:47:10 -03:00
See Documentation/admin-guide/blockdev/ramdisk.rst.
2005-04-16 15:20:36 -07:00
2019-10-23 13:56:36 +02:00
prot_virt= [S390] enable hosting protected virtual machines
isolated from the hypervisor (if hardware supports
that).
Format: <bool>
2018-11-30 14:09:58 -08:00
psi= [KNL] Enable or disable pressure stall information
tracking.
Format: <bool>
2005-10-23 12:57:11 -07:00
psmouse.proto= [HW,MOUSE] Highest PS2 mouse protocol extension to
probe for; one of (bare|imps|exps|lifebook|any).
2005-04-16 15:20:36 -07:00
psmouse.rate= [HW,MOUSE] Set desired mouse report rate, in reports
per second.
2005-10-23 12:57:11 -07:00
psmouse.resetafter= [HW,MOUSE]
Try to reset the device after so many bad packets
2005-04-16 15:20:36 -07:00
(0 = never).
psmouse.resolution=
[HW,MOUSE] Set desired mouse resolution, in dpi.
psmouse.smartscroll=
2005-10-23 12:57:11 -07:00
[HW,MOUSE] Controls Logitech smartscroll autorepeat.
2005-04-16 15:20:36 -07:00
0 = disabled, 1 = enabled (default).
2011-07-21 16:57:55 -04:00
pstore.backend= Specify the name of the pstore backend to use
2005-04-16 15:20:36 -07:00
pt. [PARIDE]
2019-06-18 11:47:10 -03:00
See Documentation/admin-guide/blockdev/paride.rst.
2005-04-16 15:20:36 -07:00
2018-01-05 09:44:36 -08:00
pti= [X86_64] Control Page Table Isolation of user and
kernel address spaces. Disabling this feature
removes hardening, but improves performance of
system calls and interrupts.
on - unconditionally enable
off - unconditionally disable
auto - kernel detects whether your CPU model is
vulnerable to issues that PTI mitigates
Not specifying this option is equivalent to pti=auto.
nopti [X86_64]
Equivalent to pti=off
2017-12-12 14:39:52 +01:00
2007-08-15 12:25:38 +02:00
pty.legacy_count=
[KNL] Number of legacy pty's. Overwrites compiled-in
default number.
2006-09-29 02:01:02 -07:00
quiet [KNL] Disable most log messages
2005-10-23 12:57:11 -07:00
2005-04-16 15:20:36 -07:00
r128= [HW,DRM]
raid= [HW,RAID]
2016-11-03 12:10:10 +02:00
See Documentation/admin-guide/md.rst.
2005-04-16 15:20:36 -07:00
ramdisk_size= [RAM] Sizes of RAM disks in kilobytes
2019-06-18 11:47:10 -03:00
See Documentation/admin-guide/blockdev/ramdisk.rst.
2005-04-16 15:20:36 -07:00
2018-08-27 14:51:54 -07:00
random.trust_cpu={on,off}
[KNL] Enable or disable trusting the use of the
CPU's random number generator (if available) to
fully seed the kernel's CRNG. Default is controlled
by CONFIG_RANDOM_TRUST_CPU.
2017-03-27 11:33:02 +02:00
ras=option[,option,...] [KNL] RAS-specific options
cec_disable [X86]
Disable the Correctable Errors Collector,
see CONFIG_RAS_CEC help text.
2013-10-08 20:23:47 -07:00
rcu_nocbs= [KNL]
rcu: Allow rcu_nocbs= to specify all CPUs
Currently, the rcu_nocbs= kernel boot parameter requires that a specific
list of CPUs be specified, and has no way to say "all of them".
As noted by user RavFX in a comment to Phoronix topic 1002538, this
is an inconvenient side effect of the removal of the RCU_NOCB_CPU_ALL
Kconfig option. This commit therefore enables the rcu_nocbs= kernel boot
parameter to be given the string "all", as in "rcu_nocbs=all" to specify
that all CPUs on the system are to have their RCU callbacks offloaded.
Another approach would be to make cpulist_parse() check for "all", but
there are uses of cpulist_parse() that do other checking, which could
conflict with an "all". This commit therefore focuses on the specific
use of cpulist_parse() in rcu_nocb_setup().
Just a note to other people who would like changes to Linux-kernel RCU:
If you send your requests to me directly, they might get fixed somewhat
faster. RavFX's comment was posted on January 22, 2018 and I first saw
it on March 5, 2019. And the only reason that I found it -at- -all- was
that I was looking for projects using RCU, and my search engine showed
me that Phoronix comment quite by accident. Your choice, though! ;-)
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-05 15:28:19 -08:00
The argument is a cpu list, as described above,
except that the string "all" can be used to
specify every CPU on the system.
2016-10-11 13:51:35 -07:00
2012-08-19 21:35:53 -07:00
In kernels built with CONFIG_RCU_NOCB_CPU=y, set
the specified list of CPUs to be no-callback CPUs.
2018-07-02 08:25:57 -07:00
Invocation of these CPUs' RCU callbacks will be
offloaded to "rcuox/N" kthreads created for that
purpose, where "x" is "p" for RCU-preempt, and
"s" for RCU-sched, and "N" is the CPU number.
This reduces OS jitter on the offloaded CPUs,
which can be useful for HPC and real-time
workloads. It can also improve energy efficiency
for asymmetric multiprocessors.
2012-08-19 21:35:53 -07:00
2013-10-08 20:23:47 -07:00
rcu_nocb_poll [KNL]
2012-08-19 21:35:53 -07:00
Rather than requiring that offloaded CPUs
(specified by rcu_nocbs= above) explicitly
awaken the corresponding "rcuoN" kthreads,
make these kthreads poll for callbacks.
This improves the real-time response for the
offloaded CPUs by relieving them of the need to
wake up the corresponding kthread, but degrades
energy efficiency by requiring that the kthreads
periodically wake up to do the polling.
2013-10-08 20:23:47 -07:00
rcutree.blimit= [KNL]
2013-10-27 09:44:03 -07:00
Set maximum number of finished RCU callbacks to
process in one batch.
2006-03-07 21:55:33 -08:00
2015-04-20 11:40:50 -07:00
rcutree.dump_tree= [KNL]
Dump the structure of the rcu_node combining tree
out at early boot. This is used for diagnostic
purposes, to verify correct tree setup.
2015-03-10 18:33:20 -07:00
rcutree.gp_cleanup_delay= [KNL]
Set the number of jiffies to delay each step of
rcu: Remove *_SLOW_* Kconfig options
The RCU_TORTURE_TEST_SLOW_PREINIT, RCU_TORTURE_TEST_SLOW_PREINIT_DELAY,
RCU_TORTURE_TEST_SLOW_PREINIT_DELAY, RCU_TORTURE_TEST_SLOW_INIT,
RCU_TORTURE_TEST_SLOW_INIT_DELAY, RCU_TORTURE_TEST_SLOW_CLEANUP,
and RCU_TORTURE_TEST_SLOW_CLEANUP_DELAY Kconfig options are only
useful for torture testing, and there are the rcutree.gp_cleanup_delay,
rcutree.gp_init_delay, and rcutree.gp_preinit_delay kernel boot parameters
that rcutorture can use instead. The effect of these parameters is to
artificially slow down grace period initialization and cleanup in order
to make some types of race conditions happen more often.
This commit therefore simplifies Tree RCU a bit by removing the Kconfig
options and adding the corresponding kernel parameters to rcutorture's
.boot files instead. However, this commit also leaves out the kernel
parameters for TREE02, TREE04, and TREE07 in order to have about the
same number of tests slowed as not slowed. TREE01, TREE03, TREE05,
and TREE06 are slowed, and the rest are not slowed.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-05-10 14:36:55 -07:00
RCU grace-period cleanup.
2015-03-10 18:33:20 -07:00
2015-01-22 18:24:08 -08:00
rcutree.gp_init_delay= [KNL]
Set the number of jiffies to delay each step of
rcu: Remove *_SLOW_* Kconfig options
The RCU_TORTURE_TEST_SLOW_PREINIT, RCU_TORTURE_TEST_SLOW_PREINIT_DELAY,
RCU_TORTURE_TEST_SLOW_PREINIT_DELAY, RCU_TORTURE_TEST_SLOW_INIT,
RCU_TORTURE_TEST_SLOW_INIT_DELAY, RCU_TORTURE_TEST_SLOW_CLEANUP,
and RCU_TORTURE_TEST_SLOW_CLEANUP_DELAY Kconfig options are only
useful for torture testing, and there are the rcutree.gp_cleanup_delay,
rcutree.gp_init_delay, and rcutree.gp_preinit_delay kernel boot parameters
that rcutorture can use instead. The effect of these parameters is to
artificially slow down grace period initialization and cleanup in order
to make some types of race conditions happen more often.
This commit therefore simplifies Tree RCU a bit by removing the Kconfig
options and adding the corresponding kernel parameters to rcutorture's
.boot files instead. However, this commit also leaves out the kernel
parameters for TREE02, TREE04, and TREE07 in order to have about the
same number of tests slowed as not slowed. TREE01, TREE03, TREE05,
and TREE06 are slowed, and the rest are not slowed.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-05-10 14:36:55 -07:00
RCU grace-period initialization.
2015-03-10 18:33:20 -07:00
rcutree.gp_preinit_delay= [KNL]
Set the number of jiffies to delay each step of
RCU grace-period pre-initialization, that is,
the propagation of recent CPU-hotplug changes up
rcu: Remove *_SLOW_* Kconfig options
The RCU_TORTURE_TEST_SLOW_PREINIT, RCU_TORTURE_TEST_SLOW_PREINIT_DELAY,
RCU_TORTURE_TEST_SLOW_PREINIT_DELAY, RCU_TORTURE_TEST_SLOW_INIT,
RCU_TORTURE_TEST_SLOW_INIT_DELAY, RCU_TORTURE_TEST_SLOW_CLEANUP,
and RCU_TORTURE_TEST_SLOW_CLEANUP_DELAY Kconfig options are only
useful for torture testing, and there are the rcutree.gp_cleanup_delay,
rcutree.gp_init_delay, and rcutree.gp_preinit_delay kernel boot parameters
that rcutorture can use instead. The effect of these parameters is to
artificially slow down grace period initialization and cleanup in order
to make some types of race conditions happen more often.
This commit therefore simplifies Tree RCU a bit by removing the Kconfig
options and adding the corresponding kernel parameters to rcutorture's
.boot files instead. However, this commit also leaves out the kernel
parameters for TREE02, TREE04, and TREE07 in order to have about the
same number of tests slowed as not slowed. TREE01, TREE03, TREE05,
and TREE06 are slowed, and the rest are not slowed.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-05-10 14:36:55 -07:00
the rcu_node combining tree.
2015-01-22 18:24:08 -08:00
2019-03-20 22:13:33 +01:00
rcutree.use_softirq= [KNL]
If set to zero, move all RCU_SOFTIRQ processing to
per-CPU rcuc kthreads. Defaults to a non-zero
value, meaning that RCU_SOFTIRQ is used by default.
Specify rcutree.use_softirq=0 to use rcuc kthreads.
2015-04-20 10:27:15 -07:00
rcutree.rcu_fanout_exact= [KNL]
Disable autobalancing of the rcu_node combining
tree. This is used by rcutorture, and might
possibly be useful for architectures having high
cache-to-cache transfer latencies.
2015-01-22 18:24:08 -08:00
2013-10-08 20:23:47 -07:00
rcutree.rcu_fanout_leaf= [KNL]
2015-07-31 08:28:35 -07:00
Change the number of CPUs assigned to each
leaf rcu_node structure. Useful for very
large systems, which will choose the value 64,
and for NUMA systems with large remote-access
latencies, which will choose a value aligned
with the appropriate hardware boundaries.
2012-04-23 15:52:53 -07:00
2020-05-25 23:47:52 +02:00
rcutree.rcu_min_cached_objs= [KNL]
Minimum number of objects which are cached and
maintained per one CPU. Object size is equal
to PAGE_SIZE. The cache allows to reduce the
pressure to page allocator, also it makes the
whole algorithm to behave better in low memory
condition.
2013-10-08 20:23:47 -07:00
rcutree.jiffies_till_first_fqs= [KNL]
2012-12-28 11:30:36 -08:00
Set delay from grace-period initialization to
first attempt to force quiescent states.
Units are jiffies, minimum value is zero,
and maximum value is HZ.
2013-10-08 20:23:47 -07:00
rcutree.jiffies_till_next_fqs= [KNL]
2012-12-28 11:30:36 -08:00
Set delay between subsequent attempts to force
quiescent states. Units are jiffies, minimum
value is one, and maximum value is HZ.
2018-11-20 10:22:00 -08:00
rcutree.jiffies_till_sched_qs= [KNL]
Set required age in jiffies for a
given grace period before RCU starts
soliciting quiescent-state help from
rcu_note_context_switch() and cond_resched().
If not specified, the kernel will calculate
a value based on the most recent settings
of rcutree.jiffies_till_first_fqs
and rcutree.jiffies_till_next_fqs.
This calculated value may be viewed in
rcutree.jiffies_to_sched_qs. Any attempt to set
rcutree.jiffies_to_sched_qs will be cheerfully
overwritten.
2014-09-12 21:21:09 -05:00
rcutree.kthread_prio= [KNL,BOOT]
2015-01-20 23:54:59 -08:00
Set the SCHED_FIFO priority of the RCU per-CPU
kthreads (rcuc/N). This value is also used for
the priority of the RCU boost threads (rcub/N)
and for the RCU grace-period kthreads (rcu_bh,
rcu_preempt, and rcu_sched). If RCU_BOOST is
set, valid values are 1-99 and the default is 1
(the least-favored priority). Otherwise, when
RCU_BOOST is not set, valid values are 0-99 and
the default is zero (non-realtime operation).
2014-09-12 21:21:09 -05:00
2019-04-02 08:05:55 -07:00
rcutree.rcu_nocb_gp_stride= [KNL]
Set the number of NOCB callback kthreads in
each group, which defaults to the square root
of the number of CPUs. Larger numbers reduce
the wakeup overhead on the global grace-period
kthread, but increases that same overhead on
each group's NOCB grace-period kthread.
rcu: Parallelize and economize NOCB kthread wakeups
An 80-CPU system with a context-switch-heavy workload can require so
many NOCB kthread wakeups that the RCU grace-period kthreads spend several
tens of percent of a CPU just awakening things. This clearly will not
scale well: If you add enough CPUs, the RCU grace-period kthreads would
get behind, increasing grace-period latency.
To avoid this problem, this commit divides the NOCB kthreads into leaders
and followers, where the grace-period kthreads awaken the leaders each of
whom in turn awakens its followers. By default, the number of groups of
kthreads is the square root of the number of CPUs, but this default may
be overridden using the rcutree.rcu_nocb_leader_stride boot parameter.
This reduces the number of wakeups done per grace period by the RCU
grace-period kthread by the square root of the number of CPUs, but of
course by shifting those wakeups to the leaders. In addition, because
the leaders do grace periods on behalf of their respective followers,
the number of wakeups of the followers decreases by up to a factor of two.
Instead of being awakened once when new callbacks arrive and again
at the end of the grace period, the followers are awakened only at
the end of the grace period.
For a numerical example, in a 4096-CPU system, the grace-period kthread
would awaken 64 leaders, each of which would awaken its 63 followers
at the end of the grace period. This compares favorably with the 79
wakeups for the grace-period kthread on an 80-CPU system.
Reported-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-06-24 09:26:11 -07:00
2013-10-08 20:23:47 -07:00
rcutree.qhimark= [KNL]
2013-10-27 09:44:03 -07:00
Set threshold of queued RCU callbacks beyond which
batch limiting is disabled.
2006-03-07 21:55:33 -08:00
2013-10-08 20:23:47 -07:00
rcutree.qlowmark= [KNL]
2008-02-03 15:20:26 +02:00
Set threshold of queued RCU callbacks below which
batch limiting is re-enabled.
2006-03-07 21:55:33 -08:00
2019-10-30 11:56:10 -07:00
rcutree.qovld= [KNL]
Set threshold of queued RCU callbacks beyond which
RCU's force-quiescent-state scan will aggressively
enlist help from cond_resched() and sched IPIs to
help CPUs more quickly reach quiescent states.
Set to less than zero to make this be set based
on rcutree.qhimark at boot time and to zero to
disable more aggressive help enlistment.
2013-10-08 20:23:47 -07:00
rcutree.rcu_idle_gp_delay= [KNL]
2012-12-28 11:30:36 -08:00
Set wakeup interval for idle CPUs that have
RCU callbacks (RCU_FAST_NO_HZ=y).
rcu: Control grace-period duration from sysfs
Although almost everyone is well-served by the defaults, some uses of RCU
benefit from shorter grace periods, while others benefit more from the
greater efficiency provided by longer grace periods. Situations requiring
a large number of grace periods to elapse (and wireshark startup has
been called out as an example of this) are helped by lower-latency
grace periods. Furthermore, in some embedded applications, people are
willing to accept a small degradation in update efficiency (due to there
being more of the shorter grace-period operations) in order to gain the
lower latency.
In contrast, those few systems with thousands of CPUs need longer grace
periods because the CPU overhead of a grace period rises roughly
linearly with the number of CPUs. Such systems normally do not make
much use of facilities that require large numbers of grace periods to
elapse, so this is a good tradeoff.
Therefore, this commit allows the durations to be controlled from sysfs.
There are two sysfs parameters, one named "jiffies_till_first_fqs" that
specifies the delay in jiffies from the end of grace-period initialization
until the first attempt to force quiescent states, and the other named
"jiffies_till_next_fqs" that specifies the delay (again in jiffies)
between subsequent attempts to force quiescent states. They both default
to three jiffies, which is compatible with the old hard-coded behavior.
At some future time, it may be possible to automatically increase the
grace-period length with the number of CPUs, but we do not yet have
sufficient data to do a good job. Preliminary data indicates that we
should add an addiitonal jiffy to each of the delays for every 200 CPUs
in the system, but more experimentation is needed. For now, the number
of systems with more than 1,000 CPUs is small enough that this can be
relegated to boot-time hand tuning.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-06-26 20:45:57 -07:00
2013-10-08 20:23:47 -07:00
rcutree.rcu_idle_lazy_gp_delay= [KNL]
2012-12-28 11:30:36 -08:00
Set wakeup interval for idle CPUs that have
only "lazy" RCU callbacks (RCU_FAST_NO_HZ=y).
Lazy RCU callbacks are those which RCU can
prove do nothing more than free memory.
rcu: Control grace-period duration from sysfs
Although almost everyone is well-served by the defaults, some uses of RCU
benefit from shorter grace periods, while others benefit more from the
greater efficiency provided by longer grace periods. Situations requiring
a large number of grace periods to elapse (and wireshark startup has
been called out as an example of this) are helped by lower-latency
grace periods. Furthermore, in some embedded applications, people are
willing to accept a small degradation in update efficiency (due to there
being more of the shorter grace-period operations) in order to gain the
lower latency.
In contrast, those few systems with thousands of CPUs need longer grace
periods because the CPU overhead of a grace period rises roughly
linearly with the number of CPUs. Such systems normally do not make
much use of facilities that require large numbers of grace periods to
elapse, so this is a good tradeoff.
Therefore, this commit allows the durations to be controlled from sysfs.
There are two sysfs parameters, one named "jiffies_till_first_fqs" that
specifies the delay in jiffies from the end of grace-period initialization
until the first attempt to force quiescent states, and the other named
"jiffies_till_next_fqs" that specifies the delay (again in jiffies)
between subsequent attempts to force quiescent states. They both default
to three jiffies, which is compatible with the old hard-coded behavior.
At some future time, it may be possible to automatically increase the
grace-period length with the number of CPUs, but we do not yet have
sufficient data to do a good job. Preliminary data indicates that we
should add an addiitonal jiffy to each of the delays for every 200 CPUs
in the system, but more experimentation is needed. For now, the number
of systems with more than 1,000 CPUs is small enough that this can be
relegated to boot-time hand tuning.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-06-26 20:45:57 -07:00
2017-01-06 15:14:11 -08:00
rcutree.rcu_kick_kthreads= [KNL]
Cause the grace-period kthread to get an extra
wake_up() if it sleeps three times longer than
it should at force-quiescent-state time.
This wake_up() will be accompanied by a
WARN_ONCE() splat and an ftrace_dump().
2018-12-12 12:32:06 -08:00
rcutree.sysrq_rcu= [KNL]
Commandeer a sysrq key to dump out Tree RCU's
rcu_node tree with an eye towards determining
why a new grace period has not yet started.
2017-04-17 12:47:10 -07:00
rcuperf.gp_async= [KNL]
Measure performance of asynchronous
grace-period primitives such as call_rcu().
rcuperf.gp_async_max= [KNL]
Specify the maximum number of outstanding
callbacks per writer thread. When a writer
thread exceeds this limit, it invokes the
corresponding flavor of rcu_barrier() to allow
previously posted callbacks to drain.
2016-01-01 13:47:19 -08:00
rcuperf.gp_exp= [KNL]
Measure performance of expedited synchronous
grace-period primitives.
2016-01-30 20:56:38 -08:00
rcuperf.holdoff= [KNL]
Set test-start holdoff period. The purpose of
this parameter is to delay the start of the
test until boot completes in order to avoid
interference.
2019-08-30 12:36:29 -04:00
rcuperf.kfree_rcu_test= [KNL]
Set to measure performance of kfree_rcu() flooding.
rcuperf.kfree_nthreads= [KNL]
The number of threads running loops of kfree_rcu().
rcuperf.kfree_alloc_num= [KNL]
Number of allocations and frees done in an iteration.
rcuperf.kfree_loops= [KNL]
Number of loops doing rcuperf.kfree_alloc_num number
of allocations and frees.
2016-01-01 13:47:19 -08:00
rcuperf.nreaders= [KNL]
Set number of RCU readers. The value -1 selects
N, where N is the number of CPUs. A value
"n" less than -1 selects N-n+1, where N is again
the number of CPUs. For example, -2 selects N
(the number of CPUs), -3 selects N+1, and so on.
A value of "n" less than or equal to -N selects
a single reader.
rcuperf.nwriters= [KNL]
Set number of RCU writers. The values operate
the same as for rcuperf.nreaders.
N, where N is the number of CPUs
2017-04-25 15:12:56 -07:00
rcuperf.perf_type= [KNL]
Specify the RCU implementation to test.
2016-01-01 13:47:19 -08:00
rcuperf.shutdown= [KNL]
Shut the system down after performance tests
complete. This is useful for hands-off automated
testing.
rcuperf.verbose= [KNL]
Enable additional printk() statements.
2017-04-25 15:12:56 -07:00
rcuperf.writer_holdoff= [KNL]
Write-side holdoff between grace periods,
in microseconds. The default of zero says
no holdoff.
2013-10-08 20:23:47 -07:00
rcutorture.fqs_duration= [KNL]
2015-05-14 17:29:51 -07:00
Set duration of force_quiescent_state bursts
in microseconds.
2012-04-23 10:54:45 -07:00
2013-10-08 20:23:47 -07:00
rcutorture.fqs_holdoff= [KNL]
2015-05-14 17:29:51 -07:00
Set holdoff time within force_quiescent_state bursts
in microseconds.
2012-04-23 10:54:45 -07:00
2013-10-08 20:23:47 -07:00
rcutorture.fqs_stutter= [KNL]
2015-05-14 17:29:51 -07:00
Set wait time between force_quiescent_state bursts
in seconds.
2018-10-01 08:38:54 -07:00
rcutorture.fwd_progress= [KNL]
Enable RCU grace-period forward-progress testing
for the types of RCU supporting this notion.
rcutorture.fwd_progress_div= [KNL]
Specify the fraction of a CPU-stall-warning
period to do tight-loop forward-progress testing.
rcutorture.fwd_progress_holdoff= [KNL]
Number of seconds to wait between successive
forward-progress tests.
rcutorture.fwd_progress_need_resched= [KNL]
Enclose cond_resched() calls within checks for
need_resched() during tight-loop forward-progress
testing.
2015-05-14 17:29:51 -07:00
rcutorture.gp_cond= [KNL]
Use conditional/asynchronous update-side
primitives, if available.
2012-04-23 10:54:45 -07:00
2013-10-08 20:23:47 -07:00
rcutorture.gp_exp= [KNL]
2015-05-14 17:29:51 -07:00
Use expedited update-side primitives, if available.
2013-10-08 20:23:47 -07:00
rcutorture.gp_normal= [KNL]
2015-05-14 17:29:51 -07:00
Use normal (non-expedited) asynchronous
update-side primitives, if available.
rcutorture.gp_sync= [KNL]
Use normal (non-expedited) synchronous
update-side primitives, if available. If all
of rcutorture.gp_cond=, rcutorture.gp_exp=,
rcutorture.gp_normal=, and rcutorture.gp_sync=
are zero, rcutorture acts as if is interpreted
they are all non-zero.
2012-04-23 10:54:45 -07:00
2013-10-08 20:23:47 -07:00
rcutorture.n_barrier_cbs= [KNL]
2012-04-23 10:54:45 -07:00
Set callbacks/threads for rcu_barrier() testing.
2013-10-08 20:23:47 -07:00
rcutorture.nfakewriters= [KNL]
2012-04-23 10:54:45 -07:00
Set number of concurrent RCU writers. These just
stress RCU, they don't participate in the actual
test, hence the "fake".
2013-10-08 20:23:47 -07:00
rcutorture.nreaders= [KNL]
2015-03-12 13:55:48 -07:00
Set number of RCU readers. The value -1 selects
N-1, where N is the number of CPUs. A value
"n" less than -1 selects N-n-2, where N is again
the number of CPUs. For example, -2 selects N
(the number of CPUs), -3 selects N+1, and so on.
2012-04-23 10:54:45 -07:00
2013-10-08 20:23:47 -07:00
rcutorture.object_debug= [KNL]
Enable debug-object double-call_rcu() testing.
rcutorture.onoff_holdoff= [KNL]
2012-04-23 10:54:45 -07:00
Set time (s) after boot for CPU-hotplug testing.
2013-10-08 20:23:47 -07:00
rcutorture.onoff_interval= [KNL]
2018-05-08 09:20:34 -07:00
Set time (jiffies) between CPU-hotplug operations,
or zero to disable CPU-hotplug testing.
2012-04-23 10:54:45 -07:00
2013-10-08 20:23:47 -07:00
rcutorture.shuffle_interval= [KNL]
2012-04-23 10:54:45 -07:00
Set task-shuffle interval (s). Shuffling tasks
allows some CPUs to go into dyntick-idle mode
during the rcutorture test.
2013-10-08 20:23:47 -07:00
rcutorture.shutdown_secs= [KNL]
2012-04-23 10:54:45 -07:00
Set time (s) after boot system shutdown. This
is useful for hands-off automated testing.
2013-10-08 20:23:47 -07:00
rcutorture.stall_cpu= [KNL]
2012-04-23 10:54:45 -07:00
Duration of CPU stall (s) to test RCU CPU stall
warnings, zero to disable.
2020-03-11 17:39:12 -07:00
rcutorture.stall_cpu_block= [KNL]
Sleep while stalling if set. This will result
in warnings from preemptible RCU in addition
to any other stall-related activity.
2013-10-08 20:23:47 -07:00
rcutorture.stall_cpu_holdoff= [KNL]
2012-04-23 10:54:45 -07:00
Time to wait (s) after boot before inducing stall.
2017-08-18 16:11:37 -07:00
rcutorture.stall_cpu_irqsoff= [KNL]
Disable interrupts while stalling if set.
2020-04-01 19:57:52 -07:00
rcutorture.stall_gp_kthread= [KNL]
Duration (s) of forced sleep within RCU
grace-period kthread to test RCU CPU stall
warnings, zero to disable. If both stall_cpu
and stall_gp_kthread are specified, the
kthread is starved first, then the CPU.
2013-10-08 20:23:47 -07:00
rcutorture.stat_interval= [KNL]
2012-04-23 10:54:45 -07:00
Time (s) between statistics printk()s.
2013-10-08 20:23:47 -07:00
rcutorture.stutter= [KNL]
2012-04-23 10:54:45 -07:00
Time (s) to stutter testing, for example, specifying
five seconds causes the test to run for five seconds,
wait for five seconds, and so on. This tests RCU's
ability to transition abruptly to and from idle.
2013-10-08 20:23:47 -07:00
rcutorture.test_boost= [KNL]
2012-04-23 10:54:45 -07:00
Test RCU priority boosting? 0=no, 1=maybe, 2=yes.
"Maybe" means test if the RCU implementation
under test support RCU priority boosting.
2013-10-08 20:23:47 -07:00
rcutorture.test_boost_duration= [KNL]
2012-04-23 10:54:45 -07:00
Duration (s) of each individual boost test.
2013-10-08 20:23:47 -07:00
rcutorture.test_boost_interval= [KNL]
2012-04-23 10:54:45 -07:00
Interval (s) between each boost test.
2013-10-08 20:23:47 -07:00
rcutorture.test_no_idle_hz= [KNL]
2012-04-23 10:54:45 -07:00
Test RCU's dyntick-idle handling. See also the
rcutorture.shuffle_interval parameter.
2013-10-08 20:23:47 -07:00
rcutorture.torture_type= [KNL]
2012-04-23 10:54:45 -07:00
Specify the RCU implementation to test.
2013-10-08 20:23:47 -07:00
rcutorture.verbose= [KNL]
2012-04-23 10:54:45 -07:00
Enable additional printk() statements.
2019-06-13 15:30:49 -07:00
rcupdate.rcu_cpu_stall_ftrace_dump= [KNL]
Dump ftrace buffer after reporting RCU CPU
stall warning.
2015-11-24 15:44:06 -08:00
rcupdate.rcu_cpu_stall_suppress= [KNL]
Suppress RCU CPU stall warning messages.
2019-12-05 11:29:01 -08:00
rcupdate.rcu_cpu_stall_suppress_at_boot= [KNL]
Suppress RCU CPU stall warning messages and
rcutorture writer stall warnings that occur
during early boot, that is, during the time
before the init task is spawned.
2015-11-24 15:44:06 -08:00
rcupdate.rcu_cpu_stall_timeout= [KNL]
Set timeout for RCU CPU stall warning messages.
2013-10-08 20:23:47 -07:00
rcupdate.rcu_expedited= [KNL]
Use expedited grace-period primitives, for
example, synchronize_rcu_expedited() instead
of synchronize_rcu(). This reduces latency,
but can increase CPU utilization, degrade
real-time latency, and degrade energy efficiency.
2015-12-07 13:09:52 -08:00
No effect on CONFIG_TINY_RCU kernels.
2013-10-08 20:23:47 -07:00
2015-11-24 15:44:06 -08:00
rcupdate.rcu_normal= [KNL]
Use only normal grace-period primitives,
for example, synchronize_rcu() instead of
synchronize_rcu_expedited(). This improves
2015-12-07 13:09:52 -08:00
real-time latency, CPU utilization, and
energy efficiency, but can expose users to
increased grace-period latency. This parameter
overrides rcupdate.rcu_expedited. No effect on
CONFIG_TINY_RCU kernels.
2013-10-08 20:23:47 -07:00
2015-11-25 18:56:00 -08:00
rcupdate.rcu_normal_after_boot= [KNL]
Once boot has completed (that is, after
rcu_end_inkernel_boot() has been invoked), use
2015-12-07 13:09:52 -08:00
only normal grace-period primitives. No effect
on CONFIG_TINY_RCU kernels.
2015-11-25 18:56:00 -08:00
2020-03-17 11:39:26 -07:00
rcupdate.rcu_task_ipi_delay= [KNL]
Set time in jiffies during which RCU tasks will
avoid sending IPIs, starting with the beginning
of a given grace period. Setting a large
number avoids disturbing real-time workloads,
but lengthens grace periods.
2014-07-01 18:16:30 -07:00
rcupdate.rcu_task_stall_timeout= [KNL]
Set timeout in jiffies for RCU task stall warning
messages. Disable with a value less than or equal
to zero.
2014-09-19 11:34:09 -04:00
rcupdate.rcu_self_test= [KNL]
Run the RCU early boot self tests
2005-09-06 15:17:19 -07:00
rdinit= [KNL]
Format: <full_path>
Run specified binary instead of /init from the ramdisk,
used for early userspace startup. See initrd.
2019-08-19 15:52:35 +00:00
rdrand= [X86]
force - Override the decision by the kernel to hide the
advertisement of RDRAND support (this affects
certain AMD processors because of buggy BIOS
support, specifically around the suspend/resume
path).
2017-08-24 09:26:51 -07:00
rdt= [HW,X86,RDT]
Turn on/off individual RDT features. List is:
2017-12-20 14:57:24 -08:00
cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
mba.
2017-08-24 09:26:51 -07:00
E.g. to turn on cmt and turn off mba use:
rdt=cmt,!mba
2013-07-08 16:01:42 -07:00
reboot= [KNL]
Format (x86 or x86_64):
[w[arm] | c[old] | h[ard] | s[oft] | g[pio]] \
[[,]s[mp]#### \
[[,]b[ios] | a[cpi] | k[bd] | t[riple] | e[fi] | p[ci]] \
[[,]f[orce]
2019-05-14 15:45:37 -07:00
Where reboot_mode is one of warm (soft) or cold (hard) or gpio
(prefix with 'panic_' to set mode for panic
reboot only),
2013-07-08 16:01:42 -07:00
reboot_type is one of bios, acpi, kbd, triple, efi, or pci,
reboot_force is either force or not specified,
reboot_cpu is s[mp]#### with #### being the processor
to be used for rebooting.
2005-04-16 15:20:36 -07:00
2008-07-04 10:00:09 -07:00
relax_domain_level=
[KNL, SMP] Set scheduler's default relax_domain_level.
2019-06-27 13:08:35 -03:00
See Documentation/admin-guide/cgroup-v1/cpusets.rst.
2008-07-04 10:00:09 -07:00
2017-12-01 11:50:33 -06:00
reserve= [KNL,BUGS] Force kernel to ignore I/O ports or memory
Format: <base1>,<size1>[,<base2>,<size2>,...]
Reserve I/O ports or memory so the kernel won't use
them. If <base> is less than 0x10000, the region
is assumed to be I/O ports; otherwise it is memory.
2005-04-16 15:20:36 -07:00
2007-07-31 00:37:59 -07:00
reservetop= [X86-32]
2006-09-25 23:32:25 -07:00
Format: nn[KMG]
Reserves a hole at the top of the kernel virtual
address space.
2010-08-25 16:38:20 -07:00
reservelow= [X86]
Format: nn[K]
Set the amount of memory to reserve for BIOS at
the bottom of the address space.
2006-09-27 01:50:44 -07:00
reset_devices [KNL] Force drivers to reset the underlying device
during initialization.
2005-10-23 12:57:11 -07:00
resume= [SWSUSP]
Specify the partition device for software suspend
2012-05-14 21:45:31 +02:00
Format:
{/dev/<dev> | PARTUUID=<uuid> | <int>:<int> | <hex>}
2005-04-16 15:20:36 -07:00
2006-12-06 20:34:13 -08:00
resume_offset= [SWSUSP]
Specify the offset from the beginning of the partition
given by "resume=" at which the swap header is located,
in <PAGE_SIZE> units (needed only for swap files).
2019-06-13 07:10:36 -03:00
See Documentation/power/swsusp-and-swap-files.rst
2006-12-06 20:34:13 -08:00
2011-10-10 23:38:41 +02:00
resumedelay= [HIBERNATION] Delay (in seconds) to pause before attempting to
read the resume files
2011-10-06 20:34:46 +02:00
resumewait [HIBERNATION] Wait (indefinitely) for resume device to show up.
Useful for devices that are detected asynchronously
(e.g. USB and MMC devices).
2010-09-09 23:06:23 +02:00
hibernate= [HIBERNATION]
noresume Don't check if there's a hibernation image
present during boot.
nocompress Don't compress/decompress hibernation images.
2014-06-13 13:30:35 -07:00
no Disable hibernation and resume.
2016-07-10 02:12:10 +02:00
protect_image Turn on image protection during restoration
(that will set all pages holding image data
during restoration read-only).
2010-09-09 23:06:23 +02:00
2007-02-10 01:44:33 -08:00
retain_initrd [RAM] Keep initrd memory after extraction
2015-01-09 20:24:55 +00:00
rfkill.default_state=
0 "airplane mode". All wifi, bluetooth, wimax, gps, fm,
etc. communication is blocked by default.
1 Unblocked.
rfkill.master_switch_mode=
0 The "airplane mode" button does nothing.
1 The "airplane mode" button toggles between everything
blocked and the previous configuration.
2 The "airplane mode" button toggles between everything
blocked and everything unblocked.
2005-04-16 15:20:36 -07:00
rhash_entries= [KNL,NET]
Set number of hash buckets for route cache
2017-01-20 14:22:36 +01:00
ring3mwait=disable
[KNL] Disable ring 3 MONITOR/MWAIT feature on supported
CPUs.
2005-04-16 15:20:36 -07:00
ro [KNL] Mount root device read-only on boot
2016-02-17 14:41:13 -08:00
rodata= [KNL]
on Mark read-only kernel memory as read-only (default).
off Leave read-only kernel memory writable for debugging.
2016-02-22 12:55:01 +01:00
rockchip.usb_uart
Enable the uart passthrough on the designated usb port
on Rockchip SoCs. When active, the signals of the
debug-uart get routed to the D+ and D- pins of the usb
port and the regular usb controller gets disabled.
2005-04-16 15:20:36 -07:00
root= [KNL] Root filesystem
2011-08-03 16:21:08 -07:00
See name_to_dev_t comment in init/do_mounts.c.
2005-04-16 15:20:36 -07:00
rootdelay= [KNL] Delay (in seconds) to pause before attempting to
mount the root filesystem
rootflags= [KNL] Set root filesystem mount option string
rootfstype= [KNL] Set root filesystem type
2007-07-15 23:40:35 -07:00
rootwait [KNL] Wait (indefinitely) for root device to show up.
Useful for devices that are detected asynchronously
(e.g. USB and MMC devices).
2013-03-28 18:41:46 -07:00
rproc_mem=nn[KMG][@address]
[KNL,ARM,CMA] Remoteproc physical memory block.
Memory area to be used by remote processor image,
managed by CMA.
2005-04-16 15:20:36 -07:00
rw [KNL] Mount root device read-write on boot
S [KNL] Run init in single mode
2014-07-18 17:37:08 +02:00
s390_iommu= [HW,S390]
Set s390 IOTLB flushing mode
strict
With strict flushing every unmap operation will result in
an IOTLB flush. Default is lazy flushing before reuse,
which is faster.
2005-04-16 15:20:36 -07:00
sa1100ir [NET]
See drivers/net/irda/sa1100_ir.c.
sbni= [NET] Granch SBNI12 leased line adapter
2005-10-23 12:57:11 -07:00
2009-11-17 18:22:15 -06:00
sched_debug [KNL] Enables verbose scheduler debug messages.
2016-02-05 09:08:36 +00:00
schedstats= [KNL,X86] Enable or disable scheduled statistics.
Allowed values are enable and disable. This feature
incurs a small amount of overhead in the scheduler
but is useful for debugging and performance tuning.
2009-11-17 18:22:15 -06:00
2020-02-21 19:52:13 -05:00
sched_thermal_decay_shift=
[KNL, SMP] Set a decay shift for scheduler thermal
pressure signal. Thermal pressure signal follows the
default decay period of other scheduler pelt
signals(usually 32 ms but configurable). Setting
sched_thermal_decay_shift will left shift the decay
period for the thermal pressure signal by the shift
value.
i.e. with the default pelt decay period of 32 ms
sched_thermal_decay_shift thermal pressure decay pr
1 64 ms
2 128 ms
and so on.
Format: integer between 0 and 10
Default is 0.
2012-05-08 12:20:58 +02:00
skew_tick= [KNL] Offset the periodic timer tick per cpu to mitigate
xtime_lock contention on larger systems, and/or RCU lock
contention on all systems with CONFIG_MAXSMP set.
Format: { "0" | "1" }
0 -- disable. (may be 1 via CONFIG_CMDLINE="skew_tick=1"
1 -- enable.
Note: increases power consumption, thus should only be
enabled if running jitter sensitive (HPC/RT) workloads.
2019-02-12 10:23:18 -08:00
security= [SECURITY] Choose a legacy "major" security module to
enable at boot. This has been deprecated by the
"lsm=" parameter.
2009-04-05 15:55:22 -07:00
selinux= [SELINUX] Disable or enable SELinux at boot time.
2005-04-16 15:20:36 -07:00
Format: { "0" | "1" }
See security/selinux/Kconfig help text.
0 -- disable.
1 -- enable.
2020-01-07 11:35:04 -05:00
Default value is 1.
2005-04-16 15:20:36 -07:00
2010-07-29 14:48:09 -07:00
apparmor= [APPARMOR] Disable or enable AppArmor at boot time
Format: { "0" | "1" }
See security/apparmor/Kconfig help text
0 -- disable.
1 -- enable.
Default value is set via kernel config option.
2007-07-31 00:37:59 -07:00
serialnumber [BUGS=X86-32]
2005-04-16 15:20:36 -07:00
shapers= [NET]
Maximal number of shapers.
2005-10-23 12:57:11 -07:00
2005-04-16 15:20:36 -07:00
simeth= [IA-64]
simscsi=
2005-10-23 12:57:11 -07:00
2005-04-16 15:20:36 -07:00
slram= [HW,MTD]
2014-10-09 15:26:22 -07:00
slab_nomerge [MM]
Disable merging of slabs with similar size. May be
necessary if there is some reason to distinguish
2017-07-06 15:36:40 -07:00
allocs to different slabs, especially in hardened
environments where the risk of heap overflows and
layout control by attackers can usually be
frustrated by disabling merging. This will reduce
most of the exposure of a heap attack to a single
cache (risks via metadata attacks are mostly
unchanged). Debug options disable merging on their
own.
2018-03-21 21:22:47 +02:00
For more information see Documentation/vm/slub.rst.
2014-10-09 15:26:22 -07:00
2011-10-18 22:09:28 -07:00
slab_max_order= [MM, SLAB]
Determines the maximum allowed order for slabs.
A high setting may cause OOMs due to memory
fragmentation. Defaults to 1 for systems with
more than 32MB of RAM, 0 otherwise.
2007-07-15 23:38:14 -07:00
slub_debug[=options[,slabs]] [MM, SLUB]
Enabling slub_debug allows one to determine the
culprit if slab objects become corrupted. Enabling
slub_debug can create guard zones around objects and
may poison objects when not in use. Also tracks the
last alloc / free. For more information see
2018-03-21 21:22:47 +02:00
Documentation/vm/slub.rst.
2007-05-31 00:40:47 -07:00
2017-02-22 15:41:39 -08:00
slub_memcg_sysfs= [MM, SLUB]
Determines whether to enable sysfs directories for
memory cgroup sub-caches. 1 to enable, 0 to disable.
The default is determined by CONFIG_SLUB_MEMCG_SYSFS_ON.
Enabling this can lead to a very high number of debug
directories and files being created under
/sys/kernel/slub.
2007-05-31 00:40:47 -07:00
slub_max_order= [MM, SLUB]
2007-07-15 23:38:14 -07:00
Determines the maximum allowed order for slabs.
A high setting may cause OOMs due to memory
fragmentation. For more information see
2018-03-21 21:22:47 +02:00
Documentation/vm/slub.rst.
2007-05-31 00:40:47 -07:00
slub_min_objects= [MM, SLUB]
2007-07-15 23:38:14 -07:00
The minimum number of objects per slab. SLUB will
increase the slab order up to slub_max_order to
generate a sufficiently large slab able to contain
the number of objects indicated. The higher the number
of objects the smaller the overhead of tracking slabs
and the less frequently locks need to be acquired.
2018-03-21 21:22:47 +02:00
For more information see Documentation/vm/slub.rst.
2007-05-31 00:40:47 -07:00
slub_min_order= [MM, SLUB]
2012-02-15 00:26:42 +09:00
Determines the minimum page order for slabs. Must be
2007-07-15 23:38:14 -07:00
lower than slub_max_order.
2018-03-21 21:22:47 +02:00
For more information see Documentation/vm/slub.rst.
2007-05-31 00:40:47 -07:00
slub_nomerge [MM, SLUB]
2014-10-09 15:26:22 -07:00
Same with slab_nomerge. This is supported for legacy.
See slab_nomerge for more information.
2007-05-31 00:40:47 -07:00
2005-04-16 15:20:36 -07:00
smart2= [HW]
Format: <io1>[,<io2>[,...,<io8>]]
2007-05-08 00:36:05 -07:00
smsc-ircc2.nopnp [HW] Don't use PNP to discover SMC devices
smsc-ircc2.ircc_cfg= [HW] Device configuration I/O port
smsc-ircc2.ircc_sir= [HW] SIR base I/O port
smsc-ircc2.ircc_fir= [HW] FIR base I/O port
smsc-ircc2.ircc_irq= [HW] IRQ line
smsc-ircc2.ircc_dma= [HW] DMA channel
smsc-ircc2.ircc_transceiver= [HW] Transceiver type:
0: Toshiba Satellite 1800 (GP data pin select)
1: Fast pin select (default)
2: ATC IRMode
2016-04-05 12:53:38 +02:00
smt [KNL,S390] Set the maximum number of threads (logical
CPUs) to use per physical CPU on systems capable of
symmetric multithreading (SMT). Will be capped to the
actual hardware limit.
Format: <integer>
Default: -1 (no limit)
2008-05-12 21:21:04 +02:00
softlockup_panic=
[KNL] Should the soft-lockup detector generate panics.
2020-06-07 21:40:42 -07:00
Format: 0 | 1
2008-05-12 21:21:04 +02:00
2020-06-07 21:40:42 -07:00
A value of 1 instructs the soft-lockup detector
2020-03-10 15:36:49 -03:00
to panic the machine when a soft-lockup occurs. It is
also controlled by the kernel.softlockup_panic sysctl
and CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC, which is the
respective build-time switch to that functionality.
2017-10-03 17:54:07 +02:00
2014-06-23 13:22:05 -07:00
softlockup_all_cpu_backtrace=
[KNL] Should the soft-lockup detector generate
backtraces on all cpus.
2020-06-07 21:40:42 -07:00
Format: 0 | 1
2014-06-23 13:22:05 -07:00
2005-04-16 15:20:36 -07:00
sonypi.*= [HW] Sony Programmable I/O Control Device driver
2019-06-13 15:07:43 -03:00
See Documentation/admin-guide/laptops/sonypi.rst
2005-04-16 15:20:36 -07:00
2018-01-11 21:46:26 +00:00
spectre_v2= [X86] Control mitigation of Spectre variant 2
(indirect branch speculation) vulnerability.
2018-11-25 19:33:45 +01:00
The default operation protects the kernel from
user space attacks.
2018-01-11 21:46:26 +00:00
2018-11-25 19:33:45 +01:00
on - unconditionally enable, implies
spectre_v2_user=on
off - unconditionally disable, implies
spectre_v2_user=off
2018-01-11 21:46:26 +00:00
auto - kernel detects whether your CPU model is
vulnerable
Selecting 'on' will, and 'auto' may, choose a
mitigation method at run time according to the
CPU, the available microcode, the setting of the
CONFIG_RETPOLINE configuration option, and the
compiler with which the kernel was built.
2018-11-25 19:33:45 +01:00
Selecting 'on' will also enable the mitigation
against user space to user space task attacks.
Selecting 'off' will disable both the kernel and
the user space protections.
2018-01-11 21:46:26 +00:00
Specific mitigations can also be selected manually:
retpoline - replace indirect branches
retpoline,generic - google's original retpoline
retpoline,amd - AMD-specific minimal thunk
Not specifying this option is equivalent to
spectre_v2=auto.
2018-11-25 19:33:45 +01:00
spectre_v2_user=
[X86] Control mitigation of Spectre variant 2
(indirect branch speculation) vulnerability between
user space tasks
on - Unconditionally enable mitigations. Is
enforced by spectre_v2=on
off - Unconditionally disable mitigations. Is
enforced by spectre_v2=off
2018-11-25 19:33:54 +01:00
prctl - Indirect branch speculation is enabled,
but mitigation can be enabled via prctl
per thread. The mitigation control state
is inherited on fork.
2018-11-25 19:33:56 +01:00
prctl,ibpb
- Like "prctl" above, but only STIBP is
controlled per thread. IBPB is issued
always when switching between different user
space processes.
2018-11-25 19:33:55 +01:00
seccomp
- Same as "prctl" above, but all seccomp
threads will enable the mitigation unless
they explicitly opt out.
2018-11-25 19:33:56 +01:00
seccomp,ibpb
- Like "seccomp" above, but only STIBP is
controlled per thread. IBPB is issued
always when switching between different
user space processes.
2018-11-25 19:33:45 +01:00
auto - Kernel selects the mitigation depending on
the available CPU features and vulnerability.
2018-11-25 19:33:55 +01:00
Default mitigation:
If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
2018-11-25 19:33:45 +01:00
Not specifying this option is equivalent to
spectre_v2_user=auto.
2018-04-25 22:04:21 -04:00
spec_store_bypass_disable=
[HW] Control Speculative Store Bypass (SSB) Disable mitigation
(Speculative Store Bypass vulnerability)
Certain CPUs are vulnerable to an exploit against a
a common industry wide performance optimization known
as "Speculative Store Bypass" in which recent stores
to the same memory location may not be observed by
later loads during speculative execution. The idea
is that such stores are unlikely and that they can
be detected prior to instruction retirement at the
end of a particular speculation execution window.
In vulnerable processors, the speculatively forwarded
store can be used in a cache side channel attack, for
example to read memory to which the attacker does not
directly have access (e.g. inside sandboxed code).
This parameter controls whether the Speculative Store
Bypass optimization is used.
2018-07-10 12:08:36 +10:00
On x86 the options are:
2018-05-03 14:37:54 -07:00
on - Unconditionally disable Speculative Store Bypass
off - Unconditionally enable Speculative Store Bypass
auto - Kernel detects whether the CPU model contains an
implementation of Speculative Store Bypass and
picks the most appropriate mitigation. If the
CPU is not vulnerable, "off" is selected. If the
CPU is vulnerable the default mitigation is
architecture and Kconfig dependent. See below.
prctl - Control Speculative Store Bypass per thread
via prctl. Speculative Store Bypass is enabled
for a process by default. The state of the control
is inherited on fork.
seccomp - Same as "prctl" above, but all seccomp threads
will disable SSB unless they explicitly opt out.
2018-04-25 22:04:21 -04:00
2018-05-03 14:37:54 -07:00
Default mitigations:
X86: If CONFIG_SECCOMP=y "seccomp", otherwise "prctl"
2018-07-10 12:08:36 +10:00
On powerpc the options are:
on,auto - On Power8 and Power9 insert a store-forwarding
barrier on kernel entry and exit. On Power7
perform a software flush on kernel entry and
exit.
off - No action.
Not specifying this option is equivalent to
spec_store_bypass_disable=auto.
2005-04-16 15:20:36 -07:00
spia_io_base= [HW,MTD]
spia_fio_base=
spia_pedr=
spia_peddr=
2020-01-26 12:05:35 -08:00
split_lock_detect=
[X86] Enable split lock detection
When enabled (and if hardware support is present), atomic
instructions that access data across cache line
boundaries will result in an alignment check exception.
off - not enabled
warn - the kernel will emit rate limited warnings
about applications triggering the #AC
exception. This mode is the default on CPUs
that supports split lock detection.
fatal - the kernel will send SIGBUS to applications
that trigger the #AC exception.
If an #AC exception is hit in the kernel or in
firmware (i.e. not while executing in user mode)
the kernel will oops in either "warn" or "fatal"
mode.
2020-04-16 17:54:04 +02:00
srbds= [X86,INTEL]
Control the Special Register Buffer Data Sampling
(SRBDS) mitigation.
Certain CPUs are vulnerable to an MDS-like
exploit which can leak bits from the random
number generator.
By default, this issue is mitigated by
microcode. However, the microcode fix can cause
the RDRAND and RDSEED instructions to become
much slower. Among other effects, this will
result in reduced throughput from /dev/urandom.
The microcode mitigation can be disabled with
the following option:
off: Disable mitigation and remove
performance impact to RDRAND and RDSEED
srcu: Prevent sdp->srcu_gp_seq_needed counter wrap
If a given CPU never happens to ever start an SRCU grace period, the
grace-period sequence counter might wrap. If this CPU were to decide to
finally start a grace period, the state of its sdp->srcu_gp_seq_needed
might make it appear that it has already requested this grace period,
which would prevent starting the grace period. If no other CPU ever started
a grace period again, this would look like a grace-period hang. Even
if some other CPU took pity and started the needed grace period, the
leaf rcu_node structure's ->srcu_data_have_cbs field won't have record
of the fact that this CPU has a callback pending, which would look like
a very localized grace-period hang.
This might seem very unlikely, but SRCU grace periods can take less than
a microsecond on small systems, which means that overflow can happen
in much less than an hour on a 32-bit embedded system. And embedded
systems are especially likely to have long-term idle CPUs. Therefore,
it makes sense to prevent this scenario from happening.
This commit therefore scans each srcu_data structure occasionally,
with frequency controlled by the srcutree.counter_wrap_check kernel
boot parameter. This parameter can be set to something like 255
in order to exercise the counter-wrap-prevention code.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-05-03 15:35:32 -07:00
srcutree.counter_wrap_check [KNL]
Specifies how frequently to check for
grace-period sequence counter wrap for the
srcu_data structure's ->srcu_gp_seq_needed field.
The greater the number of bits set in this kernel
parameter, the less frequently counter wrap will
be checked for. Note that the bottom two bits
are ignored.
2017-04-25 14:03:11 -07:00
srcutree.exp_holdoff [KNL]
Specifies how many nanoseconds must elapse
since the end of the last SRCU grace period for
a given srcu_struct until the next normal SRCU
grace period will be considered for automatic
expediting. Set to zero to disable automatic
expediting.
2018-05-29 13:11:09 +01:00
ssbd= [ARM64,HW]
Speculative Store Bypass Disable control
On CPUs that are vulnerable to the Speculative
Store Bypass vulnerability and offer a
firmware based mitigation, this parameter
indicates how the mitigation should be used:
force-on: Unconditionally enable mitigation for
for both kernel and userspace
force-off: Unconditionally disable mitigation for
for both kernel and userspace
kernel: Always enable mitigation in the
kernel, and offer a prctl interface
to allow userspace to register its
interest in being mitigated too.
mm: larger stack guard gap, between vmas
Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.
This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.
Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.
One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications. For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).
Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.
Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.
Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-19 04:03:24 -07:00
stack_guard_gap= [MM]
override the default stack gap protection. The value
is in page units and it defines how many pages prior
to (for stacks growing down) resp. after (for stacks
growing up) the main stack are reserved for no other
mapping. Default value is 256 pages.
2008-12-16 23:06:40 -05:00
stacktrace [FTRACE]
Enabled the stack tracer on boot up.
2011-12-19 22:01:00 -05:00
stacktrace_filter=[function-list]
[FTRACE] Limit the functions that the stack tracer
will trace at boot up. function-list is a comma separated
list of functions. This list can be changed at run
time by the stack_trace_filter file in the debugfs
tracing directory. Note, this enables stack tracing
and the stacktrace above is not needed.
2005-04-16 15:20:36 -07:00
sti= [PARISC,HW]
Format: <num>
Set the STI (builtin display/keyboard on the HP-PARISC
machines) console (graphic card) which should be used
as the initial boot-console.
See also comment in drivers/video/console/sticore.c.
sti_font= [HW]
See comment in drivers/video/console/sticore.c.
stifb= [HW]
Format: bpp:<bpp1>[:<bpp2>[:<bpp3>...]]
2009-08-09 15:06:19 -04:00
sunrpc.min_resvport=
sunrpc.max_resvport=
[NFS,SUNRPC]
SunRPC servers often require that client requests
originate from a privileged port (i.e. a port in the
range 0 < portnr < 1024).
An administrator who wishes to reserve some of these
ports for other uses may adjust the range that the
kernel's sunrpc client considers to be privileged
using these two parameters to set the minimum and
maximum port values.
2016-06-24 10:55:50 -04:00
sunrpc.svc_rpc_per_connection_limit=
[NFS,SUNRPC]
Limit the number of requests that the server will
process in parallel from a single connection.
The default value is 0 (no limit).
2007-03-06 01:42:23 -08:00
sunrpc.pool_mode=
[NFS]
Control how the NFS server code allocates CPUs to
service thread pools. Depending on how many NICs
you have and where their interrupts are bound, this
option will affect which CPUs will do NFS serving.
Note: this parameter cannot be changed while the
NFS server is running.
auto the server chooses an appropriate mode
automatically using heuristics
global a single global pool contains all CPUs
percpu one pool for each CPU
pernode one pool for each NUMA node (equivalent
to global on non-NUMA machines)
2009-08-09 15:06:19 -04:00
sunrpc.tcp_slot_table_entries=
sunrpc.udp_slot_table_entries=
[NFS,SUNRPC]
Sets the upper limit on the number of simultaneous
RPC calls that can be sent from the client to a
server. Increasing these values may allow you to
improve throughput, but will also increase the
amount of memory reserved for use by the client.
PM / sleep: add configurable delay for pm_test
When CONFIG_PM_DEBUG=y, we provide a sysfs file (/sys/power/pm_test) for
selecting one of a few suspend test modes, where rather than entering a
full suspend state, the kernel will perform some subset of suspend
steps, wait 5 seconds, and then resume back to normal operation.
This mode is useful for (among other things) observing the state of the
system just before entering a sleep mode, for debugging or analysis
purposes. However, a constant 5 second wait is not sufficient for some
sorts of analysis; for example, on an SoC, one might want to use
external tools to probe the power states of various on-chip controllers
or clocks.
This patch turns this 5 second delay into a configurable module
parameter, so users can determine how long to wait in this
pseudo-suspend state before resuming the system.
Example (wait 30 seconds);
# echo 30 > /sys/module/suspend/parameters/pm_test_delay
# echo core > /sys/power/pm_test
# time echo mem > /sys/power/state
...
[ 17.583625] suspend debug: Waiting for 30 second(s).
...
real 0m30.381s
user 0m0.017s
sys 0m0.080s
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Kevin Cernekee <cernekee@chromium.org>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-22 21:16:49 -08:00
suspend.pm_test_delay=
[SUSPEND]
Sets the number of seconds to remain in a suspend test
mode before resuming the system (see
/sys/power/pm_test). Only available when CONFIG_PM_DEBUG
is set. Default value is 5.
2019-08-19 23:13:14 -03:00
svm= [PPC]
Format: { on | off | y | n | 1 | 0 }
This parameter controls use of the Protected
Execution Facility on pSeries.
2013-08-22 16:35:46 -07:00
swapaccount=[0|1]
2010-11-24 12:57:08 -08:00
[KNL] Enable accounting of swap in memory resource
controller if no parameter or 1 is given or disable
2019-06-27 13:08:35 -03:00
it if 0 is given (See Documentation/admin-guide/cgroup-v1/memory.rst)
2010-11-24 12:57:08 -08:00
2013-11-27 13:48:09 +01:00
swiotlb= [ARM,IA-64,PPC,MIPS,X86]
2016-12-16 14:28:42 +01:00
Format: { <int> | force | noforce }
2013-11-27 13:48:09 +01:00
<int> -- Number of I/O TLB slabs
force -- force using of bounce buffers even if they
wouldn't be automatically used by the kernel
2016-12-16 14:28:42 +01:00
noforce -- Never use bounce buffers (for debugging)
2005-10-23 12:57:11 -07:00
2005-04-16 15:20:36 -07:00
switches= [HW,M68k]
kernel/sysctl: support setting sysctl parameters from kernel command line
Patch series "support setting sysctl parameters from kernel command line", v3.
This series adds support for something that seems like many people
always wanted but nobody added it yet, so here's the ability to set
sysctl parameters via kernel command line options in the form of
sysctl.vm.something=1
The important part is Patch 1. The second, not so important part is an
attempt to clean up legacy one-off parameters that do the same thing as
a sysctl. I don't want to remove them completely for compatibility
reasons, but with generic sysctl support the idea is to remove the
one-off param handlers and treat the parameters as aliases for the
sysctl variants.
I have identified several parameters that mention sysctl counterparts in
Documentation/admin-guide/kernel-parameters.txt but there might be more.
The conversion also has varying level of success:
- numa_zonelist_order is converted in Patch 2 together with adding the
necessary infrastructure. It's easy as it doesn't really do anything
but warn on deprecated value these days.
- hung_task_panic is converted in Patch 3, but there's a downside that
now it only accepts 0 and 1, while previously it was any integer
value
- nmi_watchdog maps to two sysctls nmi_watchdog and hardlockup_panic,
so there's no straighforward conversion possible
- traceoff_on_warning is a flag without value and it would be required
to handle that somehow in the conversion infractructure, which seems
pointless for a single flag
This patch (of 5):
A recently proposed patch to add vm_swappiness command line parameter in
addition to existing sysctl [1] made me wonder why we don't have a
general support for passing sysctl parameters via command line.
Googling found only somebody else wondering the same [2], but I haven't
found any prior discussion with reasons why not to do this.
Settings the vm_swappiness issue aside (the underlying issue might be
solved in a different way), quick search of kernel-parameters.txt shows
there are already some that exist as both sysctl and kernel parameter -
hung_task_panic, nmi_watchdog, numa_zonelist_order, traceoff_on_warning.
A general mechanism would remove the need to add more of those one-offs
and might be handy in situations where configuration by e.g.
/etc/sysctl.d/ is impractical.
Hence, this patch adds a new parse_args() pass that looks for parameters
prefixed by 'sysctl.' and tries to interpret them as writes to the
corresponding sys/ files using an temporary in-kernel procfs mount.
This mechanism was suggested by Eric W. Biederman [3], as it handles
all dynamically registered sysctl tables, even though we don't handle
modular sysctls. Errors due to e.g. invalid parameter name or value
are reported in the kernel log.
The processing is hooked right before the init process is loaded, as
some handlers might be more complicated than simple setters and might
need some subsystems to be initialized. At the moment the init process
can be started and eventually execute a process writing to /proc/sys/
then it should be also fine to do that from the kernel.
Sysctls registered later on module load time are not set by this
mechanism - it's expected that in such scenarios, setting sysctl values
from userspace is practical enough.
[1] https://lore.kernel.org/r/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/
[2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter
[3] https://lore.kernel.org/r/87bloj2skm.fsf@x220.int.ebiederm.org/
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: "Guilherme G . Piccoli" <gpiccoli@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Link: http://lkml.kernel.org/r/20200427180433.7029-1-vbabka@suse.cz
Link: http://lkml.kernel.org/r/20200427180433.7029-2-vbabka@suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-07 21:40:24 -07:00
sysctl.*= [KNL]
Set a sysctl parameter, right before loading the init
process, as if the value was written to the respective
/proc/sys/... file. Both '.' and '/' are recognized as
separators. Unrecognized parameters and invalid values
are reported in the kernel log. Sysctls registered
later by a loaded module cannot be set this way.
Example: sysctl.vm.swappiness=40
2010-09-08 16:54:17 +02:00
sysfs.deprecated=0|1 [KNL]
Enable/disable old style sysfs layout for old udev
on older distributions. When this option is enabled
very new udev will not work anymore. When this option
is disabled (or CONFIG_SYSFS_DEPRECATED not compiled)
in older udev will not work anymore.
Default depends on CONFIG_SYSFS_DEPRECATED_V2 set in
the kernel configuration.
2006-12-13 00:34:36 -08:00
sysrq_always_enabled
[KNL]
Ignore sysrq setting - this boot parameter will
neutralize any effect of /proc/sys/kernel/sysrq.
Useful for debugging.
2014-11-06 19:46:50 +01:00
tcpmhash_entries= [KNL,NET]
Set the number of tcp_metrics_hash slots.
Default value is 8192 or 16384 depending on total
ram pages. This is used to specify the TCP metrics
2020-04-28 00:01:49 +02:00
cache size. See Documentation/networking/ip-sysctl.rst
2014-11-06 19:46:50 +01:00
"tcp_no_metrics_save" section for more details.
2005-04-16 15:20:36 -07:00
tdfx= [HW,DRM]
2014-09-02 11:54:41 -07:00
test_suspend= [SUSPEND][,N]
2008-07-23 21:28:33 -07:00
Specify "mem" (for Suspend-to-RAM) or "standby" (for
2014-09-02 11:54:41 -07:00
standby suspend) or "freeze" (for suspend type freeze)
as the system sleep state during system startup with
the optional capability to repeat N number of times.
The system is woken from this state using a
wakeup-capable RTC alarm.
2008-07-23 21:28:33 -07:00
2005-04-16 15:20:36 -07:00
thash_entries= [KNL,NET]
Set number of hash buckets for TCP connection
2007-08-12 00:12:54 -04:00
thermal.act= [HW,ACPI]
-1: disable all active trip points in all thermal zones
<degrees C>: override all lowest active trip points
2007-08-14 15:49:32 -04:00
thermal.crt= [HW,ACPI]
-1: disable all critical trip points in all thermal zones
2008-10-17 02:41:20 -04:00
<degrees C>: override all critical trip points
2007-08-14 15:49:32 -04:00
2007-08-12 00:12:44 -04:00
thermal.nocrt= [HW,ACPI]
Set to disable actions on ACPI thermal zone
critical and hot trip points.
2007-08-12 00:12:17 -04:00
thermal.off= [HW,ACPI]
1: disable ACPI thermal control
2007-08-12 00:12:35 -04:00
thermal.psv= [HW,ACPI]
-1: disable all passive trip points
2008-12-19 10:57:32 -08:00
<degrees C>: override all passive trip points to this
value
2007-08-12 00:12:35 -04:00
ACPI: thermal: expose "thermal.tzp=" to set global polling frequency
Thermal Zone Polling frequency (_TZP) is an optional ACPI object
recommending the rate that the OS should poll the associated thermal zone.
If _TZP is 0, no polling should be used.
If _TZP is non-zero, then the platform recommends that
the OS poll the thermal zone at the specified rate.
The minimum period is 30 seconds.
The maximum period is 5 minutes.
(note _TZP and thermal.tzp units are in deci-seconds,
so _TZP = 300 corresponds to 30 seconds)
If _TZP is not present, ACPI 3.0b recommends that the
thermal zone be polled at an "OS provided default frequency".
However, common industry practice is:
1. The BIOS never specifies any _TZP
2. High volume OS's from this century never poll any thermal zones
Ie. The OS depends on the platform's ability to
provoke thermal events when necessary, and
the "OS provided default frequency" is "never":-)
There is a proposal that ACPI 4.0 be updated to reflect
common industry practice -- ie. no _TZP, no polling.
The Linux kernel already follows this practice --
thermal zones are not polled unless _TZP is present and non-zero.
But thermal zone polling is useful as a workaround for systems
which have ACPI thermal control, but have an issue preventing
thermal events. Indeed, some Linux distributions still
set a non-zero thermal polling frequency for this reason.
But rather than ask the user to write a polling frequency
into all the /proc/acpi/thermal_zone/*/polling_frequency
files, here we simply document and expose the already
existing module parameter to do the same at system level,
to simplify debugging those broken platforms.
Note that thermal.tzp is a module-load time parameter only.
Signed-off-by: Len Brown <len.brown@intel.com>
2007-08-12 00:12:26 -04:00
thermal.tzp= [HW,ACPI]
Specify global default ACPI thermal zone polling rate
<deci-seconds>: poll all this frequency
0: no polling (default)
2011-02-23 23:52:23 +00:00
threadirqs [KNL]
Force threading of all interrupt handlers except those
2012-02-15 00:26:42 +09:00
marked explicitly IRQF_NO_THREAD.
2011-02-23 23:52:23 +00:00
2008-12-25 13:39:23 +01:00
topology= [S390]
Format: {off | on}
Specify if the kernel should make use of the cpu
2011-04-04 15:04:46 -07:00
topology information if the hardware supports this.
The scheduler will make use of this information and
2008-12-25 13:39:23 +01:00
e.g. base its process migration decisions on it.
2010-10-25 16:10:43 +02:00
Default is on.
2008-12-25 13:39:23 +01:00
2014-10-10 09:04:49 -07:00
topology_updates= [KNL, PPC, NUMA]
Format: {off}
Specify if the kernel should ignore (off)
topology updates sent by the hypervisor to this
LPAR.
2019-12-06 15:02:59 -08:00
torture.disable_onoff_at_boot= [KNL]
Prevent the CPU-hotplug component of torturing
until after init has spawned.
2005-04-16 15:20:36 -07:00
tp720= [HW,PS2]
2010-03-25 00:55:32 -03:00
tpm_suspend_pcr=[HW,TPM]
Format: integer pcr id
Specify that at suspend time, the tpm driver
should extend the specified pcr with zeros,
as a workaround for some chips which fail to
flush the last written pcr on TPM_SaveState.
This will guarantee that all the other pcrs
are saved.
2009-06-24 17:33:15 +08:00
trace_buf_size=nn[KMG]
2014-12-03 10:39:20 +09:00
[FTRACE] will set tracing buffer size on each cpu.
2009-03-10 13:57:10 +09:00
2009-07-01 10:47:05 +08:00
trace_event=[event-list]
[FTRACE] Set and start specified trace events in order
2016-05-23 13:37:58 -07:00
to facilitate early boot debugging. The event-list is a
comma separated list of trace events to enable. See
2018-05-08 15:14:57 -03:00
also Documentation/trace/events.rst
2009-07-01 10:47:05 +08:00
2012-11-01 22:56:07 -04:00
trace_options=[option-list]
[FTRACE] Enable or disable tracer options at boot.
The option-list is a comma delimited list of options
that can be enabled or disabled just as if you were
to echo the option name into
/sys/kernel/debug/tracing/trace_options
For example, to enable stacktrace option (to dump the
stack trace of each event), add to the command line:
trace_options=stacktrace
2018-05-08 15:14:57 -03:00
See also Documentation/trace/ftrace.rst "trace options"
2012-11-01 22:56:07 -04:00
section.
2014-12-12 22:27:10 -05:00
tp_printk[FTRACE]
Have the tracepoints sent to printk as well as the
tracing ring buffer. This is useful for early boot up
where the system hangs or reboots and does not give the
option for reading the tracing buffer or performing a
ftrace_dump_on_oops.
To turn off having tracepoints sent to printk,
echo 0 > /proc/sys/kernel/tracepoint_printk
Note, echoing 1 into this file without the
tracepoint_printk kernel cmdline option has no effect.
** CAUTION **
Having tracepoints sent to printk() and activating high
frequency tracepoints such as irq or sched, can cause
the system to live lock.
2013-06-14 16:21:43 -04:00
traceoff_on_warning
[FTRACE] enable this option to disable tracing when a
warning is hit. This turns off "tracing_on". Tracing can
be enabled again by echoing '1' into the "tracing_on"
file located in /sys/kernel/debug/tracing/
This option is useful, as it disables the trace before
the WARNING dump is called, which prevents the trace to
be filled with content caused by the warning output.
This option can also be set at run time via the sysctl
option: kernel/traceoff_on_warning
2012-03-21 16:34:02 -07:00
transparent_hugepage=
[KNL]
Format: [always|madvise|never]
Can be used to control the default behavior of the system
with respect to transparent hugepages.
2018-05-14 11:13:40 +03:00
See Documentation/admin-guide/mm/transhuge.rst
for more details.
2012-03-21 16:34:02 -07:00
2009-08-17 16:40:47 -07:00
tsc= Disable clocksource stability checks for TSC.
2008-10-24 17:22:01 -07:00
Format: <string>
[x86] reliable: mark tsc clocksource as reliable, this
2009-08-17 16:40:47 -07:00
disables clocksource verification at runtime, as well
as the stability checks done at bootup. Used to enable
high-resolution timer mode on older hardware, and in
virtualized environment.
2010-10-04 17:03:20 -07:00
[x86] noirqtime: Do not use TSC to do irq accounting.
Used to run time disable IRQ_TIME_ACCOUNTING on any
platforms where RDTSC is slow and this accounting
can add overhead.
2017-10-09 17:03:33 +08:00
[x86] unstable: mark the TSC clocksource as unstable, this
marks the TSC unconditionally unstable at bootup and
avoids any further wobbles once the TSC watchdog notices.
2019-03-07 13:09:13 +01:00
[x86] nowatchdog: disable clocksource watchdog. Used
in situations with strict latency requirements (where
interruptions from clocksource watchdog are not
acceptable).
2008-10-24 17:22:01 -07:00
2020-01-23 16:09:26 +00:00
tsc_early_khz= [X86] Skip early TSC calibration and use the given
value instead. Useful when the early TSC frequency discovery
procedure is not reliable, such as on overclocked systems
with CPUID.16h support and partial CPUID.15h support.
Format: <unsigned int>
2019-10-23 11:01:53 +02:00
tsx= [X86] Control Transactional Synchronization
Extensions (TSX) feature in Intel processors that
support TSX control.
This parameter controls the TSX feature. The options are:
on - Enable TSX on the system. Although there are
mitigations for all known security vulnerabilities,
TSX has been known to be an accelerator for
several previous speculation-related CVEs, and
so there may be unknown security risks associated
with leaving it enabled.
off - Disable TSX on the system. (Note that this
option takes effect only on newer CPUs which are
not vulnerable to MDS, i.e., have
MSR_IA32_ARCH_CAPABILITIES.MDS_NO=1 and which get
the new IA32_TSX_CTRL MSR through a microcode
update. This new MSR allows for the reliable
deactivation of the TSX functionality.)
2019-10-23 12:28:57 +02:00
auto - Disable TSX if X86_BUG_TAA is present,
otherwise enable TSX on the system.
2019-10-23 11:01:53 +02:00
Not specifying this option is equivalent to tsx=off.
See Documentation/admin-guide/hw-vuln/tsx_async_abort.rst
for more details.
2019-10-23 12:32:55 +02:00
tsx_async_abort= [X86,INTEL] Control mitigation for the TSX Async
Abort (TAA) vulnerability.
Similar to Micro-architectural Data Sampling (MDS)
certain CPUs that support Transactional
Synchronization Extensions (TSX) are vulnerable to an
exploit against CPU internal buffers which can forward
information to a disclosure gadget under certain
conditions.
In vulnerable processors, the speculatively forwarded
data can be used in a cache side channel attack, to
access data to which the attacker does not have direct
access.
This parameter controls the TAA mitigation. The
options are:
full - Enable TAA mitigation on vulnerable CPUs
if TSX is enabled.
full,nosmt - Enable TAA mitigation and disable SMT on
vulnerable CPUs. If TSX is disabled, SMT
is not disabled because CPU is not
vulnerable to cross-thread TAA attacks.
off - Unconditionally disable TAA mitigation
2019-11-15 11:14:44 -05:00
On MDS-affected machines, tsx_async_abort=off can be
prevented by an active MDS mitigation as both vulnerabilities
are mitigated with the same mechanism so in order to disable
this mitigation, you need to specify mds=off too.
2019-10-23 12:32:55 +02:00
Not specifying this option is equivalent to
tsx_async_abort=full. On CPUs which are MDS affected
and deploy MDS mitigation, TAA mitigation is not
required and doesn't provide any additional
mitigation.
For details see:
Documentation/admin-guide/hw-vuln/tsx_async_abort.rst
2005-10-23 12:57:11 -07:00
turbografx.map[2|3]= [HW,JOY]
TurboGraFX parallel port interface
Format:
<port#>,<js1>,<js2>,<js3>,<js4>,<js5>,<js6>,<js7>
2017-10-10 12:36:23 -05:00
See also Documentation/input/devices/joystick-parport.rst
2005-04-16 15:20:36 -07:00
2011-05-31 15:22:05 +00:00
udbg-immortal [PPC] When debugging early kernel crashes that
2016-11-03 12:10:10 +02:00
happen after console_init() and before a proper
2011-05-31 15:22:05 +00:00
console driver takes over, this boot options might
help "seeing" what's going on.
2009-10-07 00:37:59 +00:00
uhash_entries= [KNL,NET]
Set number of hash buckets for UDP/UDP-Lite connections
2006-12-05 16:29:55 -05:00
uhci-hcd.ignore_oc=
[USB] Ignore overcurrent events (default N).
Some badly-designed motherboards generate lots of
bogus events, for ports that aren't wired to
anything. Set this parameter to avoid log spamming.
Note that genuine overcurrent events won't be
reported either.
2008-07-19 23:32:54 +01:00
unknown_nmi_panic
2011-04-04 15:02:24 -07:00
[X86] Cause panic on unknown NMI.
2008-07-19 23:32:54 +01:00
2011-05-31 21:31:08 +02:00
usbcore.authorized_default=
[USB] Default USB device authorization:
(default -1 = authorized except for wireless USB,
2019-02-16 23:21:51 -08:00
0 = not authorized, 1 = authorized, 2 = authorized
if device connected to internal port)
2011-05-31 21:31:08 +02:00
2007-02-20 15:00:53 -05:00
usbcore.autosuspend=
[USB] The autosuspend time delay (in seconds) used
for newly-detected USB devices (default 2). This
is the time required before an idle device will be
autosuspended. Devices for which the delay is set
2007-03-13 16:39:15 -04:00
to a negative value won't be autosuspended at all.
2007-02-20 15:00:53 -05:00
2008-10-10 16:24:45 +02:00
usbcore.usbfs_snoop=
[USB] Set to log all usbfs traffic (default 0 = off).
2015-11-20 13:53:22 -05:00
usbcore.usbfs_snoop_max=
[USB] Maximum number of bytes to snoop in each URB
(default = 65536).
2008-10-10 16:24:45 +02:00
usbcore.blinkenlights=
[USB] Set to cycle leds on hubs (default 0 = off).
usbcore.old_scheme_first=
[USB] Start with the old device initialization
2020-04-22 16:13:08 -04:00
scheme (default 0 = off).
2008-10-10 16:24:45 +02:00
2011-11-17 16:41:35 -05:00
usbcore.usbfs_memory_mb=
[USB] Memory limit (in MB) for buffers allocated by
usbfs (default = 16, 0 = max = 2047).
2008-10-10 16:24:45 +02:00
usbcore.use_both_schemes=
[USB] Try the other device initialization scheme
if the first one fails (default 1 = enabled).
usbcore.initial_descriptor_timeout=
[USB] Specifies timeout for the initial 64-byte
2018-04-18 20:51:39 +02:00
USB_REQ_GET_DESCRIPTOR request in milliseconds
2008-10-10 16:24:45 +02:00
(default 5000 = 5.0 seconds).
2015-12-03 15:03:32 +01:00
usbcore.nousb [USB] Disable the USB subsystem
2018-03-20 00:26:06 +08:00
usbcore.quirks=
[USB] A list of quirk entries to augment the built-in
usb core quirk list. List entries are separated by
commas. Each entry has the form
VendorID:ProductID:Flags. The IDs are 4-digit hex
numbers and Flags is a set of letters. Each letter
will change the built-in quirk; setting it if it is
clear and clearing it if it is set. The letters have
the following meanings:
a = USB_QUIRK_STRING_FETCH_255 (string
descriptors must not be fetched using
a 255-byte read);
b = USB_QUIRK_RESET_RESUME (device can't resume
correctly so reset it instead);
c = USB_QUIRK_NO_SET_INTF (device can't handle
Set-Interface requests);
d = USB_QUIRK_CONFIG_INTF_STRINGS (device can't
handle its Configuration or Interface
strings);
e = USB_QUIRK_RESET (device can't be reset
(e.g morph devices), don't use reset);
f = USB_QUIRK_HONOR_BNUMINTERFACES (device has
more interface descriptions than the
bNumInterfaces count, and can't handle
talking to these interfaces);
g = USB_QUIRK_DELAY_INIT (device needs a pause
during initialization, after we read
the device descriptor);
h = USB_QUIRK_LINEAR_UFRAME_INTR_BINTERVAL (For
high speed and super speed interrupt
endpoints, the USB 2.0 and USB 3.0 spec
require the interval in microframes (1
microframe = 125 microseconds) to be
calculated as interval = 2 ^
(bInterval-1).
Devices with this quirk report their
bInterval as the result of this
calculation instead of the exponent
variable used in the calculation);
i = USB_QUIRK_DEVICE_QUALIFIER (device can't
handle device_qualifier descriptor
requests);
j = USB_QUIRK_IGNORE_REMOTE_WAKEUP (device
generates spurious wakeup, ignore
remote wakeup capability);
k = USB_QUIRK_NO_LPM (device can't handle Link
Power Management);
l = USB_QUIRK_LINEAR_FRAME_INTR_BINTERVAL
(Device reports its bInterval as linear
frames instead of the USB 2.0
calculation);
m = USB_QUIRK_DISCONNECT_SUSPEND (Device needs
to be disconnected before suspend to
2018-03-24 03:26:36 +08:00
prevent spurious wakeup);
n = USB_QUIRK_DELAY_CTRL_MSG (Device needs a
pause after every control message);
2018-10-19 16:14:50 +08:00
o = USB_QUIRK_HUB_SLOW_RESET (Hub needs extra
delay after resetting its port);
2018-03-20 00:26:06 +08:00
Example: quirks=0781:5580:bk,0a5c:5834:gij
2005-04-16 15:20:36 -07:00
usbhid.mousepoll=
[USBHID] The interval which mice are to be polled at.
2005-10-23 12:57:11 -07:00
2017-02-25 20:27:27 +01:00
usbhid.jspoll=
[USBHID] The interval which joysticks are to be polled at.
2018-03-21 17:28:25 +01:00
usbhid.kbpoll=
[USBHID] The interval which keyboards are to be polled at.
2008-11-10 14:07:45 -05:00
usb-storage.delay_use=
[UMS] The delay in seconds before a new device is
2014-11-04 13:00:15 +00:00
scanned for Logical Units (default 1).
2008-11-10 14:07:45 -05:00
usb-storage.quirks=
[UMS] A list of quirks entries to supplement or
override the built-in unusual_devs list. List
entries are separated by commas. Each entry has
the form VID:PID:Flags where VID and PID are Vendor
and Product ID values (4-digit hex numbers) and
Flags is a set of characters, each corresponding
to a common usb-storage quirk flag as follows:
2008-12-15 10:40:06 -05:00
a = SANE_SENSE (collect more than 18 bytes
2019-11-14 12:27:58 +01:00
of sense data, not on uas);
2009-12-07 16:39:16 -05:00
b = BAD_SENSE (don't collect more than 18
2019-11-14 12:27:58 +01:00
bytes of sense data, not on uas);
2008-11-10 14:07:45 -05:00
c = FIX_CAPACITY (decrease the reported
device capacity by one sector);
2011-05-18 21:42:34 +01:00
d = NO_READ_DISC_INFO (don't use
2019-11-14 12:27:58 +01:00
READ_DISC_INFO command, not on uas);
2011-05-18 21:42:34 +01:00
e = NO_READ_CAPACITY_16 (don't use
READ_CAPACITY_16 command);
2014-09-16 18:36:52 +02:00
f = NO_REPORT_OPCODES (don't use report opcodes
command, uas only);
2015-04-21 11:20:31 +02:00
g = MAX_SECTORS_240 (don't transfer more than
240 sectors at a time, uas only);
2008-12-15 10:40:06 -05:00
h = CAPACITY_HEURISTICS (decrease the
reported device capacity by one
sector if the number is odd);
2008-11-10 14:07:45 -05:00
i = IGNORE_DEVICE (don't bind to this
device);
2016-04-12 12:27:09 +02:00
j = NO_REPORT_LUNS (don't use report luns
command, uas only);
2008-11-10 14:07:45 -05:00
l = NOT_LOCKABLE (don't try to lock and
2019-11-14 12:27:58 +01:00
unlock ejectable media, not on uas);
2008-11-10 14:07:45 -05:00
m = MAX_SECTORS_64 (don't transfer more
2019-11-14 12:27:58 +01:00
than 64 sectors = 32 KB at a time,
not on uas);
2011-06-07 11:35:52 -04:00
n = INITIAL_READ10 (force a retry of the
2019-11-14 12:27:58 +01:00
initial READ(10) command, not on uas);
2008-12-15 10:40:06 -05:00
o = CAPACITY_OK (accept the capacity
2019-11-14 12:27:58 +01:00
reported by the device, not on uas);
2012-07-07 23:05:28 -04:00
p = WRITE_CACHE (the device cache is ON
2019-11-14 12:27:58 +01:00
by default, not on uas);
2008-11-10 14:07:45 -05:00
r = IGNORE_RESIDUE (the device reports
2019-11-14 12:27:58 +01:00
bogus residue values, not on uas);
2008-11-10 14:07:45 -05:00
s = SINGLE_LUN (the device has only one
Logical Unit);
2014-09-15 16:04:12 +02:00
t = NO_ATA_1X (don't allow ATA(12) and ATA(16)
commands, uas only);
2014-09-02 15:42:18 -04:00
u = IGNORE_UAS (don't bind to the uas driver);
2008-11-10 14:07:45 -05:00
w = NO_WP_DETECT (don't test whether the
medium is write-protected).
2016-09-12 15:19:41 +02:00
y = ALWAYS_SYNC (issue a SYNCHRONIZE_CACHE
2019-11-14 12:27:58 +01:00
even if the device claims no cache,
not on uas)
2008-11-10 14:07:45 -05:00
Example: quirks=0419:aaf5:rl,0421:0433:rc
2011-08-13 12:34:50 -07:00
user_debug= [KNL,ARM]
Format: <int>
See arch/arm/Kconfig.debug help text.
1 - undefined instruction events
2 - system calls
4 - invalid data aborts
8 - SIGSEGV faults
16 - SIGBUS faults
Example: user_debug=31
2010-02-17 10:38:10 +00:00
userpte=
[X86] Flags controlling user PTE allocations.
nohigh = do not allocate PTE pages in
HIGHMEM regardless of setting
of CONFIG_HIGHPTE.
2009-04-14 14:03:43 +05:30
vdso= [X86,SH]
2014-03-13 16:01:26 -07:00
On X86_32, this is an alias for vdso32=. Otherwise:
vdso=1: enable VDSO (the default)
[PATCH] vdso: randomize the i386 vDSO by moving it into a vma
Move the i386 VDSO down into a vma and thus randomize it.
Besides the security implications, this feature also helps debuggers, which
can COW a vma-backed VDSO just like a normal DSO and can thus do
single-stepping and other debugging features.
It's good for hypervisors (Xen, VMWare) too, which typically live in the same
high-mapped address space as the VDSO, hence whenever the VDSO is used, they
get lots of guest pagefaults and have to fix such guest accesses up - which
slows things down instead of speeding things up (the primary purpose of the
VDSO).
There's a new CONFIG_COMPAT_VDSO (default=y) option, which provides support
for older glibcs that still rely on a prelinked high-mapped VDSO. Newer
distributions (using glibc 2.3.3 or later) can turn this option off. Turning
it off is also recommended for security reasons: attackers cannot use the
predictable high-mapped VDSO page as syscall trampoline anymore.
There is a new vdso=[0|1] boot option as well, and a runtime
/proc/sys/vm/vdso_enabled sysctl switch, that allows the VDSO to be turned
on/off.
(This version of the VDSO-randomization patch also has working ELF
coredumping, the previous patch crashed in the coredumping code.)
This code is a combined work of the exec-shield VDSO randomization
code and Gerd Hoffmann's hypervisor-centric VDSO patch. Rusty Russell
started this patch and i completed it.
[akpm@osdl.org: cleanups]
[akpm@osdl.org: compile fix]
[akpm@osdl.org: compile fix 2]
[akpm@osdl.org: compile fix 3]
[akpm@osdl.org: revernt MAXMEM change]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Cc: Gerd Hoffmann <kraxel@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 02:53:50 -07:00
vdso=0: disable VDSO mapping
2014-03-13 16:01:26 -07:00
vdso32= [X86] Control the 32-bit vDSO
vdso32=1: enable 32-bit VDSO
vdso32=0 or vdso32=2: disable 32-bit VDSO
See the help text for CONFIG_COMPAT_VDSO for more
details. If CONFIG_COMPAT_VDSO is set, the default is
vdso32=0; otherwise, the default is vdso32=1.
For compatibility with older kernels, vdso32=2 is an
alias for vdso32=0.
Try vdso32=0 if you encounter an error that says:
dl_main: Assertion `(void *) ph->p_vaddr == _rtld_local._dl_sysinfo_dso' failed!
2008-01-30 13:30:43 +01:00
2007-07-17 21:22:55 +09:00
vector= [IA-64,SMP]
vector=percpu: enable percpu vector domain
2005-04-16 15:20:36 -07:00
video= [FB] Frame buffer configuration
2019-06-12 14:52:45 -03:00
See Documentation/fb/modedb.rst.
2005-04-16 15:20:36 -07:00
2013-06-20 15:08:55 +08:00
video.brightness_switch_enabled= [0,1]
If set to 1, on receiving an ACPI notify event
generated by hotkey, video driver will adjust brightness
level and then send out the event to user space through
the allocated input device; If set to 0, video driver
will only send out the event without touching backlight
brightness level.
2014-07-14 19:35:45 +02:00
default: 1
2013-06-20 15:08:55 +08:00
2012-05-09 18:30:16 +01:00
virtio_mmio.device=
[VMMIO] Memory mapped virtio (platform) device.
<size>@<baseaddr>:<irq>[:<id>]
where:
<size> := size (can use standard suffixes
like K, M and G)
<baseaddr> := physical base address
<irq> := interrupt number (as passed to
request_irq())
<id> := (optional) platform device id
example:
virtio_mmio.device=1K@0x100b0000:48:7
Can be used multiple times for multiple devices.
2007-07-31 00:37:59 -07:00
vga= [BOOT,X86-32] Select a particular video mode
2019-06-07 15:54:32 -03:00
See Documentation/x86/boot.rst and
2019-06-27 14:56:51 -03:00
Documentation/admin-guide/svga.rst.
2005-04-16 15:20:36 -07:00
Use vga=ask for menu.
This is actually a boot loader parameter; the value is
passed to the kernel using a special protocol.
2018-10-26 15:07:45 -07:00
vm_debug[=options] [KNL] Available with CONFIG_DEBUG_VM=y.
May slow down system boot speed, especially when
enabled on systems with a large amount of memory.
All options are enabled by default, and this
interface is meant to allow for selectively
enabling or disabling specific virtual memory
debugging features.
Available options are:
P Enable page structure init time poisoning
- Disable all of the above options
2005-10-23 12:57:11 -07:00
vmalloc=nn[KMG] [KNL,BOOT] Forces the vmalloc area to have an exact
2005-04-16 15:20:36 -07:00
size of <nn>. This can be used to increase the
minimum size (128MB on x86). It can also be used to
decrease the size and leave more room for directly
mapped kernel RAM.
2017-08-07 15:16:15 +02:00
vmcp_cma=nn[MG] [KNL,S390]
Sets the memory size reserved for contiguous memory
allocations for the vmcp device driver.
2006-06-29 15:08:25 +02:00
vmhalt= [KNL,S390] Perform z/VM CP command after system halt.
Format: <command>
2005-04-16 15:20:36 -07:00
2006-06-29 15:08:25 +02:00
vmpanic= [KNL,S390] Perform z/VM CP command after kernel panic.
Format: <command>
vmpoff= [KNL,S390] Perform z/VM CP command after power off.
Format: <command>
2005-10-23 12:57:11 -07:00
2011-08-10 11:15:32 -04:00
vsyscall= [X86-64]
Controls the behavior of vsyscalls (i.e. calls to
fixed addresses of 0xffffffffff600x00 from legacy
code). Most statically-linked binaries and older
versions of glibc use these calls. Because these
functions are at fixed addresses, they make nice
targets for exploits that can control RIP.
2011-11-07 16:33:41 -08:00
emulate [default] Vsyscalls turn into traps and are
2019-06-26 21:45:03 -07:00
emulated reasonably safely. The vsyscall
page is readable.
2011-08-10 11:15:32 -04:00
2019-06-26 21:45:03 -07:00
xonly Vsyscalls turn into traps and are
emulated reasonably safely. The vsyscall
page is not readable.
2011-08-10 11:15:32 -04:00
none Vsyscalls don't work at all. This makes
them quite hard to use for exploits but
might break your system.
2013-08-04 13:09:50 +02:00
vt.color= [VT] Default text color.
Format: 0xYX, X = foreground, Y = background.
Default: 0x07 = light gray on black.
2009-12-15 16:45:39 -08:00
vt.cur_default= [VT] Default cursor shape.
Format: 0xCCBBAA, where AA, BB, and CC are the same as
the parameters of the <Esc>[?A;B;Cc escape sequence;
see VGA-softcursor.txt. Default: 2 = underline.
2009-04-05 15:55:22 -07:00
vt.default_blu= [VT]
Format: <blue0>,<blue1>,<blue2>,...,<blue15>
Change the default blue palette of the console.
This is a 16-member array composed of values
ranging from 0-255.
vt.default_grn= [VT]
Format: <green0>,<green1>,<green2>,...,<green15>
Change the default green palette of the console.
This is a 16-member array composed of values
ranging from 0-255.
vt.default_red= [VT]
Format: <red0>,<red1>,<red2>,...,<red15>
Change the default red palette of the console.
This is a 16-member array composed of values
ranging from 0-255.
vt.default_utf8=
[VT]
Format=<0|1>
Set system-wide default UTF-8 mode for all tty's.
Default is 1, i.e. UTF-8 mode is enabled for all
newly opened terminals.
2009-11-13 15:14:11 -05:00
vt.global_cursor_default=
[VT]
Format=<-1|0|1>
Set system-wide default for whether a cursor
is shown on new VTs. Default is -1,
i.e. cursors will be created by default unless
overridden by individual drivers. 0 will hide
cursors, 1 will display them.
2013-08-04 13:09:50 +02:00
vt.italic= [VT] Default color for italic text; 0-15.
Default: 2 = green.
vt.underline= [VT] Default color for underlined text; 0-15.
Default: 3 = cyan.
2010-05-03 11:42:52 -07:00
watchdog timers [HW,WDT] For information on watchdog timers,
2019-06-12 14:53:01 -03:00
see Documentation/watchdog/watchdog-parameters.rst
2010-05-03 11:42:52 -07:00
or other driver-specific files in the
Documentation/watchdog/ directory.
2005-04-16 15:20:36 -07:00
2018-11-01 09:30:18 -04:00
watchdog_thresh=
[KNL]
Set the hard lockup detector stall duration
threshold in seconds. The soft lockup detector
threshold is set to twice the value. A value of 0
disables both lockup detectors. Default is 10
seconds.
workqueue: implement lockup detector
Workqueue stalls can happen from a variety of usage bugs such as
missing WQ_MEM_RECLAIM flag or concurrency managed work item
indefinitely staying RUNNING. These stalls can be extremely difficult
to hunt down because the usual warning mechanisms can't detect
workqueue stalls and the internal state is pretty opaque.
To alleviate the situation, this patch implements workqueue lockup
detector. It periodically monitors all worker_pools periodically and,
if any pool failed to make forward progress longer than the threshold
duration, triggers warning and dumps workqueue state as follows.
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 31s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=17/256
pending: monkey_wrench_fn, e1000_watchdog, cache_reap, vmstat_shepherd, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, cgroup_release_agent
workqueue events_power_efficient: flags=0x80
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
pending: check_lifetime, neigh_periodic_work
workqueue cgroup_pidlist_destroy: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
pending: cgroup_pidlist_destroy_work_fn
...
The detection mechanism is controller through kernel parameter
workqueue.watchdog_thresh and can be updated at runtime through the
sysfs module parameter file.
v2: Decoupled from softlockup control knobs.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
2015-12-08 11:28:04 -05:00
workqueue.watchdog_thresh=
If CONFIG_WQ_WATCHDOG is configured, workqueue can
warn stall conditions and dump internal state to
help debugging. 0 disables workqueue stall
detection; otherwise, it's the stall threshold
duration in seconds. The default value is 30 and
it can be updated at runtime by writing to the
corresponding sysfs file.
2013-04-01 11:23:38 -07:00
workqueue.disable_numa
By default, all work items queued to unbound
workqueues are affine to the NUMA nodes they're
issued on, which results in better behavior in
general. If NUMA affinity needs to be disabled for
whatever reason, this option can be used. Note
that this also can be controlled per-workqueue for
workqueues visible under /sys/bus/workqueue/.
2013-04-08 16:45:40 +05:30
workqueue.power_efficient
Per-cpu workqueues are generally preferred because
they show better performance thanks to cache
locality; unfortunately, per-cpu workqueues tend to
be more power hungry than unbound workqueues.
Enabling this makes the per-cpu workqueues which
were observed to contribute significantly to power
consumption unbound, leading to measurably lower
power usage at the cost of small performance
overhead.
The default value of this parameter is determined by
the config option CONFIG_WQ_POWER_EFFICIENT_DEFAULT.
2016-02-09 17:59:38 -05:00
workqueue.debug_force_rr_cpu
Workqueue used to implicitly guarantee that work
items queued without explicit CPU specified are put
on the local CPU. This guarantee is no longer true
and while local CPU is still preferred work items
may be put on foreign CPUs. This debug option
forces round-robin CPU selection to flush out
usages which depend on the now broken guarantee.
When enabled, memory and cache locality will be
impacted.
2009-04-05 15:55:22 -07:00
x2apic_phys [X86-64,APIC] Use x2apic physical mode instead of
default x2apic cluster mode on platforms
supporting x2apic.
2013-10-17 15:35:29 -07:00
x86_intel_mid_timer= [X86-32,APBT]
Choose timer option for x86 Intel MID platform.
2009-09-02 07:37:17 -07:00
Two valid options are apbt timer only and lapic timer
plus one apbt timer for broadcast timer.
2013-10-17 15:35:29 -07:00
x86_intel_mid_timer=apbt_only | lapic_and_apbt
2009-09-02 07:37:17 -07:00
2015-07-17 06:51:36 +02:00
xen_512gb_limit [KNL,X86-64,XEN]
Restricts the kernel running paravirtualized under Xen
to use only up to 512 GB of RAM. The reason to do so is
crash analysis tools and Xen tools for doing domain
save/restore/migration must be enabled to handle larger
domains.
2010-05-14 12:44:30 +01:00
xen_emul_unplug= [HW,X86,XEN]
Unplug Xen emulated devices
Format: [unplug0,][unplug1]
ide-disks -- unplug primary master IDE devices
aux-ide-disks -- unplug non-primary-master IDE devices
nics -- unplug network devices
all -- unplug all emulated devices (NICs and IDE disks)
2010-08-23 11:59:29 +01:00
unnecessary -- unplugging emulated devices is
unnecessary even if the host did not respond to
the unplug protocol
2010-08-23 11:59:28 +01:00
never -- do not unplug even if version check succeeds
2010-05-14 12:44:30 +01:00
2019-09-30 16:44:41 -04:00
xen_legacy_crash [X86,XEN]
Crash from Xen panic notifier, without executing late
panic() code such as dumping handler.
2013-09-25 10:07:20 -04:00
xen_nopvspin [X86,XEN]
Disables the ticketlock slowpath using Xen PV
optimizations.
2014-07-11 11:51:35 -04:00
xen_nopv [X86]
Disables the PV optimizations forcing the HVM guest to
run as generic HVM guest with no PV drivers.
2019-07-11 20:02:10 +08:00
This option is obsoleted by the "nopv" option, which
has equivalent effect for XEN platform.
2014-07-11 11:51:35 -04:00
2018-09-07 18:49:08 +02:00
xen_scrub_pages= [XEN]
Boolean option to control scrubbing pages before giving them back
to Xen, for use by other domains. Can be also changed at runtime
with /sys/devices/system/xen_memory/xen_memory0/scrub_pages.
Default value controlled with CONFIG_XEN_SCRUB_PAGES_DEFAULT.
2019-03-22 14:29:57 -04:00
xen_timer_slop= [X86-64,XEN]
Set the timer slop (in nanoseconds) for the virtual Xen
timers (default is 100000). This adjusts the minimum
delta of virtualized Xen timers, where lower values
improve timer resolution at the expense of processing
more timer interrupts.
2019-07-11 20:02:09 +08:00
nopv= [X86,XEN,KVM,HYPER_V,VMWARE]
Disables the PV optimizations forcing the guest to run
as generic guest with no PV drivers. Currently support
XEN HVM, KVM, HYPER_V and VMWARE guest.
2005-04-16 15:20:36 -07:00
xirc2ps_cs= [NET,PCMCIA]
2005-10-23 12:57:11 -07:00
Format:
<irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]
2018-07-05 16:31:42 +03:00
2019-05-13 15:39:10 +10:00
xive= [PPC]
By default on POWER9 and above, the kernel will
natively use the XIVE interrupt controller. This option
allows the fallback firmware mode to be used:
off Fallback to firmware control of XIVE interrupt
controller on both pseries and powernv
platforms. Only useful on POWER9 and above.
2018-07-05 16:31:42 +03:00
xhci-hcd.quirks [USB,KNL]
A hex value specifying bitmask with supplemental xhci
host controller quirks. Meaning of each bit can be
consulted in header drivers/usb/host/xhci.h.
2019-08-14 15:56:37 -05:00
xmon [PPC]
Format: { early | on | rw | ro | off }
Controls if xmon debugger is enabled. Default is off.
Passing only "xmon" is equivalent to "xmon=early".
early Call xmon as early as possible on boot; xmon
debugger is called from setup_arch().
on xmon debugger hooks will be installed so xmon
is only called on a kernel crash. Default mode,
i.e. either "ro" or "rw" mode, is controlled
with CONFIG_XMON_DEFAULT_RO_MODE.
rw xmon debugger hooks will be installed so xmon
is called only on a kernel crash, mode is write,
meaning SPR registers, memory and, other data
can be written using xmon commands.
ro same as "rw" option above but SPR registers,
memory, and other data can't be written using
xmon commands.
off xmon is disabled.