IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
When booting with retbleed=auto, if the kernel wasn't built with
CONFIG_CC_HAS_RETURN_THUNK, the mitigation falls back to IBPB. Make
sure a warning is printed in that case. The IBPB fallback check is done
twice, but it really only needs to be done once.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
jmp2ret mitigates the easy-to-attack case at relatively low overhead.
It mitigates the long speculation windows after a mispredicted RET, but
it does not mitigate the short speculation window from arbitrary
instruction boundaries.
On Zen2, there is a chicken bit which needs setting, which mitigates
"arbitrary instruction boundaries" down to just "basic block boundaries".
But there is no fix for the short speculation window on basic block
boundaries, other than to flush the entire BTB to evict all attacker
predictions.
On the spectrum of "fast & blurry" -> "safe", there is (on top of STIBP
or no-SMT):
1) Nothing System wide open
2) jmp2ret May stop a script kiddy
3) jmp2ret+chickenbit Raises the bar rather further
4) IBPB Only thing which can count as "safe".
Tentative numbers put IBPB-on-entry at a 2.5x hit on Zen2, and a 10x hit
on Zen1 according to lmbench.
[ bp: Fixup feature bit comments, document option, 32-bit build fix. ]
Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Having IBRS enabled while the SMT sibling is idle unnecessarily slows
down the running sibling. OTOH, disabling IBRS around idle takes two
MSR writes, which will increase the idle latency.
Therefore, only disable IBRS around deeper idle states. Shallow idle
states are bounded by the tick in duration, since NOHZ is not allowed
for them by virtue of their short target residency.
Only do this for mwait-driven idle, since that keeps interrupts disabled
across idle, which makes disabling IBRS vs IRQ-entry a non-issue.
Note: C6 is a random threshold, most importantly C1 probably shouldn't
disable IBRS, benchmarking needed.
Suggested-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
retbleed will depend on spectre_v2, while spectre_v2_user depends on
retbleed. Break this cycle.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
When changing SPEC_CTRL for user control, the WRMSR can be delayed
until return-to-user when KERNEL_IBRS has been enabled.
This avoids an MSR write during context switch.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Due to TIF_SSBD and TIF_SPEC_IB the actual IA32_SPEC_CTRL value can
differ from x86_spec_ctrl_base. As such, keep a per-CPU value
reflecting the current task's MSR content.
[jpoimboe: rename]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
For untrained return thunks to be fully effective, STIBP must be enabled
or SMT disabled.
Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Add the "retbleed=<value>" boot parameter to select a mitigation for
RETBleed. Possible values are "off", "auto" and "unret"
(JMP2RET mitigation). The default value is "auto".
Currently, "retbleed=auto" will select the unret mitigation on
AMD and Hygon and no mitigation on Intel (JMP2RET is not effective on
Intel).
[peterz: rebase; add hygon]
[jpoimboe: cleanups]
Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Note: needs to be in a section distinct from Retpolines such that the
Retpoline RET substitution cannot possibly use immediate jumps.
ORC unwinding for zen_untrain_ret() and __x86_return_thunk() is a
little tricky but works due to the fact that zen_untrain_ret() doesn't
have any stack ops and as such will emit a single ORC entry at the
start (+0x3f).
Meanwhile, unwinding an IP, including the __x86_return_thunk() one
(+0x40) will search for the largest ORC entry smaller or equal to the
IP, these will find the one ORC entry (+0x3f) and all works.
[ Alexandre: SVM part. ]
[ bp: Build fix, massages. ]
Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
In addition to teaching static_call about the new way to spell 'RET',
there is an added complication in that static_call() is allowed to
rewrite text before it is known which particular spelling is required.
In order to deal with this; have a static_call specific fixup in the
apply_return() 'alternative' patching routine that will rewrite the
static_call trampoline to match the definite sequence.
This in turn creates the problem of uniquely identifying static call
trampolines. Currently trampolines are 8 bytes, the first 5 being the
jmp.d32/ret sequence and the final 3 a byte sequence that spells out
'SCT'.
This sequence is used in __static_call_validate() to ensure it is
patching a trampoline and not a random other jmp.d32. That is,
false-positives shouldn't be plenty, but aren't a big concern.
OTOH the new __static_call_fixup() must not have false-positives, and
'SCT' decodes to the somewhat weird but semi plausible sequence:
push %rbx
rex.XB push %r12
Additionally, there are SLS concerns with immediate jumps. Combined it
seems like a good moment to change the signature to a single 3 byte
trap instruction that is unique to this usage and will not ever get
generated by accident.
As such, change the signature to: '0x0f, 0xb9, 0xcc', which decodes
to:
ud1 %esp, %ecx
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Instead of defaulting to patching NOP opcodes at init time, and leaving
it to the architectures to override this if this is not needed, switch
to a model where doing nothing is the default. This is the common case
by far, as only MIPS requires NOP patching at init time. On all other
architectures, the correct encodings are emitted by the compiler and so
no initial patching is needed.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220615154142.1574619-4-ardb@kernel.org
MIPS is the only remaining architecture that needs to patch jump label
NOP encodings to initialize them at load time. So let's move the module
patching part of that from generic code into arch/mips, and drop it from
the others.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220615154142.1574619-3-ardb@kernel.org
VMWARE_CMD_VCPU_RESERVED is bit 31 and that would mean undefined
behavior when shifting an int but the kernel is built with
-fno-strict-overflow which will wrap around using two's complement.
Use the BIT() macro to improve readability and avoid any potential
overflow confusion because it uses an unsigned long.
[ bp: Clarify commit message. ]
Signed-off-by: Shreenidhi Shedi <sshedi@vmware.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Link: https://lore.kernel.org/r/20220601101820.535031-1-sshedi@vmware.com
Make sure to free the platform device in the unlikely event that
registration fails.
Fixes: 7a67832c7e44 ("libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220620140723.9810-1-johan@kernel.org
kfree can handle NULL pointer as its argument.
According to coccinelle isnullfree check, remove NULL check
before kfree operation.
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Message-Id: <20220614133458.147314-1-dzm91@hust.edu.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Make RESERVE_BRK() work again with older binutils. The recent
'simplification' broke that.
- Make early #VE handling increment RIP when successful.
- Make the #VE code consistent vs. the RIP adjustments and add comments.
- Handle load_unaligned_zeropad() across page boundaries correctly in #VE
when the second page is shared.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKvIG4THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoaqCD/9NAUyHTjKDqdWuMD/ITU8ymDr+Ix8z
vUlysdXbxJg6MvT12ZbhJUFTKAsXskGAAnXz/EtZ8zTQQVzTjis/HooJh4XLeuO4
NLh9KV9FvH7w69e6Jg31MGkOUJU3BV+WYUx1f34zbQ8FHftxUwu+M47UYExPYKDR
VIbNeQIpqoBfjTSPVGXlWl/panuZG6RV+PRcvxV3yeRRA8zyCB/WTmNkoDjbw4fl
YCWwJF7/m4iT3LtoaFXWVGFzSRZoGHbhSdgEOZGIZ7sjvydoaQo402JuhW3WLI2m
oXLVZ+2wOPGBKp3WQ1t3mpfScBvCiN3SW4pSPDQ+E8fT/RQiRMb29c9S6ANdm3nT
27fYMJOq+xxex5gOYzdgLz7O99M08uOn2bxJwB+IBIr5jEFH9b4EffeEWsfdZBsi
1AzkXCi+Ib0ZYAndxUP068m+4iW0LtuApm0fg6LhtdDmBGquj+88OZOUK7Z/kW/N
IkjgCeqFgmdNb/+Z3XrdYobaAl6J4toIqA4A+O8yL6gJfn9PnaMGsYtA8c5yQchD
kFoTu5pCALY2KjZkKFRMuEbMH2oj3sjjb7f6mYAHxec6jikIx2c5HswA4sLmzHAN
GG2MDUH12bWoLfeA4IRwTRz/vh8IeZNq5ZzdCnS6KHUNk5OJRGLtRphKy8z+pOYx
+i9ThZFBV8pBzg==
=sRtG
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
- Make RESERVE_BRK() work again with older binutils. The recent
'simplification' broke that.
- Make early #VE handling increment RIP when successful.
- Make the #VE code consistent vs. the RIP adjustments and add
comments.
- Handle load_unaligned_zeropad() across page boundaries correctly in
#VE when the second page is shared.
* tag 'x86-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page
x86/tdx: Clarify RIP adjustments in #VE handler
x86/tdx: Fix early #VE handling
x86/mm: Fix RESERVE_BRK() for older binutils
- Remove obsolete CONFIG_X86_SMAP reference from objtool
- Fix overlapping text section failures in faddr2line for real
- Remove OBJECT_FILES_NON_STANDARD usage from x86 ftrace and replace it
with finegrained annotations so objtool can validate that code
correctly.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKvHbwTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoQlsEADC7BotE1Mi8HITScIlHT19bkZ7Bm6M
g46RCtUw0u+lo7BclIetNYGWQ0z+D+1DbmZTBv5D22N/Wd6MrwOuuwRVtJ2dEosF
l/EK1qKLxCGMdQ1x0KOmrjzHB4WmcUH0pQwDHa5N5s5IyrJRRFHtoLk3EYp6QIhi
mIZTJgGUEe/aS7BtFM5xARprm7JxdbhXgXVbnC1ctzpOi2y6O7F4Ek3x4j5zTGV/
49iuO2ljqmKdoFlPa1fhuw+O9sTUf+RmKLPEU+2nm7Wg+wlIwKIB3yDDBoZ9Q7lm
YdorW0/506x4qOwa8KAjxXEb2qgZkL/fQcMS0jQQKNNDD8GHU+3YdjCNbrmZjA+T
Ib0rs9YsxgLstnmqzU3QfarhgtdzIMhzF0cofoFDK0a8tHLqOhZF4j7wLmYo52ec
6Hrt8IvJPW2b3TC6KfRAF+uWsDDoaum0OOFVjVu1b9G6fFf0i8h5JGWdkJh8cau9
0qvAYg0sDjRf7zAZ6kbbGDVCljECyypIdCBF3ihA6yMhJX16VihEWTJyqIaoO+a0
0XzPvkhX+6mY0kqn+5HvvhLEY4POA8pFKeTPx4eABvKnw7m01I9cJfHa7HEBQHe6
ZJA7qWMBRrPfK2Ogrx8zohj2sOdHMHlIQU+6Ai+7+BHs13Nje4umGMu2YuZrnfIF
gQ1ayCN8OsUwVA==
=12KJ
-----END PGP SIGNATURE-----
Merge tag 'objtool-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull build tooling updates from Thomas Gleixner:
- Remove obsolete CONFIG_X86_SMAP reference from objtool
- Fix overlapping text section failures in faddr2line for real
- Remove OBJECT_FILES_NON_STANDARD usage from x86 ftrace and replace it
with finegrained annotations so objtool can validate that code
correctly.
* tag 'objtool-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ftrace: Remove OBJECT_FILES_NON_STANDARD usage
faddr2line: Fix overlapping text section failures, the sequel
objtool: Fix obsolete reference to CONFIG_X86_SMAP
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmKs17UUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vzlVQ//Qygje8u1x0qQqGFykMf3+cev2ECV
/sfOHtRoG9qZ4gKQOGShr9gyeV+9VFl0+CEJE4z9/YiAnoOr5JMAMixYRjHQU9MB
k3TFkDqD0KBgmXdAubVP5HQGZgA+mryvtEhr5rm45PooJJSjuh1ds87YYO+Z/t7s
c2AzvpHFLjECB6LRHqHieyp4CeWn5tw8im7uMUmfKkkXF5ckqw3e+7gwJzRrAukg
GgbLb2JZLzXSl1HOx/2GvPW8RzyXRbbJmpvu9LsNKeoqP006F3chDVIncqG3Q9QQ
LvrjudC/eY/2Fee3tpP6gjl9A5ALvXT/k4gTw4Lwm8OlaxrYZ3gwcBHZepp3F08s
qVCncjFooeHAMiJDkGvtf6N8k8VnOy4zvg2qDpKOl2NO3jH95nGi0LGMf0/GXvfh
P4bwqjGcWKSo9C2amagZ49rzaJBIQRms8ItM6WPvCYirjyYi3PeUMl/dylstbnuq
DQuprZtgOEGlPGUg/CO4fCpCIkKzpvOV6157z39mZS5HOT5ugvF4k+hwSxZd0hsM
rJI7Te48Z46Y/qoFesgOglJwwEZs4RqHAvMTGC+V5Ftj9gHe7wo2oOxlhCyFW1Qd
jec3UhXEzTVxnjsu7peHyfwwmGWFUkG16P17hWpncpG6azW1dybQOCsRy21/F+q2
i1nd61lXpuhKQKQ=
=5y1d
-----END PGP SIGNATURE-----
Merge tag 'pci-v5.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fix from Bjorn Helgaas:
"Revert clipping of PCI host bridge windows to avoid E820 regions,
which broke several machines by forcing unnecessary BAR reassignments
(Hans de Goede)"
* tag 'pci-v5.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
x86/PCI: Revert "x86/PCI: Clip only host bridge windows for E820 regions"
This reverts commit 4c5e242d3e93.
Prior to 4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820
regions"), E820 regions did not affect PCI host bridge windows. We only
looked at E820 regions and avoided them when allocating new MMIO space.
If firmware PCI bridge window and BAR assignments used E820 regions, we
left them alone.
After 4c5e242d3e93, we removed E820 regions from the PCI host bridge
windows before looking at BARs, so firmware assignments in E820 regions
looked like errors, and we moved things around to fit in the space left
(if any) after removing the E820 regions. This unnecessary BAR
reassignment broke several machines.
Guilherme reported that Steam Deck fails to boot after 4c5e242d3e93. We
clipped the window that contained most 32-bit BARs:
BIOS-e820: [mem 0x00000000a0000000-0x00000000a00fffff] reserved
acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window] for e820 entry [mem 0xa0000000-0xa00fffff]
which forced us to reassign all those BARs, for example, this NVMe BAR:
pci 0000:00:01.2: PCI bridge to [bus 01]
pci 0000:00:01.2: bridge window [mem 0x80600000-0x806fffff]
pci 0000:01:00.0: BAR 0: [mem 0x80600000-0x80603fff 64bit]
pci 0000:00:01.2: can't claim window [mem 0x80600000-0x806fffff]: no compatible bridge window
pci 0000:01:00.0: can't claim BAR 0 [mem 0x80600000-0x80603fff 64bit]: no compatible bridge window
pci 0000:00:01.2: bridge window: assigned [mem 0xa0100000-0xa01fffff]
pci 0000:01:00.0: BAR 0: assigned [mem 0xa0100000-0xa0103fff 64bit]
All the reassignments were successful, so the devices should have been
functional at the new addresses, but some were not.
Andy reported a similar failure on an Intel MID platform. Benjamin
reported a similar failure on a VMWare Fusion VM.
Note: this is not a clean revert; this revert keeps the later change to
make the clipping dependent on a new pci_use_e820 bool, moving the checking
of this bool to arch_remove_reservations().
[bhelgaas: commit log, add more reporters and testers]
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216109
Reported-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reported-by: Benjamin Coddington <bcodding@redhat.com>
Reported-by: Jongman Heo <jongman.heo@gmail.com>
Fixes: 4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820 regions")
Link: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@redhat.com
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Stale Data.
They are a class of MMIO-related weaknesses which can expose stale data
by propagating it into core fill buffers. Data which can then be leaked
using the usual speculative execution methods.
Mitigations include this set along with microcode updates and are
similar to MDS and TAA vulnerabilities: VERW now clears those buffers
too.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKXMkMTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoWGPD/idalLIhhV5F2+hZIKm0WSnsBxAOh9K
7y8xBxpQQ5FUfW3vm7Pg3ro6VJp7w2CzKoD4lGXzGHriusn3qst3vkza9Ay8xu8g
RDwKe6hI+p+Il9BV9op3f8FiRLP9bcPMMReW/mRyYsOnJe59hVNwRAL8OG40PY4k
hZgg4Psfvfx8bwiye5efjMSe4fXV7BUCkr601+8kVJoiaoszkux9mqP+cnnB5P3H
zW1d1jx7d6eV1Y063h7WgiNqQRYv0bROZP5BJkufIoOHUXDpd65IRF3bDnCIvSEz
KkMYJNXb3qh7EQeHS53NL+gz2EBQt+Tq1VH256qn6i3mcHs85HvC68gVrAkfVHJE
QLJE3MoXWOqw+mhwzCRrEXN9O1lT/PqDWw8I4M/5KtGG/KnJs+bygmfKBbKjIVg4
2yQWfMmOgQsw3GWCRjgEli7aYbDJQjany0K/qZTq54I41gu+TV8YMccaWcXgDKrm
cXFGUfOg4gBm4IRjJ/RJn+mUv6u+/3sLVqsaFTs9aiib1dpBSSUuMGBh548Ft7g2
5VbFVSDaLjB2BdlcG7enlsmtzw0ltNssmqg7jTK/L7XNVnvxwUoXw+zP7RmCLEYt
UV4FHXraMKNt2ZketlomC8ui2hg73ylUp4pPdMXCp7PIXp9sVamRTbpz12h689VJ
/s55bWxHkR6S
=LBxT
-----END PGP SIGNATURE-----
Merge tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 MMIO stale data fixes from Thomas Gleixner:
"Yet another hw vulnerability with a software mitigation: Processor
MMIO Stale Data.
They are a class of MMIO-related weaknesses which can expose stale
data by propagating it into core fill buffers. Data which can then be
leaked using the usual speculative execution methods.
Mitigations include this set along with microcode updates and are
similar to MDS and TAA vulnerabilities: VERW now clears those buffers
too"
* tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation/mmio: Print SMT warning
KVM: x86/speculation: Disable Fill buffer clear within guests
x86/speculation/mmio: Reuse SRBDS mitigation for SBDS
x86/speculation/srbds: Update SRBDS mitigation selection
x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data
x86/speculation/mmio: Enable CPU Fill buffer clearing on idle
x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations
x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data
x86/speculation: Add a common function for MD_CLEAR mitigation update
x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug
Documentation: Add documentation for Processor MMIO Stale Data
With binutils 2.26, RESERVE_BRK() causes a build failure:
/tmp/ccnGOKZ5.s: Assembler messages:
/tmp/ccnGOKZ5.s:98: Error: missing ')'
/tmp/ccnGOKZ5.s:98: Error: missing ')'
/tmp/ccnGOKZ5.s:98: Error: missing ')'
/tmp/ccnGOKZ5.s:98: Error: junk at end of line, first unrecognized
character is `U'
The problem is this line:
RESERVE_BRK(early_pgt_alloc, INIT_PGT_BUF_SIZE)
Specifically, the INIT_PGT_BUF_SIZE macro which (via PAGE_SIZE's use
_AC()) has a "1UL", which makes older versions of the assembler unhappy.
Unfortunately the _AC() macro doesn't work for inline asm.
Inline asm was only needed here to convince the toolchain to add the
STT_NOBITS flag. However, if a C variable is placed in a section whose
name is prefixed with ".bss", GCC and Clang automatically set
STT_NOBITS. In fact, ".bss..page_aligned" already relies on this trick.
So fix the build failure (and simplify the macro) by allocating the
variable in C.
Also, add NOLOAD to the ".brk" output section clause in the linker
script. This is a failsafe in case the ".bss" prefix magic trick ever
stops working somehow. If there's a section type mismatch, the GNU
linker will force the ".brk" output section to be STT_NOBITS. The LLVM
linker will fail with a "section type mismatch" error.
Note this also changes the name of the variable from .brk.##name to
__brk_##name. The variable names aren't actually used anywhere, so it's
harmless.
Fixes: a1e2c031ec39 ("x86/mm: Simplify RESERVE_BRK()")
Reported-by: Joe Damato <jdamato@fastly.com>
Reported-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://lore.kernel.org/r/22d07a44c80d8e8e1e82b9a806ddc8c6bbb2606e.1654759036.git.jpoimboe@kernel.org
Remove vendor checks from prefer_mwait_c1_over_halt function. Restore
the decision tree to support MWAIT C1 as the default idle state based on
CPUID checks as done by Thomas Gleixner in
commit 09fd4b4ef5bc ("x86: use cpuid to check MWAIT support for C1")
The decision tree is removed in
commit 69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
Prefer MWAIT when the following conditions are satisfied:
1. CPUID_Fn00000001_ECX [Monitor] should be set
2. CPUID_Fn00000005 should be supported
3. If CPUID_Fn00000005_ECX [EMX] is set then there should be
at least one C1 substate available, indicated by
CPUID_Fn00000005_EDX [MWaitC1SubStates] bits.
Otherwise use HLT for default_idle function.
HPC customers who want to optimize for lower latency are known to
disable Global C-States in the BIOS. In fact, some vendors allow
choosing a BIOS 'performance' profile which explicitly disables
C-States. In this scenario, the cpuidle driver will not be loaded and
the kernel will continue with the default idle state chosen at boot
time. On AMD systems currently the default idle state is HLT which has
a higher exit latency compared to MWAIT.
The reason for the choice of HLT over MWAIT on AMD systems is:
1. Families prior to 10h didn't support MWAIT
2. Families 10h-15h supported MWAIT, but not MWAIT C1. Hence it was
preferable to use HLT as the default state on these systems.
However, AMD Family 17h onwards supports MWAIT as well as MWAIT C1. And
it is preferable to use MWAIT as the default idle state on these
systems, as it has lower exit latencies.
The below table represents the exit latency for HLT and MWAIT on AMD
Zen 3 system. Exit latency is measured by issuing a wakeup (IPI) to
other CPU and measuring how many clock cycles it took to wakeup. Each
iteration measures 10K wakeups by pinning source and destination.
HLT:
25.0000th percentile : 1900 ns
50.0000th percentile : 2000 ns
75.0000th percentile : 2300 ns
90.0000th percentile : 2500 ns
95.0000th percentile : 2600 ns
99.0000th percentile : 2800 ns
99.5000th percentile : 3000 ns
99.9000th percentile : 3400 ns
99.9500th percentile : 3600 ns
99.9900th percentile : 5900 ns
Min latency : 1700 ns
Max latency : 5900 ns
Total Samples 9999
MWAIT:
25.0000th percentile : 1400 ns
50.0000th percentile : 1500 ns
75.0000th percentile : 1700 ns
90.0000th percentile : 1800 ns
95.0000th percentile : 1900 ns
99.0000th percentile : 2300 ns
99.5000th percentile : 2500 ns
99.9000th percentile : 3200 ns
99.9500th percentile : 3500 ns
99.9900th percentile : 4600 ns
Min latency : 1200 ns
Max latency : 4600 ns
Total Samples 9997
Improvement (99th percentile): 21.74%
Below is another result for context_switch2 micro-benchmark, which
brings out the impact of improved wakeup latency through increased
context-switches per second.
with HLT:
-------------------------------
50.0000th percentile : 190184
75.0000th percentile : 191032
90.0000th percentile : 192314
95.0000th percentile : 192520
99.0000th percentile : 192844
MIN : 190148
MAX : 192852
with MWAIT:
-------------------------------
50.0000th percentile : 277444
75.0000th percentile : 278268
90.0000th percentile : 278888
95.0000th percentile : 279164
99.0000th percentile : 280504
MIN : 273278
MAX : 281410
Improvement(99th percentile): ~ 45.46%
Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Link: https://ozlabs.org/~anton/junkcode/context_switch2.c
Link: https://lkml.kernel.org/r/0cc675d8fd1f55e41b510e10abf2e21b6e9803d5.1654538381.git-series.wyes.karny@amd.com
When kernel is booted with idle=nomwait do not use MWAIT as the
default idle state.
If the user boots the kernel with idle=nomwait, it is a clear
direction to not use mwait as the default idle state.
However, the current code does not take this into consideration
while selecting the default idle state on x86.
Fix it by checking for the idle=nomwait boot option in
prefer_mwait_c1_over_halt().
Also update the documentation around idle=nomwait appropriately.
[ dhansen: tweak commit message ]
Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lkml.kernel.org/r/fdc2dc2d0a1bc21c2f53d989ea2d2ee3ccbc0dbe.1654538381.git-series.wyes.karny@amd.com
A new 64-bit control field "tertiary processor-based VM-execution
controls", is defined [1]. It's controlled by bit 17 of the primary
processor-based VM-execution controls.
Different from its brother VM-execution fields, this tertiary VM-
execution controls field is 64 bit. So it occupies 2 vmx_feature_leafs,
TERTIARY_CTLS_LOW and TERTIARY_CTLS_HIGH.
Its companion VMX capability reporting MSR,MSR_IA32_VMX_PROCBASED_CTLS3
(0x492), is also semantically different from its brothers, whose 64 bits
consist of all allow-1, rather than 32-bit allow-0 and 32-bit allow-1 [1][2].
Therefore, its init_vmx_capabilities() is a little different from others.
[1] ISE 6.2 "VMCS Changes"
https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
[2] SDM Vol3. Appendix A.3
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Message-Id: <20220419153240.11549-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The file-wide OBJECT_FILES_NON_STANDARD annotation is used with
CONFIG_FRAME_POINTER to tell objtool to skip the entire file when frame
pointers are enabled. However that annotation is now deprecated because
it doesn't work with IBT, where objtool runs on vmlinux.o instead of
individual translation units.
Instead, use more fine-grained function-specific annotations:
- The 'save_mcount_regs' macro does funny things with the frame pointer.
Use STACK_FRAME_NON_STANDARD_FP to tell objtool to ignore the
functions using it.
- The return_to_handler() "function" isn't actually a callable function.
Instead of being called, it's returned to. The real return address
isn't on the stack, so unwinding is already doomed no matter which
unwinder is used. So just remove the STT_FUNC annotation, telling
objtool to ignore it. That also removes the implicit
ANNOTATE_NOENDBR, which now needs to be made explicit.
Fixes the following warning:
vmlinux.o: warning: objtool: __fentry__+0x16: return with modified stack frame
Fixes: ed53a0d97192 ("x86/alternative: Use .ibt_endbr_seal to seal indirect calls")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/b7a7a42fe306aca37826043dac89e113a1acdbac.1654268610.git.jpoimboe@kernel.org
- fixes for material merged during this merge window
- cc:stable fixes for more longstanding issues
- minor mailmap and MAINTAINERS updates
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYpz1+QAKCRDdBJ7gKXxA
jrudAP9EvjTg4KhmXDoUpgJYc2oPg27nIhu1LWT8VFdsVQ6mPwEA//HPvPhjah8u
C1M183VxKL9trZf22DBn2BbD3kBDIAo=
=9LgC
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm hotfixes from Andrew Morton:
"Fixups for various recently-added and longer-term issues and a few
minor tweaks:
- fixes for material merged during this merge window
- cc:stable fixes for more longstanding issues
- minor mailmap and MAINTAINERS updates"
* tag 'mm-hotfixes-stable-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/oom_kill.c: fix vm_oom_kill_table[] ifdeffery
x86/kexec: fix memory leak of elf header buffer
mm/memremap: fix missing call to untrack_pfn() in pagemap_range()
mm: page_isolation: use compound_nr() correctly in isolate_single_pageblock()
mm: hugetlb_vmemmap: fix CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON
MAINTAINERS: add maintainer information for z3fold
mailmap: update Josh Poimboeuf's email
SGX enclave is accounted to the wrong memory control group.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKcd1MTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoaalD/0TdNTH+LiM0BpEZ4VHIAFhE9mgfaU/
1HIZcXEvAzPqS+iLMYAPo2dS7hNKv1GCCD8HcuOdEwC/CyTdrcpvhCNeQXCagF38
BHtzVCMFd/Y6U7ERNVsaHiuHFSkF+3QHef4Gzljzblgj1FK7s55z9tlQmE3pElOg
UGfRoD32ODUtQPmOCjlOhFjsUUtFpdpXFCbjPPFdOqJ80LbdKR2s/0IBpHMk1xoz
ESmS10tVC3a5np1/4Ge8vRCZnewOpulL/Is84Q8MbCvxI8NQh9pD7Imom/wRjSAS
19N+sWh7ywuUtAOVqJ23dDc6SOL3yjM4HbmsEYRGPsgzuJ5crezLcrKgCFeGmz/4
4zbU3R9hzzXQy8ZqNjbj71FKswfUDcMLb26GA/62d2N6zR7O0TSzfIrpIYp+GwJ3
5KaM0LiKoW/LXGfwEdEBWpCkK1OKgMXmZ5IQlr5bRz3Qihqzkk65Dgfo66XRt+jb
DhMHW+cMfLwSX72QER6LyP2jPfUSCZgy5Pn8LfXUH30fc084gyrAPq2eqtVnf0lf
Hq5/r1nMosPE0CtxHM1vNRj5M052nQxXhDhdsTcoO6PVBrvEjJbkanj3XbNRk66T
FDWGWmdtDC2su6p0ezwbARMxYnsSS40GVsp/DoOu+SHxlAm9VkomY+QDJ2FuoJnb
K0XfW5vV9MEsvw==
=cuLD
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SGX fix from Thomas Gleixner:
"A single fix for x86/SGX to prevent that memory which is allocated for
an SGX enclave is accounted to the wrong memory control group"
* tag 'x86-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Set active memcg prior to shmem allocation
- Disable late microcode loading by default. Unless the HW people get
their act together and provide a required minimum version in the
microcode header for making a halfways informed decision its just
lottery and broken.
- Warn and taint the kernel when microcode is loaded late
- Remove the old unused microcode loader interface
- Remove a redundant perf callback from the microcode loader
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKcdjETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoUUTD/92cLMB0g7XP8yFN+ZiHl7uoaDtE0UR
WfapnMNL3tKKVnuwEMg83AjtyQ+O1JNZ6iS5K8jmiBnBByg6EQccz8pfe3jUQ6Ar
gOLWV1F4tRbJD2NqqiWOo/l5qs5hJaz/QeE+oyCP0fvw6DOZixepG5RzveFSSwAa
G2Q03GsGEu84SXlVAjagMSU6tYlBnlZBfKRB8NiNxkW8CLcJY0NfDCunbN6icEbH
AQHXeviM3GWMKJA9R9DeSvYq9PbN5o2UVmcFQWsDAzZ8Ne2qCqskjoGNjuQ9s+72
G35fm5d7dtIcrYg4PSJN0JDJP4HbcfSjhUrbdH4iAClTkGnNQERfuDV9O92/lYJE
hd9c8yCegD0NWQ4dMNNrM5PSbWbQK7ajqRYVvqqouJZpH+IDtajA1jxEe+1msB8P
xmXQDcdSMOyVs+Bw3Djf3tt8Qqhu4jxnf3y711oLklPwwh9lq9SvaWiX9ZFoYgdn
1HVtQUAOdgDmncs5BQ8dpuwtoYXH5p31n0wh57emyFXl7wA93eWouuFczQ6mSol9
LLdd9c+q9mBFIo0ult+fVhEOTDJF+27s3YXOpge6BAqei/SQIU4c5oq51CukV7ap
TPzWAayq0lsAwXn1k6r86Zkewh1C2SQTyk1J3zMehZlSVpwSWbEQASYlKywmh6YB
N+6/0XtHDAVK5Q==
=DQjr
-----END PGP SIGNATURE-----
Merge tag 'x86-microcode-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode updates from Thomas Gleixner:
- Disable late microcode loading by default. Unless the HW people get
their act together and provide a required minimum version in the
microcode header for making a halfways informed decision its just
lottery and broken.
- Warn and taint the kernel when microcode is loaded late
- Remove the old unused microcode loader interface
- Remove a redundant perf callback from the microcode loader
* tag 'x86-microcode-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode: Remove unnecessary perf callback
x86/microcode: Taint and warn on late loading
x86/microcode: Default-disable late loading
x86/microcode: Rip out the OLD_INTERFACE
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKcdNYTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYod9oD/9Y3JOKOKfXKz6VTZeIkaq8bwizjCxX
tzTdeDy6pCYHv1z63rhwTpB4lUl+dt5CUiaf9YjEU3bdgNBtba/C0j6rx2Zf7Qk0
hw2CjsPahEdFGRRgzbMF6fOUHxsOV/fZ5S4w9XcV7u5QAMkoj/w3rl7Eh2Vn9KbL
B8I7Cl+Vyec2feknWau3s9vt4GRt+EYR08YmjWL1bxzjFss8JTs0mpzVnuR/QurU
O20UIGS/167EvacmC15Ehht3EJEOykte3iWVFCMgEwFsZTOpByCQGGnDQCVik0o9
6gYESc6fRNfTbC+rRGMs2LWXwsYfJMAqYLkSQIfOIETqysxu2HoWoCFWhpmaKLYr
DEL3mVTy1LB7TqY6+C0P2UCeU9CuNr3fejQf4SsDIsNmTlUuv+FHrDCfi9cotX/G
gmRa/29BASMgoVzF/QnzrEUGvEqU5S7wJgBxAD0cTw4IwvXz80KgbHNEl9utOCjB
ceXoPh3zOaEnBZn7B5HXRq52r2KOA+T7dL/6blPfuYokZrKftq8z9fLDXEqKSLFY
2lSxtowzAZiUxZDI5Z6qoBmEeEXrbxK3r7ro42KXetvudDWAoCspf2Qz8kEgZOCV
ykDMPEnhesL8eE1LJzaJBSggz3LmQAslxIZ+CcZlFMAI+vOmbFEbMYVYxDzyJECN
LEK9uNoEY1mMeQ==
=gNpf
-----END PGP SIGNATURE-----
Merge tag 'x86-boot-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot update from Thomas Gleixner:
"Use strlcpy() instead of strscpy() in arch_setup()"
* tag 'x86-boot-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/setup: Use strscpy() to replace deprecated strlcpy()
of Peter Zijlstra was encountering with ptrace in his freezer rewrite
I identified some cleanups to ptrace_stop that make sense on their own
and move make resolving the other problems much simpler.
The biggest issue is the habbit of the ptrace code to change task->__state
from the tracer to suppress TASK_WAKEKILL from waking up the tracee. No
other code in the kernel does that and it is straight forward to update
signal_wake_up and friends to make that unnecessary.
Peter's task freezer sets frozen tasks to a new state TASK_FROZEN and
then it stores them by calling "wake_up_state(t, TASK_FROZEN)" relying
on the fact that all stopped states except the special stop states can
tolerate spurious wake up and recover their state.
The state of stopped and traced tasked is changed to be stored in
task->jobctl as well as in task->__state. This makes it possible for
the freezer to recover tasks in these special states, as well as
serving as a general cleanup. With a little more work in that
direction I believe TASK_STOPPED can learn to tolerate spurious wake
ups and become an ordinary stop state.
The TASK_TRACED state has to remain a special state as the registers for
a process are only reliably available when the process is stopped in
the scheduler. Fundamentally ptrace needs acess to the saved
register values of a task.
There are bunch of semi-random ptrace related cleanups that were found
while looking at these issues.
One cleanup that deserves to be called out is from commit 57b6de08b5f6
("ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs"). This
makes a change that is technically user space visible, in the handling
of what happens to a tracee when a tracer dies unexpectedly.
According to our testing and our understanding of userspace nothing
cares that spurious SIGTRAPs can be generated in that case.
The entire discussion can be found at:
https://lkml.kernel.org/r/87a6bv6dl6.fsf_-_@email.froward.int.ebiederm.org
Eric W. Biederman (11):
signal: Rename send_signal send_signal_locked
signal: Replace __group_send_sig_info with send_signal_locked
ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
ptrace: Remove arch_ptrace_attach
signal: Use lockdep_assert_held instead of assert_spin_locked
ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
ptrace: Document that wait_task_inactive can't fail
ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
ptrace: Don't change __state
ptrace: Always take siglock in ptrace_resume
Peter Zijlstra (1):
sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
arch/ia64/include/asm/ptrace.h | 4 --
arch/ia64/kernel/ptrace.c | 57 ----------------
arch/um/include/asm/thread_info.h | 2 +
arch/um/kernel/exec.c | 2 +-
arch/um/kernel/process.c | 2 +-
arch/um/kernel/ptrace.c | 8 +--
arch/um/kernel/signal.c | 4 +-
arch/x86/kernel/step.c | 3 +-
arch/xtensa/kernel/ptrace.c | 4 +-
arch/xtensa/kernel/signal.c | 4 +-
drivers/tty/tty_jobctrl.c | 4 +-
include/linux/ptrace.h | 7 --
include/linux/sched.h | 10 ++-
include/linux/sched/jobctl.h | 8 +++
include/linux/sched/signal.h | 20 ++++--
include/linux/signal.h | 3 +-
kernel/ptrace.c | 87 ++++++++---------------
kernel/sched/core.c | 5 +-
kernel/signal.c | 140 +++++++++++++++++---------------------
kernel/time/posix-cpu-timers.c | 6 +-
20 files changed, 140 insertions(+), 240 deletions(-)
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmKaXaYACgkQC/v6Eiaj
j0CgoA/+JncSQ6PY2D5Jh1apvHzmnRsFXzr3DRvtv/CVx4oIebOXRQFyVDeD5tRn
TmMgB29HpBlHRDLojlmlZRGAld1HR/aPEW9j8W1D3Sy/ZFO5L8lQitv9aDHO9Ntw
4lZvlhS1M0KhATudVVBqSPixiG6CnV5SsGmixqdOyg7xcXSY6G1l2nB7Zk9I3Tat
ZlmhuZ6R5Z5qsm4MEq0vUSrnsHiGxYrpk6uQOaVz8Wkv8ZFmbutt6XgxF0tsyZNn
mHSmWSiZzIgBjTlaibEmxi8urYJTPj3vGBeJQVYHblFwLFi6+Oy7bDxQbWjQvaZh
DsgWPScfBF4Jm0+8hhCiSYpvPp8XnZuklb4LNCeok/VFr+KfSmpJTIhn00kagQ1u
vxQDqLws8YLW4qsfGydfx9uUIFCbQE/V2VDYk5J3Re3gkUNDOOR1A56hPniKv6VB
2aqGO2Fl0RdBbUa3JF+XI5Pwq5y1WrqR93EUvj+5+u5W9rZL/8WLBHBMEz6gbmfD
DhwFE0y8TG2WRlWJVEDRId+5zo3di/YvasH0vJZ5HbrxhS2RE/yIGAd+kKGx/lZO
qWDJC7IHvFJ7Mw5KugacyF0SHeNdloyBM7KZW6HeXmgKn9IMJBpmwib92uUkRZJx
D8j/bHHqD/zsgQ39nO+c4M0MmhO/DsPLG/dnGKrRCu7v1tmEnkY=
=ZUuO
-----END PGP SIGNATURE-----
Merge tag 'ptrace_stop-cleanup-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ptrace_stop cleanups from Eric Biederman:
"While looking at the ptrace problems with PREEMPT_RT and the problems
Peter Zijlstra was encountering with ptrace in his freezer rewrite I
identified some cleanups to ptrace_stop that make sense on their own
and move make resolving the other problems much simpler.
The biggest issue is the habit of the ptrace code to change
task->__state from the tracer to suppress TASK_WAKEKILL from waking up
the tracee. No other code in the kernel does that and it is straight
forward to update signal_wake_up and friends to make that unnecessary.
Peter's task freezer sets frozen tasks to a new state TASK_FROZEN and
then it stores them by calling "wake_up_state(t, TASK_FROZEN)" relying
on the fact that all stopped states except the special stop states can
tolerate spurious wake up and recover their state.
The state of stopped and traced tasked is changed to be stored in
task->jobctl as well as in task->__state. This makes it possible for
the freezer to recover tasks in these special states, as well as
serving as a general cleanup. With a little more work in that
direction I believe TASK_STOPPED can learn to tolerate spurious wake
ups and become an ordinary stop state.
The TASK_TRACED state has to remain a special state as the registers
for a process are only reliably available when the process is stopped
in the scheduler. Fundamentally ptrace needs acess to the saved
register values of a task.
There are bunch of semi-random ptrace related cleanups that were found
while looking at these issues.
One cleanup that deserves to be called out is from commit 57b6de08b5f6
("ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs"). This
makes a change that is technically user space visible, in the handling
of what happens to a tracee when a tracer dies unexpectedly. According
to our testing and our understanding of userspace nothing cares that
spurious SIGTRAPs can be generated in that case"
* tag 'ptrace_stop-cleanup-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
ptrace: Always take siglock in ptrace_resume
ptrace: Don't change __state
ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
ptrace: Document that wait_task_inactive can't fail
ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
signal: Use lockdep_assert_held instead of assert_spin_locked
ptrace: Remove arch_ptrace_attach
ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
signal: Replace __group_send_sig_info with send_signal_locked
signal: Rename send_signal send_signal_locked
ordinary user mode tasks.
In commit 40966e316f86 ("kthread: Ensure struct kthread is present for
all kthreads") caused init and the user mode helper threads that call
kernel_execve to have struct kthread allocated for them. This struct
kthread going away during execve in turned made a use after free of
struct kthread possible.
The commit 343f4c49f243 ("kthread: Don't allocate kthread_struct for
init and umh") is enough to fix the use after free and is simple enough
to be backportable.
The rest of the changes pass struct kernel_clone_args to clean things
up and cause the code to make sense.
In making init and the user mode helpers tasks purely user mode tasks
I ran into two complications. The function task_tick_numa was
detecting tasks without an mm by testing for the presence of
PF_KTHREAD. The initramfs code in populate_initrd_image was using
flush_delayed_fput to ensuere the closing of all it's file descriptors
was complete, and flush_delayed_fput does not work in a userspace thread.
I have looked and looked and more complications and in my code review
I have not found any, and neither has anyone else with the code sitting
in linux-next.
Link: https://lkml.kernel.org/r/87mtfu4up3.fsf@email.froward.int.ebiederm.org
Eric W. Biederman (8):
kthread: Don't allocate kthread_struct for init and umh
fork: Pass struct kernel_clone_args into copy_thread
fork: Explicity test for idle tasks in copy_thread
fork: Generalize PF_IO_WORKER handling
init: Deal with the init process being a user mode process
fork: Explicitly set PF_KTHREAD
fork: Stop allowing kthreads to call execve
sched: Update task_tick_numa to ignore tasks without an mm
arch/alpha/kernel/process.c | 13 ++++++------
arch/arc/kernel/process.c | 13 ++++++------
arch/arm/kernel/process.c | 12 ++++++-----
arch/arm64/kernel/process.c | 12 ++++++-----
arch/csky/kernel/process.c | 15 ++++++-------
arch/h8300/kernel/process.c | 10 ++++-----
arch/hexagon/kernel/process.c | 12 ++++++-----
arch/ia64/kernel/process.c | 15 +++++++------
arch/m68k/kernel/process.c | 12 ++++++-----
arch/microblaze/kernel/process.c | 12 ++++++-----
arch/mips/kernel/process.c | 13 ++++++------
arch/nios2/kernel/process.c | 12 ++++++-----
arch/openrisc/kernel/process.c | 12 ++++++-----
arch/parisc/kernel/process.c | 18 +++++++++-------
arch/powerpc/kernel/process.c | 15 +++++++------
arch/riscv/kernel/process.c | 12 ++++++-----
arch/s390/kernel/process.c | 12 ++++++-----
arch/sh/kernel/process_32.c | 12 ++++++-----
arch/sparc/kernel/process_32.c | 12 ++++++-----
arch/sparc/kernel/process_64.c | 12 ++++++-----
arch/um/kernel/process.c | 15 +++++++------
arch/x86/include/asm/fpu/sched.h | 2 +-
arch/x86/include/asm/switch_to.h | 8 +++----
arch/x86/kernel/fpu/core.c | 4 ++--
arch/x86/kernel/process.c | 18 +++++++++-------
arch/xtensa/kernel/process.c | 17 ++++++++-------
fs/exec.c | 8 ++++---
include/linux/sched/task.h | 8 +++++--
init/initramfs.c | 2 ++
init/main.c | 2 +-
kernel/fork.c | 46 +++++++++++++++++++++++++++++++++-------
kernel/sched/fair.c | 2 +-
kernel/umh.c | 6 +++---
33 files changed, 234 insertions(+), 160 deletions(-)
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmKaR/MACgkQC/v6Eiaj
j0Aayg/7Bx66872d9c6igkJ+MPCTuh+v9QKCGwiYEmiU4Q5sVAFB0HPJO27qC14u
630X0RFNZTkPzNNEJNIW4kw6Dj8s8YRKf+FgQAVt4SzdRwT7eIPDjk1nGraopPJ3
O04pjvuTmUyidyViRyFcf2ptx/pnkrwP8jUSc+bGTgfASAKAgAokqKE5ecjewbBc
Y/EAkQ6QW7KxPjeSmpAHwI+t3BpBev9WEC4PbhRhsBCQFO2+PJiklvqdhVNBnIjv
qUezll/1xv9UYgniB15Q4Nb722SmnWSU3r8as1eFPugzTHizKhufrrpyP+KMK1A0
tdtEJNs5t2DZF7ZbGTFSPqJWmyTYLrghZdO+lOmnaSjHxK4Nda1d4NzbefJ0u+FE
tutewowvHtBX6AFIbx+H3O+DOJM2IgNMf+ReQDU/TyNyVf3wBrTbsr9cLxypIJIp
zze8npoLMlB7B4yxVo5ES5e63EXfi3iHl0L3/1EhoGwriRz1kWgVLUX/VZOUpscL
RkJHsW6bT8sqxPWAA5kyWjEN+wNR2PxbXi8OE4arT0uJrEBMUgDCzydzOv5tJB00
mSQdytxH9LVdsmxBKAOBp5X6WOLGA4yb1cZ6E/mEhlqXMpBDF1DaMfwbWqxSYi4q
sp5zU3SBAW0qceiZSsWZXInfbjrcQXNV/DkDRDO9OmzEZP4m1j0=
=x6fy
-----END PGP SIGNATURE-----
Merge tag 'kthread-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull kthread updates from Eric Biederman:
"This updates init and user mode helper tasks to be ordinary user mode
tasks.
Commit 40966e316f86 ("kthread: Ensure struct kthread is present for
all kthreads") caused init and the user mode helper threads that call
kernel_execve to have struct kthread allocated for them. This struct
kthread going away during execve in turned made a use after free of
struct kthread possible.
Here, commit 343f4c49f243 ("kthread: Don't allocate kthread_struct for
init and umh") is enough to fix the use after free and is simple
enough to be backportable.
The rest of the changes pass struct kernel_clone_args to clean things
up and cause the code to make sense.
In making init and the user mode helpers tasks purely user mode tasks
I ran into two complications. The function task_tick_numa was
detecting tasks without an mm by testing for the presence of
PF_KTHREAD. The initramfs code in populate_initrd_image was using
flush_delayed_fput to ensuere the closing of all it's file descriptors
was complete, and flush_delayed_fput does not work in a userspace
thread.
I have looked and looked and more complications and in my code review
I have not found any, and neither has anyone else with the code
sitting in linux-next"
* tag 'kthread-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
sched: Update task_tick_numa to ignore tasks without an mm
fork: Stop allowing kthreads to call execve
fork: Explicitly set PF_KTHREAD
init: Deal with the init process being a user mode process
fork: Generalize PF_IO_WORKER handling
fork: Explicity test for idle tasks in copy_thread
fork: Pass struct kernel_clone_args into copy_thread
kthread: Don't allocate kthread_struct for init and umh
When the system runs out of enclave memory, SGX can reclaim EPC pages
by swapping to normal RAM. These backing pages are allocated via a
per-enclave shared memory area. Since SGX allows unlimited over
commit on EPC memory, the reclaimer thread can allocate a large
number of backing RAM pages in response to EPC memory pressure.
When the shared memory backing RAM allocation occurs during
the reclaimer thread context, the shared memory is charged to
the root memory control group, and the shmem usage of the enclave
is not properly accounted for, making cgroups ineffective at
limiting the amount of RAM an enclave can consume.
For example, when using a cgroup to launch a set of test
enclaves, the kernel does not properly account for 50% - 75% of
shmem page allocations on average. In the worst case, when
nearly all allocations occur during the reclaimer thread, the
kernel accounts less than a percent of the amount of shmem used
by the enclave's cgroup to the correct cgroup.
SGX stores a list of mm_structs that are associated with
an enclave. Pick one of them during reclaim and charge that
mm's memcg with the shmem allocation. The one that gets picked
is arbitrary, but this list almost always only has one mm. The
cases where there is more than one mm with different memcg's
are not worth considering.
Create a new function - sgx_encl_alloc_backing(). This function
is used whenever a new backing storage page needs to be
allocated. Previously the same function was used for page
allocation as well as retrieving a previously allocated page.
Prior to backing page allocation, if there is a mm_struct associated
with the enclave that is requesting the allocation, it is set
as the active memory control group.
[ dhansen: - fix merge conflict with ELDU fixes
- check against actual ksgxd_tsk, not ->mm ]
Cc: stable@vger.kernel.org
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Link: https://lkml.kernel.org/r/20220520174248.4918-1-kristen@linux.intel.com
This is reported by kmemleak detector:
unreferenced object 0xffffc900002a9000 (size 4096):
comm "kexec", pid 14950, jiffies 4295110793 (age 373.951s)
hex dump (first 32 bytes):
7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 .ELF............
04 00 3e 00 01 00 00 00 00 00 00 00 00 00 00 00 ..>.............
backtrace:
[<0000000016a8ef9f>] __vmalloc_node_range+0x101/0x170
[<000000002b66b6c0>] __vmalloc_node+0xb4/0x160
[<00000000ad40107d>] crash_prepare_elf64_headers+0x8e/0xcd0
[<0000000019afff23>] crash_load_segments+0x260/0x470
[<0000000019ebe95c>] bzImage64_load+0x814/0xad0
[<0000000093e16b05>] arch_kexec_kernel_image_load+0x1be/0x2a0
[<000000009ef2fc88>] kimage_file_alloc_init+0x2ec/0x5a0
[<0000000038f5a97a>] __do_sys_kexec_file_load+0x28d/0x530
[<0000000087c19992>] do_syscall_64+0x3b/0x90
[<0000000066e063a4>] entry_SYSCALL_64_after_hwframe+0x44/0xae
In crash_prepare_elf64_headers(), a buffer is allocated via vmalloc() to
store elf headers. While it's not freed back to system correctly when
kdump kernel is reloaded or unloaded. Then memory leak is caused. Fix it
by introducing x86 specific function arch_kimage_file_post_load_cleanup(),
and freeing the buffer there.
And also remove the incorrect elf header buffer freeing code. Before
calling arch specific kexec_file loading function, the image instance has
been initialized. So 'image->elf_headers' must be NULL. It doesn't make
sense to free the elf header buffer in the place.
Three different people have reported three bugs about the memory leak on
x86_64 inside Redhat.
Link: https://lkml.kernel.org/r/20220223113225.63106-2-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Similar to MDS and TAA, print a warning if SMT is enabled for the MMIO
Stale Data vulnerability.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
c93dc84cbe32 ("perf/x86: Add a microcode revision check for SNB-PEBS")
checks whether the microcode revision has fixed PEBS issues.
This can happen either:
1. At PEBS init time, where the early microcode has been loaded already
2. During late loading, in the microcode_check() callback.
So remove the unnecessary call in the microcode loader init routine.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220525161232.14924-5-bp@alien8.de
Warn before it is attempted and taint the kernel. Late loading microcode
can lead to malfunction of the kernel when the microcode update changes
behaviour. There is no way for the kernel to determine whether its safe or
not.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220525161232.14924-4-bp@alien8.de
It is dangerous and it should not be used anyway - there's a nice early
loading already.
Requested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220525161232.14924-3-bp@alien8.de
- Add Tegra234 cpufreq support (Sumit Gupta).
- Clean up and enhance the Mediatek cpufreq driver (Wan Jiabing,
Rex-BC Chen, and Jia-Wei Chang).
- Fix up the CPPC cpufreq driver after recent changes (Zheng Bin,
Pierre Gondois).
- Minor update to dt-binding for Qcom's opp-v2-kryo-cpu (Yassine
Oudjana).
- Use list iterator only inside the list_for_each_entry loop (Xiaomeng
Tong, and Jakob Koschel).
- New APIs related to finding OPP based on interconnect bandwidth
(Krzysztof Kozlowski).
- Fix the missing of_node_put() in _bandwidth_supported() (Dan
Carpenter).
- Cleanups (Krzysztof Kozlowski, and Viresh Kumar).
- Add Out of Band mode description to the intel-speed-select utility
documentation (Srinivas Pandruvada).
- Add power sequences support to the system reboot and power off
code and make related platform-specific changes for multiple
platforms (Dmitry Osipenko, Geert Uytterhoeven).
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmKU8lESHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxVz0P91LNCbkDSt60jzNkXdEjsvUnI/YjJ+QJ
/+ta7iCwf90obb6s9soBkTyU8Ia7hJ/IWDJW/5xhdG0ySYF17hGNIGKK9xKGsJFK
tzzWtjFsvT3PeUZQERekqWp8OYskHYmQMj8o4jqqFF7DZD/AswTgkVLALUd7YhVL
UvLmcKsUA7eXy3ZrhtrGSzVSEbKOGXBLFyjy3IuWjfz6Uk/nGQRNKGf7byRWLM44
y7zb75/5+p4MPyyJP8M/uiXzEYDKuubRtfx9PdmLgBUSMbtho6eB1x47dZWooaxe
YKmcFjF80AmnwxHb+Te2rZHPeIYr+5hLBaEq7xaLQf/nAS3y5z1PIfI2wVQ5mXPz
D599jHHda/6oSAKCVTq2fKfnlR6fetm5j66xOQINpD+G5b5tNSpllXJDamFZxFgP
DiQAOFzdnRYnK7yTiLWVl1q76SVRxqsGz7/5Ak+NRj2OQK2wRkLzHuZfiV/8r0pk
ksi6Ew9TerXkstoTQsSToPQxB2VvosSajNU3Oy27pmM0oal1XxP0LIPz9sMor5/g
tfk5f6Yz/+FFIfXj3cZffZNdhsJgejmcqPdrSdCOV3sBrblnIMQNpHiYg4jGztoj
IjYKYPVpSaWiSZLQOaK2moTEvm9CfQz1TQCF+/Kz88LX6/7ZaDJFxHG2FDEob0sg
6KVbrZWweLI=
=PAh+
-----END PGP SIGNATURE-----
Merge tag 'pm-5.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These update the ARM cpufreq drivers and fix up the CPPC cpufreq
driver after recent changes, update the OPP code and PM documentation
and add power sequences support to the system reboot and power off
code.
Specifics:
- Add Tegra234 cpufreq support (Sumit Gupta)
- Clean up and enhance the Mediatek cpufreq driver (Wan Jiabing,
Rex-BC Chen, and Jia-Wei Chang)
- Fix up the CPPC cpufreq driver after recent changes (Zheng Bin,
Pierre Gondois)
- Minor update to dt-binding for Qcom's opp-v2-kryo-cpu (Yassine
Oudjana)
- Use list iterator only inside the list_for_each_entry loop
(Xiaomeng Tong, and Jakob Koschel)
- New APIs related to finding OPP based on interconnect bandwidth
(Krzysztof Kozlowski)
- Fix the missing of_node_put() in _bandwidth_supported() (Dan
Carpenter)
- Cleanups (Krzysztof Kozlowski, and Viresh Kumar)
- Add Out of Band mode description to the intel-speed-select utility
documentation (Srinivas Pandruvada)
- Add power sequences support to the system reboot and power off code
and make related platform-specific changes for multiple platforms
(Dmitry Osipenko, Geert Uytterhoeven)"
* tag 'pm-5.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (60 commits)
cpufreq: CPPC: Fix unused-function warning
cpufreq: CPPC: Fix build error without CONFIG_ACPI_CPPC_CPUFREQ_FIE
Documentation: admin-guide: PM: Add Out of Band mode
kernel/reboot: Change registration order of legacy power-off handler
m68k: virt: Switch to new sys-off handler API
kernel/reboot: Add devm_register_restart_handler()
kernel/reboot: Add devm_register_power_off_handler()
soc/tegra: pmc: Use sys-off handler API to power off Nexus 7 properly
reboot: Remove pm_power_off_prepare()
regulator: pfuze100: Use devm_register_sys_off_handler()
ACPI: power: Switch to sys-off handler API
memory: emif: Use kernel_can_power_off()
mips: Use do_kernel_power_off()
ia64: Use do_kernel_power_off()
x86: Use do_kernel_power_off()
sh: Use do_kernel_power_off()
m68k: Switch to new sys-off handler API
powerpc: Use do_kernel_power_off()
xen/x86: Use do_kernel_power_off()
parisc: Use do_kernel_power_off()
...
- The majority of the changes are for fixes and clean ups.
Noticeable changes:
- Rework trace event triggers code to be easier to interact with.
- Support for embedding bootconfig with the kernel (as suppose to having it
embedded in initram). This is useful for embedded boards without initram
disks.
- Speed up boot by parallelizing the creation of tracefs files.
- Allow absolute ring buffer timestamps handle timestamps that use more than
59 bits.
- Added new tracing clock "TAI" (International Atomic Time)
- Have weak functions show up in available_filter_function list as:
__ftrace_invalid_address___<invalid-offset>
instead of using the name of the function before it.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYpOgXRQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qjkKAQDbpemxvpFyJlZqT8KgEIXubu+ag2/q
p0XDHaPS0zF9OQEAjTxg6GMEbnFYl6fzxZtOoEbiaQ7ppfdhRI8t6sSMVA8=
=+nDD
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"The majority of the changes are for fixes and clean ups.
Notable changes:
- Rework trace event triggers code to be easier to interact with.
- Support for embedding bootconfig with the kernel (as suppose to
having it embedded in initram). This is useful for embedded boards
without initram disks.
- Speed up boot by parallelizing the creation of tracefs files.
- Allow absolute ring buffer timestamps handle timestamps that use
more than 59 bits.
- Added new tracing clock "TAI" (International Atomic Time)
- Have weak functions show up in available_filter_function list as:
__ftrace_invalid_address___<invalid-offset> instead of using the
name of the function before it"
* tag 'trace-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (52 commits)
ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function
tracing: Fix comments for event_trigger_separate_filter()
x86/traceponit: Fix comment about irq vector tracepoints
x86,tracing: Remove unused headers
ftrace: Clean up hash direct_functions on register failures
tracing: Fix comments of create_filter()
tracing: Disable kcov on trace_preemptirq.c
tracing: Initialize integer variable to prevent garbage return value
ftrace: Fix typo in comment
ftrace: Remove return value of ftrace_arch_modify_*()
tracing: Cleanup code by removing init "char *name"
tracing: Change "char *" string form to "char []"
tracing/timerlat: Do not wakeup the thread if the trace stops at the IRQ
tracing/timerlat: Print stacktrace in the IRQ handler if needed
tracing/timerlat: Notify IRQ new max latency only if stop tracing is set
kprobes: Fix build errors with CONFIG_KRETPROBES=n
tracing: Fix return value of trace_pid_write()
tracing: Fix potential double free in create_var_ref()
tracing: Use strim() to remove whitespace instead of doing it manually
ftrace: Deal with error return code of the ftrace_process_locs() function
...
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmKSSbcTHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXgJyCACeyMOcFws5lyqqdk0R0zGr2KFfKsJn
YQR9nvldT2p/1y0ykvU208UIq0HHmXOb9pD8gOUzGYGp4XlEaC1f4V37mmzgLcRu
vL/HcFqBl2cQEfaQxiXZrmsIIVszwbc57EGqpl93cS2er4hp/NXmredKCId7Mpt8
FjxjgVGzdhEUKbJZYjkDM5pYAnJ9QVwuK3MaarKMK86Oj1P5YtKgIb4ZSt/NHvsC
Mukx3nivSH29XfK3fRsFDJUQr9WNYh1cmTtyhB0tWVXQCYFc4angZRtCJwyXzkp2
P5GBIQoMZcXX2XWkUBTtA1w5g/aZZsBExb3YGhQjsQP+jb6MtDnvOEo9
=Z2E+
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed-20220528' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- Harden hv_sock driver (Andrea Parri)
- Harden Hyper-V PCI driver (Andrea Parri)
- Fix multi-MSI for Hyper-V PCI driver (Jeffrey Hugo)
- Fix Hyper-V PCI to reduce boot time (Dexuan Cui)
- Remove code for long EOL'ed Hyper-V versions (Michael Kelley, Saurabh
Sengar)
- Fix balloon driver error handling (Shradha Gupta)
- Fix a typo in vmbus driver (Julia Lawall)
- Ignore vmbus IMC device (Michael Kelley)
- Add a new error message to Hyper-V DRM driver (Saurabh Sengar)
* tag 'hyperv-next-signed-20220528' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (28 commits)
hv_balloon: Fix balloon_probe() and balloon_remove() error handling
scsi: storvsc: Removing Pre Win8 related logic
Drivers: hv: vmbus: fix typo in comment
PCI: hv: Fix synchronization between channel callback and hv_pci_bus_exit()
PCI: hv: Add validation for untrusted Hyper-V values
PCI: hv: Fix interrupt mapping for multi-MSI
PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()
drm/hyperv: Remove support for Hyper-V 2008 and 2008R2/Win7
video: hyperv_fb: Remove support for Hyper-V 2008 and 2008R2/Win7
scsi: storvsc: Remove support for Hyper-V 2008 and 2008R2/Win7
Drivers: hv: vmbus: Remove support for Hyper-V 2008 and Hyper-V 2008R2/Win7
x86/hyperv: Disable hardlockup detector by default in Hyper-V guests
drm/hyperv: Add error message for fb size greater than allocated
PCI: hv: Do not set PCI_COMMAND_MEMORY to reduce VM boot time
PCI: hv: Fix hv_arch_irq_unmask() for multi-MSI
Drivers: hv: vmbus: Refactor the ring-buffer iterator functions
Drivers: hv: vmbus: Accept hv_sock offers in isolated guests
hv_sock: Add validation for untrusted Hyper-V values
hv_sock: Copy packets sent by Hyper-V out of the ring buffer
hv_sock: Check hv_pkt_iter_first_raw()'s return value
...
- Add support for clearing memory error via pwrite(2) on DAX
- Fix 'security overwrite' support in the presence of media errors
- Miscellaneous cleanups and fixes for nfit_test (nvdimm unit tests)
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYpFPcQAKCRDfioYZHlFs
Z9A3AQCdfoT5sY3OK+I/3oTvJ//6lw2MtXrnXFM046ICKPi9sgD8CzR9mRAHA+vj
kxOtJEU2bA9naninXGORsDUndiNkwQo=
=gVIn
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm and DAX updates from Dan Williams:
"New support for clearing memory errors when a file is in DAX mode,
alongside with some other fixes and cleanups.
Previously it was only possible to clear these errors using a truncate
or hole-punch operation to trigger the filesystem to reallocate the
block, now, any page aligned write can opportunistically clear errors
as well.
This change spans x86/mm, nvdimm, and fs/dax, and has received the
appropriate sign-offs. Thanks to Jane for her work on this.
Summary:
- Add support for clearing memory error via pwrite(2) on DAX
- Fix 'security overwrite' support in the presence of media errors
- Miscellaneous cleanups and fixes for nfit_test (nvdimm unit tests)"
* tag 'libnvdimm-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
pmem: implement pmem_recovery_write()
pmem: refactor pmem_clear_poison()
dax: add .recovery_write dax_operation
dax: introduce DAX_RECOVERY_WRITE dax access mode
mce: fix set_mce_nospec to always unmap the whole page
x86/mce: relocate set{clear}_mce_nospec() functions
acpi/nfit: rely on mce->misc to determine poison granularity
testing: nvdimm: asm/mce.h is not needed in nfit.c
testing: nvdimm: iomap: make __nfit_test_ioremap a macro
nvdimm: Allow overwrite in the presence of disabled dimms
tools/testing/nvdimm: remove unneeded flush_workqueue
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmKP/tQUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vzH0xAAojQowrSWzZ5FKTqI+L/L9ZXoAb+e
9IvQljKc9taJldmXp+EB9wkS/5B+VtQcC2qUQuWEQXUoECF8qHlcB4l+XQyd1tWO
O0vZxETH22xjLLrjG2F3l5rrfkJZAf2nEugwbDk97YEgiimeOiRcv3bx6AUCtj6I
rPJ13Fop3Jke7sQMcXYJe3gQLT1o1AKiQGghiCFNi/gzx2lXI6mmHBgLxFoiqcby
WpfXbvbJti95HRaahUR3HaDFfHj4HVkQNLlTtIykJ3Tl2/rOhWEJjI8JOIQpAA+M
WBrWw9rfgbScTiGV+dZ3h7hKiPnHKl9YETIX7L0oA2sj0jZcIs0d6mSBZx0kYuI9
eAlx+qSK9xpbQQr/fdYaUdF1q4QdtU0BYOvOWOzWsqYCECMRJ1PUHFSMbmR/+PNB
P5lHnAbggRSoxdAtwFYv1HTr+VpGH9S+5oxHCz3ohpMjYy6mkCZwHpZn3doaU3ci
KG6yIoVKftm3fZdtFvL03qHl/I8+X24ZhT/T/278PRGjkhSyr56hZo8hg0gqqTct
ngip8qNABmSbqpr73/W6Vl42zAbYtNk1BykYahbKupgW8FbT7hqaZTB05V87pVu+
Ko1aJM6VoOP9rMlKHI9ba8eYCzDrZbLZUFn7ljNPDpzutf0tAwtgwzvZBXN3za6+
Z9+D5dxmvrZEIbA=
=hEti
-----END PGP SIGNATURE-----
Merge tag 'pci-v5.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci updates from Bjorn Helgaas:
"Resource management:
- Restrict E820 clipping to PCI host bridge windows (Bjorn Helgaas)
- Log E820 clipping better (Bjorn Helgaas)
- Add kernel cmdline options to enable/disable E820 clipping (Hans de
Goede)
- Disable E820 reserved region clipping for IdeaPads, Yoga, Yoga
Slip, Acer Spin 5, Clevo Barebone systems where clipping leaves no
usable address space for touchpads, Thunderbolt devices, etc (Hans
de Goede)
- Disable E820 clipping by default starting in 2023 (Hans de Goede)
PCI device hotplug:
- Include files to remove implicit dependencies (Christophe Leroy)
- Only put Root Ports in D3 if they can signal and wake from D3 so
AMD Yellow Carp doesn't miss hotplug events (Mario Limonciello)
Power management:
- Define pci_restore_standard_config() only for CONFIG_PM_SLEEP since
it's unused otherwise (Krzysztof Kozlowski)
- Power up devices completely, including anything platform firmware
needs to do, during runtime resume (Rafael J. Wysocki)
- Move pci_resume_bus() to PM callbacks so we observe the required
bridge power-up delays (Rafael J. Wysocki)
- Drop unneeded runtime_d3cold device flag (Rafael J. Wysocki)
- Split pci_raw_set_power_state() between pci_power_up() and a new
pci_set_low_power_state() (Rafael J. Wysocki)
- Set current_state to D3cold if config read returns ~0, indicating
the device is not accessible (Rafael J. Wysocki)
- Do not call pci_update_current_state() from pci_power_up() so BARs
and ASPM config are restored correctly (Rafael J. Wysocki)
- Write 0 to PMCSR in pci_power_up() in all cases (Rafael J. Wysocki)
- Split pci_power_up() to pci_set_full_power_state() to avoid some
redundant operations (Rafael J. Wysocki)
- Skip restoring BARs if device is not in D0 (Rafael J. Wysocki)
- Rearrange and clarify pci_set_power_state() (Rafael J. Wysocki)
- Remove redundant BAR restores from pci_pm_thaw_noirq() (Rafael J.
Wysocki)
Virtualization:
- Acquire device lock before config space access lock to avoid AB/BA
deadlock with sriov_numvfs_store() (Yicong Yang)
Error handling:
- Clear MULTI_ERR_COR/UNCOR_RCV bits, which a race could previously
leave permanently set (Kuppuswamy Sathyanarayanan)
Peer-to-peer DMA:
- Whitelist Intel Skylake-E Root Ports regardless of which devfn they
are (Shlomo Pongratz)
ASPM:
- Override L1 acceptable latency advertised by Intel DG2 so ASPM L1
can be enabled (Mika Westerberg)
Cadence PCIe controller driver:
- Set up device-specific register to allow PTM Responder to be
enabled by the normal architected bit (Christian Gmeiner)
- Override advertised FLR support since the controller doesn't
implement FLR correctly (Parshuram Thombare)
Cadence PCIe endpoint driver:
- Correct bitmap size for the ob_region_map of outbound window usage
(Dan Carpenter)
Freescale i.MX6 PCIe controller driver:
- Fix PERST# assertion/deassertion so we observe the required delays
before accessing device (Francesco Dolcini)
Freescale Layerscape PCIe controller driver:
- Add "big-endian" DT property (Hou Zhiqiang)
- Update SCFG DT property (Hou Zhiqiang)
- Add "aer", "pme", "intr" DT properties (Li Yang)
- Add DT compatible strings for ls1028a (Xiaowei Bao)
Intel VMD host bridge driver:
- Assign VMD IRQ domain before enumeration to avoid IOMMU interrupt
remapping errors when MSI-X remapping is disabled (Nirmal Patel)
- Revert VMD workaround that kept MSI-X remapping enabled when IOMMU
remapping was enabled (Nirmal Patel)
Marvell MVEBU PCIe controller driver:
- Add of_pci_get_slot_power_limit() to parse the
'slot-power-limit-milliwatt' DT property (Pali Rohár)
- Add mvebu support for sending Set_Slot_Power_Limit message (Pali
Rohár)
MediaTek PCIe controller driver:
- Fix refcount leak in mtk_pcie_subsys_powerup() (Miaoqian Lin)
MediaTek PCIe Gen3 controller driver:
- Reset PHY and MAC at probe time (AngeloGioacchino Del Regno)
Microchip PolarFlare PCIe controller driver:
- Add chained_irq_enter()/chained_irq_exit() calls to mc_handle_msi()
and mc_handle_intx() to avoid lost interrupts (Conor Dooley)
- Fix interrupt handling race (Daire McNamara)
NVIDIA Tegra194 PCIe controller driver:
- Drop tegra194 MSI register save/restore, which is unnecessary since
the DWC core does it (Jisheng Zhang)
Qualcomm PCIe controller driver:
- Add SM8150 SoC DT binding and support (Bhupesh Sharma)
- Fix pipe clock imbalance (Johan Hovold)
- Fix runtime PM imbalance on probe errors (Johan Hovold)
- Fix PHY init imbalance on probe errors (Johan Hovold)
- Convert DT binding to YAML (Dmitry Baryshkov)
- Update DT binding to show that resets aren't required for
MSM8996/APQ8096 platforms (Dmitry Baryshkov)
- Add explicit register names per chipset in DT binding (Dmitry
Baryshkov)
- Add sc7280-specific clock and reset definitions to DT binding
(Dmitry Baryshkov)
Rockchip PCIe controller driver:
- Fix bitmap size when searching for free outbound region (Dan
Carpenter)
Rockchip DesignWare PCIe controller driver:
- Remove "snps,dw-pcie" from rockchip-dwc DT "compatible" property
because it's not fully compatible with rockchip (Peter Geis)
- Reset rockchip-dwc controller at probe (Peter Geis)
- Add rockchip-dwc INTx support (Peter Geis)
Synopsys DesignWare PCIe controller driver:
- Return error instead of success if DMA mapping of MSI area fails
(Jiantao Zhang)
Miscellaneous:
- Change pci_set_dma_mask() documentation references to
dma_set_mask() (Alex Williamson)"
* tag 'pci-v5.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (64 commits)
dt-bindings: PCI: qcom: Add schema for sc7280 chipset
dt-bindings: PCI: qcom: Specify reg-names explicitly
dt-bindings: PCI: qcom: Do not require resets on msm8996 platforms
dt-bindings: PCI: qcom: Convert to YAML
PCI: qcom: Fix unbalanced PHY init on probe errors
PCI: qcom: Fix runtime PM imbalance on probe errors
PCI: qcom: Fix pipe clock imbalance
PCI: qcom: Add SM8150 SoC support
dt-bindings: pci: qcom: Document PCIe bindings for SM8150 SoC
x86/PCI: Disable E820 reserved region clipping starting in 2023
x86/PCI: Disable E820 reserved region clipping via quirks
x86/PCI: Add kernel cmdline options to use/ignore E820 reserved regions
PCI: microchip: Fix potential race in interrupt handling
PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits
PCI: cadence: Clear FLR in device capabilities register
PCI: cadence: Allow PTM Responder to be enabled
PCI: vmd: Revert 2565e5b69c44 ("PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU.")
PCI: vmd: Assign VMD IRQ domain before enumeration
PCI: Avoid pci_dev_lock() AB/BA deadlock with sriov_numvfs_store()
PCI: rockchip-dwc: Add legacy interrupt support
...
subsystems. Most notably some maintenance work in ocfs2 and initramfs.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYo/6xQAKCRDdBJ7gKXxA
jkD9AQCPczLBbRWpe1edL+5VHvel9ePoHQmvbHQnufdTh9rB5QEAu0Uilxz4q9cx
xSZypNhj2n9f8FCYca/ZrZneBsTnAA8=
=AJEO
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2022-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc updates from Andrew Morton:
"The non-MM patch queue for this merge window.
Not a lot of material this cycle. Many singleton patches against
various subsystems. Most notably some maintenance work in ocfs2
and initramfs"
* tag 'mm-nonmm-stable-2022-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (65 commits)
kcov: update pos before writing pc in trace function
ocfs2: dlmfs: fix error handling of user_dlm_destroy_lock
ocfs2: dlmfs: don't clear USER_LOCK_ATTACHED when destroying lock
fs/ntfs: remove redundant variable idx
fat: remove time truncations in vfat_create/vfat_mkdir
fat: report creation time in statx
fat: ignore ctime updates, and keep ctime identical to mtime in memory
fat: split fat_truncate_time() into separate functions
MAINTAINERS: add Muchun as a memcg reviewer
proc/sysctl: make protected_* world readable
ia64: mca: drop redundant spinlock initialization
tty: fix deadlock caused by calling printk() under tty_port->lock
relay: remove redundant assignment to pointer buf
fs/ntfs3: validate BOOT sectors_per_clusters
lib/string_helpers: fix not adding strarray to device's resource list
kernel/crash_core.c: remove redundant check of ck_cmdline
ELF, uapi: fixup ELF_ST_TYPE definition
ipc/mqueue: use get_tree_nodev() in mqueue_get_tree()
ipc: update semtimedop() to use hrtimer
ipc/sem: remove redundant assignments
...