IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
commit 74e19ef0ff8061ef55957c3abd71614ef0f42f47 upstream.
The results of "access_ok()" can be mis-speculated. The result is that
you can end speculatively:
if (access_ok(from, size))
// Right here
even for bad from/size combinations. On first glance, it would be ideal
to just add a speculation barrier to "access_ok()" so that its results
can never be mis-speculated.
But there are lots of system calls just doing access_ok() via
"copy_to_user()" and friends (example: fstat() and friends). Those are
generally not problematic because they do not _consume_ data from
userspace other than the pointer. They are also very quick and common
system calls that should not be needlessly slowed down.
"copy_from_user()" on the other hand uses a user-controller pointer and
is frequently followed up with code that might affect caches. Take
something like this:
if (!copy_from_user(&kernelvar, uptr, size))
do_something_with(kernelvar);
If userspace passes in an evil 'uptr' that *actually* points to a kernel
addresses, and then do_something_with() has cache (or other)
side-effects, it could allow userspace to infer kernel data values.
Add a barrier to the common copy_from_user() code to prevent
mis-speculated values which happen after the copy.
Also add a stub for architectures that do not define barrier_nospec().
This makes the macro usable in generic code.
Since the barrier is now usable in generic code, the x86 #ifdef in the
BPF code can also go away.
Reported-by: Jordy Zomer <jordyzomer@google.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Daniel Borkmann <daniel@iogearbox.net> # BPF bits
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 821de68c1f9c0236b0b9c10834cda900ae9b443c ]
Unsupported port speed can be set and cause error. Now fixing it
and return an error if setting unsupported speed.
This fix depends on the following, which was included in v6.2-rc1:
commit a61474c41e8c ("nfp: ethtool: support reporting link modes").
Fixes: 7c698737270f ("nfp: add support for .set_link_ksettings()")
Signed-off-by: Yu Xiao <yu.xiao@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a61474c41e8c530c54a26db4f5434f050ef7718d ]
Add support for reporting link modes,
including `Supported link modes` and `Advertised link modes`,
via ethtool $DEV.
A new command `SPCODE_READ_MEDIA` is added to read info from
management firmware. Also, the mapping table `nfp_eth_media_table`
associates the link modes between NFP and kernel. Both of them
help to support this ability.
Signed-off-by: Yu Xiao <yu.xiao@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Link: https://lore.kernel.org/r/20221125113030.141642-1-simon.horman@corigine.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stable-dep-of: 821de68c1f9c ("nfp: ethtool: fix the bug of setting unsupported port speed")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 111bcb37385353f0510e5847d5abcd1c613dba23 ]
If a relocatable kernel is loaded at a non-zero address and told not to
relocate to zero (kdump or RELOCATABLE_TEST), the mapping of the
interrupt code at zero is left with RWX permissions.
That is a security weakness, and leads to a warning at boot if
CONFIG_DEBUG_WX is enabled:
powerpc/mm: Found insecure W+X mapping at address 00000000056435bc/0xc000000000000000
WARNING: CPU: 1 PID: 1 at arch/powerpc/mm/ptdump/ptdump.c:193 note_page+0x484/0x4c0
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc1-00001-g8ae8e98aea82-dirty #175
Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,git-dd0dca hv:linux,kvm pSeries
NIP: c0000000004a1c34 LR: c0000000004a1c30 CTR: 0000000000000000
REGS: c000000003503770 TRAP: 0700 Not tainted (6.2.0-rc1-00001-g8ae8e98aea82-dirty)
MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 24000220 XER: 00000000
CFAR: c000000000545a58 IRQMASK: 0
...
NIP note_page+0x484/0x4c0
LR note_page+0x480/0x4c0
Call Trace:
note_page+0x480/0x4c0 (unreliable)
ptdump_pmd_entry+0xc8/0x100
walk_pgd_range+0x618/0xab0
walk_page_range_novma+0x74/0xc0
ptdump_walk_pgd+0x98/0x170
ptdump_check_wx+0x94/0x100
mark_rodata_ro+0x30/0x70
kernel_init+0x78/0x1a0
ret_from_kernel_thread+0x5c/0x64
The fix has two parts. Firstly the pages from zero up to the end of
interrupts need to be marked read-only, so that they are left with R-X
permissions. Secondly the mapping logic needs to be taught to ensure
there is a page boundary at the end of the interrupt region, so that the
permission change only applies to the interrupt text, and not the region
following it.
Fixes: c55d7b5e6426 ("powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE")
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230110124753.1325426-2-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 50aa870ba2f7735f556e52d15f61cd0f359c4c0b ]
Placing a declaration of evt_reset is pedantically invalid
according to the C standard. While GCC does not really care
and only warns with -Wpedantic, clang ignores the declaration
altogether with an error:
x86_64/xen_shinfo_test.c:965:2: error: expected expression
struct kvm_xen_hvm_attr evt_reset = {
^
x86_64/xen_shinfo_test.c:969:38: error: use of undeclared identifier evt_reset
vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &evt_reset);
^
Reported-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reported-by: Sean Christopherson <seanjc@google.com>
Fixes: a79b53aaaab5 ("KVM: x86: fix deadlock for KVM_XEN_EVTCHN_RESET", 2022-12-28)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a79b53aaaab53de017517bf9579b6106397a523c ]
While KVM_XEN_EVTCHN_RESET is usually called with no vCPUs running,
if that happened it could cause a deadlock. This is due to
kvm_xen_eventfd_reset() doing a synchronize_srcu() inside
a kvm->lock critical section.
To avoid this, first collect all the evtchnfd objects in an
array and free all of them once the kvm->lock critical section
is over and th SRCU grace period has expired.
Reported-by: Michal Luczaj <mhal@rbox.co>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fff758698842fb6722be37498d8773e0fb47f000 ]
The attribute __maybe_unused should remain only until the respective
info is not in the pciidlist. The info can't be added together
with its definition because that would cause the driver to automatically
probe for the device, while it's still not ready for that. However once
pciidlist contains it, the attribute can be removed.
Fixes: 7835303982d1 ("drm/i915/mtl: Add MeteorLake PCI IDs")
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221214194944.3670344-1-lucas.demarchi@intel.com
(cherry picked from commit 50490ce05b7a50b0bd4108fa7d6db3ca2972fa83)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b24cded8c065d7cef8690b2c7b82b828cce57708 ]
If the irq is enabled after the spi si registered, there can be a race
with the initialization of the devices on the spi bus.
Eg:
mtk-spi 1100a000.spi: spi-mem transfer timeout
spi-nor: probe of spi0.0 failed with error -110
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000010
...
Call trace:
mtk_spi_can_dma+0x0/0x2c
Fixes: c6f7874687f7 ("spi: mediatek: Enable irq when pdata is ready")
Reported-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/20221225-mtk-spi-fixes-v1-0-bb6c14c232f8@chromium.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8d8bee13ae9e316443c6666286360126a19c8d94 ]
There aren't enough resources to run these ports at 10G speeds. Disable
10G for these ports, reverting to the previous speed.
Fixes: 36926a7d70c2 ("powerpc: dts: t208x: Mark MAC1 and MAC2 as 10G")
Reported-by: Camelia Alexandra Groza <camelia.groza@nxp.com>
Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Reviewed-by: Camelia Groza <camelia.groza@nxp.com>
Tested-by: Camelia Groza <camelia.groza@nxp.com>
Link: https://lore.kernel.org/r/20221216172937.2960054-1-sean.anderson@seco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f006229135b7debf4037adb1eb93e358559593db ]
Debian's gcc-13 [1] throws the following error in
kvaser_usb_hydra_cmd_size():
[1] gcc version 13.0.0 20221214 (experimental) [master r13-4693-g512098a3316] (Debian 13-20221214-1)
| drivers/net/can/usb/kvaser_usb/kvaser_usb_hydra.c:502:65: error:
| array subscript ‘struct kvaser_cmd_ext[0]’ is partly outside array
| bounds of ‘unsigned char[32]’ [-Werror=array-bounds=]
| 502 | ret = le16_to_cpu(((struct kvaser_cmd_ext *)cmd)->len);
kvaser_usb_hydra_cmd_size() returns the size of given command. It
depends on the command number (cmd->header.cmd_no). For extended
commands (cmd->header.cmd_no == CMD_EXTENDED) the above shown code is
executed.
Help gcc to recognize that this code path is not taken in all cases,
by calling kvaser_usb_hydra_cmd_size() directly after assigning the
command number.
Fixes: aec5fb2268b7 ("can: kvaser_usb: Add support for Kvaser USB hydra family")
Cc: Jimmy Assarsson <extja@kvaser.com>
Cc: Anssi Hannula <anssi.hannula@bitwise.fi>
Link: https://lore.kernel.org/all/20221219110104.1073881-1-mkl@pengutronix.de
Tested-by: Jimmy Assarsson <extja@kvaser.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2e7eab81425ad6c875f2ed47c0ce01e78afc38a5 ]
According to Intel's document on Indirect Branch Restricted
Speculation, "Enabling IBRS does not prevent software from controlling
the predicted targets of indirect branches of unrelated software
executed later at the same predictor mode (for example, between two
different user applications, or two different virtual machines). Such
isolation can be ensured through use of the Indirect Branch Predictor
Barrier (IBPB) command." This applies to both basic and enhanced IBRS.
Since L1 and L2 VMs share hardware predictor modes (guest-user and
guest-kernel), hardware IBRS is not sufficient to virtualize
IBRS. (The way that basic IBRS is implemented on pre-eIBRS parts,
hardware IBRS is actually sufficient in practice, even though it isn't
sufficient architecturally.)
For virtual CPUs that support IBRS, add an indirect branch prediction
barrier on emulated VM-exit, to ensure that the predicted targets of
indirect branches executed in L1 cannot be controlled by software that
was executed in L2.
Since we typically don't intercept guest writes to IA32_SPEC_CTRL,
perform the IBPB at emulated VM-exit regardless of the current
IA32_SPEC_CTRL.IBRS value, even though the IBPB could technically be
deferred until L1 sets IA32_SPEC_CTRL.IBRS, if IA32_SPEC_CTRL.IBRS is
clear at emulated VM-exit.
This is CVE-2022-2196.
Fixes: 5c911beff20a ("KVM: nVMX: Skip IBPB when switching between vmcs01 and vmcs02")
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221019213620.1953281-3-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5c30e8101e8d5d020b1d7119117889756a6ed713 ]
Skip the WRMSR fastpath in SVM's VM-Exit handler if the next RIP isn't
valid, e.g. because KVM is running with nrips=false. SVM must decode and
emulate to skip the WRMSR if the CPU doesn't provide the next RIP.
Getting the instruction bytes to decode the WRMSR requires reading guest
memory, which in turn means dereferencing memslots, and that isn't safe
because KVM doesn't hold SRCU when the fastpath runs.
Don't bother trying to enable the fastpath for this case, e.g. by doing
only the WRMSR and leaving the "skip" until later. NRIPS is supported on
all modern CPUs (KVM has considered making it mandatory), and the next
RIP will be valid the vast, vast majority of the time.
=============================
WARNING: suspicious RCU usage
6.0.0-smp--4e557fcd3d80-skip #13 Tainted: G O
-----------------------------
include/linux/kvm_host.h:954 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by stable/206475:
#0: ffff9d9dfebcc0f0 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x8b/0x620 [kvm]
stack backtrace:
CPU: 152 PID: 206475 Comm: stable Tainted: G O 6.0.0-smp--4e557fcd3d80-skip #13
Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 10.48.0 01/27/2022
Call Trace:
<TASK>
dump_stack_lvl+0x69/0xaa
dump_stack+0x10/0x12
lockdep_rcu_suspicious+0x11e/0x130
kvm_vcpu_gfn_to_memslot+0x155/0x190 [kvm]
kvm_vcpu_gfn_to_hva_prot+0x18/0x80 [kvm]
paging64_walk_addr_generic+0x183/0x450 [kvm]
paging64_gva_to_gpa+0x63/0xd0 [kvm]
kvm_fetch_guest_virt+0x53/0xc0 [kvm]
__do_insn_fetch_bytes+0x18b/0x1c0 [kvm]
x86_decode_insn+0xf0/0xef0 [kvm]
x86_emulate_instruction+0xba/0x790 [kvm]
kvm_emulate_instruction+0x17/0x20 [kvm]
__svm_skip_emulated_instruction+0x85/0x100 [kvm_amd]
svm_skip_emulated_instruction+0x13/0x20 [kvm_amd]
handle_fastpath_set_msr_irqoff+0xae/0x180 [kvm]
svm_vcpu_run+0x4b8/0x5a0 [kvm_amd]
vcpu_enter_guest+0x16ca/0x22f0 [kvm]
kvm_arch_vcpu_ioctl_run+0x39d/0x900 [kvm]
kvm_vcpu_ioctl+0x538/0x620 [kvm]
__se_sys_ioctl+0x77/0xc0
__x64_sys_ioctl+0x1d/0x20
do_syscall_64+0x3d/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: 404d5d7bff0d ("KVM: X86: Introduce more exit_fastpath_completion enum values")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220930234031.1732249-1-seanjc@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 17122c06b86c9f77f45b86b8e62c3ed440847a59 ]
Treat any exception during instruction decode for EMULTYPE_SKIP as a
"full" emulation failure, i.e. signal failure instead of queuing the
exception. When decoding purely to skip an instruction, KVM and/or the
CPU has already done some amount of emulation that cannot be unwound,
e.g. on an EPT misconfig VM-Exit KVM has already processeed the emulated
MMIO. KVM already does this if a #UD is encountered, but not for other
exceptions, e.g. if a #PF is encountered during fetch.
In SVM's soft-injection use case, queueing the exception is particularly
problematic as queueing exceptions while injecting events can put KVM
into an infinite loop due to bailing from VM-Enter to service the newly
pending exception. E.g. multiple warnings to detect such behavior fire:
------------[ cut here ]------------
WARNING: CPU: 3 PID: 1017 at arch/x86/kvm/x86.c:9873 kvm_arch_vcpu_ioctl_run+0x1de5/0x20a0 [kvm]
Modules linked in: kvm_amd ccp kvm irqbypass
CPU: 3 PID: 1017 Comm: svm_nested_soft Not tainted 6.0.0-rc1+ #220
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:kvm_arch_vcpu_ioctl_run+0x1de5/0x20a0 [kvm]
Call Trace:
kvm_vcpu_ioctl+0x223/0x6d0 [kvm]
__x64_sys_ioctl+0x85/0xc0
do_syscall_64+0x2b/0x50
entry_SYSCALL_64_after_hwframe+0x46/0xb0
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
WARNING: CPU: 3 PID: 1017 at arch/x86/kvm/x86.c:9987 kvm_arch_vcpu_ioctl_run+0x12a3/0x20a0 [kvm]
Modules linked in: kvm_amd ccp kvm irqbypass
CPU: 3 PID: 1017 Comm: svm_nested_soft Tainted: G W 6.0.0-rc1+ #220
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:kvm_arch_vcpu_ioctl_run+0x12a3/0x20a0 [kvm]
Call Trace:
kvm_vcpu_ioctl+0x223/0x6d0 [kvm]
__x64_sys_ioctl+0x85/0xc0
do_syscall_64+0x2b/0x50
entry_SYSCALL_64_after_hwframe+0x46/0xb0
---[ end trace 0000000000000000 ]---
Fixes: 6ea6e84309ca ("KVM: x86: inject exceptions produced by x86_decode_insn")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220930233632.1725475-1-seanjc@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eb79f12b4c41dd2403a0d16772ee72fcd6416015 ]
The PMU instance will be called hisi_pcie<sicl>_core<core> rather than
hisi_pcie<sicl>_<core>. Fix this in the documentation.
Fixes: c8602008e247 ("docs: perf: Add description for HiSilicon PCIe PMU driver")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20221117084136.53572-3-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c6f7874687f7027d7c4b2f53ff6e4d22850f915d ]
If the device does not come straight from reset, we might receive an IRQ
before we are ready to handle it.
Fixes:
[ 0.832328] Unable to handle kernel read from unreadable memory at virtual address 0000000000000010
[ 1.040343] Call trace:
[ 1.040347] mtk_spi_can_dma+0xc/0x40
...
[ 1.262265] start_kernel+0x338/0x42c
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20221128-spi-mt65xx-v1-0-509266830665@chromium.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3c2673a09cf1181318c07b7dbc1bc532ba3d33e3 ]
SATA devices on an expander may be removed and not be found again when I_T
nexus reset and revalidation are processed simultaneously.
The issue comes from:
- Revalidation can remove SATA devices in link reset, e.g. in
hisi_sas_clear_nexus_ha().
- However, hisi_sas_debug_I_T_nexus_reset() polls the state of a SATA
device on an expander after sending link_reset, where it calls:
hisi_sas_debug_I_T_nexus_reset
sas_ata_wait_after_reset
ata_wait_after_reset
ata_wait_ready
smp_ata_check_ready
sas_ex_phy_discover
sas_ex_phy_discover_helper
sas_set_ex_phy
The ex_phy's change count is updated in sas_set_ex_phy(), so SATA
devices after a link reset may not be found later through revalidation.
A similar issue was reported in:
commit 0f3fce5cc77e ("[SCSI] libsas: fix ata_eh clobbering ex_phys via
smp_ata_check_ready")
commit 87c8331fcf72 ("[SCSI] libsas: prevent domain rediscovery competing
with ata error handling").
To address this issue, in hisi_sas_debug_I_T_nexus_reset(), we now call
smp_ata_check_ready_type() that only polls the device type while not
updating the ex_phy's data of libsas.
Fixes: 71453bd9d1bf ("scsi: hisi_sas: Use sas_ata_wait_after_reset() in IT nexus reset")
Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
Link: https://lore.kernel.org/r/20221118083714.4034612-5-zhanjie9@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9181ce3cb5d96f0ee28246a857ca651830fa3746 ]
Create function smp_ata_check_ready_type() for LLDDs to wait for SATA
devices to come up after a link reset.
Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
Link: https://lore.kernel.org/r/20221118083714.4034612-4-zhanjie9@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Stable-dep-of: 3c2673a09cf1 ("scsi: hisi_sas: Fix SATA devices missing issue during I_T nexus reset")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d7bf7f3b813e3755226bcb5114ad2ac477514ebf ]
add_latent_entropy() is called every time a process forks, in
kernel_clone(). This in turn calls add_device_randomness() using the
latent entropy global state. add_device_randomness() does two things:
2) Mixes into the input pool the latent entropy argument passed; and
1) Mixes in a cycle counter, a sort of measurement of when the event
took place, the high precision bits of which are presumably
difficult to predict.
(2) is impossible without CONFIG_GCC_PLUGIN_LATENT_ENTROPY=y. But (1) is
always possible. However, currently CONFIG_GCC_PLUGIN_LATENT_ENTROPY=n
disables both (1) and (2), instead of just (2).
This commit causes the CONFIG_GCC_PLUGIN_LATENT_ENTROPY=n case to still
do (1) by passing NULL (len 0) to add_device_randomness() when add_latent_
entropy() is called.
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Emese Revfy <re.emese@gmail.com>
Fixes: 38addce8b600 ("gcc-plugins: Add latent_entropy plugin")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 710ffe671e014d5ccbcff225130a178b088ef090 ]
Psi polling mechanism is trying to minimize the number of wakeups to
run psi_poll_work and is currently relying on timer_pending() to detect
when this work is already scheduled. This provides a window of opportunity
for psi_group_change to schedule an immediate psi_poll_work after
poll_timer_fn got called but before psi_poll_work could reschedule itself.
Below is the depiction of this entire window:
poll_timer_fn
wake_up_interruptible(&group->poll_wait);
psi_poll_worker
wait_event_interruptible(group->poll_wait, ...)
psi_poll_work
psi_schedule_poll_work
if (timer_pending(&group->poll_timer)) return;
...
mod_timer(&group->poll_timer, jiffies + delay);
Prior to 461daba06bdc we used to rely on poll_scheduled atomic which was
reset and set back inside psi_poll_work and therefore this race window
was much smaller.
The larger window causes increased number of wakeups and our partners
report visible power regression of ~10mA after applying 461daba06bdc.
Bring back the poll_scheduled atomic and make this race window even
narrower by resetting poll_scheduled only when we reach polling expiration
time. This does not completely eliminate the possibility of extra wakeups
caused by a race with psi_group_change however it will limit it to the
worst case scenario of one extra wakeup per every tracking window (0.5s
in the worst case).
This patch also ensures correct ordering between clearing poll_scheduled
flag and obtaining changed_states using memory barrier. Correct ordering
between updating changed_states and setting poll_scheduled is ensured by
atomic_xchg operation.
By tracing the number of immediate rescheduling attempts performed by
psi_group_change and the number of these attempts being blocked due to
psi monitor being already active, we can assess the effects of this change:
Before the patch:
Run#1 Run#2 Run#3
Immediate reschedules attempted: 684365 1385156 1261240
Immediate reschedules blocked: 682846 1381654 1258682
Immediate reschedules (delta): 1519 3502 2558
Immediate reschedules (% of attempted): 0.22% 0.25% 0.20%
After the patch:
Run#1 Run#2 Run#3
Immediate reschedules attempted: 882244 770298 426218
Immediate reschedules blocked: 881996 769796 426074
Immediate reschedules (delta): 248 502 144
Immediate reschedules (% of attempted): 0.03% 0.07% 0.03%
The number of non-blocked immediate reschedules dropped from 0.22-0.25%
to 0.03-0.07%. The drop is attributed to the decrease in the race window
size and the fact that we allow this race only when psi monitors reach
polling window expiration time.
Fixes: 461daba06bdc ("psi: eliminate kthread_worker from psi trigger scheduling mechanism")
Reported-by: Kathleen Chang <yt.chang@mediatek.com>
Reported-by: Wenju Xu <wenju.xu@mediatek.com>
Reported-by: Jonathan Chen <jonathan.jmchen@mediatek.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: SH Chen <show-hong.chen@mediatek.com>
Link: https://lore.kernel.org/r/20221028194541.813985-1-surenb@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7256d1f4618b40792d1e9b9b6cb1406a13cad2dd ]
Commit 036177310bac ("clk: mxl: Switch from direct readl/writel based IO
to regmap based IO") introduced code resulting in below warning issued
by the smatch static checker.
drivers/clk/x86/clk-lgm.c:441 lgm_cgu_probe() warn: passing zero to 'PTR_ERR'
Fix the warning by replacing incorrect IS_ERR_OR_NULL() with IS_ERR().
Fixes: 036177310bac ("clk: mxl: Switch from direct readl/writel based IO to regmap based IO")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rahul Tanwar <rtanwar@maxlinear.com>
Link: https://lore.kernel.org/r/49e339d4739e4ae4c92b00c1b2918af0755d4122.1666695221.git.rtanwar@maxlinear.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 36926a7d70c2d462fca1ed85bfee000d17fd8662 ]
On the T208X SoCs, MAC1 and MAC2 support XGMII. Add some new MAC dtsi
fragments, and mark the QMAN ports as 10G.
Fixes: da414bb923d9 ("powerpc/mpc85xx: Add FSL QorIQ DPAA FMan support to the SoC device tree(s)")
Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 106ef3bda21006fe37b62c85931230a6355d78d3 ]
One of the clock entry "dcl" clk has some HW limitations. One is that
its rate can only by changed by changing its parent clk's rate & two is
that HW does not support enable/disable for this clk.
Handle above two limitations by adding relevant flags. Add standard flag
CLK_SET_RATE_PARENT to handle rate change and add driver internal flag
DIV_CLK_NO_MASK to handle enable/disable.
Fixes: d058fd9e8984 ("clk: intel: Add CGU clock driver for a new SoC")
Reviewed-by: Yi xin Zhu <yzhu@maxlinear.com>
Signed-off-by: Rahul Tanwar <rtanwar@maxlinear.com>
Link: https://lore.kernel.org/r/a4770e7225f8a0c03c8ab2ba80434a4e8e9afb17.1665642720.git.rtanwar@maxlinear.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a5d49bd369b8588c0ee9d4d0a2c0160558a3ab69 ]
In MxL's LGM SoC, gate clocks can be controlled either from CGU clk driver
i.e. this driver or directly from power management driver/daemon. It is
dependent on the power policy/profile requirements of the end product.
To support such use cases, provide option to override gate clks enable/disable
by adding a flag GATE_CLK_HW which controls if these gate clks are controlled
by HW i.e. this driver or overridden in order to allow it to be controlled
by power profiles instead.
Reviewed-by: Yi xin Zhu <yzhu@maxlinear.com>
Signed-off-by: Rahul Tanwar <rtanwar@maxlinear.com>
Link: https://lore.kernel.org/r/bdc9c89317b5d338a6c4f1d49386b696e947a672.1665642720.git.rtanwar@maxlinear.com
[sboyd@kernel.org: Add braces on many line if-else]
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Stable-dep-of: 106ef3bda210 ("clk: mxl: Fix a clk entry by adding relevant flags")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eaabee88a88a26b108be8d120fc072dfaf462cef ]
Patch 1/4 of this patch series switches from direct readl/writel
based register access to regmap based register access. Instead
of using direct readl/writel, regmap API's are used to read, write
& read-modify-write clk registers. Regmap API's already use their
own spinlocks to serialize the register accesses across multiple
cores in which case additional driver spinlocks becomes redundant.
Hence, remove redundant spinlocks from driver in this patch 2/4.
Reviewed-by: Yi xin Zhu <yzhu@maxlinear.com>
Signed-off-by: Rahul Tanwar <rtanwar@maxlinear.com>
Link: https://lore.kernel.org/r/a8a02c8773b88924503a9fdaacd37dd2e6488bf3.1665642720.git.rtanwar@maxlinear.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Stable-dep-of: 106ef3bda210 ("clk: mxl: Fix a clk entry by adding relevant flags")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 036177310bac5534de44ff6a7b60a4d2c0b6567c ]
Earlier version of driver used direct io remapped register read
writes using readl/writel. But we need secure boot access which
is only possible when registers are read & written using regmap.
This is because the security bus/hook is written & coupled only
with regmap layer.
Switch the driver from direct readl/writel based register accesses
to regmap based register accesses.
Additionally, update the license headers to latest status.
Reviewed-by: Yi xin Zhu <yzhu@maxlinear.com>
Signed-off-by: Rahul Tanwar <rtanwar@maxlinear.com>
Link: https://lore.kernel.org/r/2610331918206e0e3bd18babb39393a558fb34f9.1665642720.git.rtanwar@maxlinear.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Stable-dep-of: 106ef3bda210 ("clk: mxl: Fix a clk entry by adding relevant flags")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 18feaf6d0784dcba888859109676adf1e0260dfd ]
HF-VSDB/SCDB has bits to advertise support for 16, 12 and 10 bpc.
If none of the bits are set, the minimum bpc supported with DSC is 8.
This patch corrects the min bpc supported to be 8, instead of 0.
Fixes: 76ee7b905678 ("drm/edid: Parse DSC1.2 cap fields from HFVSDB block")
Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
v2: s/DSC1.2/DSC 1.2
Signed-off-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220916100551.2531750-2-ankit.k.nautiyal@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 791082ec0ab843e0be07c8ce3678e4c2afd2e33d ]
Re-enable the function rtl8xxxu_gen2_report_connect.
It informs the firmware when connecting to a network. This makes the
firmware enable the rate control, which makes the upload faster.
It also informs the firmware when disconnecting from a network. In the
past this made reconnecting impossible because it was sending the
auth on queue 0x7 (TXDESC_QUEUE_VO) instead of queue 0x12
(TXDESC_QUEUE_MGNT):
wlp0s20f0u3: send auth to 90:55:de:__:__:__ (try 1/3)
wlp0s20f0u3: send auth to 90:55:de:__:__:__ (try 2/3)
wlp0s20f0u3: send auth to 90:55:de:__:__:__ (try 3/3)
wlp0s20f0u3: authentication with 90:55:de:__:__:__ timed out
Probably the firmware disables the unnecessary TX queues when it
knows it's disconnected.
However, this was fixed in commit edd5747aa12e ("wifi: rtl8xxxu: Fix
skb misuse in TX queue selection").
Fixes: c59f13bbead4 ("rtl8xxxu: Work around issue with 8192eu and 8723bu devices not reconnecting")
Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/43200afc-0c65-ee72-48f8-231edd1df493@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f74878433d5ade360447da5d92e9c2e535780d80 ]
Commit 26f3a021b37c ("ath11k: allocate smaller chunks of memory for
firmware") and commit f6f92968e1e5 ("ath11k: qmi: try to allocate a
big block of DMA memory first") change ath11k to allocate the memory
chunks for target twice while wlan load. It fails for the 1st time
because of large memory and then changed to allocate many small chunks
for the 2nd time sometimes as below log.
1st time failed:
[10411.640620] ath11k_pci 0000:05:00.0: qmi firmware request memory request
[10411.640625] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 6881280
[10411.640630] ath11k_pci 0000:05:00.0: qmi mem seg type 4 size 3784704
[10411.640658] ath11k_pci 0000:05:00.0: qmi dma allocation failed (6881280 B type 1), will try later with small size
[10411.640671] ath11k_pci 0000:05:00.0: qmi delays mem_request 2
[10411.640677] ath11k_pci 0000:05:00.0: qmi respond memory request delayed 1
2nd time success:
[10411.642004] ath11k_pci 0000:05:00.0: qmi firmware request memory request
[10411.642008] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288
[10411.642012] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288
[10411.642014] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288
[10411.642016] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288
[10411.642018] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288
[10411.642020] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288
[10411.642022] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288
[10411.642024] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288
[10411.642027] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288
[10411.642029] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288
[10411.642031] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 458752
[10411.642033] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 131072
[10411.642035] ath11k_pci 0000:05:00.0: qmi mem seg type 4 size 524288
[10411.642037] ath11k_pci 0000:05:00.0: qmi mem seg type 4 size 524288
[10411.642039] ath11k_pci 0000:05:00.0: qmi mem seg type 4 size 524288
[10411.642041] ath11k_pci 0000:05:00.0: qmi mem seg type 4 size 524288
[10411.642043] ath11k_pci 0000:05:00.0: qmi mem seg type 4 size 524288
[10411.642045] ath11k_pci 0000:05:00.0: qmi mem seg type 4 size 524288
[10411.642047] ath11k_pci 0000:05:00.0: qmi mem seg type 4 size 491520
[10411.642049] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288
And then commit 5962f370ce41 ("ath11k: Reuse the available memory after
firmware reload") skip the ath11k_qmi_free_resource() which frees the
memory chunks while recovery, after that, when run recovery test on
WCN6855, a warning happened every time as below and finally leads fail
for recovery.
[ 159.570318] BUG: Bad page state in process kworker/u16:5 pfn:33300
[ 159.570320] page:0000000096ffdbb9 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x33300
[ 159.570324] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
[ 159.570329] raw: 000fffffc0000000 0000000000000000 dead000000000122 0000000000000000
[ 159.570332] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 159.570334] page dumped because: nonzero _refcount
[ 159.570440] firewire_ohci syscopyarea sysfillrect psmouse sdhci_pci ahci sysimgblt firewire_core fb_sys_fops libahci crc_itu_t cqhci drm sdhci e1000e wmi video
[ 159.570460] CPU: 2 PID: 217 Comm: kworker/u16:5 Kdump: loaded Tainted: G B 5.19.0-rc1-wt-ath+ #3
[ 159.570465] Hardware name: LENOVO 418065C/418065C, BIOS 83ET63WW (1.33 ) 07/29/2011
[ 159.570467] Workqueue: qmi_msg_handler qmi_data_ready_work [qmi_helpers]
[ 159.570475] Call Trace:
[ 159.570476] <TASK>
[ 159.570478] dump_stack_lvl+0x49/0x5f
[ 159.570486] dump_stack+0x10/0x12
[ 159.570493] bad_page+0xab/0xf0
[ 159.570502] check_free_page_bad+0x66/0x70
[ 159.570511] __free_pages_ok+0x530/0x9a0
[ 159.570517] ? __dev_printk+0x58/0x6b
[ 159.570525] ? _dev_printk+0x56/0x72
[ 159.570534] ? qmi_decode+0x119/0x470 [qmi_helpers]
[ 159.570543] __free_pages+0x91/0xd0
[ 159.570548] dma_free_contiguous+0x50/0x60
[ 159.570556] dma_direct_free+0xe5/0x140
[ 159.570564] dma_free_attrs+0x35/0x50
[ 159.570570] ath11k_qmi_msg_mem_request_cb+0x2ae/0x3c0 [ath11k]
[ 159.570620] qmi_invoke_handler+0xac/0xe0 [qmi_helpers]
[ 159.570630] qmi_handle_message+0x6d/0x180 [qmi_helpers]
[ 159.570643] qmi_data_ready_work+0x2ca/0x440 [qmi_helpers]
[ 159.570656] process_one_work+0x227/0x440
[ 159.570667] worker_thread+0x31/0x3d0
[ 159.570676] ? process_one_work+0x440/0x440
[ 159.570685] kthread+0xfe/0x130
[ 159.570692] ? kthread_complete_and_exit+0x20/0x20
[ 159.570701] ret_from_fork+0x22/0x30
[ 159.570712] </TASK>
The reason is because when wlan start to recovery, the type, size and
count is not same for the 1st and 2nd QMI_WLFW_REQUEST_MEM_IND message,
Then it leads the parameter size is not correct for the dma_free_coherent().
For the chunk[1], the actual dma size is 524288 which allocate in the
2nd time of the initial wlan load phase, and the size which pass to
dma_free_coherent() is 3784704 which is got in the 1st time of recovery
phase, then warning above happened.
Change to use prev_size of struct target_mem_chunk for the paramter of
dma_free_coherent() since prev_size is the real size of last load/recovery.
Also change to check both type and size of struct target_mem_chunk to
reuse the memory to avoid mismatch buffer size for target. Then the
warning disappear and recovery success. When the 1st QMI_WLFW_REQUEST_MEM_IND
for recovery arrived, the trunk[0] is freed in ath11k_qmi_alloc_target_mem_chunk()
and then dma_alloc_coherent() failed caused by large size, and then
trunk[1] is freed in ath11k_qmi_free_target_mem_chunk(), the left 18
trunks will be reuse for the 2nd QMI_WLFW_REQUEST_MEM_IND message.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
Fixes: 5962f370ce41 ("ath11k: Reuse the available memory after firmware reload")
Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220928073832.16251-1-quic_wgong@quicinc.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d37c120b73128690434cc093952439eef9d56af1 ]
While the interface for the MMU mapping takes phys_addr_t to hold a
full 64bit address when necessary and MMUv2 is able to map physical
addresses with up to 40bit, etnaviv_iommu_map() truncates the address
to 32bits. Fix this by using the correct type.
Fixes: 931e97f3afd8 ("drm/etnaviv: mmuv2: support 40 bit phys address")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 9cec2aaffe969f2a3e18b5ec105fc20bb908e475 upstream.
The > needs be >= to prevent an out of bounds access.
Fixes: de5ca4c3852f ("net: sched: sch: Bounds check priority")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/Y+D+KN18FQI2DKLq@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f810d2b6b2fbdc5279644d8b2c140b1f7c9d43d upstream.
The HDaudio stream allocation is done first, and in a second step the
LOSIDV parameter is programmed for the multi-link used by a codec.
This leads to a possible stream_tag leak, e.g. if a DisplayAudio link
is not used. This would happen when a non-Intel graphics card is used
and userspace unconditionally uses the Intel Display Audio PCMs without
checking if they are connected to a receiver with jack controls.
We should first check that there is a valid multi-link entry to
configure before allocating a stream_tag. This change aligns the
dma_assign and dma_cleanup phases.
Complements: b0cd60f3e9f5 ("ALSA/ASoC: hda: clarify bus_get_link() and bus_link_get() helpers")
Link: https://github.com/thesofproject/linux/issues/4151
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://lore.kernel.org/r/20230216162340.19480-1-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e917a849c3fc317c4a5f82bb18726000173d39e6 upstream.
The sysfs group containing the cmb attributes is registered before the
driver knows if they need to be visible or not. Update the group when
cmb attributes are known to exist so the visibility setting is correct.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217037
Fixes: 86adbf0cdb9ec65 ("nvme: simplify transport specific device attribute handling")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d125d1349abeb46945dc5e98f7824bf688266f13 upstream.
syzbot reported a RCU stall which is caused by setting up an alarmtimer
with a very small interval and ignoring the signal. The reproducer arms the
alarm timer with a relative expiry of 8ns and an interval of 9ns. Not a
problem per se, but that's an issue when the signal is ignored because then
the timer is immediately rearmed because there is no way to delay that
rearming to the signal delivery path. See posix_timer_fn() and commit
58229a189942 ("posix-timers: Prevent softirq starvation by small intervals
and SIG_IGN") for details.
The reproducer does not set SIG_IGN explicitely, but it sets up the timers
signal with SIGCONT. That has the same effect as explicitely setting
SIG_IGN for a signal as SIGCONT is ignored if there is no handler set and
the task is not ptraced.
The log clearly shows that:
[pid 5102] --- SIGCONT {si_signo=SIGCONT, si_code=SI_TIMER, si_timerid=0, si_overrun=316014, si_int=0, si_ptr=NULL} ---
It works because the tasks are traced and therefore the signal is queued so
the tracer can see it, which delays the restart of the timer to the signal
delivery path. But then the tracer is killed:
[pid 5087] kill(-5102, SIGKILL <unfinished ...>
...
./strace-static-x86_64: Process 5107 detached
and after it's gone the stall can be observed:
syzkaller login: [ 79.439102][ C0] hrtimer: interrupt took 68471 ns
[ 184.460538][ C1] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
...
[ 184.658237][ C1] rcu: Stack dump where RCU GP kthread last ran:
[ 184.664574][ C1] Sending NMI from CPU 1 to CPUs 0:
[ 184.669821][ C0] NMI backtrace for cpu 0
[ 184.669831][ C0] CPU: 0 PID: 5108 Comm: syz-executor192 Not tainted 6.2.0-rc6-next-20230203-syzkaller #0
...
[ 184.670036][ C0] Call Trace:
[ 184.670041][ C0] <IRQ>
[ 184.670045][ C0] alarmtimer_fired+0x327/0x670
posix_timer_fn() prevents that by checking whether the interval for
timers which have the signal ignored is smaller than a jiffie and
artifically delay it by shifting the next expiry out by a jiffie. That's
accurate vs. the overrun accounting, but slightly inaccurate
vs. timer_gettimer(2).
The comment in that function says what needs to be done and there was a fix
available for the regular userspace induced SIG_IGN mechanism, but that did
not work due to the implicit ignore for SIGCONT and similar signals. This
needs to be worked on, but for now the only available workaround is to do
exactly what posix_timer_fn() does:
Increase the interval of self-rearming timers, which have their signal
ignored, to at least a jiffie.
Interestingly this has been fixed before via commit ff86bf0c65f1
("alarmtimer: Rate limit periodic intervals") already, but that fix got
lost in a later rework.
Reported-by: syzbot+b9564ba6e8e00694511b@syzkaller.appspotmail.com
Fixes: f2c45807d399 ("alarmtimer: Switch over to generic set/get/rearm routine")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87k00q1no2.ffs@tglx
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4b4191b8ae1278bde3642acaaef8f92810ed111a upstream.
Now that KVM disables vPMU support on hybrid CPUs, WARN and return zeros
if perf_get_x86_pmu_capability() is invoked on a hybrid CPU. The helper
doesn't provide an accurate accounting of the PMU capabilities for hybrid
CPUs and needs to be enhanced if KVM, or anything else outside of perf,
wants to act on the PMU capabilities.
Cc: stable@vger.kernel.org
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/all/20220818181530.2355034-1-kan.liang@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230208204230.1360502-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c10b61421a28e95a46ab489fd56c0f442ff6952 upstream.
When calling the KVM_GET_DEBUGREGS ioctl, on some configurations, there
might be some unitialized portions of the kvm_debugregs structure that
could be copied to userspace. Prevent this as is done in the other kvm
ioctls, by setting the whole structure to 0 before copying anything into
it.
Bonus is that this reduces the lines of code as the explicit flag
setting and reserved space zeroing out can be removed.
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <x86@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable <stable@kernel.org>
Reported-by: Xingyuan Mo <hdthky0@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Message-Id: <20230214103304.3689213-1-gregkh@linuxfoundation.org>
Tested-by: Xingyuan Mo <hdthky0@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4d7404e5ee0066e9a9e8268675de8a273b568b08 upstream.
Disable KVM support for virtualizing PMUs on hosts with hybrid PMUs until
KVM gains a sane way to enumeration the hybrid vPMU to userspace and/or
gains a mechanism to let userspace opt-in to the dangers of exposing a
hybrid vPMU to KVM guests. Virtualizing a hybrid PMU, or at least part of
a hybrid PMU, is possible, but it requires careful, deliberate
configuration from userspace.
E.g. to expose full functionality, vCPUs need to be pinned to pCPUs to
prevent migrating a vCPU between a big core and a little core, userspace
must enumerate a reasonable topology to the guest, and guest CPUID must be
curated per vCPU to enumerate accurate vPMU capabilities.
The last point is especially problematic, as KVM doesn't control which
pCPU it runs on when enumerating KVM's vPMU capabilities to userspace,
i.e. userspace can't rely on KVM_GET_SUPPORTED_CPUID in it's current form.
Alternatively, userspace could enable vPMU support by enumerating the
set of features that are common and coherent across all cores, e.g. by
filtering PMU events and restricting guest capabilities. But again, that
requires userspace to take action far beyond reflecting KVM's supported
feature set into the guest.
For now, simply disable vPMU support on hybrid CPUs to avoid inducing
seemingly random #GPs in guests, and punt support for hybrid CPUs to a
future enabling effort.
Reported-by: Jianfeng Gao <jianfeng.gao@intel.com>
Cc: stable@vger.kernel.org
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/all/20220818181530.2355034-1-kan.liang@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230208204230.1360502-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 91c11d5f32547a08d462934246488fe72f3d44c3 ]
when starting error recovery there might be a authentication work
running, and it involves I/O commands. Given the controller is tearing
down there is no chance for the I/O to complete other than timing out
which may unnecessarily take a full io timeout.
So first tear down the queues, fail/cancel all inflight I/O (including
potentially authentication) and only then stop authentication. This
ensures that failover is not stalled due to blocked authentication I/O.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1f1a4f89562d3b33b6ca4fc8a4f3bd4cd35ab4ea ]
when starting error recovery there might be a authentication work
running, and it involves I/O commands. Given the controller is tearing
down there is no chance for the I/O to complete other than timing out
which may unnecessarily take a full io timeout.
So first tear down the queues, fail/cancel all inflight I/O (including
potentially authentication) and only then stop authentication. This
ensures that failover is not stalled due to blocked authentication I/O.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7fa0b526f865cb42aa33917fd02a92cb03746f4d ]
The result of nlmsg_find_attr() 'br_spec' is dereferenced in
nla_for_each_nested(), but it can take NULL value in nla_find() function,
which will result in an error.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 51616018dd1b ("i40e: Add support for getlink, setlink ndo ops")
Signed-off-by: Natalia Petrova <n.petrova@fintech.ru>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230209172833.3596034-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 3770e52fd4ec40ebee16ba19ad6c09dc0b52739b upstream.
After x86 enabled support for KMSAN, it has become possible to have larger
'struct page' than was expected when commit 5470dea49f53 ("mm: use
mm_zero_struct_page from SPARC on all 64b architectures") was merged:
include/linux/mm.h:156:10: warning: no case matching constant switch condition '96'
switch (sizeof(struct page)) {
Extend the maximum accordingly.
Link: https://lkml.kernel.org/r/20230130130739.563628-1-arnd@kernel.org
Fixes: 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b architectures")
Fixes: 4ca8cc8d1bbe ("x86: kmsan: enable KMSAN builds for x86")
Fixes: f80be4571b19 ("kmsan: add KMSAN runtime core")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aa1e6a932ca652a50a5df458399724a80459f521 upstream.
If we call folio_isolate_lru() successfully, we will get return value 0.
We need to add this folio to the movable_pages_list.
Link: https://lkml.kernel.org/r/20230131063206.28820-1-Kuan-Ying.Lee@mediatek.com
Fixes: 67e139b02d99 ("mm/gup.c: refactor check_and_migrate_movable_pages()")
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Andrew Yang <andrew.yang@mediatek.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8230680f36fd1525303d1117768c8852314c488c upstream.
Take into account the IPV6_TCLASS socket option (DSCP) in
tcp_v6_connect(). Otherwise fib6_rule_match() can't properly
match the DSCP value, resulting in invalid route lookup.
For example:
ip route add unreachable table main 2001:db8::10/124
ip route add table 100 2001:db8::10/124 dev eth0
ip -6 rule add dsfield 0x04 table 100
echo test | socat - TCP6:[2001:db8::11]:54321,ipv6-tclass=0x04
Without this patch, socat fails at connect() time ("No route to host")
because the fib-rule doesn't jump to table 100 and the lookup ends up
being done in the main table.
Fixes: 2cc67cc731d9 ("[IPV6] ROUTE: Routing by Traffic Class.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e010ae08c71fda8be3d6bda256837795a0b3ea41 upstream.
Take into account the IPV6_TCLASS socket option (DSCP) in
ip6_datagram_flow_key_init(). Otherwise fib6_rule_match() can't
properly match the DSCP value, resulting in invalid route lookup.
For example:
ip route add unreachable table main 2001:db8::10/124
ip route add table 100 2001:db8::10/124 dev eth0
ip -6 rule add dsfield 0x04 table 100
echo test | socat - UDP6:[2001:db8::11]:54321,ipv6-tclass=0x04
Without this patch, socat fails at connect() time ("No route to host")
because the fib-rule doesn't jump to table 100 and the lookup ends up
being done in the main table.
Fixes: 2cc67cc731d9 ("[IPV6] ROUTE: Routing by Traffic Class.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0967bf837784a11c65d66060623a74e65211af0b upstream.
Include the second VLAN HLEN into account when computing the maximum
MTU size as other drivers do.
Fixes: fabf1bce103a ("ixgbe: Prevent unsupported configurations with XDP")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 207ce626add80ddd941f62fc2fe5d77586e0801b upstream.
Fix handling of the tsync interrupt to compare the pin number with
IGB_N_SDP instead of IGB_N_EXTTS/IGB_N_PEROUT and fix the indexing to
the perout array.
Fixes: cf99c1dd7b77 ("igb: move PEROUT and EXTTS isr logic to separate functions")
Reported-by: Matt Corallo <ntp-lists@mattcorallo.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230213185822.3960072-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d54cb1767e06025819daa6769e0f18dcbc60936 upstream.
Commit a97f8783a937 ("igb: unbreak I2C bit-banging on i350") introduced
code to change I2C settings to bit banging unconditionally.
However, this patch introduced a regression: On an Intel S2600CWR
Server Board with three NICs:
- 1x dual-port copper
Intel I350 Gigabit Network Connection [8086:1521] (rev 01)
fw 1.63, 0x80000dda
- 2x quad-port SFP+ with copper SFP Avago ABCU-5700RZ
Intel I350 Gigabit Fiber Network Connection [8086:1522] (rev 01)
fw 1.52.0
the SFP NICs no longer get link at all. Reverting commit a97f8783a937
or switching to the Intel out-of-tree driver both fix the problem.
Per the igb out-of-tree driver, I2C bit banging on i350 depends on
support for an external thermal sensor (ETS). However, commit
a97f8783a937 added bit banging unconditionally. Additionally, the
out-of-tree driver always calls init_thermal_sensor_thresh on probe,
while our driver only calls init_thermal_sensor_thresh only in
igb_reset(), and only if an ETS is present, ignoring the internal
thermal sensor. The affected SFPs don't provide an ETS. Per Intel,
the behaviour is a result of i350 firmware requirements.
This patch fixes the problem by aligning the behaviour to the
out-of-tree driver:
- split igb_init_i2c() into two functions:
- igb_init_i2c() only performs the basic I2C initialization.
- igb_set_i2c_bb() makes sure that E1000_CTRL_I2C_ENA is set
and enables bit-banging.
- igb_probe() only calls igb_set_i2c_bb() if an ETS is present.
- igb_probe() calls init_thermal_sensor_thresh() unconditionally.
- igb_reset() aligns its behaviour to igb_probe(), i. e., call
igb_set_i2c_bb() if an ETS is present and call
init_thermal_sensor_thresh() unconditionally.
Fixes: a97f8783a937 ("igb: unbreak I2C bit-banging on i350")
Tested-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Co-developed-by: Jamie Bainbridge <jbainbri@redhat.com>
Signed-off-by: Jamie Bainbridge <jbainbri@redhat.com>
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230214185549.1306522-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fda6c89fe3d9aca073495a664e1d5aea28cd4377 upstream.
lianhui reports that when MPLS fails to register the sysctl table
under new location (during device rename) the old pointers won't
get overwritten and may be freed again (double free).
Handle this gracefully. The best option would be unregistering
the MPLS from the device completely on failure, but unfortunately
mpls_ifdown() can fail. So failing fully is also unreliable.
Another option is to register the new table first then only
remove old one if the new one succeeds. That requires more
code, changes order of notifications and two tables may be
visible at the same time.
sysctl point is not used in the rest of the code - set to NULL
on failures and skip unregister if already NULL.
Reported-by: lianhui tang <bluetlh@gmail.com>
Fixes: 0fae3bf018d9 ("mpls: handle device renames for per-device sysctls")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>