Commit Graph

1264564 Commits

Author SHA1 Message Date
Antonio Borneo
df41b65cee irqchip/stm32-exti: Mark events reserved with RIF configuration check
EXTI events availability depends on Resource Isolation Framework (RIF)
configuration.

RIF grants access to buses with Compartment ID (CID) filtering, secure and
privilege level. It also assigns EXTI events to one or several processors
(CID, Secure, Privilege).

EXTI events used by Linux must be CID-filtered (EnCIDCFGR.CFEN=1) and
statically assigned to CID1 (EnCIDCFR.CID=CID1).

EXTI events not filling these conditions are marked as reserved and can't
be used by Linux.

Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240415134926.1254428-7-antonio.borneo@foss.st.com
2024-04-23 00:28:15 +02:00
Antonio Borneo
c00a4cb15a irqchip/stm32-exti: Skip secure events
Secure OS can reserve some EXTI events, marking them as "secure" by setting
the corresponding bit in register SECCFGR (aka TZENR).  These events cannot
be used by Linux.

Read the list of reserved events and check it during interrupt domain
allocation.

Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240415134926.1254428-6-antonio.borneo@foss.st.com
2024-04-23 00:28:15 +02:00
Antonio Borneo
06d7e914ca irqchip/stm32-exti: Convert driver to standard PM
All driver's dependencies for suspend/resume have been fixed long
ago. There are no more reasons to use syscore PM for the part of this
driver related to Cortex-A MPU.

Switch to standard PM using NOIRQ_SYSTEM_SLEEP_PM_OPS, so all the registers
of the interrupt controller get resumed before any irq gets enabled.

A side effect of this change is to drop the only global variable
'stm32_host_data', used to keep the driver's data for syscore_ops.  This
makes the driver ready to support multiple EXTI instances.

Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240415134926.1254428-5-antonio.borneo@foss.st.com
2024-04-23 00:28:15 +02:00
Antonio Borneo
77ec258ffa irqchip/stm32-exti: Map interrupts through interrupts-extended
The mapping of EXTI events to its parent interrupt controller is both SoC
and instance dependent.

The current implementation requires adding a new mapping table to the
driver's code and a new compatible for each new EXTI instance.

Check for the presence of the optional interrupts-extended property and use
it to map EXTI events to the parent's interrupts.

For old device trees without the optional interrupts-extended property, the
driver's behavior is unchanged, thus keeps backward compatibility.

Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240415134926.1254428-4-antonio.borneo@foss.st.com
2024-04-23 00:28:15 +02:00
Antonio Borneo
e9c17d91e6 dt-bindings: interrupt-controller: stm32-exti: Add irq mapping to parent
The mapping of EXTI events to its parent interrupt controller is both SoC
and instance dependent.

The current implementation requires adding a new mapping table to the
driver's code and a new compatible for each new EXTI instance.

To avoid that use the interrupts-extended property to list, for each EXTI
event, the associated parent interrupt.

Co-developed-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com>
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com>
Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20240415134926.1254428-3-antonio.borneo@foss.st.com
2024-04-23 00:28:14 +02:00
Antonio Borneo
8661327f80 irqchip/stm32-exti: Fix minor indentation issue
Commit 046a6ee234 ("irqchip: Bulk conversion to
generic_handle_domain_irq()") incorrectly added a leading space character
in the line indentation.

Use only TAB for indentation, removing the leading space.

Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240415134926.1254428-2-antonio.borneo@foss.st.com
2024-04-23 00:28:14 +02:00
Stefan Wahren
8371696a97 irqchip/mxs: Declare icoll_handle_irq() as static
After commit 5bb578a0c1 ("ARM: 9298/1: Drop custom mdesc->handle_irq()")
the function icoll_handle_irq() is only used within irq-mxs.c.  So declare
it as static to fix the warning about a missing prototype when building
with W=1.

Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2024-04-23 00:28:14 +02:00
Baoqi Zhang
234a557e28 irqchip/loongson-pch-pic: Update interrupt registration policy
The current code is using a fixed mapping between the LS7A interrupt source
and the HT interrupt vector. This prevents the utilization of the full
interrupt vector space and therefore limits the number of interrupt source
in a system.

Replace the fixed mapping with a dynamic mapping which allocates a
vector when an interrupt source is set up. This avoids that unused
sources prevent vectors from being used for other devices.

Introduce a mapping table in struct pch_pic, where each interrupt source
will allocate an index as a 'hwirq' number from the table in the order of
application and set table value as interrupt source number. This hwirq
number will be configured as vector in the HT interrupt controller. For an
interrupt source, the validity period of the obtained hwirq will last until
the system reset.

Co-developed-by: Biao Dong <dongbiao@loongson.cn>
Signed-off-by: Biao Dong <dongbiao@loongson.cn>
Co-developed-by: Tianyang Zhang <zhangtianyang@loongson.cn>
Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
Signed-off-by: Baoqi Zhang <zhangbaoqi@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240422093830.27212-1-zhangtianyang@loongson.cn
2024-04-23 00:17:07 +02:00
Jinjie Ruan
bb58c1baa5 genirq: Simplify the checks for irq_set_percpu_devid_partition()
Since whether desc is NULL or desc->percpu_enabled is true, it returns
-EINVAL, check them together, and assign desc->percpu_affinity using a
ternary to simplify the code.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240417085356.3785381-1-ruanjinjie@huawei.com
2024-04-23 00:17:07 +02:00
Anup Patel
35d77eb7b9 irqchip/riscv-imsic: Fix boot time update effective affinity warning
Currently, the following warning is observed on the QEMU virt machine:
genirq: irq_chip APLIC-MSI-d000000.aplic did not update eff. affinity mask of irq 12

The above warning is because the IMSIC driver does not set the initial
value of effective affinity in the interrupt descriptor. To address this,
initialize the effective affinity in imsic_irq_domain_alloc().

Fixes: 027e125acd ("irqchip/riscv-imsic: Add device MSI domain support for platform devices")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240413065210.315896-1-apatel@ventanamicro.com
2024-04-14 13:28:49 +02:00
Bitao Hu
e9a9292e23 watchdog/softlockup: Report the most frequent interrupts
When the watchdog determines that the current soft lockup is due to an
interrupt storm based on CPU utilization, reporting the most frequent
interrupts could be good enough for further troubleshooting.

Below is an example of interrupt storm. The call tree does not provide
useful information, but analyzing which interrupt caused the soft lockup by
comparing the counts of interrupts during the lockup period allows to
identify the culprit.

[  638.870231] watchdog: BUG: soft lockup - CPU#9 stuck for 26s! [swapper/9:0]
[  638.870825] CPU#9 Utilization every 4s during lockup:
[  638.871194]  #1:   0% system,          0% softirq,   100% hardirq,     0% idle
[  638.871652]  #2:   0% system,          0% softirq,   100% hardirq,     0% idle
[  638.872107]  #3:   0% system,          0% softirq,   100% hardirq,     0% idle
[  638.872563]  #4:   0% system,          0% softirq,   100% hardirq,     0% idle
[  638.873018]  #5:   0% system,          0% softirq,   100% hardirq,     0% idle
[  638.873494] CPU#9 Detect HardIRQ Time exceeds 50%. Most frequent HardIRQs:
[  638.873994]  #1: 330945      irq#7
[  638.874236]  #2: 31          irq#82
[  638.874493]  #3: 10          irq#10
[  638.874744]  #4: 2           irq#89
[  638.874992]  #5: 1           irq#102
...
[  638.875313] Call trace:
[  638.875315]  __do_softirq+0xa8/0x364

Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Liu Song <liusong@linux.alibaba.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20240411074134.30922-6-yaoma@linux.alibaba.com
2024-04-12 17:08:06 +02:00
Bitao Hu
d7037381d0 watchdog/softlockup: Low-overhead detection of interrupt storm
The following softlockup is caused by interrupt storm, but it cannot be
identified from the call tree. Because the call tree is just a snapshot
and doesn't fully capture the behavior of the CPU during the soft lockup.
  watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921]
  ...
  Call trace:
    __do_softirq+0xa0/0x37c
    __irq_exit_rcu+0x108/0x140
    irq_exit+0x14/0x20
    __handle_domain_irq+0x84/0xe0
    gic_handle_irq+0x80/0x108
    el0_irq_naked+0x50/0x58

Therefore, it is necessary to report CPU utilization during the
softlockup_threshold period (report once every sample_period, for a total
of 5 reportings), like this:
  watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921]
  CPU#28 Utilization every 4s during lockup:
    #1: 0% system, 0% softirq, 100% hardirq, 0% idle
    #2: 0% system, 0% softirq, 100% hardirq, 0% idle
    #3: 0% system, 0% softirq, 100% hardirq, 0% idle
    #4: 0% system, 0% softirq, 100% hardirq, 0% idle
    #5: 0% system, 0% softirq, 100% hardirq, 0% idle
  ...

This is helpful in determining whether an interrupt storm has occurred or
in identifying the cause of the softlockup. The criteria for determination
are as follows:

  a. If the hardirq utilization is high, then interrupt storm should be
     considered and the root cause cannot be determined from the call tree.
  b. If the softirq utilization is high, then the call might not necessarily
     point at the root cause.
  c. If the system utilization is high, then analyzing the root
     cause from the call tree is possible in most cases.

The mechanism requires a considerable amount of global storage space
when configured for the maximum number of CPUs. Therefore, adding a
SOFTLOCKUP_DETECTOR_INTR_STORM Kconfig knob that defaults to "yes"
if the max number of CPUs is <= 128.

Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Liu Song <liusong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240411074134.30922-5-yaoma@linux.alibaba.com
2024-04-12 17:08:05 +02:00
Bitao Hu
25a4a01511 genirq: Avoid summation loops for /proc/interrupts
show_interrupts() unconditionally accumulates the per CPU interrupt
statistics to determine whether an interrupt was ever raised.

This can be avoided for all interrupts which are not strictly per CPU
and not of type NMI because those interrupts provide already an
accumulated counter. The required logic is already implemented in
kstat_irqs().

Split the inner access logic out of kstat_irqs() and use it for
kstat_irqs() and show_interrupts() to avoid the accumulation loop
when possible.

Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Liu Song <liusong@linux.alibaba.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20240411074134.30922-4-yaoma@linux.alibaba.com
2024-04-12 17:08:05 +02:00
Bitao Hu
99cf63c566 genirq: Provide a snapshot mechanism for interrupt statistics
The soft lockup detector lacks a mechanism to identify interrupt storms as
root cause of a lockup. To enable this the detector needs a mechanism to
snapshot the interrupt count statistics on a CPU when the detector observes
a potential lockup scenario and compare that against the interrupt count
when it warns about the lockup later on. The number of interrupts in that
period give a hint whether the lockup might have been caused by an interrupt
storm.

Instead of having extra storage in the lockup detector and accessing the
internals of the interrupt descriptor directly, add a snapshot member to
the per CPU irq_desc::kstat_irq structure and provide interfaces to take a
snapshot of all interrupts on the current CPU and to retrieve the delta of
a specific interrupt later on.

Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240411074134.30922-3-yaoma@linux.alibaba.com
2024-04-12 17:08:05 +02:00
Bitao Hu
86d2a2f51f genirq: Convert kstat_irqs to a struct
The irq_desc::kstat_irqs member is a per-CPU variable of type int, which is
only capable of counting. A snapshot mechanism for interrupt statistics
will be added soon, which requires an additional variable to store the
snapshot.

To facilitate expansion, convert kstat_irqs here to a struct containing
only the count.

Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240411074134.30922-2-yaoma@linux.alibaba.com
2024-04-12 17:08:05 +02:00
Andy Shevchenko
81e4cb0fd4 genirq: Update MAINTAINERS to include interrupt related header files
Interrupt related header files seems orphaned, add them to the respective
subsystem records.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240405185726.3931703-2-andriy.shevchenko@linux.intel.com
2024-04-11 12:29:40 +02:00
Andy Shevchenko
63752ad191 genirq: Fix trivial typo in the comment CPY ==> COPY
IRQ_SET_MASK_NOCOPY is defined with 'O' letter. Fix the comment.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240405185726.3931703-3-andriy.shevchenko@linux.intel.com
2024-04-11 12:29:40 +02:00
Tiezhu Yang
42a7d88766 irqchip/loongson: Select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP for IRQ_LOONGARCH_CPU
An interrupt's effective affinity can only be different from its configured
affinity if there are multiple CPUs. Make it clear that this option is only
meaningful when SMP is enabled. Otherwise, there exists "WARNING: unmet
direct dependencies detected for GENERIC_IRQ_EFFECTIVE_AFF_MASK" when make
menuconfig if CONFIG_SMP is not set on LoongArch.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240326121130.16622-3-yangtiezhu@loongson.cn
2024-04-09 11:03:16 +02:00
Tiezhu Yang
a64003da0e irqchip/loongson-eiointc: Set CPU affinity only on SMP machines for LoongArch
According to the code comment of "struct irq_chip", the member
"irq_set_affinity" is to set the CPU affinity on SMP machines, so define
and call eiointc_set_irq_affinity() only under CONFIG_SMP.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240326121130.16622-4-yangtiezhu@loongson.cn
2024-04-09 11:03:16 +02:00
Zenghui Yu
b327708798 irqchip/loongson-pch-msi: Fix off-by-one on allocation error path
When pch_msi_parent_domain_alloc() returns an error, there is an off-by-one
in the number of interrupts to be freed.

Fix it by passing the number of successfully allocated interrupts, instead of the
relative index of the last allocated one.

Fixes: 632dcc2c75 ("irqchip: Add Loongson PCH MSI controller")
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Link: https://lore.kernel.org/r/20240327142334.1098-1-yuzenghui@huawei.com
2024-04-09 11:03:16 +02:00
Zenghui Yu
ff3669a71a irqchip/alpine-msi: Fix off-by-one in allocation error path
When alpine_msix_gic_domain_alloc() fails, there is an off-by-one in the
number of interrupts to be freed.

Fix it by passing the number of successfully allocated interrupts, instead
of the relative index of the last allocated one.

Fixes: 3841245e84 ("irqchip/alpine-msi: Fix freeing of interrupts on allocation error path")
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240327142305.1048-1-yuzenghui@huawei.com
2024-04-09 11:03:15 +02:00
Colin Ian King
14ced47564 irqchip/riscv-aplic: Fix spelling mistake "forwared" -> "forwarded"
There is a spelling mistake in a dev_info message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240327110516.283738-1-colin.i.king@gmail.com
2024-04-09 11:03:15 +02:00
Andy Shevchenko
a2ea3cd783 irqdomain: Check virq for 0 before use in irq_dispose_mapping()
It's a bit hard to read the logic since the virq is used before checking it
for 0. Rearrange the code to make it better to understand.

This, in particular, should clearly answer the question whether the caller
needs to perform this check or not, and there are plenty of places for both
variants, confirming a confusion.

Fun fact that the new code is shorter:

  Function                                     old     new   delta
  irq_dispose_mapping                          278     271      -7
  Total: Before=11625, After=11618, chg -0.06%

when compiled by GCC on Debian for x86_64.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240405190105.3932034-1-andriy.shevchenko@linux.intel.com
2024-04-08 12:08:58 +02:00
Keguang Zhang
7b6f0f278d irqchip: Remove redundant irq_chip::name initialization
Since commit 021a8ca2ba ("genirq/generic-chip: Fix the irq_chip name for
/proc/interrupts"), the chip name of all chip types are set to the same
name by irq_init_generic_chip() now. So the initialization to the same
irq_chip name are no longer needed. Drop them.

Signed-off-by: Keguang Zhang <keguang.zhang@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://lore.kernel.org/r/20240311115344.72567-1-keguang.zhang@gmail.com
2024-03-25 17:38:29 +01:00
Anup Patel
f4e116b2c5 MAINTAINERS: Add entry for RISC-V AIA drivers
Add myself as maintainer for RISC-V AIA drivers including the
RISC-V INTC driver which supports both AIA and non-AIA platforms.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20240307140307.646078-10-apatel@ventanamicro.com
2024-03-25 17:38:29 +01:00
Anup Patel
0eebc69db3 RISC-V: Select APLIC and IMSIC drivers
The QEMU virt machine supports AIA emulation and quite a few RISC-V
platforms with AIA support are under development so select APLIC and
IMSIC drivers for all RISC-V platforms.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20240307140307.646078-9-apatel@ventanamicro.com
2024-03-25 17:38:29 +01:00
Anup Patel
ca8df97fe6 irqchip/riscv-aplic: Add support for MSI-mode
The RISC-V advanced platform-level interrupt controller (APLIC) has
two modes of operation: 1) Direct mode and 2) MSI mode.
(For more details, refer https://github.com/riscv/riscv-aia)

In APLIC MSI-mode, wired interrupts are forwared as message signaled
interrupts (MSIs) to CPUs via IMSIC.

Extend the existing APLIC irqchip driver to support MSI-mode for
RISC-V platforms having both wired interrupts and MSIs.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20240307140307.646078-8-apatel@ventanamicro.com
2024-03-25 17:38:29 +01:00
Anup Patel
2333df5ae5 irqchip: Add RISC-V advanced PLIC driver for direct-mode
The RISC-V advanced interrupt architecture (AIA) specification defines
advanced platform-level interrupt controller (APLIC) which has two modes
of operation: 1) Direct mode and 2) MSI mode.
(For more details, refer https://github.com/riscv/riscv-aia)

In APLIC direct-mode, wired interrupts are forwared to CPUs (or HARTs)
as a local external interrupt.

Add a platform irqchip driver for the RISC-V APLIC direct-mode to
support RISC-V platforms having only wired interrupts.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20240307140307.646078-7-apatel@ventanamicro.com
2024-03-25 17:38:29 +01:00
Anup Patel
3b806a5a1a dt-bindings: interrupt-controller: Add RISC-V advanced PLIC
Add DT bindings document for RISC-V advanced platform level interrupt
controller (APLIC) defined by the RISC-V advanced interrupt architecture
(AIA) specification.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20240307140307.646078-6-apatel@ventanamicro.com
2024-03-25 17:38:28 +01:00
Anup Patel
5c5a71d043 irqchip/riscv-imsic: Add device MSI domain support for PCI devices
The Linux PCI framework supports per-device MSI domains for PCI devices
so extend the IMSIC driver to allow PCI per-device MSI domains.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20240307140307.646078-5-apatel@ventanamicro.com
2024-03-25 17:38:28 +01:00
Anup Patel
027e125acd irqchip/riscv-imsic: Add device MSI domain support for platform devices
The Linux platform MSI support allows per-device MSI domains so add
a platform irqchip driver for RISC-V IMSIC which provides a base IRQ
domain with MSI parent support for platform device domains.

The IMSIC platform driver assumes that the IMSIC state is already
initialized by the IMSIC early driver.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20240307140307.646078-4-apatel@ventanamicro.com
2024-03-25 17:38:28 +01:00
Anup Patel
21a8f8a0eb irqchip: Add RISC-V incoming MSI controller early driver
The RISC-V advanced interrupt architecture (AIA) specification
defines a new MSI controller called incoming message signalled
interrupt controller (IMSIC) which manages MSI on per-HART (or
per-CPU) basis. It also supports IPIs as software injected MSIs.
(For more details refer https://github.com/riscv/riscv-aia)

Add an early irqchip driver for RISC-V IMSIC which sets up the
IMSIC state and provide IPIs.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20240307140307.646078-3-apatel@ventanamicro.com
2024-03-25 17:38:28 +01:00
Anup Patel
0151a8db49 dt-bindings: interrupt-controller: Add RISC-V incoming MSI controller
Add DT bindings document for the RISC-V incoming MSI controller (IMSIC)
defined by the RISC-V advanced interrupt architecture (AIA) specification.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240307140307.646078-2-apatel@ventanamicro.com
2024-03-25 17:38:28 +01:00
Biju Das
46efb3053f irqchip/renesas-rzg2l: Simplify rzg2l_irqc_irq_{en,dis}able()
Simplify rzg2l_irqc_irq_{en,dis}able() by moving common code to
rzg2l_tint_irq_endisable().

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2024-03-25 17:38:28 +01:00
Linus Torvalds
4cece76496 Linux 6.9-rc1 2024-03-24 14:10:05 -07:00
Linus Torvalds
ab8de2dbfc EFI fixes for v6.9 #2
- Fix logic that is supposed to prevent placement of the kernel image
   below LOAD_PHYSICAL_ADDR
 - Use the firmware stack in the EFI stub when running in mixed mode
 - Clear BSS only once when using mixed mode
 - Check efi.get_variable() function pointer for NULL before trying to
   call it
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCZgCRgwAKCRAwbglWLn0t
 XHozAP9jLdeGs1ReYZAn+W0QtW/SJHJznoPiHcktdNKG4rNX3QD9G3URu0f4jKCG
 yvjw8qHM1pC2cihXXjABjf7gL7g6LAE=
 =cNP7
 -----END PGP SIGNATURE-----

Merge tag 'efi-fixes-for-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fixes from Ard Biesheuvel:

 - Fix logic that is supposed to prevent placement of the kernel image
   below LOAD_PHYSICAL_ADDR

 - Use the firmware stack in the EFI stub when running in mixed mode

 - Clear BSS only once when using mixed mode

 - Check efi.get_variable() function pointer for NULL before trying to
   call it

* tag 'efi-fixes-for-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: fix panic in kdump kernel
  x86/efistub: Don't clear BSS twice in mixed mode
  x86/efistub: Call mixed mode boot services on the firmware's stack
  efi/libstub: fix efi_random_alloc() to allocate memory at alloc_min or higher address
2024-03-24 13:54:06 -07:00
Linus Torvalds
5e74df2f8f A set of x86 fixes:
- Ensure that the encryption mask at boot is properly propagated on
     5-level page tables, otherwise the PGD entry is incorrectly set to
     non-encrypted, which causes system crashes during boot.
 
   - Undo the deferred 5-level page table setup as it cannot work with
     memory encryption enabled.
 
   - Prevent inconsistent XFD state on CPU hotplug, where the MSR is reset
     to the default value but the cached variable is not, so subsequent
     comparisons might yield the wrong result and as a consequence the
     result prevents updating the MSR.
 
   - Register the local APIC address only once in the MPPARSE enumeration to
     prevent triggering the related WARN_ONs() in the APIC and topology code.
 
   - Handle the case where no APIC is found gracefully by registering a fake
     APIC in the topology code. That makes all related topology functions
     work correctly and does not affect the actual APIC driver code at all.
 
   - Don't evaluate logical IDs during early boot as the local APIC IDs are
     not yet enumerated and the invoked function returns an error
     code. Nothing requires the logical IDs before the final CPUID
     enumeration takes place, which happens after the enumeration.
 
   - Cure the fallout of the per CPU rework on UP which misplaced the
     copying of boot_cpu_data to per CPU data so that the final update to
     boot_cpu_data got lost which caused inconsistent state and boot
     crashes.
 
   - Use copy_from_kernel_nofault() in the kprobes setup as there is no
     guarantee that the address can be safely accessed.
 
   - Reorder struct members in struct saved_context to work around another
     kmemleak false positive
 
   - Remove the buggy code which tries to update the E820 kexec table for
     setup_data as that is never passed to the kexec kernel.
 
   - Update the resource control documentation to use the proper units.
 
   - Fix a Kconfig warning observed with tinyconfig
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmYAUH4THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoXzREAC/HVB7yzUEbjbh7dyYRBEgFU19bcyC
 JKf9HVmEHj03HstUxF1dxguUhwfHVPNTWpjmy/fRwxqgM9JG+QpV6T4DIldWqchv
 AUYFrQBMvql8hTKxRa/Ny75d2IqKPgEEGUuyU+ZHAzEEPwhKrbtVRDPuEiMxpd5I
 9B1Pya4EzUyOv1UhPIg7PRoya1msimBZ0mCw4In6ri6xVRm1uC3Ln4LZPylxn96l
 f77rz5UToUw0gfgDaezF0z4ml1phGEdSX0Z3hhD0PX12wbJGEdvPzL0qTgEq72Ad
 AeLmHx4K8z2zoHMHK7iTEwjoplQxGsWLoezh22cVEEJX0dtzHz6R0ftBCa6uzATJ
 C8FF1oDDHAhTL94YmVSTZHr6AdJ6LwgYHO3zXZUhxuB7PNXAT4FmT0zgU1fU3sC1
 U/1mIFdgOEUOlGll2Ra5uTUKc0K/dc+yC9dcbz37Kwj3KlfqTN+5BWocjySkHomr
 gcv37aU1TJGSC/D1lYWTDWGKVbbP5lk+KIGICT5SBKn0METa/wOo8dE6+T1kIwvS
 t2QTlJdzilLcWGVQ8GiNjjRxFtRKY5i9Shi4K+wUvCee4/XJzRrpxrCEY8w/qceV
 hc3kfUIon3TCv8+rnlSuNRZBvmFhXMYwMt0gQv4YywB+aOITKTzbGUOazLtRNKAH
 lFCnBRS55AB8mg==
 =WyQ2
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:

 - Ensure that the encryption mask at boot is properly propagated on
   5-level page tables, otherwise the PGD entry is incorrectly set to
   non-encrypted, which causes system crashes during boot.

 - Undo the deferred 5-level page table setup as it cannot work with
   memory encryption enabled.

 - Prevent inconsistent XFD state on CPU hotplug, where the MSR is reset
   to the default value but the cached variable is not, so subsequent
   comparisons might yield the wrong result and as a consequence the
   result prevents updating the MSR.

 - Register the local APIC address only once in the MPPARSE enumeration
   to prevent triggering the related WARN_ONs() in the APIC and topology
   code.

 - Handle the case where no APIC is found gracefully by registering a
   fake APIC in the topology code. That makes all related topology
   functions work correctly and does not affect the actual APIC driver
   code at all.

 - Don't evaluate logical IDs during early boot as the local APIC IDs
   are not yet enumerated and the invoked function returns an error
   code. Nothing requires the logical IDs before the final CPUID
   enumeration takes place, which happens after the enumeration.

 - Cure the fallout of the per CPU rework on UP which misplaced the
   copying of boot_cpu_data to per CPU data so that the final update to
   boot_cpu_data got lost which caused inconsistent state and boot
   crashes.

 - Use copy_from_kernel_nofault() in the kprobes setup as there is no
   guarantee that the address can be safely accessed.

 - Reorder struct members in struct saved_context to work around another
   kmemleak false positive

 - Remove the buggy code which tries to update the E820 kexec table for
   setup_data as that is never passed to the kexec kernel.

 - Update the resource control documentation to use the proper units.

 - Fix a Kconfig warning observed with tinyconfig

* tag 'x86-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/64: Move 5-level paging global variable assignments back
  x86/boot/64: Apply encryption mask to 5-level pagetable update
  x86/cpu: Add model number for another Intel Arrow Lake mobile processor
  x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD
  Documentation/x86: Document that resctrl bandwidth control units are MiB
  x86/mpparse: Register APIC address only once
  x86/topology: Handle the !APIC case gracefully
  x86/topology: Don't evaluate logical IDs during early boot
  x86/cpu: Ensure that CPU info updates are propagated on UP
  kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe address
  x86/pm: Work around false positive kmemleak report in msr_build_context()
  x86/kexec: Do not update E820 kexec table for setup_data
  x86/config: Fix warning for 'make ARCH=x86_64 tinyconfig'
2024-03-24 11:13:56 -07:00
Linus Torvalds
b136f68eb0 A single update for the documentation of the base_slice_ns tunable to
clarify that any value which is less than the tick slice has no effect
 because the scheduler tick is not guaranteed to happen within the set time
 slice.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmYASukTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoXsRD/41H+nJ9o2Mb5ImF8NfZr2u/bIzf9Rd
 BZxlWsGbTmzhvNN6WJXZ89nM61jUqYlQcJeAYhwzjmCgNDnC/BTa3umQjnWha+M5
 fiawvzJxPjMt4TzMVPAtyrjxpOmfD9iDNtaczr2gv4PwMl2Ko17iItOJCNX0rlrG
 OBFZ0s9cBc4X+OUmVnuetkoJVm4knsLxfHA1ETJ9E0DlWOwdGsAOnue+KsVpecmF
 VSi5bgWT7fMyayOxomBynoDhjxfMCBLygFyctwJqcCWPpiEkt0nMehBzV32V6p/V
 go2rSbBw2w7kl1v2lBngVCdFlVbvMzIGIYQ6zJweVjDKX51+UGs0JNwlU1bIVBih
 dL+6tms+RUqO39Td4xIf7MeVAfplxRJg7fvn/b9oxM+TwwiOsjSi/Pir3h9w6sm8
 vo/JwO2VQtGxAZZA7UJFlZ/8QE06WTeNLO/giatETL+6OILEsm71qqoRIl1h2/d6
 ePqewAqmo8WTSHYjLm2IKzgRfKU8ko04ZJ6Er+12D3MNa3+Ezhx+8lIX7X9j5vtn
 eczwxCEZjvS3oF0O+NKu8mcS57vrDBE1qmGW97CWNLO0FuxF8yzp3RZFApCGCfee
 rfD38Epda0cQyRe2f6F1A0jFyOA5rQDMjWzivrQHbyCULwEuwVQP+UFzaCorzWSU
 UCjInBTY2zLPoA==
 =K76J
 -----END PGP SIGNATURE-----

Merge tag 'sched-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler doc clarification from Thomas Gleixner:
 "A single update for the documentation of the base_slice_ns tunable to
  clarify that any value which is less than the tick slice has no effect
  because the scheduler tick is not guaranteed to happen within the set
  time slice"

* tag 'sched-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/doc: Update documentation for base_slice_ns and CONFIG_HZ relation
2024-03-24 11:11:05 -07:00
Linus Torvalds
864ad046c1 dma-mapping fixes for Linux 6.9
This has a set of swiotlb alignment fixes for sometimes very long
 standing bugs from Will.  We've been discussion them for a while and they
 should be solid now.
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmX/bmILHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYOuKQ//cUR3EywszAc04x8dIYsfegFGdQxUeJD0+1elAPss
 ELiqrlg5A/Yn4uHKpXjWbvJ+v1Ywh3o8+vlgUiG4aFeg4xEd+FsJqm2SDa3jhdMP
 2hV8pwB92kpkKCxyCAqx8O/4o4fY++KCFsOtnammEudFjurJaCrRTlauOn6D1t/i
 JsBYCFtjFIhIPHQe7jmZ6dNiLEfiIJ+q8ImW+UxuB+gOGgU8C4VVW3tHuo3KeU7n
 yVOcz4yJrQ4xYzG3RKtaU0FE0ybA860xwiA5oPvqpI9A2ISGovv7ik0QCUlHXhff
 z+iL8Lj/KsOucq5pBDhbRYeN2n4VVogEwb/hut6mgyqj1ESjqeZaLioVHqOTDbmB
 +vNTVBt6OGTOq1YkNKttK9vBBXs5RdZSBalzBG/QO1ewmrNVVZ7z8fWXVRDipoIl
 sAIXmI8xAy5TNL6UbJ+RDfYeLlTzHjXGKQGB49gumOA8s4w5P5v9diYegX6GcVZV
 PKkYLOvprwcyi8Xxx2mNxFDxh+LWqzMYqzwsN7AoRTW4TRc7Tel0G6Axs+V/cL/Y
 23IHfFfT2HqDUM5PuBfUcgCrtw1hinuD80xqXVcvaU+AYoQhrGHJFLHkj6lTwV2b
 hmuul170froI2A/vm8yGGqcn2Me55AexlpMab+UWL+iisGtqFTWi9b9vK/2Vi+Zj
 wBg=
 =Xaob
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:
 "This has a set of swiotlb alignment fixes for sometimes very long
  standing bugs from Will. We've been discussion them for a while and
  they should be solid now"

* tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE
  iommu/dma: Force swiotlb_max_mapping_size on an untrusted device
  swiotlb: Fix alignment checks when both allocation and DMA masks are present
  swiotlb: Honour dma_alloc_coherent() alignment in swiotlb_alloc()
  swiotlb: Enforce page alignment in swiotlb_alloc()
  swiotlb: Fix double-allocation of slots due to broken alignment handling
2024-03-24 10:45:31 -07:00
Oleksandr Tymoshenko
62b71cd73d efi: fix panic in kdump kernel
Check if get_next_variable() is actually valid pointer before
calling it. In kdump kernel this method is set to NULL that causes
panic during the kexec-ed kernel boot.

Tested with QEMU and OVMF firmware.

Fixes: bad267f9e1 ("efi: verify that variable services are supported")
Signed-off-by: Oleksandr Tymoshenko <ovt@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-03-24 09:28:33 +01:00
Ard Biesheuvel
df7ecce842 x86/efistub: Don't clear BSS twice in mixed mode
Clearing BSS should only be done once, at the very beginning.
efi_pe_entry() is the entrypoint from the firmware, which may not clear
BSS and so it is done explicitly. However, efi_pe_entry() is also used
as an entrypoint by the mixed mode startup code, in which case BSS will
already have been cleared, and doing it again at this point will corrupt
global variables holding the firmware's GDT/IDT and segment selectors.

So make the memset() conditional on whether the EFI stub is running in
native mode.

Fixes: b3810c5a2c ("x86/efistub: Clear decompressor BSS in native EFI entrypoint")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-03-24 09:28:33 +01:00
Ard Biesheuvel
cefcd4fe2e x86/efistub: Call mixed mode boot services on the firmware's stack
Normally, the EFI stub calls into the EFI boot services using the stack
that was live when the stub was entered. According to the UEFI spec,
this stack needs to be at least 128k in size - this might seem large but
all asynchronous processing and event handling in EFI runs from the same
stack and so quite a lot of space may be used in practice.

In mixed mode, the situation is a bit different: the bootloader calls
the 32-bit EFI stub entry point, which calls the decompressor's 32-bit
entry point, where the boot stack is set up, using a fixed allocation
of 16k. This stack is still in use when the EFI stub is started in
64-bit mode, and so all calls back into the EFI firmware will be using
the decompressor's limited boot stack.

Due to the placement of the boot stack right after the boot heap, any
stack overruns have gone unnoticed. However, commit

  5c4feadb00 ("x86/decompressor: Move global symbol references to C code")

moved the definition of the boot heap into C code, and now the boot
stack is placed right at the base of BSS, where any overruns will
corrupt the end of the .data section.

While it would be possible to work around this by increasing the size of
the boot stack, doing so would affect all x86 systems, and mixed mode
systems are a tiny (and shrinking) fraction of the x86 installed base.

So instead, record the firmware stack pointer value when entering from
the 32-bit firmware, and switch to this stack every time a EFI boot
service call is made.

Cc: <stable@kernel.org> # v6.1+
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-03-24 09:28:32 +01:00
Tom Lendacky
9843231c97 x86/boot/64: Move 5-level paging global variable assignments back
Commit 63bed96604 ("x86/startup_64: Defer assignment of 5-level paging
global variables") moved assignment of 5-level global variables to later
in the boot in order to avoid having to use RIP relative addressing in
order to set them. However, when running with 5-level paging and SME
active (mem_encrypt=on), the variables are needed as part of the page
table setup needed to encrypt the kernel (using pgd_none(), p4d_offset(),
etc.). Since the variables haven't been set, the page table manipulation
is done as if 4-level paging is active, causing the system to crash on
boot.

While only a subset of the assignments that were moved need to be set
early, move all of the assignments back into check_la57_support() so that
these assignments aren't spread between two locations. Instead of just
reverting the fix, this uses the new RIP_REL_REF() macro when assigning
the variables.

Fixes: 63bed96604 ("x86/startup_64: Defer assignment of 5-level paging global variables")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/2ca419f4d0de719926fd82353f6751f717590a86.1711122067.git.thomas.lendacky@amd.com
2024-03-24 05:00:36 +01:00
Tom Lendacky
4d0d7e7852 x86/boot/64: Apply encryption mask to 5-level pagetable update
When running with 5-level page tables, the kernel mapping PGD entry is
updated to point to the P4D table. The assignment uses _PAGE_TABLE_NOENC,
which, when SME is active (mem_encrypt=on), results in a page table
entry without the encryption mask set, causing the system to crash on
boot.

Change the assignment to use _PAGE_TABLE instead of _PAGE_TABLE_NOENC so
that the encryption mask is set for the PGD entry.

Fixes: 533568e06b ("x86/boot/64: Use RIP_REL_REF() to access early_top_pgt[]")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/8f20345cda7dbba2cf748b286e1bc00816fe649a.1711122067.git.thomas.lendacky@amd.com
2024-03-24 05:00:35 +01:00
Tony Luck
8a8a9c9047 x86/cpu: Add model number for another Intel Arrow Lake mobile processor
This one is the regular laptop CPU.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240322161725.195614-1-tony.luck@intel.com
2024-03-24 04:08:10 +01:00
Adamos Ttofari
10e4b5166d x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD
Commit 672365477a ("x86/fpu: Update XFD state where required") and
commit 8bf26758ca ("x86/fpu: Add XFD state to fpstate") introduced a
per CPU variable xfd_state to keep the MSR_IA32_XFD value cached, in
order to avoid unnecessary writes to the MSR.

On CPU hotplug MSR_IA32_XFD is reset to the init_fpstate.xfd, which
wipes out any stale state. But the per CPU cached xfd value is not
reset, which brings them out of sync.

As a consequence a subsequent xfd_update_state() might fail to update
the MSR which in turn can result in XRSTOR raising a #NM in kernel
space, which crashes the kernel.

To fix this, introduce xfd_set_state() to write xfd_state together
with MSR_IA32_XFD, and use it in all places that set MSR_IA32_XFD.

Fixes: 672365477a ("x86/fpu: Update XFD state where required")
Signed-off-by: Adamos Ttofari <attofari@amazon.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240322230439.456571-1-chang.seok.bae@intel.com

Closes: https://lore.kernel.org/lkml/20230511152818.13839-1-attofari@amazon.de
2024-03-24 04:03:54 +01:00
Tony Luck
a8ed59a3a8 Documentation/x86: Document that resctrl bandwidth control units are MiB
The memory bandwidth software controller uses 2^20 units rather than
10^6. See mbm_bw_count() which computes bandwidth using the "SZ_1M"
Linux define for 0x00100000.

Update the documentation to use MiB when describing this feature.
It's too late to fix the mount option "mba_MBps" as that is now an
established user interface.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240322182016.196544-1-tony.luck@intel.com
2024-03-24 03:58:43 +01:00
Linus Torvalds
70293240c5 Two regression fixes for the timer and timer migration code:
1) Prevent endless timer requeuing which is caused by two CPUs racing out
      of idle. This happens when the last CPU goes idle and therefore has to
      ensure to expire the pending global timers and some other CPU come out
      of idle at the same time and the other CPU wins the race and expires
      the global queue. This causes the last CPU to chase ghost timers
      forever and reprogramming it's clockevent device endlessly.
 
      Cure this by re-evaluating the wakeup time unconditionally.
 
   2) The split into local (pinned) and global timers in the timer wheel
      caused a regression for NOHZ full as it broke the idle tracking of
      global timers. On NOHZ full this prevents an self IPI being sent which
      in turn causes the timer to be not programmed and not being expired on
      time.
 
      Restore the idle tracking for the global timer base so that the self
      IPI condition for NOHZ full is working correctly again.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmX/Mn8THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYX0D/93fKa9qp+qBs72Vvctj4bJuk6auel7
 GRWx3Vc//kk4VOJFCteQ2eykr1/fsLuibPb67iiKp41stvaWKeJPj0kD+RUuVf8E
 dPGPpPU+qY5ynhoiqJekZL7+5NSA48y4bDc00a4U31MPpEcJB5y94zFOCKMiCWtk
 6Tf6I6168bsSFYvKqb2LImVoowu/bf7bXLVUk1HcdNnSC7bfx+yN8nkQ1zSy3K2M
 IyE1CQqMyDmdfKW9Vs68ooTIpuA0n7bxOuXbVaFdJyiJ035v3Z3+m2vQmrHHLDdz
 MfqHbFDEomDC+zfiugFvuxyxLIi2Gf/NXPibu6OxLkVe2Pu1KUJkhFbgZVUR2W6A
 EU6SmZr77zMPAMZyqG8OJqTqlCPiJfJX2KMWDF+ezbXBt+sbMe6LfazPqj9TopnN
 /ECMCt77xl1POCEnP81hPWizKsqf8HCTDDZEi9UlqbIxT3TgrZFlTnKfmmciwWiP
 uGoUXgZmi8qJ+lSGNTVUgbTmIbazvUz43sKgfndUy2yxeCb/SBlx1/8Ys2ntszOy
 XDOF8QroPH0zXlBaYo0QVZbOdB4O0/En1qZuGScBmoUY7bRr0NRD/C3ObhvQHI7C
 iguAsnB+zirwwZSTDzwhQDhXAtWgSaqBB7pb4aCDxC0AvnKtL5HKdDNeIafpPJeA
 4Xh40iu44u/V9w==
 =lY5O
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
 "Two regression fixes for the timer and timer migration code:

   - Prevent endless timer requeuing which is caused by two CPUs racing
     out of idle. This happens when the last CPU goes idle and therefore
     has to ensure to expire the pending global timers and some other
     CPU come out of idle at the same time and the other CPU wins the
     race and expires the global queue. This causes the last CPU to
     chase ghost timers forever and reprogramming it's clockevent device
     endlessly.

     Cure this by re-evaluating the wakeup time unconditionally.

   - The split into local (pinned) and global timers in the timer wheel
     caused a regression for NOHZ full as it broke the idle tracking of
     global timers. On NOHZ full this prevents an self IPI being sent
     which in turn causes the timer to be not programmed and not being
     expired on time.

     Restore the idle tracking for the global timer base so that the
     self IPI condition for NOHZ full is working correctly again"

* tag 'timers-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Fix removed self-IPI on global timer's enqueue in nohz_full
  timers/migration: Fix endless timer requeue after idle interrupts
2024-03-23 14:49:25 -07:00
Linus Torvalds
00164f477f A set of updates for clocksource and clockevent drivers:
- A fix for the prescaler of the ARM global timer where the prescaler
     mask define only covered 4 bits while it is actully 8 bits wide. This
     restricted obviously the possible range of the prescaler adjustments.
 
   - A fix for the RISC-V timer which prevents a timer interrupt being
     raised while the timer is initialized.
 
   - A set of device tree updates to support new system on chips in various
     drivers.
 
   - Kernel-doc and other cleanups all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmX/MDcTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYofUcD/9hZjJp11tNJNibLGjJrZUIg1Zpiu7c
 avlrG/KT9BgElyp8D4WG3GTY7ng9vzJ7XNXG5eXyrkMClzlDYmXt5oeoUM3KRFDg
 G/kX0ee/+vcP5IKVsYR3w1MQ7QGe7VfE5FU4CxtR3OjG3T5Gueim1AKBtM26QazK
 wHe1/yp0vIPjZO1awlWQm9CJ+DSD2Mb1oPo8c9Bitd5KXXjPg8uR3bk2kOVYMdzC
 B6xYvQWF3Pl7NXcQKOnIazqsNlphsYiBGc5ZbKp5zjDggmp/ChBedt6ePCccU3DO
 VXE6D4eyZ5hmQbFzSEZUYsWcNzkeH8ZWnxkjeAoG60Y2vErLVsceytaIja061VOl
 BP2j8QbQ7ztwLOTtZc8KwJsVfxBh1xhwAUwe5Mpl171m7K2n+zquR8Clq4SqL2DK
 mjUcHKIZzyhZ4HX3rRwVOpJmfbJlRxSRjiLp+oPulJBB9pLMjarxwQ6Ij68yrPst
 kVow4Ur45abmXsDRj3lc3+bDcfMr8AM3l/HjD7AZamC2TAvkp1hkYPghhazdolx5
 qQA3swdQl0tq49SNvvLbVRjRiCiQmUdpOGpnevvqxEO/BESGquVfPrihOYqIQuwh
 +yurmba5L8uxpsBCPtGfiMse9fbE7GVQJkMqhr9KNNOfER30rEe8bmqr6r71+uZJ
 19Jp2U76zNfVXA==
 =M9Gu
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull more clocksource updates from Thomas Gleixner:
 "A set of updates for clocksource and clockevent drivers:

   - A fix for the prescaler of the ARM global timer where the prescaler
     mask define only covered 4 bits while it is actully 8 bits wide.
     This obviously restricted the possible range of prescaler
     adjustments

   - A fix for the RISC-V timer which prevents a timer interrupt being
     raised while the timer is initialized

   - A set of device tree updates to support new system on chips in
     various drivers

   - Kernel-doc and other cleanups all over the place"

* tag 'timers-core-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/drivers/timer-riscv: Clear timer interrupt on timer initialization
  dt-bindings: timer: Add support for cadence TTC PWM
  clocksource/drivers/arm_global_timer: Simplify prescaler register access
  clocksource/drivers/arm_global_timer: Guard against division by zero
  clocksource/drivers/arm_global_timer: Make gt_target_rate unsigned long
  dt-bindings: timer: add Ralink SoCs system tick counter
  clocksource: arm_global_timer: fix non-kernel-doc comment
  clocksource/drivers/arm_global_timer: Remove stray tab
  clocksource/drivers/arm_global_timer: Fix maximum prescaler value
  clocksource/drivers/imx-sysctr: Add i.MX95 support
  clocksource/drivers/imx-sysctr: Drop use global variables
  dt-bindings: timer: nxp,sysctr-timer: support i.MX95
  dt-bindings: timer: renesas: ostm: Document RZ/Five SoC
  dt-bindings: timer: renesas,tmu: Document input capture interrupt
  clocksource/drivers/ti-32K: Fix misuse of "/**" comment
  clocksource/drivers/stm32: Fix all kernel-doc warnings
  dt-bindings: timer: exynos4210-mct: Add google,gs101-mct compatible
  clocksource/drivers/imx: Fix -Wunused-but-set-variable warning
2024-03-23 14:42:45 -07:00
Linus Torvalds
1a39193137 A series of fixes for the Renesas RZG21 interrupt chip driver to prevent
spurious and misrouted interrupts.
 
   - Ensure that posted writes are flushed in the eoi() callback
 
   - Ensure that interrupts are masked at the chip level when the trigger
     type is changed
 
   - Clear the interrupt status register when setting up edge type trigger
     modes.
 
   - Ensure that the trigger type and routing information is set before the
     interrupt is enabled.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmX/LiETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYofNND/95Mf0Gu9HOYLYaq3vO37B8oA+LcyGP
 SAm5j7VT5ymTLxMGZgVJk5gpaw5jysUVH6TfMG9tk861vGvQGAoAVd7hVgJaZSWz
 +c7WgQQY9YTGOjZqE726GnB/uHZUr/m8Ls7z+H7IEliB9T9Gruvu3pMkfrAZjg6t
 exPxDMdr1d0/j7t/7WVGbincqkz2lJUxW/6BdPFgoJ7Ez0CpePb/TkTGmO7+GXHz
 WQi3xBAVhaPcsTJ4PZnMxm++XG0dAZqyAr4WVjaMKCuDwrultBpDUgHhYCFKHJ/D
 eNNCyPbXzrpEujO9KUdcDSrQv//J44SXejXDNhgGamtRQSkAsPLUNH/zHRtKVOjL
 oJWF7UKX6a1ySdmTs3PadgO/g8ZVkla8IL0u/yVIaVttT7ix32hf66XW8rIpfBw1
 cCc9GXmyXBQs8WRtQAf+OjBcPeyfD3+ZI0m1GdA3ATpK455+TImJ5D2++6O05u8T
 3m62UtJH5KQvKlEUESflSxGEhJCZz/MoGgJwEsJ6lG8b0gQgbhVl2l9r4qOJ6f6y
 rFUTvJKO+UTaPRRkH5xWITpf3JdxmAAHeAMBUzSND3JoPVv60qkVhgGpx/D0Xiwo
 lJVo36g09uEpN1igNTJwi6C/ZnhHa6tIZAaGs5/BNqE3+W7e9p2U8NqCxrRfArq3
 TInh1wSXok2rsw==
 =zIKG
 -----END PGP SIGNATURE-----

Merge tag 'irq-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "A series of fixes for the Renesas RZG21 interrupt chip driver to
  prevent spurious and misrouted interrupts.

   - Ensure that posted writes are flushed in the eoi() callback

   - Ensure that interrupts are masked at the chip level when the
     trigger type is changed

   - Clear the interrupt status register when setting up edge type
     trigger modes

   - Ensure that the trigger type and routing information is set before
     the interrupt is enabled"

* tag 'irq-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/renesas-rzg2l: Do not set TIEN and TINT source at the same time
  irqchip/renesas-rzg2l: Prevent spurious interrupts when setting trigger type
  irqchip/renesas-rzg2l: Rename rzg2l_irq_eoi()
  irqchip/renesas-rzg2l: Rename rzg2l_tint_eoi()
  irqchip/renesas-rzg2l: Flush posted write in irq_eoi()
2024-03-23 14:30:38 -07:00