5245 Commits

Author SHA1 Message Date
Catalin Marinas
0c5ade742e Merge branches 'for-next/reorg-va-space', 'for-next/rust-for-arm64', 'for-next/misc', 'for-next/daif-cleanup', 'for-next/kselftest', 'for-next/documentation', 'for-next/sysreg' and 'for-next/dpisa', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf: (39 commits)
  docs: perf: Fix build warning of hisi-pcie-pmu.rst
  perf: starfive: Only allow COMPILE_TEST for 64-bit architectures
  MAINTAINERS: Add entry for StarFive StarLink PMU
  docs: perf: Add description for StarFive's StarLink PMU
  dt-bindings: perf: starfive: Add JH8100 StarLink PMU
  perf: starfive: Add StarLink PMU support
  docs: perf: Update usage for target filter of hisi-pcie-pmu
  drivers/perf: hisi_pcie: Merge find_related_event() and get_event_idx()
  drivers/perf: hisi_pcie: Relax the check on related events
  drivers/perf: hisi_pcie: Check the target filter properly
  drivers/perf: hisi_pcie: Add more events for counting TLP bandwidth
  drivers/perf: hisi_pcie: Fix incorrect counting under metric mode
  drivers/perf: hisi_pcie: Introduce hisi_pcie_pmu_get_event_ctrl_val()
  drivers/perf: hisi_pcie: Rename hisi_pcie_pmu_{config,clear}_filter()
  drivers/perf: hisi: Enable HiSilicon Erratum 162700402 quirk for HIP09
  perf/arm_cspmu: Add devicetree support
  dt-bindings/perf: Add Arm CoreSight PMU
  perf/arm_cspmu: Simplify counter reset
  perf/arm_cspmu: Simplify attribute groups
  perf/arm_cspmu: Simplify initialisation
  ...

* for-next/reorg-va-space:
  : Reorganise the arm64 kernel VA space in preparation for LPA2 support
  : (52-bit VA/PA).
  arm64: kaslr: Adjust randomization range dynamically
  arm64: mm: Reclaim unused vmemmap region for vmalloc use
  arm64: vmemmap: Avoid base2 order of struct page size to dimension region
  arm64: ptdump: Discover start of vmemmap region at runtime
  arm64: ptdump: Allow all region boundaries to be defined at boot time
  arm64: mm: Move fixmap region above vmemmap region
  arm64: mm: Move PCI I/O emulation region above the vmemmap region

* for-next/rust-for-arm64:
  : Enable Rust support for arm64
  arm64: rust: Enable Rust support for AArch64
  rust: Refactor the build target to allow the use of builtin targets

* for-next/misc:
  : Miscellaneous arm64 patches
  ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512
  arm64: Remove enable_daif macro
  arm64/hw_breakpoint: Directly use ESR_ELx_WNR for an watchpoint exception
  arm64: cpufeatures: Clean up temporary variable to simplify code
  arm64: Update setup_arch() comment on interrupt masking
  arm64: remove unnecessary ifdefs around is_compat_task()
  arm64: ftrace: Don't forbid CALL_OPS+CC_OPTIMIZE_FOR_SIZE with Clang
  arm64/sme: Ensure that all fields in SMCR_EL1 are set to known values
  arm64/sve: Ensure that all fields in ZCR_EL1 are set to known values
  arm64/sve: Document that __SVE_VQ_MAX is much larger than needed
  arm64: make member of struct pt_regs and it's offset macro in the same order
  arm64: remove unneeded BUILD_BUG_ON assertion
  arm64: kretprobes: acquire the regs via a BRK exception
  arm64: io: permit offset addressing
  arm64: errata: Don't enable workarounds for "rare" errata by default

* for-next/daif-cleanup:
  : Clean up DAIF handling for EL0 returns
  arm64: Unmask Debug + SError in do_notify_resume()
  arm64: Move do_notify_resume() to entry-common.c
  arm64: Simplify do_notify_resume() DAIF masking

* for-next/kselftest:
  : Miscellaneous arm64 kselftest patches
  kselftest/arm64: Test that ptrace takes effect in the target process

* for-next/documentation:
  : arm64 documentation patches
  arm64/sme: Remove spurious 'is' in SME documentation
  arm64/fp: Clarify effect of setting an unsupported system VL
  arm64/sme: Fix cut'n'paste in ABI document
  arm64/sve: Remove bitrotted comment about syscall behaviour

* for-next/sysreg:
  : sysreg updates
  arm64/sysreg: Update ID_AA64DFR0_EL1 register
  arm64/sysreg: Update ID_DFR0_EL1 register fields
  arm64/sysreg: Add register fields for ID_AA64DFR1_EL1

* for-next/dpisa:
  : Support for 2023 dpISA extensions
  kselftest/arm64: Add 2023 DPISA hwcap test coverage
  kselftest/arm64: Add basic FPMR test
  kselftest/arm64: Handle FPMR context in generic signal frame parser
  arm64/hwcap: Define hwcaps for 2023 DPISA features
  arm64/ptrace: Expose FPMR via ptrace
  arm64/signal: Add FPMR signal handling
  arm64/fpsimd: Support FEAT_FPMR
  arm64/fpsimd: Enable host kernel access to FPMR
  arm64/cpufeature: Hook new identification registers up to cpufeature
2024-03-07 19:04:55 +00:00
Mark Brown
c1932cac79 arm64/hwcap: Define hwcaps for 2023 DPISA features
The 2023 architecture extensions include a large number of floating point
features, most of which simply add new instructions. Add hwcaps so that
userspace can enumerate these features.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-6-c568edc8ed7f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-07 17:14:54 +00:00
Mark Brown
8c46def444 arm64/signal: Add FPMR signal handling
Expose FPMR in the signal context on systems where it is supported. The
kernel validates the exact size of the FPSIMD registers so we can't readily
add it to fpsimd_context without disruption.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-4-c568edc8ed7f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-07 17:14:53 +00:00
Mark Brown
203f2b95a8 arm64/fpsimd: Support FEAT_FPMR
FEAT_FPMR defines a new EL0 accessible register FPMR use to configure the
FP8 related features added to the architecture at the same time. Detect
support for this register and context switch it for EL0 when present.

Due to the sharing of responsibility for saving floating point state
between the host kernel and KVM FP8 support is not yet implemented in KVM
and a stub similar to that used for SVCR is provided for FPMR in order to
avoid bisection issues. To make it easier to share host state with the
hypervisor we store FPMR as a hardened usercopy field in uw (along with
some padding).

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-3-c568edc8ed7f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-07 17:14:53 +00:00
Mark Brown
b6c0b424cb arm64/fpsimd: Enable host kernel access to FPMR
FEAT_FPMR provides a new generally accessible architectural register FPMR.
This is only accessible to EL0 and EL1 when HCRX_EL2.EnFPM is set to 1,
do this when the host is running. The guest part will be done along with
context switching the new register and exposing it via guest management.

Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-2-c568edc8ed7f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-07 17:14:52 +00:00
Mark Brown
cc9f69a3da arm64/cpufeature: Hook new identification registers up to cpufeature
The 2023 architecture extensions have defined several new ID registers,
hook them up to the cpufeature code so we can add feature checks and hwcaps
based on their contents.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-1-c568edc8ed7f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-07 17:14:52 +00:00
Jinjie Ruan
527db67a4d arm64: Remove enable_daif macro
Since commit bb8e93a287a5 ("arm64: entry: convert SError handlers to C"),
the enable_daif assembler macro is no longer used anywhere, so remove it.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20240229132802.1682026-2-ruanjinjie@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-01 17:41:37 +00:00
Anshuman Khandual
9d6b6789c8 arm64/hw_breakpoint: Directly use ESR_ELx_WNR for an watchpoint exception
Let's use existing ISS encoding for an watchpoint exception i.e ESR_ELx_WNR
This represents an instruction's either writing to or reading from a memory
location during an watchpoint exception. While here this drops non-standard
macro AARCH64_ESR_ACCESS_MASK.

Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20240229083431.356578-1-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-01 17:36:51 +00:00
Leonardo Bras
1984c80546 arm64: remove unnecessary ifdefs around is_compat_task()
Currently some parts of the codebase will test for CONFIG_COMPAT before
testing is_compat_task().

is_compat_task() is a inlined function only present on CONFIG_COMPAT.
On the other hand, for !CONFIG_COMPAT, we have in linux/compat.h:

 #define is_compat_task() (0)

Since we have this define available in every usage of is_compat_task() for
!CONFIG_COMPAT, it's unnecessary to keep the ifdefs, since the compiler is
smart enough to optimize-out those snippets on CONFIG_COMPAT=n

This requires some regset code as well as a few other defines to be made
available on !CONFIG_COMPAT, so some symbols can get resolved before
getting optimized-out.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20240109034651.478462-2-leobras@redhat.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-28 18:01:23 +00:00
Mark Brown
21eb468e9f arm64/sve: Document that __SVE_VQ_MAX is much larger than needed
__SVE_VQ_MAX is defined without comment as 512 but the actual
architectural maximum is 16, a substantial difference which might not
be obvious to readers especially given the several different units used
for specifying vector sizes in various contexts and the fact that it's
often used via macros.  In an effort to minimise surprises for users who
might assume the value is the architectural maximum and use it to do
things like size allocations add a comment noting the difference, and
add a note for SVE_VQ_MAX to aid discoverability.

Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Dave Martin <Dave.Martin@arm.com>
Link: https://lore.kernel.org/r/20240209-arm64-sve-vl-max-comment-v2-1-111b283469ee@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-22 19:32:47 +00:00
Mark Rutland
253751233b arm64: kretprobes: acquire the regs via a BRK exception
On arm64, kprobes always take an exception and so create a struct
pt_regs through the usual exception entry logic. Similarly kretprobes
taskes and exception for function entry, but for function returns it
uses a trampoline which attempts to create a struct pt_regs without
taking an exception.

This is problematic for a few reasons, including:

1) The kretprobes trampoline neither saves nor restores all of the
   portions of PSTATE. Before invoking the handler it saves a number of
   portions of PSTATE, and after returning from the handler it restores
   NZCV before returning to the original return address provided by the
   handler.

2) The kretprobe trampoline constructs the PSTATE value piecemeal from
   special purpose registers as it cannot read all of PSTATE atomically
   without taking an exception. This is somewhat fragile, and it's not
   possible to reliably recover PSTATE information which only exists on
   some physical CPUs (e.g. when SSBS support is mismatched).

   Today the kretprobes trampoline does not record:

   - BTYPE
   - SSBS
   - ALLINT
   - SS
   - PAN
   - UAO
   - DIT
   - TCO

   ... and this will only get worse with future architecture extensions
   which add more PSTATE bits.

3) The kretprobes trampoline doesn't store portions of struct pt_regs
   (e.g. the PMR value when using pseudo-NMIs). Due to this, helpers
   which operate on a struct pt_regs, such as interrupts_enabled(), may
   not work correctly.

4) The function entry and function exit handlers run in different
   contexts. The entry handler will always be run in a debug exception
   context (which is currently treated as an NMI), but the return will
   be treated as whatever context the instrumented function was executed
   in. The differences between these contexts are liable to cause
   problems (e.g. as the two can be differently interruptible or
   preemptible, adversely affecting synchronization between the
   handlers).

5) As the kretprobes trampoline runs in the same context as the code
   being probed, it is subject to the same single-stepping context,
   which may not be desirable if this is being driven by the kprobes
   handlers.

Overall, this is fragile, painful to maintain, and gets in the way of
supporting other things (e.g. RELIABLE_STACKTRACE, FEAT_NMI).

This patch addresses these issues by replacing the kretprobes trampoline
with a `BRK` instruction, and using an exception boundary to acquire and
restore the regs, in the same way as the regular kprobes trampoline.

Ive tested this atop v6.8-rc3:

| KTAP version 1
| 1..1
|     KTAP version 1
|     # Subtest: kprobes_test
|     # module: test_kprobes
|     1..7
|     ok 1 test_kprobe
|     ok 2 test_kprobes
|     ok 3 test_kprobe_missed
|     ok 4 test_kretprobe
|     ok 5 test_kretprobes
|     ok 6 test_stacktrace_on_kretprobe
|     ok 7 test_stacktrace_on_nested_kretprobe
| # kprobes_test: pass:7 fail:0 skip:0 total:7
| # Totals: pass:7 fail:0 skip:0 total:7
| ok 1 kprobes_test

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240208145916.2004154-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-20 18:13:57 +00:00
Mark Rutland
997d79eb93 arm64: Move do_notify_resume() to entry-common.c
Currently do_notify_resume() lives in arch/arm64/kernel/signal.c, but it would
make more sense for it to live in entry-common.c as it handles more than
signals, and is coupled with the rest of the return-to-userspace sequence (e.g.
with unusual DAIF masking that matches the exception return requirements).

Move do_notify_resume() to entry-common.c.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240206123848.1696480-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Itaru Kitayama <itaru.kitayama@linux.dev>
2024-02-20 18:12:13 +00:00
Mark Rutland
d044d6ba6f arm64: io: permit offset addressing
Currently our IO accessors all use register addressing without offsets,
but we could safely use offset addressing (without writeback) to
simplify and optimize the generated code.

To function correctly under a hypervisor which emulates IO accesses, we
must ensure that any faulting/trapped IO access results in an ESR_ELx
value with ESR_ELX.ISS.ISV=1 and with the tranfer register described in
ESR_ELx.ISS.SRT. This means that we can only use loads/stores of a
single general purpose register (or the zero register), and must avoid
writeback addressing modes. However, we can use immediate offset
addressing modes, as these still provide ESR_ELX.ISS.ISV=1 and a valid
ESR_ELx.ISS.SRT when those accesses fault at Stage-2.

Currently we only use register addressing without offsets. We use the
"r" constraint to place the address into a register, and manually
generate the register addressing by surrounding the resulting register
operand with square braces, e.g.

| static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr)
| {
|         asm volatile("str %x0, [%1]" : : "rZ" (val), "r" (addr));
| }

Due to this, sequences of adjacent accesses need to generate addresses
using separate instructions. For example, the following code:

| void writeq_zero_8_times(void *ptr)
| {
|        writeq_relaxed(0, ptr + 8 * 0);
|        writeq_relaxed(0, ptr + 8 * 1);
|        writeq_relaxed(0, ptr + 8 * 2);
|        writeq_relaxed(0, ptr + 8 * 3);
|        writeq_relaxed(0, ptr + 8 * 4);
|        writeq_relaxed(0, ptr + 8 * 5);
|        writeq_relaxed(0, ptr + 8 * 6);
|        writeq_relaxed(0, ptr + 8 * 7);
| }

... is compiled to:

| <writeq_zero_8_times>:
|     str     xzr, [x0]
|     add     x1, x0, #0x8
|     str     xzr, [x1]
|     add     x1, x0, #0x10
|     str     xzr, [x1]
|     add     x1, x0, #0x18
|     str     xzr, [x1]
|     add     x1, x0, #0x20
|     str     xzr, [x1]
|     add     x1, x0, #0x28
|     str     xzr, [x1]
|     add     x1, x0, #0x30
|     str     xzr, [x1]
|     add     x0, x0, #0x38
|     str     xzr, [x0]
|     ret

As described above, we could safely use immediate offset addressing,
which would allow the ADDs to be folded into the address generation for
the STRs, resulting in simpler and smaller generated assembly. We can do
this by using the "o" constraint to allow the compiler to generate
offset addressing (without writeback) for a memory operand, e.g.

| static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr)
| {
|         volatile u64 __iomem *ptr = addr;
|         asm volatile("str %x0, %1" : : "rZ" (val), "o" (*ptr));
| }

... which results in the earlier code sequence being compiled to:

| <writeq_zero_8_times>:
|     str     xzr, [x0]
|     str     xzr, [x0, #8]
|     str     xzr, [x0, #16]
|     str     xzr, [x0, #24]
|     str     xzr, [x0, #32]
|     str     xzr, [x0, #40]
|     str     xzr, [x0, #48]
|     str     xzr, [x0, #56]
|     ret

As Will notes at:

  https://lore.kernel.org/linux-arm-kernel/20240117160528.GA3398@willie-the-truck/

... some compilers struggle with a plain "o" constraint, so it's
preferable to use "Qo", where the additional "Q" constraint permits
using non-offset register addressing.

This patch modifies our IO write accessors to use "Qo" constraints,
resulting in the better code generation described above. The IO read
accessors are left as-is because ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE
requires that non-offset register addressing is used, as the LDAR
instruction does not support offset addressing.

When compiling v6.8-rc1 defconfig with GCC 13.2.0, this saves ~4KiB of
text:

| [mark@lakrids:~/src/linux]% ls -al vmlinux-*
| -rwxr-xr-x 1 mark mark 153960576 Jan 23 12:01 vmlinux-after
| -rwxr-xr-x 1 mark mark 153862192 Jan 23 11:57 vmlinux-before
|
| [mark@lakrids:~/src/linux]% size vmlinux-before vmlinux-after
|    text    data     bss     dec     hex filename
| 26708921        16690350         622736 44022007        29fb8f7 vmlinux-before
| 26704761        16690414         622736 44017911        29fa8f7 vmlinux-after

... though due to internal alignment of sections, this has no impact on
the size of the resulting Image:

| [mark@lakrids:~/src/linux]% ls -al Image-*
| -rw-r--r-- 1 mark mark 43590144 Jan 23 12:01 Image-after
| -rw-r--r-- 1 mark mark 43590144 Jan 23 11:57 Image-before

Aside from the better code generation, there should be no functional
change as a result of this patch. I have lightly tested this patch,
including booting under KVM (where some devices such as PL011 are
emulated).

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240124111259.874975-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-20 18:08:37 +00:00
Ard Biesheuvel
d432b8d57c arm64: mm: Reclaim unused vmemmap region for vmalloc use
The vmemmap array is statically sized based on the maximum supported
size of the virtual address space, but it is located inside the upper VA
region, which is statically sized based on the *minimum* supported size
of the VA space. This doesn't matter much when using 64k pages, which is
the only configuration that currently supports 52-bit virtual
addressing.

However, upcoming LPA2 support will change this picture somewhat, as in
that case, the vmemmap array will take up more than 25% of the upper VA
region when using 4k pages. Given that most of this space is never used
when running on a system that does not support 52-bit virtual
addressing, let's reclaim the unused vmemmap area in that case.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20231213084024.2367360-15-ardb@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
2024-02-09 10:56:12 +00:00
Ard Biesheuvel
32697ff382 arm64: vmemmap: Avoid base2 order of struct page size to dimension region
The placement and size of the vmemmap region in the kernel virtual
address space is currently derived from the base2 order of the size of a
struct page. This makes for nicely aligned constants with lots of
leading 0xf and trailing 0x0 digits, but given that the actual struct
pages are indexed as an ordinary array, this resulting region is
severely overdimensioned when the size of a struct page is just over a
power of 2.

This doesn't matter today, but once we enable 52-bit virtual addressing
for 4k pages configurations, the vmemmap region may take up almost half
of the upper VA region with the current struct page upper bound at 64
bytes. And once we enable KMSAN or other features that push the size of
a struct page over 64 bytes, we will run out of VMALLOC space entirely.

So instead, let's derive the region size from the actual size of a
struct page, and place the entire region 1 GB from the top of the VA
space, where it still doesn't share any lower level translation table
entries with the fixmap.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20231213084024.2367360-14-ardb@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
2024-02-09 10:56:12 +00:00
Ard Biesheuvel
b730b0f2b1 arm64: mm: Move fixmap region above vmemmap region
Move the fixmap region above the vmemmap region, so that the start of
the vmemmap delineates the end of the region available for vmalloc and
vmap allocations and the randomized placement of the kernel and modules.

In a subsequent patch, we will take advantage of this to reclaim most of
the vmemmap area when running a 52-bit VA capable build with 52-bit
virtual addressing disabled at runtime.

Note that the existing guard region of 256 MiB covers the fixmap and PCI
I/O regions as well, so we can reduce it 8 MiB, which is what we use in
other places too.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20231213084024.2367360-11-ardb@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
2024-02-09 10:56:11 +00:00
Ard Biesheuvel
031e011d8b arm64: mm: Move PCI I/O emulation region above the vmemmap region
Move the PCI I/O region above the vmemmap region in the kernel's VA
space. This will permit us to reclaim the lower part of the vmemmap
region for vmalloc/vmap allocations when running a 52-bit VA capable
build on a 48-bit VA capable system.

Also, given that PCI_IO_START is derived from VMEMMAP_END, use that
symbolic constant directly in ptdump rather than deriving it from
VMEMMAP_START and VMEMMAP_SIZE, as those definitions will change in
subsequent patches.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20231213084024.2367360-10-ardb@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
2024-02-09 10:56:10 +00:00
Kevin Brodsky
c7767f5c43 arm64: vdso32: Remove unused vdso32-offsets.h
Commit 2d071968a405 ("arm64: compat: Remove 32-bit sigreturn code
from the vDSO") removed all VDSO_* symbols in the compat vDSO. As a
result, vdso32-offsets.h is now empty and therefore unused. Time to
remove it.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Link: https://lore.kernel.org/r/20240129154748.1727759-1-kevin.brodsky@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-01-30 11:59:17 +00:00
Linus Torvalds
18b5cb6cb8 arm64 fixes for -rc1
- Fix shadow call stack patching with LTO=full
 
 - Fix voluntary preemption of the FPSIMD registers from assembly code
 
 - Fix workaround for A520 CPU erratum #2966298 and extend to A510
 
 - Fix SME issues that resulted in corruption of the register state
 
 - Minor fixes (missing includes, formatting)
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmWqUgEQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNB+7B/0VDHq2F8KtOhW02XqcKJaqiDk8QggTZn0D
 3JxZs6P6y9KP88xa6gr3G+PzLYjKV66aP871oKPECtsQAAIJzMUfhB7C7+zJzxPL
 kxrP3fTCwGUUkBlH7+dhyoX4hmV174c0xp70vp/2+hG5IixwtpFVi4284pgU6RcC
 El6LH0UrRiHUI7oP5vLArk3vp1X8yFXxGRCeFCmP9mOBB4Auf9q5F0YoESPz0LBS
 ohb9L8vZw1eBYJxoSNiGo819FX4Q2nximR75byLYMB1+M0wlqFo1Or/AbfpZGPzY
 q5plHckTU25NxPEMWVvzXlu/O1gBkAfsWcxb0TIDpVWGDrL1+6Qm
 =9pba
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "I think the main one is fixing the dynamic SCS patching when full LTO
  is enabled (clang was silently getting this horribly wrong), but it's
  all good stuff.

  Rob just pointed out that the fix to the workaround for erratum
  #2966298 might not be necessary, but in the worst case it's harmless
  and since the official description leaves a little to be desired here,
  I've left it in.

  Summary:

   - Fix shadow call stack patching with LTO=full

   - Fix voluntary preemption of the FPSIMD registers from assembly code

   - Fix workaround for A520 CPU erratum #2966298 and extend to A510

   - Fix SME issues that resulted in corruption of the register state

   - Minor fixes (missing includes, formatting)"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Fix silcon-errata.rst formatting
  arm64/sme: Always exit sme_alloc() early with existing storage
  arm64/fpsimd: Remove spurious check for SVE support
  arm64/ptrace: Don't flush ZA/ZT storage when writing ZA via ptrace
  arm64: entry: simplify kernel_exit logic
  arm64: entry: fix ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD
  arm64: errata: Add Cortex-A510 speculative unprivileged load workaround
  arm64: Rename ARM64_WORKAROUND_2966298
  arm64: fpsimd: Bring cond_yield asm macro in line with new rules
  arm64: scs: Work around full LTO issue with dynamic SCS
  arm64: irq: include <linux/cpumask.h>
2024-01-19 13:36:15 -08:00
Linus Torvalds
80955ae955 Driver core changes for 6.8-rc1
Here are the set of driver core and kernfs changes for 6.8-rc1.  Nothing
 major in here this release cycle, just lots of small cleanups and some
 tweaks on kernfs that in the very end, got reverted and will come back
 in a safer way next release cycle.
 
 Included in here are:
   - more driver core 'const' cleanups and fixes
   - fw_devlink=rpm is now the default behavior
   - kernfs tiny changes to remove some string functions
   - cpu handling in the driver core is updated to work better on many
     systems that add topologies and cpus after booting
   - other minor changes and cleanups
 
 All of the cpu handling patches have been acked by the respective
 maintainers and are coming in here in one series.  Everything has been
 in linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZaeOrg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymtcwCffzvKKkSY9qAp6+0v2WQNkZm1JWoAoJCPYUwF
 If6wEoPLWvRfKx4gIoq9
 =D96r
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here are the set of driver core and kernfs changes for 6.8-rc1.
  Nothing major in here this release cycle, just lots of small cleanups
  and some tweaks on kernfs that in the very end, got reverted and will
  come back in a safer way next release cycle.

  Included in here are:

   - more driver core 'const' cleanups and fixes

   - fw_devlink=rpm is now the default behavior

   - kernfs tiny changes to remove some string functions

   - cpu handling in the driver core is updated to work better on many
     systems that add topologies and cpus after booting

   - other minor changes and cleanups

  All of the cpu handling patches have been acked by the respective
  maintainers and are coming in here in one series. Everything has been
  in linux-next for a while with no reported issues"

* tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (51 commits)
  Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock"
  kernfs: convert kernfs_idr_lock to an irq safe raw spinlock
  class: fix use-after-free in class_register()
  PM: clk: make pm_clk_add_notifier() take a const pointer
  EDAC: constantify the struct bus_type usage
  kernfs: fix reference to renamed function
  driver core: device.h: fix Excess kernel-doc description warning
  driver core: class: fix Excess kernel-doc description warning
  driver core: mark remaining local bus_type variables as const
  driver core: container: make container_subsys const
  driver core: bus: constantify subsys_register() calls
  driver core: bus: make bus_sort_breadthfirst() take a const pointer
  kernfs: d_obtain_alias(NULL) will do the right thing...
  driver core: Better advertise dev_err_probe()
  kernfs: Convert kernfs_path_from_node_locked() from strlcpy() to strscpy()
  kernfs: Convert kernfs_name_locked() from strlcpy() to strscpy()
  kernfs: Convert kernfs_walk_ns() from strlcpy() to strscpy()
  initramfs: Expose retained initrd as sysfs file
  fs/kernfs/dir: obey S_ISGID
  kernel/cgroup: use kernfs_create_dir_ns()
  ...
2024-01-18 09:48:40 -08:00
Linus Torvalds
09d1c6a80f Generic:
- Use memdup_array_user() to harden against overflow.
 
 - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures.
 
 - Clean up Kconfigs that all KVM architectures were selecting
 
 - New functionality around "guest_memfd", a new userspace API that
   creates an anonymous file and returns a file descriptor that refers
   to it.  guest_memfd files are bound to their owning virtual machine,
   cannot be mapped, read, or written by userspace, and cannot be resized.
   guest_memfd files do however support PUNCH_HOLE, which can be used to
   switch a memory area between guest_memfd and regular anonymous memory.
 
 - New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
   per-page attributes for a given page of guest memory; right now the
   only attribute is whether the guest expects to access memory via
   guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
   TDX or ARM64 pKVM is checked by firmware or hypervisor that guarantees
   confidentiality (AMD PSP, Intel TDX module, or EL2 in the case of pKVM).
 
 x86:
 
 - Support for "software-protected VMs" that can use the new guest_memfd
   and page attributes infrastructure.  This is mostly useful for testing,
   since there is no pKVM-like infrastructure to provide a meaningfully
   reduced TCB.
 
 - Fix a relatively benign off-by-one error when splitting huge pages during
   CLEAR_DIRTY_LOG.
 
 - Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf
   TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE.
 
 - Use more generic lockdep assertions in paths that don't actually care
   about whether the caller is a reader or a writer.
 
 - let Xen guests opt out of having PV clock reported as "based on a stable TSC",
   because some of them don't expect the "TSC stable" bit (added to the pvclock
   ABI by KVM, but never set by Xen) to be set.
 
 - Revert a bogus, made-up nested SVM consistency check for TLB_CONTROL.
 
 - Advertise flush-by-ASID support for nSVM unconditionally, as KVM always
   flushes on nested transitions, i.e. always satisfies flush requests.  This
   allows running bleeding edge versions of VMware Workstation on top of KVM.
 
 - Sanity check that the CPU supports flush-by-ASID when enabling SEV support.
 
 - On AMD machines with vNMI, always rely on hardware instead of intercepting
   IRET in some cases to detect unmasking of NMIs
 
 - Support for virtualizing Linear Address Masking (LAM)
 
 - Fix a variety of vPMU bugs where KVM fail to stop/reset counters and other state
   prior to refreshing the vPMU model.
 
 - Fix a double-overflow PMU bug by tracking emulated counter events using a
   dedicated field instead of snapshotting the "previous" counter.  If the
   hardware PMC count triggers overflow that is recognized in the same VM-Exit
   that KVM manually bumps an event count, KVM would pend PMIs for both the
   hardware-triggered overflow and for KVM-triggered overflow.
 
 - Turn off KVM_WERROR by default for all configs so that it's not
   inadvertantly enabled by non-KVM developers, which can be problematic for
   subsystems that require no regressions for W=1 builds.
 
 - Advertise all of the host-supported CPUID bits that enumerate IA32_SPEC_CTRL
   "features".
 
 - Don't force a masterclock update when a vCPU synchronizes to the current TSC
   generation, as updating the masterclock can cause kvmclock's time to "jump"
   unexpectedly, e.g. when userspace hotplugs a pre-created vCPU.
 
 - Use RIP-relative address to read kvm_rebooting in the VM-Enter fault paths,
   partly as a super minor optimization, but mostly to make KVM play nice with
   position independent executable builds.
 
 - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
   CONFIG_HYPERV as a minor optimization, and to self-document the code.
 
 - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV "emulation"
   at build time.
 
 ARM64:
 
 - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB
   base granule sizes. Branch shared with the arm64 tree.
 
 - Large Fine-Grained Trap rework, bringing some sanity to the
   feature, although there is more to come. This comes with
   a prefix branch shared with the arm64 tree.
 
 - Some additional Nested Virtualization groundwork, mostly
   introducing the NV2 VNCR support and retargetting the NV
   support to that version of the architecture.
 
 - A small set of vgic fixes and associated cleanups.
 
 Loongarch:
 
 - Optimization for memslot hugepage checking
 
 - Cleanup and fix some HW/SW timer issues
 
 - Add LSX/LASX (128bit/256bit SIMD) support
 
 RISC-V:
 
 - KVM_GET_REG_LIST improvement for vector registers
 
 - Generate ISA extension reg_list using macros in get-reg-list selftest
 
 - Support for reporting steal time along with selftest
 
 s390:
 
 - Bugfixes
 
 Selftests:
 
 - Fix an annoying goof where the NX hugepage test prints out garbage
   instead of the magic token needed to run the test.
 
 - Fix build errors when a header is delete/moved due to a missing flag
   in the Makefile.
 
 - Detect if KVM bugged/killed a selftest's VM and print out a helpful
   message instead of complaining that a random ioctl() failed.
 
 - Annotate the guest printf/assert helpers with __printf(), and fix the
   various bugs that were lurking due to lack of said annotation.
 
 There are two non-KVM patches buried in the middle of guest_memfd support:
 
   fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure()
   mm: Add AS_UNMOVABLE to mark mapping as completely unmovable
 
 The first is small and mostly suggested-by Christian Brauner; the second
 a bit less so but it was written by an mm person (Vlastimil Babka).
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmWcMWkUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroO15gf/WLmmg3SET6Uzw9iEq2xo28831ZA+
 6kpILfIDGKozV5safDmMvcInlc/PTnqOFrsKyyN4kDZ+rIJiafJdg/loE0kPXBML
 wdR+2ix5kYI1FucCDaGTahskBDz8Lb/xTpwGg9BFLYFNmuUeHc74o6GoNvr1uliE
 4kLZL2K6w0cSMPybUD+HqGaET80ZqPwecv+s1JL+Ia0kYZJONJifoHnvOUJ7DpEi
 rgudVdgzt3EPjG0y1z6MjvDBXTCOLDjXajErlYuZD3Ej8N8s59Dh2TxOiDNTLdP4
 a4zjRvDmgyr6H6sz+upvwc7f4M4p+DBvf+TkWF54mbeObHUYliStqURIoA==
 =66Ws
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "Generic:

   - Use memdup_array_user() to harden against overflow.

   - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all
     architectures.

   - Clean up Kconfigs that all KVM architectures were selecting

   - New functionality around "guest_memfd", a new userspace API that
     creates an anonymous file and returns a file descriptor that refers
     to it. guest_memfd files are bound to their owning virtual machine,
     cannot be mapped, read, or written by userspace, and cannot be
     resized. guest_memfd files do however support PUNCH_HOLE, which can
     be used to switch a memory area between guest_memfd and regular
     anonymous memory.

   - New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
     per-page attributes for a given page of guest memory; right now the
     only attribute is whether the guest expects to access memory via
     guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
     TDX or ARM64 pKVM is checked by firmware or hypervisor that
     guarantees confidentiality (AMD PSP, Intel TDX module, or EL2 in
     the case of pKVM).

  x86:

   - Support for "software-protected VMs" that can use the new
     guest_memfd and page attributes infrastructure. This is mostly
     useful for testing, since there is no pKVM-like infrastructure to
     provide a meaningfully reduced TCB.

   - Fix a relatively benign off-by-one error when splitting huge pages
     during CLEAR_DIRTY_LOG.

   - Fix a bug where KVM could incorrectly test-and-clear dirty bits in
     non-leaf TDP MMU SPTEs if a racing thread replaces a huge SPTE with
     a non-huge SPTE.

   - Use more generic lockdep assertions in paths that don't actually
     care about whether the caller is a reader or a writer.

   - let Xen guests opt out of having PV clock reported as "based on a
     stable TSC", because some of them don't expect the "TSC stable" bit
     (added to the pvclock ABI by KVM, but never set by Xen) to be set.

   - Revert a bogus, made-up nested SVM consistency check for
     TLB_CONTROL.

   - Advertise flush-by-ASID support for nSVM unconditionally, as KVM
     always flushes on nested transitions, i.e. always satisfies flush
     requests. This allows running bleeding edge versions of VMware
     Workstation on top of KVM.

   - Sanity check that the CPU supports flush-by-ASID when enabling SEV
     support.

   - On AMD machines with vNMI, always rely on hardware instead of
     intercepting IRET in some cases to detect unmasking of NMIs

   - Support for virtualizing Linear Address Masking (LAM)

   - Fix a variety of vPMU bugs where KVM fail to stop/reset counters
     and other state prior to refreshing the vPMU model.

   - Fix a double-overflow PMU bug by tracking emulated counter events
     using a dedicated field instead of snapshotting the "previous"
     counter. If the hardware PMC count triggers overflow that is
     recognized in the same VM-Exit that KVM manually bumps an event
     count, KVM would pend PMIs for both the hardware-triggered overflow
     and for KVM-triggered overflow.

   - Turn off KVM_WERROR by default for all configs so that it's not
     inadvertantly enabled by non-KVM developers, which can be
     problematic for subsystems that require no regressions for W=1
     builds.

   - Advertise all of the host-supported CPUID bits that enumerate
     IA32_SPEC_CTRL "features".

   - Don't force a masterclock update when a vCPU synchronizes to the
     current TSC generation, as updating the masterclock can cause
     kvmclock's time to "jump" unexpectedly, e.g. when userspace
     hotplugs a pre-created vCPU.

   - Use RIP-relative address to read kvm_rebooting in the VM-Enter
     fault paths, partly as a super minor optimization, but mostly to
     make KVM play nice with position independent executable builds.

   - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
     CONFIG_HYPERV as a minor optimization, and to self-document the
     code.

   - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV
     "emulation" at build time.

  ARM64:

   - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base
     granule sizes. Branch shared with the arm64 tree.

   - Large Fine-Grained Trap rework, bringing some sanity to the
     feature, although there is more to come. This comes with a prefix
     branch shared with the arm64 tree.

   - Some additional Nested Virtualization groundwork, mostly
     introducing the NV2 VNCR support and retargetting the NV support to
     that version of the architecture.

   - A small set of vgic fixes and associated cleanups.

  Loongarch:

   - Optimization for memslot hugepage checking

   - Cleanup and fix some HW/SW timer issues

   - Add LSX/LASX (128bit/256bit SIMD) support

  RISC-V:

   - KVM_GET_REG_LIST improvement for vector registers

   - Generate ISA extension reg_list using macros in get-reg-list
     selftest

   - Support for reporting steal time along with selftest

  s390:

   - Bugfixes

  Selftests:

   - Fix an annoying goof where the NX hugepage test prints out garbage
     instead of the magic token needed to run the test.

   - Fix build errors when a header is delete/moved due to a missing
     flag in the Makefile.

   - Detect if KVM bugged/killed a selftest's VM and print out a helpful
     message instead of complaining that a random ioctl() failed.

   - Annotate the guest printf/assert helpers with __printf(), and fix
     the various bugs that were lurking due to lack of said annotation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (185 commits)
  x86/kvm: Do not try to disable kvmclock if it was not enabled
  KVM: x86: add missing "depends on KVM"
  KVM: fix direction of dependency on MMU notifiers
  KVM: introduce CONFIG_KVM_COMMON
  KVM: arm64: Add missing memory barriers when switching to pKVM's hyp pgd
  KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache
  RISC-V: KVM: selftests: Add get-reg-list test for STA registers
  RISC-V: KVM: selftests: Add steal_time test support
  RISC-V: KVM: selftests: Add guest_sbi_probe_extension
  RISC-V: KVM: selftests: Move sbi_ecall to processor.c
  RISC-V: KVM: Implement SBI STA extension
  RISC-V: KVM: Add support for SBI STA registers
  RISC-V: KVM: Add support for SBI extension registers
  RISC-V: KVM: Add SBI STA info to vcpu_arch
  RISC-V: KVM: Add steal-update vcpu request
  RISC-V: KVM: Add SBI STA extension skeleton
  RISC-V: paravirt: Implement steal-time support
  RISC-V: Add SBI STA extension definitions
  RISC-V: paravirt: Add skeleton for pv-time support
  RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr()
  ...
2024-01-17 13:03:37 -08:00
Ard Biesheuvel
3931261ecf arm64: fpsimd: Bring cond_yield asm macro in line with new rules
We no longer disable softirqs or preemption when doing kernel mode SIMD,
and so for fully preemptible kernels, there is no longer a need to do any
explicit yielding (and for non-preemptible kernels, yielding is not
needed either).

That leaves voluntary preemption, where only explicit yield calls may
result in a reschedule. To retain the existing behavior for such a
configuration, we should take the new situation into account, where the
preempt count will be zero rather than one, and yielding to pending
softirqs is unnecessary.

Fixes: aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240111112447.577640-2-ardb+git@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-01-12 12:48:27 +00:00
Tudor Ambarus
b95df3bd1e arm64: irq: include <linux/cpumask.h>
Sorting include files in alphabetic order in
drivers/tty/serial/samsung.c revealed the following error:

In file included from drivers/tty/serial/samsung_tty.c:24:
./arch/arm64/include/asm/irq.h:9:43: error: unknown type name ‘cpumask_t’
    9 | void arch_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu);
      |                                           ^~~~~~~~~

Include cpumask.h to avoid unknown type errors for parents of irq.h that
don't include cpumask.h.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20240110074007.4020016-1-tudor.ambarus@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
2024-01-12 12:44:18 +00:00
Linus Torvalds
c299010061 asm-generic cleanups for 6.8
A series from Baoquan He cleans up the asm-generic/io.h to remove the
 ioremap_uc() definition from everything except x86, which still needs it
 for pre-PAT systems. This series notably contains a patch from Jiaxun Yang
 that converts MIPS to use asm-generic/io.h like every other architecture
 does, enabling future cleanups.
 
 Some of my own patches fix -Wmissing-prototype warnings in architecture
 specific code across several architectures. This is now needed as the
 warning is enabled by default. There are still some remaining warnings
 in minor platforms, but the series should catch most of the widely used
 ones make them more consistent with one another.
 
 David McKay fixes a bug in __generic_cmpxchg_local() when this is used
 on 64-bit architectures. This could currently only affect parisc64
 and sparc64.
 
 Additional cleanups address from Linus Walleij, Uwe Kleine-König,
 Thomas Huth, and Kefeng Wang help reduce unnecessary inconsistencies
 between architectures.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmWeak8ACgkQYKtH/8kJ
 UidSiQ/+LL1WTO9d3Zx5HI0GGGjaIYpYs6jUNSf9Y5GPQiOrvjfEWj7CU11/4vxl
 GlQRpRyncYm8Eiz0Qu+aNxZFiiMah8Uful75yfbX8P1L4EPTbAYNDjkyNJrTjIAK
 jPK4sl8awIrapOeFUz++PsEj22R/4Is4f0mo+CqoCkL5RKlHe5oFdXzcwjmds4yK
 CvU6Ldn+M7FZ3EItMdjXaB3D3HS9uictFiO5JByZY8p+IcqgNRI/iHNnZIMsltJ+
 XjDi0DG+x4jCj6teElSchw7AofE4OcNSP3xbR1PLKv6+xBLGYaAGZhNuPTz88eV/
 Gj0loDQrrR5McGUfDBRHK9zN2Jd0O/FKnfh9kLOt1FLFyGPvC78Q/2HkpVCjbBr2
 Pr1aqhLDHA+tGNSsThsV8RUa8/tiEnxAki43tfBFS3SEKhtQsTm2g1z4miwbE3p0
 BJIrSgTqrP/SBq7a9z/thPrkzdZcNuA9FUETTbaMeUlJS51n1V9E5A1t7sOG7jaI
 vV/gbuR6FjvD49mTyQiOSCt3V4ygRqgN1Q+C4QM8WLqq2keUq0AhGodquv8F78in
 J3x2j2r27lHY7jKf8B0dua/JXAsF20u8qD6yDQ9ymkjt/MWhGXBgK0jpT7RTIuMS
 e2jmTywUVD4UohAcx3inkOojUhIJ5KDB0I4Pzv4zWcHNbyFNKcY=
 =4VQl
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic cleanups from Arnd Bergmann:
 "A series from Baoquan He cleans up the asm-generic/io.h to remove the
  ioremap_uc() definition from everything except x86, which still needs
  it for pre-PAT systems. This series notably contains a patch from
  Jiaxun Yang that converts MIPS to use asm-generic/io.h like every
  other architecture does, enabling future cleanups.

  Some of my own patches fix -Wmissing-prototype warnings in
  architecture specific code across several architectures. This is now
  needed as the warning is enabled by default. There are still some
  remaining warnings in minor platforms, but the series should catch
  most of the widely used ones make them more consistent with one
  another.

  David McKay fixes a bug in __generic_cmpxchg_local() when this is used
  on 64-bit architectures. This could currently only affect parisc64 and
  sparc64.

  Additional cleanups address from Linus Walleij, Uwe Kleine-König,
  Thomas Huth, and Kefeng Wang help reduce unnecessary inconsistencies
  between architectures"

* tag 'asm-generic-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic: Fix 32 bit __generic_cmpxchg_local
  Hexagon: Make pfn accessors statics inlines
  ARC: mm: Make virt_to_pfn() a static inline
  mips: remove extraneous asm-generic/iomap.h include
  sparc: Use $(kecho) to announce kernel images being ready
  arm64: vdso32: Define BUILD_VDSO32_64 to correct prototypes
  csky: fix arch_jump_label_transform_static override
  arch: add do_page_fault prototypes
  arch: add missing prepare_ftrace_return() prototypes
  arch: vdso: consolidate gettime prototypes
  arch: include linux/cpu.h for trap_init() prototype
  arch: fix asm-offsets.c building with -Wmissing-prototypes
  arch: consolidate arch_irq_work_raise prototypes
  hexagon: Remove CONFIG_HEXAGON_ARCH_VERSION from uapi header
  asm/io: remove unnecessary xlate_dev_mem_ptr() and unxlate_dev_mem_ptr()
  mips: io: remove duplicated codes
  arch/*/io.h: remove ioremap_uc in some architectures
  mips: add <asm-generic/io.h> including
2024-01-10 18:13:44 -08:00
Linus Torvalds
78273df7f6 header cleanups for 6.8
The goal is to get sched.h down to a type only header, so the main thing
 happening in this patchset is splitting out various _types.h headers and
 dependency fixups, as well as moving some things out of sched.h to
 better locations.
 
 This is prep work for the memory allocation profiling patchset which
 adds new sched.h interdepencencies.
 
 Testing - it's been in -next, and fixes from pretty much all
 architectures have percolated in - nothing major.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmWfBwwACgkQE6szbY3K
 bnZPwBAAmuRojXaeWxi01IPIOehSGDe68vw44PR9glEMZvxdnZuPOdvE4/+245/L
 bRKU2WBCjBUokUbV9msIShwRkFTZAmEMPNfPAAsFMA+VXeDYHKB+ZRdwTggNAQ+I
 SG6fZgh5m0HsewCDxU8oqVHkjVq4fXn0cy+aL6xLEd9gu67GoBzX2pDieS2Kvy6j
 jnyoKTxFwb+LTQgph0P4EIpq5I2umAsdLwdSR8EJ+8e9NiNvMo1pI00Lx/ntAnFZ
 JftWUJcMy3TQ5u1GkyfQN9y/yThX1bZK5GvmHS9SJ2Dkacaus5d+xaKCHtRuFS1I
 7C6b8PsNgRczUMumBXus44HdlNfNs1yU3lvVxFvBIPE1qC9pYRHrkWIXXIocXLLC
 oxTEJ6B2G3BQZVQgLIA4fOaxMVhmvKffi/aEZLi9vN9VVosd1a6XNKI6KbyRnXFp
 GSs9qDqszhn5I3GYNlDNQTc/8UsRlhPFgS6nS0By6QnvxtGi9QkU2tBRBsXvqwCy
 cLoCYIhc2tvugHvld70dz26umiJ4rnmxGlobStNoigDvIKAIUt1UmIdr1so8P8eH
 xehnL9ZcOX6xnANDL0AqMFFHV6I58CJynhFdUoXfVQf/DWLGX48mpi9LVNsYBzsI
 CAwVOAQ0UjGrpdWmJ9ueY/ABYqg9vRjzaDEXQ+MhAYO55CLaVsg=
 =3tyT
 -----END PGP SIGNATURE-----

Merge tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefs

Pull header cleanups from Kent Overstreet:
 "The goal is to get sched.h down to a type only header, so the main
  thing happening in this patchset is splitting out various _types.h
  headers and dependency fixups, as well as moving some things out of
  sched.h to better locations.

  This is prep work for the memory allocation profiling patchset which
  adds new sched.h interdepencencies"

* tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefs: (51 commits)
  Kill sched.h dependency on rcupdate.h
  kill unnecessary thread_info.h include
  Kill unnecessary kernel.h include
  preempt.h: Kill dependency on list.h
  rseq: Split out rseq.h from sched.h
  LoongArch: signal.c: add header file to fix build error
  restart_block: Trim includes
  lockdep: move held_lock to lockdep_types.h
  sem: Split out sem_types.h
  uidgid: Split out uidgid_types.h
  seccomp: Split out seccomp_types.h
  refcount: Split out refcount_types.h
  uapi/linux/resource.h: fix include
  x86/signal: kill dependency on time.h
  syscall_user_dispatch.h: split out *_types.h
  mm_types_task.h: Trim dependencies
  Split out irqflags_types.h
  ipc: Kill bogus dependency on spinlock.h
  shm: Slim down dependencies
  workqueue: Split out workqueue_types.h
  ...
2024-01-10 16:43:55 -08:00
Linus Torvalds
063a7ce32d lsm/stable-6.8 PR 20240105
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmWYKUIUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNyHw/+IKnqL1MZ5QS+/HtSzi4jCL47N9yZ
 OHLol6XswyEGHH9myKPPGnT5lVA93v98v4ty2mws7EJUSGZQQUntYBPbU9Gi40+B
 XDzYSRocoj96sdlKeOJMgaWo3NBRD9HYSoGPDNWZixy6m+bLPk/Dqhn3FabKf1lo
 2qQSmstvChFRmVNkmgaQnBCAtWVqla4EJEL0EKX6cspHbuzRNTeJdTPn6Q/zOUVL
 O2znOZuEtSVpYS7yg3uJT0hHD8H0GnIciAcDAhyPSBL5Uk5l6gwJiACcdRfLRbgp
 QM5Z4qUFdKljV5XBCzYnfhhrx1df08h1SG84El8UK8HgTTfOZfYmawByJRWNJSQE
 TdCmtyyvEbfb61CKBFVwD7Tzb9/y8WgcY5N3Un8uCQqRzFIO+6cghHri5NrVhifp
 nPFlP4klxLHh3d7ZVekLmCMHbpaacRyJKwLy+f/nwbBEID47jpPkvZFIpbalat+r
 QaKRBNWdTeV+GZ+Yu0uWsI029aQnpcO1kAnGg09fl6b/dsmxeKOVWebir25AzQ++
 a702S8HRmj80X+VnXHU9a64XeGtBH7Nq0vu0lGHQPgwhSx/9P6/qICEPwsIriRjR
 I9OulWt4OBPDtlsonHFgDs+lbnd0Z0GJUwYT8e9pjRDMxijVO9lhAXyglVRmuNR8
 to2ByKP5BO+Vh8Y=
 =Py+n
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull security module updates from Paul Moore:

 - Add three new syscalls: lsm_list_modules(), lsm_get_self_attr(), and
   lsm_set_self_attr().

   The first syscall simply lists the LSMs enabled, while the second and
   third get and set the current process' LSM attributes. Yes, these
   syscalls may provide similar functionality to what can be found under
   /proc or /sys, but they were designed to support multiple,
   simultaneaous (stacked) LSMs from the start as opposed to the current
   /proc based solutions which were created at a time when only one LSM
   was allowed to be active at a given time.

   We have spent considerable time discussing ways to extend the
   existing /proc interfaces to support multiple, simultaneaous LSMs and
   even our best ideas have been far too ugly to support as a kernel
   API; after +20 years in the kernel, I felt the LSM layer had
   established itself enough to justify a handful of syscalls.

   Support amongst the individual LSM developers has been nearly
   unanimous, with a single objection coming from Tetsuo (TOMOYO) as he
   is worried that the LSM_ID_XXX token concept will make it more
   difficult for out-of-tree LSMs to survive. Several members of the LSM
   community have demonstrated the ability for out-of-tree LSMs to
   continue to exist by picking high/unused LSM_ID values as well as
   pointing out that many kernel APIs rely on integer identifiers, e.g.
   syscalls (!), but unfortunately Tetsuo's objections remain.

   My personal opinion is that while I have no interest in penalizing
   out-of-tree LSMs, I'm not going to penalize in-tree development to
   support out-of-tree development, and I view this as a necessary step
   forward to support the push for expanded LSM stacking and reduce our
   reliance on /proc and /sys which has occassionally been problematic
   for some container users. Finally, we have included the linux-api
   folks on (all?) recent revisions of the patchset and addressed all of
   their concerns.

 - Add a new security_file_ioctl_compat() LSM hook to handle the 32-bit
   ioctls on 64-bit systems problem.

   This patch includes support for all of the existing LSMs which
   provide ioctl hooks, although it turns out only SELinux actually
   cares about the individual ioctls. It is worth noting that while
   Casey (Smack) and Tetsuo (TOMOYO) did not give explicit ACKs to this
   patch, they did both indicate they are okay with the changes.

 - Fix a potential memory leak in the CALIPSO code when IPv6 is disabled
   at boot.

   While it's good that we are fixing this, I doubt this is something
   users are seeing in the wild as you need to both disable IPv6 and
   then attempt to configure IPv6 labeled networking via
   NetLabel/CALIPSO; that just doesn't make much sense.

   Normally this would go through netdev, but Jakub asked me to take
   this patch and of all the trees I maintain, the LSM tree seemed like
   the best fit.

 - Update the LSM MAINTAINERS entry with additional information about
   our process docs, patchwork, bug reporting, etc.

   I also noticed that the Lockdown LSM is missing a dedicated
   MAINTAINERS entry so I've added that to the pull request. I've been
   working with one of the major Lockdown authors/contributors to see if
   they are willing to step up and assume a Lockdown maintainer role;
   hopefully that will happen soon, but in the meantime I'll continue to
   look after it.

 - Add a handful of mailmap entries for Serge Hallyn and myself.

* tag 'lsm-pr-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (27 commits)
  lsm: new security_file_ioctl_compat() hook
  lsm: Add a __counted_by() annotation to lsm_ctx.ctx
  calipso: fix memory leak in netlbl_calipso_add_pass()
  selftests: remove the LSM_ID_IMA check in lsm/lsm_list_modules_test
  MAINTAINERS: add an entry for the lockdown LSM
  MAINTAINERS: update the LSM entry
  mailmap: add entries for Serge Hallyn's dead accounts
  mailmap: update/replace my old email addresses
  lsm: mark the lsm_id variables are marked as static
  lsm: convert security_setselfattr() to use memdup_user()
  lsm: align based on pointer length in lsm_fill_user_ctx()
  lsm: consolidate buffer size handling into lsm_fill_user_ctx()
  lsm: correct error codes in security_getselfattr()
  lsm: cleanup the size counters in security_getselfattr()
  lsm: don't yet account for IMA in LSM_CONFIG_COUNT calculation
  lsm: drop LSM_ID_IMA
  LSM: selftests for Linux Security Module syscalls
  SELinux: Add selfattr hooks
  AppArmor: Add selfattr hooks
  Smack: implement setselfattr and getselfattr hooks
  ...
2024-01-09 12:57:46 -08:00
Linus Torvalds
fb46e22a9e Many singleton patches against the MM code. The patch series which
are included in this merge do the following:
 
 - Peng Zhang has done some mapletree maintainance work in the
   series
 
 	"maple_tree: add mt_free_one() and mt_attr() helpers"
 	"Some cleanups of maple tree"
 
 - In the series "mm: use memmap_on_memory semantics for dax/kmem"
   Vishal Verma has altered the interworking between memory-hotplug
   and dax/kmem so that newly added 'device memory' can more easily
   have its memmap placed within that newly added memory.
 
 - Matthew Wilcox continues folio-related work (including a few
   fixes) in the patch series
 
 	"Add folio_zero_tail() and folio_fill_tail()"
 	"Make folio_start_writeback return void"
 	"Fix fault handler's handling of poisoned tail pages"
 	"Convert aops->error_remove_page to ->error_remove_folio"
 	"Finish two folio conversions"
 	"More swap folio conversions"
 
 - Kefeng Wang has also contributed folio-related work in the series
 
 	"mm: cleanup and use more folio in page fault"
 
 - Jim Cromie has improved the kmemleak reporting output in the
   series "tweak kmemleak report format".
 
 - In the series "stackdepot: allow evicting stack traces" Andrey
   Konovalov to permits clients (in this case KASAN) to cause
   eviction of no longer needed stack traces.
 
 - Charan Teja Kalla has fixed some accounting issues in the page
   allocator's atomic reserve calculations in the series "mm:
   page_alloc: fixes for high atomic reserve caluculations".
 
 - Dmitry Rokosov has added to the samples/ dorectory some sample
   code for a userspace memcg event listener application.  See the
   series "samples: introduce cgroup events listeners".
 
 - Some mapletree maintanance work from Liam Howlett in the series
   "maple_tree: iterator state changes".
 
 - Nhat Pham has improved zswap's approach to writeback in the
   series "workload-specific and memory pressure-driven zswap
   writeback".
 
 - DAMON/DAMOS feature and maintenance work from SeongJae Park in
   the series
 
 	"mm/damon: let users feed and tame/auto-tune DAMOS"
 	"selftests/damon: add Python-written DAMON functionality tests"
 	"mm/damon: misc updates for 6.8"
 
 - Yosry Ahmed has improved memcg's stats flushing in the series
   "mm: memcg: subtree stats flushing and thresholds".
 
 - In the series "Multi-size THP for anonymous memory" Ryan Roberts
   has added a runtime opt-in feature to transparent hugepages which
   improves performance by allocating larger chunks of memory during
   anonymous page faults.
 
 - Matthew Wilcox has also contributed some cleanup and maintenance
   work against eh buffer_head code int he series "More buffer_head
   cleanups".
 
 - Suren Baghdasaryan has done work on Andrea Arcangeli's series
   "userfaultfd move option".  UFFDIO_MOVE permits userspace heap
   compaction algorithms to move userspace's pages around rather than
   UFFDIO_COPY'a alloc/copy/free.
 
 - Stefan Roesch has developed a "KSM Advisor", in the series
   "mm/ksm: Add ksm advisor".  This is a governor which tunes KSM's
   scanning aggressiveness in response to userspace's current needs.
 
 - Chengming Zhou has optimized zswap's temporary working memory
   use in the series "mm/zswap: dstmem reuse optimizations and
   cleanups".
 
 - Matthew Wilcox has performed some maintenance work on the
   writeback code, both code and within filesystems.  The series is
   "Clean up the writeback paths".
 
 - Andrey Konovalov has optimized KASAN's handling of alloc and
   free stack traces for secondary-level allocators, in the series
   "kasan: save mempool stack traces".
 
 - Andrey also performed some KASAN maintenance work in the series
   "kasan: assorted clean-ups".
 
 - David Hildenbrand has gone to town on the rmap code.  Cleanups,
   more pte batching, folio conversions and more.  See the series
   "mm/rmap: interface overhaul".
 
 - Kinsey Ho has contributed some maintenance work on the MGLRU
   code in the series "mm/mglru: Kconfig cleanup".
 
 - Matthew Wilcox has contributed lruvec page accounting code
   cleanups in the series "Remove some lruvec page accounting
   functions".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZyF2wAKCRDdBJ7gKXxA
 jjWjAP42LHvGSjp5M+Rs2rKFL0daBQsrlvy6/jCHUequSdWjSgEAmOx7bc5fbF27
 Oa8+DxGM9C+fwqZ/7YxU2w/WuUmLPgU=
 =0NHs
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Many singleton patches against the MM code. The patch series which are
  included in this merge do the following:

   - Peng Zhang has done some mapletree maintainance work in the series

	'maple_tree: add mt_free_one() and mt_attr() helpers'
	'Some cleanups of maple tree'

   - In the series 'mm: use memmap_on_memory semantics for dax/kmem'
     Vishal Verma has altered the interworking between memory-hotplug
     and dax/kmem so that newly added 'device memory' can more easily
     have its memmap placed within that newly added memory.

   - Matthew Wilcox continues folio-related work (including a few fixes)
     in the patch series

	'Add folio_zero_tail() and folio_fill_tail()'
	'Make folio_start_writeback return void'
	'Fix fault handler's handling of poisoned tail pages'
	'Convert aops->error_remove_page to ->error_remove_folio'
	'Finish two folio conversions'
	'More swap folio conversions'

   - Kefeng Wang has also contributed folio-related work in the series

	'mm: cleanup and use more folio in page fault'

   - Jim Cromie has improved the kmemleak reporting output in the series
     'tweak kmemleak report format'.

   - In the series 'stackdepot: allow evicting stack traces' Andrey
     Konovalov to permits clients (in this case KASAN) to cause eviction
     of no longer needed stack traces.

   - Charan Teja Kalla has fixed some accounting issues in the page
     allocator's atomic reserve calculations in the series 'mm:
     page_alloc: fixes for high atomic reserve caluculations'.

   - Dmitry Rokosov has added to the samples/ dorectory some sample code
     for a userspace memcg event listener application. See the series
     'samples: introduce cgroup events listeners'.

   - Some mapletree maintanance work from Liam Howlett in the series
     'maple_tree: iterator state changes'.

   - Nhat Pham has improved zswap's approach to writeback in the series
     'workload-specific and memory pressure-driven zswap writeback'.

   - DAMON/DAMOS feature and maintenance work from SeongJae Park in the
     series

	'mm/damon: let users feed and tame/auto-tune DAMOS'
	'selftests/damon: add Python-written DAMON functionality tests'
	'mm/damon: misc updates for 6.8'

   - Yosry Ahmed has improved memcg's stats flushing in the series 'mm:
     memcg: subtree stats flushing and thresholds'.

   - In the series 'Multi-size THP for anonymous memory' Ryan Roberts
     has added a runtime opt-in feature to transparent hugepages which
     improves performance by allocating larger chunks of memory during
     anonymous page faults.

   - Matthew Wilcox has also contributed some cleanup and maintenance
     work against eh buffer_head code int he series 'More buffer_head
     cleanups'.

   - Suren Baghdasaryan has done work on Andrea Arcangeli's series
     'userfaultfd move option'. UFFDIO_MOVE permits userspace heap
     compaction algorithms to move userspace's pages around rather than
     UFFDIO_COPY'a alloc/copy/free.

   - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm:
     Add ksm advisor'. This is a governor which tunes KSM's scanning
     aggressiveness in response to userspace's current needs.

   - Chengming Zhou has optimized zswap's temporary working memory use
     in the series 'mm/zswap: dstmem reuse optimizations and cleanups'.

   - Matthew Wilcox has performed some maintenance work on the writeback
     code, both code and within filesystems. The series is 'Clean up the
     writeback paths'.

   - Andrey Konovalov has optimized KASAN's handling of alloc and free
     stack traces for secondary-level allocators, in the series 'kasan:
     save mempool stack traces'.

   - Andrey also performed some KASAN maintenance work in the series
     'kasan: assorted clean-ups'.

   - David Hildenbrand has gone to town on the rmap code. Cleanups, more
     pte batching, folio conversions and more. See the series 'mm/rmap:
     interface overhaul'.

   - Kinsey Ho has contributed some maintenance work on the MGLRU code
     in the series 'mm/mglru: Kconfig cleanup'.

   - Matthew Wilcox has contributed lruvec page accounting code cleanups
     in the series 'Remove some lruvec page accounting functions'"

* tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits)
  mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
  mm, treewide: introduce NR_PAGE_ORDERS
  selftests/mm: add separate UFFDIO_MOVE test for PMD splitting
  selftests/mm: skip test if application doesn't has root privileges
  selftests/mm: conform test to TAP format output
  selftests: mm: hugepage-mmap: conform to TAP format output
  selftests/mm: gup_test: conform test to TAP format output
  mm/selftests: hugepage-mremap: conform test to TAP format output
  mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING
  mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large
  mm/memcontrol: remove __mod_lruvec_page_state()
  mm/khugepaged: use a folio more in collapse_file()
  slub: use a folio in __kmalloc_large_node
  slub: use folio APIs in free_large_kmalloc()
  slub: use alloc_pages_node() in alloc_slab_page()
  mm: remove inc/dec lruvec page state functions
  mm: ratelimit stat flush from workingset shrinker
  kasan: stop leaking stack trace handles
  mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE
  mm/mglru: add dummy pmd_dirty()
  ...
2024-01-09 11:18:47 -08:00
Linus Torvalds
bfe8eb3b85 Scheduler changes for v6.8:
- Energy scheduling:
 
     - Consolidate how the max compute capacity is
       used in the scheduler and how we calculate
       the frequency for a level of utilization.
 
     - Rework interface between the scheduler and
       the schedutil governor
 
     - Simplify the util_est logic
 
  - Deadline scheduler:
 
     - Work more towards reducing SCHED_DEADLINE
       starvation of low priority tasks (e.g., SCHED_OTHER)
       tasks when higher priority tasks monopolize CPU
       cycles, via the introduction of 'deadline servers'
       (nested/2-level scheduling).
       "Fair servers" to make use of this facility are
       not introduced yet.
 
  - EEVDF:
 
     - Introduce O(1) fastpath for EEVDF task selection
 
  - NUMA balancing:
 
     - Tune the NUMA-balancing vma scanning logic some more,
       to better distribute the probability
       of a particular vma getting scanned.
 
  - Plus misc fixes, cleanups and updates.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmWcASMRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jLbg/+NOwF18M6klF1/3jUaV1PU09vRzYnnA7w
 oF7Tru7JLV+/vZK+rwI1zxzj5Nj3sVBQPIyp1embEHx7Z/QH8MIaIVpcSFsDDCYY
 Q8n6ZVRB+lKWEo5+Ti6JEJftDAWuLHXwFWDa57oWPuR0Tc736+zYHUfj7jdKk0RI
 nT/lnOT6hXU8q26O4QFrBrrhvCCxc4byo7buKPQfqie0bDA70ppIWkFQoQME6mvQ
 US9jvOyUipOiPV06DPwFvPDJUQBGq2VdJNk+5zCEtcqEfLREuo/Xq1Ww1x1BWaZI
 761532EuDo73iMK4IFZrvVmj1ioz957qbje11MSSkDdKj692xxjXyvnY0NBvZuho
 Ueog/jQ4D4I2qu7pPSCF8UfnI/Hw4Q+KJ89j3pcywRm4hmCTf9k3MGpAaVLVxH7G
 e5REZ5MSsFZi4Cs+zF87Of5KCKLhTr1qSetNtShinKahg06WZ+MZ8tW4jb52qy0j
 F8PMlvfBI3f7SOtA8s2P26mDGQ21YQehN2d5P+Fbwj/U3fjIlSTOyx6NwLpFwYaS
 Vf+fctchGFV1Sh7c2JjCh+ecYfXx3ghT/pvyPOImJtxtCKSRUQ8c26ApC1OsWfOE
 FdHv4f2dPqcyswCZzIv/2fyDXc9eaS2E05EMDNqVuMCGnzidzSs81n7hBioNMrnH
 ZgHK90TmEbw=
 =wTVh
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "Energy scheduling:

   - Consolidate how the max compute capacity is used in the scheduler
     and how we calculate the frequency for a level of utilization.

   - Rework interface between the scheduler and the schedutil governor

   - Simplify the util_est logic

  Deadline scheduler:

   - Work more towards reducing SCHED_DEADLINE starvation of low
     priority tasks (e.g., SCHED_OTHER) tasks when higher priority tasks
     monopolize CPU cycles, via the introduction of 'deadline servers'
     (nested/2-level scheduling).

     "Fair servers" to make use of this facility are not introduced yet.

  EEVDF:

   - Introduce O(1) fastpath for EEVDF task selection

  NUMA balancing:

   - Tune the NUMA-balancing vma scanning logic some more, to better
     distribute the probability of a particular vma getting scanned.

  Plus misc fixes, cleanups and updates"

* tag 'sched-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
  sched/fair: Fix tg->load when offlining a CPU
  sched/fair: Remove unused 'next_buddy_marked' local variable in check_preempt_wakeup_fair()
  sched/fair: Use all little CPUs for CPU-bound workloads
  sched/fair: Simplify util_est
  sched/fair: Remove SCHED_FEAT(UTIL_EST_FASTUP, true)
  arm64/amu: Use capacity_ref_freq() to set AMU ratio
  cpufreq/cppc: Set the frequency used for computing the capacity
  cpufreq/cppc: Move and rename cppc_cpufreq_{perf_to_khz|khz_to_perf}()
  energy_model: Use a fixed reference frequency
  cpufreq/schedutil: Use a fixed reference frequency
  cpufreq: Use the fixed and coherent frequency for scaling capacity
  sched/topology: Add a new arch_scale_freq_ref() method
  freezer,sched: Clean saved_state when restoring it during thaw
  sched/fair: Update min_vruntime for reweight_entity() correctly
  sched/doc: Update documentation after renames and synchronize Chinese version
  sched/cpufreq: Rework iowait boost
  sched/cpufreq: Rework schedutil governor performance estimation
  sched/pelt: Avoid underestimation of task utilization
  sched/timers: Explain why idle task schedules out on remote timer enqueue
  sched/cpuidle: Comment about timers requirements VS idle handler
  ...
2024-01-08 19:49:17 -08:00
Florian Fainelli
f0a78b3e2a arm64: Update __NR_compat_syscalls for statmount/listmount
Commit d8b0f5465012 ("wire up syscalls for statmount/listmount") added
two new system calls to arch/arm64/include/asm/unistd32.h but forgot to
update the __NR_compat_syscalls number, thus causing the following build
failures:

  arch/arm64/include/asm/unistd32.h:922:24: error: array index in initializer exceeds array bounds
    922 | #define __NR_statmount 457
        |                        ^~~
  arch/arm64/kernel/sys32.c:130:34: note: in definition of macro '__SYSCALL'
    130 | #define __SYSCALL(nr, sym)      [nr] = __arm64_##sym,
        |                                  ^~

Bump up the number by two to accomodate for the new system calls added.

Fixes: d8b0f5465012 ("wire up syscalls for statmount/listmount")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-01-08 18:09:54 -08:00
Linus Torvalds
ab5f3fcb7c arm64 updates for 6.8
* for-next/cpufeature
 
   - Remove ARM64_HAS_NO_HW_PREFETCH copy_page() optimisation for ye olde
     Thunder-X machines.
   - Avoid mapping KPTI trampoline when it is not required.
   - Make CPU capability API more robust during early initialisation.
 
 * for-next/early-idreg-overrides
 
   - Remove dependencies on core kernel helpers from the early
     command-line parsing logic in preparation for moving this code
     before the kernel is mapped.
 
 * for-next/fpsimd
 
   - Restore kernel-mode fpsimd context lazily, allowing us to run fpsimd
     code sequences in the kernel with pre-emption enabled.
 
 * for-next/kbuild
 
   - Install 'vmlinuz.efi' when CONFIG_EFI_ZBOOT=y.
   - Makefile cleanups.
 
 * for-next/lpa2-prep
 
   - Preparatory work for enabling the 'LPA2' extension, which will
     introduce 52-bit virtual and physical addressing even with 4KiB
     pages (including for KVM guests).
 
 * for-next/misc
 
   - Remove dead code and fix a typo.
 
 * for-next/mm
 
   - Pass NUMA node information for IRQ stack allocations.
 
 * for-next/perf
 
   - Add perf support for the Synopsys DesignWare PCIe PMU.
   - Add support for event counting thresholds (FEAT_PMUv3_TH) introduced
     in Armv8.8.
   - Add support for i.MX8DXL SoCs to the IMX DDR PMU driver.
   - Minor PMU driver fixes and optimisations.
 
 * for-next/rip-vpipt
 
   - Remove what support we had for the obsolete VPIPT I-cache policy.
 
 * for-next/selftests
 
   - Improvements to the SVE and SME selftests.
 
 * for-next/stacktrace
 
   - Refactor kernel unwind logic so that it can used by BPF unwinding
     and, eventually, reliable backtracing.
 
 * for-next/sysregs
 
   - Update a bunch of register definitions based on the latest XML drop
     from Arm.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmWWvKYQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNIiTB/9agZBkEhZjP2sNDGyE4UFwawweWHkt2r8h
 WyvdwP91Z/AIsYSsGYu36J0l4pOnMKp/i6t+rt031SK4j+Q8hJYhSfDt3RvVbc0/
 Pz9D18V6cLrfq+Yxycqq9ufVdjs+m+CQ5WeLaRGmNIyEzJ/Jv/qrAN+2r603EeLP
 nq08qMZhDIQd2ZzbigCnGaNrTsVSafFfBFv1GsgDvnMZAjs1G6457A6zu+NatNUc
 +TMSG+3EawutHZZ2noXl0Ra7VOfIbVZFiUssxRPenKQByHHHR+QB2c/O1blri+dm
 XLMutvqO2/WvYGIfXO5koqZqvpVeR3zXxPwmGi5hQBsmOjtXzKd+
 =U4mo
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "CPU features:

   - Remove ARM64_HAS_NO_HW_PREFETCH copy_page() optimisation for ye
     olde Thunder-X machines

   - Avoid mapping KPTI trampoline when it is not required

   - Make CPU capability API more robust during early initialisation

  Early idreg overrides:

   - Remove dependencies on core kernel helpers from the early
     command-line parsing logic in preparation for moving this code
     before the kernel is mapped

  FPsimd:

   - Restore kernel-mode fpsimd context lazily, allowing us to run
     fpsimd code sequences in the kernel with pre-emption enabled

  KBuild:

   - Install 'vmlinuz.efi' when CONFIG_EFI_ZBOOT=y

   - Makefile cleanups

  LPA2 prep:

   - Preparatory work for enabling the 'LPA2' extension, which will
     introduce 52-bit virtual and physical addressing even with 4KiB
     pages (including for KVM guests).

  Misc:

   - Remove dead code and fix a typo

  MM:

   - Pass NUMA node information for IRQ stack allocations

  Perf:

   - Add perf support for the Synopsys DesignWare PCIe PMU

   - Add support for event counting thresholds (FEAT_PMUv3_TH)
     introduced in Armv8.8

   - Add support for i.MX8DXL SoCs to the IMX DDR PMU driver.

   - Minor PMU driver fixes and optimisations

  RIP VPIPT:

   - Remove what support we had for the obsolete VPIPT I-cache policy

  Selftests:

   - Improvements to the SVE and SME selftests

  Stacktrace:

   - Refactor kernel unwind logic so that it can used by BPF unwinding
     and, eventually, reliable backtracing

  Sysregs:

   - Update a bunch of register definitions based on the latest XML drop
     from Arm"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (87 commits)
  kselftest/arm64: Don't probe the current VL for unsupported vector types
  efi/libstub: zboot: do not use $(shell ...) in cmd_copy_and_pad
  arm64: properly install vmlinuz.efi
  arm64/sysreg: Add missing system instruction definitions for FGT
  arm64/sysreg: Add missing system register definitions for FGT
  arm64/sysreg: Add missing ExtTrcBuff field definition to ID_AA64DFR0_EL1
  arm64/sysreg: Add missing Pauth_LR field definitions to ID_AA64ISAR1_EL1
  arm64: memory: remove duplicated include
  arm: perf: Fix ARCH=arm build with GCC
  arm64: Align boot cpucap handling with system cpucap handling
  arm64: Cleanup system cpucap handling
  MAINTAINERS: add maintainers for DesignWare PCIe PMU driver
  drivers/perf: add DesignWare PCIe PMU driver
  PCI: Move pci_clear_and_set_dword() helper to PCI header
  PCI: Add Alibaba Vendor ID to linux/pci_ids.h
  docs: perf: Add description for Synopsys DesignWare PCIe PMU driver
  arm64: irq: set the correct node for shadow call stack
  Revert "perf/arm_dmc620: Remove duplicate format attribute #defines"
  arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD
  arm64: fpsimd: Preserve/restore kernel mode NEON at context switch
  ...
2024-01-08 16:32:09 -08:00
Kirill A. Shutemov
5e0a760b44 mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has
changed the definition of MAX_ORDER to be inclusive.  This has caused
issues with code that was not yet upstream and depended on the previous
definition.

To draw attention to the altered meaning of the define, rename MAX_ORDER
to MAX_PAGE_ORDER.

Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-08 15:27:15 -08:00
Linus Torvalds
8c9440fea7 vfs-6.8.mount
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZU0CgAKCRCRxhvAZXjc
 osncAQDSJK0frJL+72NqXxa4YNzivrnuw6fhp5iaDAEqxdm8ygEAoJWyh7Rmkt8G
 drAXWGyGnCYqv7UgC6axLyciid7TxQg=
 =vJuv
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.8.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs mount updates from Christian Brauner:
 "This contains the work to retrieve detailed information about mounts
  via two new system calls. This is hopefully the beginning of the end
  of the saga that started with fsinfo() years ago.

  The LWN articles in [1] and [2] can serve as a summary so we can avoid
  rehashing everything here.

  At LSFMM in May 2022 we got into a room and agreed on what we want to
  do about fsinfo(). Basically, split it into pieces. This is the first
  part of that agreement. Specifically, it is concerned with retrieving
  information about mounts. So this only concerns the mount information
  retrieval, not the mount table change notification, or the extended
  filesystem specific mount option work. That is separate work.

  Currently mounts have a 32bit id. Mount ids are already in heavy use
  by libmount and other low-level userspace but they can't be relied
  upon because they're recycled very quickly. We agreed that mounts
  should carry a unique 64bit id by which they can be referenced
  directly. This is now implemented as part of this work.

  The new 64bit mount id is exposed in statx() through the new
  STATX_MNT_ID_UNIQUE flag. If the flag isn't raised the old mount id is
  returned. If it is raised and the kernel supports the new 64bit mount
  id the flag is raised in the result mask and the new 64bit mount id is
  returned. New and old mount ids do not overlap so they cannot be
  conflated.

  Two new system calls are introduced that operate on the 64bit mount
  id: statmount() and listmount(). A summary of the api and usage can be
  found on LWN as well (cf. [3]) but of course, I'll provide a summary
  here as well.

  Both system calls rely on struct mnt_id_req. Which is the request
  struct used to pass the 64bit mount id identifying the mount to
  operate on. It is extensible to allow for the addition of new
  parameters and for future use in other apis that make use of mount
  ids.

  statmount() mimicks the semantics of statx() and exposes a set flags
  that userspace may raise in mnt_id_req to request specific information
  to be retrieved. A statmount() call returns a struct statmount filled
  in with information about the requested mount. Supported requests are
  indicated by raising the request flag passed in struct mnt_id_req in
  the @mask argument in struct statmount.

  Currently we do support:

   - STATMOUNT_SB_BASIC:
     Basic filesystem info

   - STATMOUNT_MNT_BASIC
     Mount information (mount id, parent mount id, mount attributes etc)

   - STATMOUNT_PROPAGATE_FROM
     Propagation from what mount in current namespace

   - STATMOUNT_MNT_ROOT
     Path of the root of the mount (e.g., mount --bind /bla /mnt returns /bla)

   - STATMOUNT_MNT_POINT
     Path of the mount point (e.g., mount --bind /bla /mnt returns /mnt)

   - STATMOUNT_FS_TYPE
     Name of the filesystem type as the magic number isn't enough due to submounts

  The string options STATMOUNT_MNT_{ROOT,POINT} and STATMOUNT_FS_TYPE
  are appended to the end of the struct. Userspace can use the offsets
  in @fs_type, @mnt_root, and @mnt_point to reference those strings
  easily.

  The struct statmount reserves quite a bit of space currently for
  future extensibility. This isn't really a problem and if this bothers
  us we can just send a follow-up pull request during this cycle.

  listmount() is given a 64bit mount id via mnt_id_req just as
  statmount(). It takes a buffer and a size to return an array of the
  64bit ids of the child mounts of the requested mount. Userspace can
  thus choose to either retrieve child mounts for a mount in batches or
  iterate through the child mounts. For most use-cases it will be
  sufficient to just leave space for a few child mounts. But for big
  mount tables having an iterator is really helpful. Iterating through a
  mount table works by setting @param in mnt_id_req to the mount id of
  the last child mount retrieved in the previous listmount() call"

Link: https://lwn.net/Articles/934469 [1]
Link: https://lwn.net/Articles/829212 [2]
Link: https://lwn.net/Articles/950569 [3]

* tag 'vfs-6.8.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  add selftest for statmount/listmount
  fs: keep struct mnt_id_req extensible
  wire up syscalls for statmount/listmount
  add listmount(2) syscall
  statmount: simplify string option retrieval
  statmount: simplify numeric option retrieval
  add statmount(2) syscall
  namespace: extract show_path() helper
  mounts: keep list of mounts in an rbtree
  add unique mount ID
2024-01-08 10:57:34 -08:00
Paolo Bonzini
5f53d88f10 KVM/arm64 updates for Linux 6.8
- LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB
   base granule sizes. Branch shared with the arm64 tree.
 
 - Large Fine-Grained Trap rework, bringing some sanity to the
   feature, although there is more to come. This comes with
   a prefix branch shared with the arm64 tree.
 
 - Some additional Nested Virtualization groundwork, mostly
   introducing the NV2 VNCR support and retargetting the NV
   support to that version of the architecture.
 
 - A small set of vgic fixes and associated cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmWX4wUACgkQI9DQutE9
 ekM0DxAAvOJtM+m8ahv2tCSHZpwowkuKBBc7JWI75l4befHEOSvYMZwQwejrequa
 lPwLgx9t0sGjba+tRGv1JZMtnUBjV4V/lcrhX95AYTF5dfg7vbuTxUh/YFu1CaQ/
 MkuKVJ74PUWqpvDYSzwW8Jjqu6RskjW0HqVPMbFkmUWWc8cgExc8XD9M+nu0SrNT
 g5261KD53CUeyNaR0/+zkaHouq2Skeqw/u2d5OLdnY23hINMZ0qR1jYHj935suYy
 YrMTiMje1h/fs7YXWra4LmMcsg0V+3LZVQJXwRARrZdk2xkW5w+eLPIYjVqcA7aT
 VwhrtzjEzD56trrSZClOpj7MSVfQ8OjV7BgvSUpgLT5+kjVrFLIEMIOakiTOCoIJ
 weweRawTyomUoIsT1EkRmRYQkPH3Z552tcrztD/slYvqrtCB4JcHKF0O7BT88ZfM
 t2hRhlT+32KR9cOciLfFMzlZI1uKQYF8Z+CvvBA5TJ9Hv8JsIwF2E/NjYUy2ilca
 iDzF5KdZ/OLQzjwWVWDq9OlvepB2rLGQKNnw67jd1BSzd9Jj3eVuaI/9xRBrLDYR
 cBOMoIaZMy7Va+pop1zoFEhC7IbTglVHzsj2ch+4F1NB/1+Dd0zBQKbDUPqp5TR/
 OOuonTTVk9yH6RgpUULKlbRZ4oU70UoOBFBxCqnvng0cw1KBbbA=
 =Q6c+
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 6.8

- LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB
  base granule sizes. Branch shared with the arm64 tree.

- Large Fine-Grained Trap rework, bringing some sanity to the
  feature, although there is more to come. This comes with
  a prefix branch shared with the arm64 tree.

- Some additional Nested Virtualization groundwork, mostly
  introducing the NV2 VNCR support and retargetting the NV
  support to that version of the architecture.

- A small set of vgic fixes and associated cleanups.
2024-01-08 08:09:53 -05:00
Ingo Molnar
cdb3033e19 Merge branch 'sched/urgent' into sched/core, to pick up pending v6.7 fixes for the v6.8 merge window
This fix didn't make it upstream in time, pick it up
for the v6.8 merge window.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-01-08 12:57:28 +01:00
Will Deacon
db32cf8e28 Merge branch 'for-next/fixes' into for-next/core
Merge in arm64 fixes queued for 6.7 so that kpti_install_ng_mappings()
can be updated to use arm64_kernel_unmapped_at_el0() instead of checking
the ARM64_UNMAP_KERNEL_AT_EL0 CPU capability directly.

* for-next/fixes:
  arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify
  perf/arm-cmn: Fail DTC counter allocation correctly
  arm64: Avoid enabling KPTI unnecessarily
2024-01-04 12:32:33 +00:00
Will Deacon
3e8626b4ed Merge branch 'for-next/sysregs' into for-next/core
* for-next/sysregs:
  arm64/sysreg: Add missing system instruction definitions for FGT
  arm64/sysreg: Add missing system register definitions for FGT
  arm64/sysreg: Add missing ExtTrcBuff field definition to ID_AA64DFR0_EL1
  arm64/sysreg: Add missing Pauth_LR field definitions to ID_AA64ISAR1_EL1
  arm64/sysreg: Add new system registers for GCS
  arm64/sysreg: Add definition for FPMR
  arm64/sysreg: Update HCRX_EL2 definition for DDI0601 2023-09
  arm64/sysreg: Update SCTLR_EL1 for DDI0601 2023-09
  arm64/sysreg: Update ID_AA64SMFR0_EL1 definition for DDI0601 2023-09
  arm64/sysreg: Add definition for ID_AA64FPFR0_EL1
  arm64/sysreg: Add definition for ID_AA64ISAR3_EL1
  arm64/sysreg: Update ID_AA64ISAR2_EL1 defintion for DDI0601 2023-09
  arm64/sysreg: Add definition for ID_AA64PFR2_EL1
  arm64/sysreg: update CPACR_EL1 register
  arm64/sysreg: add system register POR_EL{0,1}
  arm64/sysreg: Add definition for HAFGRTR_EL2
  arm64/sysreg: Update HFGITR_EL2 definiton to DDI0601 2023-09
2024-01-04 12:28:38 +00:00
Will Deacon
41cff14b03 Merge branch 'for-next/stacktrace' into for-next/core
* for-next/stacktrace:
  arm64: stacktrace: factor out kunwind_stack_walk()
  arm64: stacktrace: factor out kernel unwind state
2024-01-04 12:28:31 +00:00
Will Deacon
30431774fe Merge branch 'for-next/rip-vpipt' into for-next/core
* for-next/rip-vpipt:
  arm64: Rename reserved values for CTR_EL0.L1Ip
  arm64: Kill detection of VPIPT i-cache policy
  KVM: arm64: Remove VPIPT I-cache handling
2024-01-04 12:28:14 +00:00
Will Deacon
65180649fa Merge branch 'for-next/misc' into for-next/core
* for-next/misc:
  arm64: memory: remove duplicated include
  arm64: Delete the zero_za macro
  Documentation/arch/arm64: Fix typo
2024-01-04 12:27:49 +00:00
Will Deacon
ccaeeec529 Merge branch 'for-next/lpa2-prep' into for-next/core
* for-next/lpa2-prep:
  arm64: mm: get rid of kimage_vaddr global variable
  arm64: mm: Take potential load offset into account when KASLR is off
  arm64: kernel: Disable latent_entropy GCC plugin in early C runtime
  arm64: Add ARM64_HAS_LPA2 CPU capability
  arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
  arm64/mm: Update tlb invalidation routines for FEAT_LPA2
  arm64/mm: Add lpa2_is_enabled() kvm_lpa2_is_enabled() stubs
  arm64/mm: Modify range-based tlbi to decrement scale
2024-01-04 12:27:42 +00:00
Will Deacon
88619527b4 Merge branch 'for-next/kbuild' into for-next/core
* for-next/kbuild:
  efi/libstub: zboot: do not use $(shell ...) in cmd_copy_and_pad
  arm64: properly install vmlinuz.efi
  arm64: replace <asm-generic/export.h> with <linux/export.h>
  arm64: vdso32: rename 32-bit debug vdso to vdso32.so.dbg
2024-01-04 12:27:35 +00:00
Will Deacon
79eb42b269 Merge branch 'for-next/fpsimd' into for-next/core
* for-next/fpsimd:
  arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD
  arm64: fpsimd: Preserve/restore kernel mode NEON at context switch
  arm64: fpsimd: Drop unneeded 'busy' flag
2024-01-04 12:27:29 +00:00
Paolo Bonzini
9cc52627c7 KVM/riscv changes for 6.8 part #1
- KVM_GET_REG_LIST improvement for vector registers
 - Generate ISA extension reg_list using macros in get-reg-list selftest
 - Steal time account support along with selftest
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmWQ+cgACgkQrUjsVaLH
 LAckBA//R4X9L5ugfPdDunp3ntjZXmNtBS5pM2jD+UvaoFn2kOA1o5kOD5mXluuh
 0imNjVuzlrX7XoAATQ4BoeoXg0whDbnv/8TE13KqSl1PfNziH2p5YD2DuHXPST3B
 V2VHrGACZ4wN074Whztl0oYI72obGnmpcsiglifkeCvRPerghHuDu40eUaWvCmgD
 DPwT+bjBgxkDZ4IheyytUrPql6reALh1Qo1vfj0FsJAgj+MAqQeD8n6rixSPnOdE
 9XXa4jdu7ycDp675VX/DVEWsNBQGPrsRK/uCiMksO36td+wLCKpAkvX95mE0w/L8
 qFJ+dN1c+1ZkjooHdVLfq2MjxaIRwmIowk7SeJbpvGIf/zG3r7eany7eXJT0+NjO
 22j5FY2z1NqcSG6Fazx76Qp2vVBVbxHShP9h7d6VTZYS7XENjmV6IWHpTSuSF8+n
 puj8Nf5C7WuqbySirSgQndDuKawn9myqfXXEoAuSiZ+kVyYEl8QnXm2gAIcxRDHX
 x+NDPMv0DpMBRO9qa/tXeqgNue/XOTJwgbmXzAlCNff3U7hPIHJ/5aZiJ/Re5TeE
 DxiU9AmIsNN2Bh0csS/wQbdScIqkOdOiDYEwT1DXOJWpmhiyCW7vR8ltaIuMJ4vP
 DtlfuUlSe4aml957nAiqqyjQAY/7gqmpoaGwu+lmrOX1K7fdtF0=
 =FeiG
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-6.8-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 6.8 part #1

- KVM_GET_REG_LIST improvement for vector registers
- Generate ISA extension reg_list using macros in get-reg-list selftest
- Steal time account support along with selftest
2024-01-02 13:19:40 -05:00
Paolo Bonzini
136292522e LoongArch KVM changes for v6.8
1. Optimization for memslot hugepage checking.
 2. Cleanup and fix some HW/SW timer issues.
 3. Add LSX/LASX (128bit/256bit SIMD) support.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmWGu+0WHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImesO7D/wOdYP96R+mRzpLBeuTtFxU8e4A
 3n2luxOeP8v1WYtQ9H8M01Wgly+9u6cJ2pgAlv79BQHfmCfC0aWQLmpnCZmk/mYW
 wtQ75ASA3Qg6zOBWEksCkA0LUdPDHfQuaaUXT7RYZ7QtHKSNkkhsw2nMCq6fgrXU
 RnZjGctjuxgYSqQtwzfYO2AjSBAfAq1MjSzCTULJ0KkE8o5Bg0KOoGj8ijC1U+ua
 QWBnqTNzeKmYmqAFfhXoiiFYcuBUq7DEk5RtwDU7SeqqJEV3a8AbbsrWfz+wMemG
 gri95uRxvnhpPZ+6/PrVjIezqexPJmQ9+tjY6mxh/bPRnS5ICFygjV3lt050JUK8
 xIaJEFvl7g88RIz5mnTeM9tU4ibIsCLgA9zj33ps2H7QP5NazUm1dzk1YGAgqPdw
 m5hjwtTFQEujQM6cz1DLfhoi15VDNcYUonJIvGFZMhl7InitDpB3u9sI+AVGIVUG
 yKzBkqGB1L1vbJGnuWmspEqSUo7Z9iYzuVGbOnjc9LKQ/8OpLxj0brymYheA+CKG
 CIdULximQFVEHc2lbE+H+bW4hnrFP4sN9hlTng7KN7ommCIg+FltisM8Nt5NLWID
 9ywLj4Qa0Qrc5vB3FJ8+ksuDe2nD83uVLj247R7B0wxQcYw4ocyW/YU+gayF4EjY
 6azutwllW5ZB+I3hyw==
 =phol
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-kvm-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD

LoongArch KVM changes for v6.8

1. Optimization for memslot hugepage checking.
2. Cleanup and fix some HW/SW timer issues.
3. Add LSX/LASX (128bit/256bit SIMD) support.
2024-01-02 13:16:29 -05:00
Andrey Konovalov
27232ba96c kasan/arm64: improve comments for KASAN_SHADOW_START/END
Patch series "kasan: assorted clean-ups".

Code clean-ups, nothing worthy of being backported to stable.


This patch (of 11):

Unify and improve the comments for KASAN_SHADOW_START/END definitions from
include/asm/kasan.h and include/asm/memory.h.

Also put both definitions together in include/asm/memory.h.

Also clarify the related BUILD_BUG_ON checks in mm/kasan_init.c.

Link: https://lkml.kernel.org/r/cover.1703188911.git.andreyknvl@google.com
Link: https://lkml.kernel.org/r/140108ca0b164648c395a41fbeecb0601b1ae9e1.1703188911.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29 11:58:43 -08:00
Vincent Guittot
9942cb22ea sched/topology: Add a new arch_scale_freq_ref() method
Create a new method to get a unique and fixed max frequency. Currently
cpuinfo.max_freq or the highest (or last) state of performance domain are
used as the max frequency when computing the frequency for a level of
utilization, but:

  - cpuinfo_max_freq can change at runtime. boost is one example of
    such change.

  - cpuinfo.max_freq and last item of the PD can be different leading to
    different results between cpufreq and energy model.

We need to save the reference frequency that has been used when computing
the CPUs capacity and use this fixed and coherent value to convert between
frequency and CPU's capacity.

In fact, we already save the frequency that has been used when computing
the capacity of each CPU. We extend the precision to save kHz instead of
MHz currently and we modify the type to be aligned with other variables
used when converting frequency to capacity and the other way.

[ mingo: Minor edits. ]

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20231211104855.558096-2-vincent.guittot@linaro.org
2023-12-23 15:52:34 +01:00
Linus Torvalds
a4aebe9365 posix-timers: Get rid of [COMPAT_]SYS_NI() uses
Only the posix timer system calls use this (when the posix timer support
is disabled, which does not actually happen in any normal case), because
they had debug code to print out a warning about missing system calls.

Get rid of that special case, and just use the standard COND_SYSCALL
interface that creates weak system call stubs that return -ENOSYS for
when the system call does not exist.

This fixes a kCFI issue with the SYS_NI() hackery:

  CFI failure at int80_emulation+0x67/0xb0 (target: sys_ni_posix_timers+0x0/0x70; expected type: 0xb02b34d9)
  WARNING: CPU: 0 PID: 48 at int80_emulation+0x67/0xb0

Reported-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-20 21:30:27 -08:00
Kent Overstreet
04bc786d66 arm64: Fix circular header dependency
Replace linux/percpu.h include with asm/percpu.h to avoid circular
dependency.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-12-20 19:26:30 -05:00
Marc Zyngier
d016264d07 Merge branch kvm-arm64/nv-6.8-prefix into kvmarm-master/next
* kvm-arm64/nv-6.8-prefix:
  : .
  : Nested Virtualization support update, focussing on the
  : NV2 support (VNCR mapping and such).
  : .
  KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
  KVM: arm64: nv: Map VNCR-capable registers to a separate page
  KVM: arm64: nv: Add EL2_REG_VNCR()/EL2_REG_REDIR() sysreg helpers
  KVM: arm64: Introduce a bad_trap() primitive for unexpected trap handling
  KVM: arm64: nv: Add include containing the VNCR_EL2 offsets
  KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
  KVM: arm64: nv: Drop EL12 register traps that are redirected to VNCR
  KVM: arm64: nv: Compute NV view of idregs as a one-off
  KVM: arm64: nv: Hoist vcpu_has_nv() into is_hyp_ctxt()
  arm64: cpufeatures: Restrict NV support to FEAT_NV2

Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-12-19 10:06:58 +00:00
Marc Zyngier
fedc612314 KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
KVM internally uses accessor functions when reading or writing the
guest's system registers. This takes care of accessing either the stored
copy or using the "live" EL1 system registers when the host uses VHE.

With the introduction of virtual EL2 we add a bunch of EL2 system
registers, which now must also be taken care of:

- If the guest is running in vEL2, and we access an EL1 sysreg, we must
  revert to the stored version of that, and not use the CPU's copy.

- If the guest is running in vEL1, and we access an EL2 sysreg, we must
  also use the stored version, since the CPU carries the EL1 copy.

- Some EL2 system registers are supposed to affect the current execution
  of the system, so we need to put them into their respective EL1
  counterparts. For this we need to define a mapping between the two.

- Some EL2 system registers have a different format than their EL1
  counterpart, so we need to translate them before writing them to the
  CPU. This is done using an (optional) translate function in the map.

All of these cases are now wrapped into the existing accessor functions,
so KVM users wouldn't need to care whether they access EL2 or EL1
registers and also which state the guest is in.

Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Co-developed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-12-19 10:02:53 +00:00