Commit Graph

1213381 Commits

Author SHA1 Message Date
Uros Bizjak
59bec00ace x86/percpu: Introduce %rip-relative addressing to PER_CPU_VAR()
Introduce x86_64 %rip-relative addressing to the PER_CPU_VAR() macro.
Instructions using %rip-relative address operand are one byte shorter
than their absolute address counterparts and are also compatible with
position independent executable (-fpie) builds. The patch reduces
code size of a test kernel build by 150 bytes.

The PER_CPU_VAR() macro is intended to be applied to a symbol and should
not be used with register operands. Introduce the new __percpu macro and
use it in cmpxchg{8,16}b_emu.S instead.

Also add a missing function comment to this_cpu_cmpxchg8b_emu().

No functional changes intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sean Christopherson <seanjc@google.com>
2023-10-20 12:19:51 +02:00
Uros Bizjak
aa47f90cd4 x86/percpu, xen: Correct PER_CPU_VAR() usage to include symbol and its addend
The PER_CPU_VAR() macro should be applied to a symbol and its addend.
Inconsistent usage is currently harmless, but needs to be corrected
before %rip-relative addressing is introduced to the PER_CPU_VAR() macro.

No functional changes intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: linux-kernel@vger.kernel.org
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sean Christopherson <seanjc@google.com>
2023-10-20 12:19:51 +02:00
Uros Bizjak
39d64ee59c x86/percpu: Correct PER_CPU_VAR() usage to include symbol and its addend
The PER_CPU_VAR() macro should be applied to a symbol and its addend.
Inconsistent usage is currently harmless, but needs to be corrected
before %rip-relative addressing is introduced to the PER_CPU_VAR() macro.

No functional changes intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sean Christopherson <seanjc@google.com>
2023-10-20 12:19:51 +02:00
Linus Torvalds
24b8a23638 x86/fpu: Clean up FPU switching in the middle of task switching
It happens to work, but it's very very wrong, because our 'current'
macro is magic that is supposedly loading a stable value.

It just happens to be not quite stable enough and the compilers
re-load the value enough for this code to work.  But it's wrong.

The whole

        struct fpu *prev_fpu = &prev->fpu;

thing in __switch_to() is pretty ugly. There's no reason why we
should look at that 'prev_fpu' pointer there, or pass it down.

And it only generates worse code, in how it loads 'current' when
__switch_to() has the right task pointers.

The attached patch not only cleans this up, it actually
generates better code too:

 (a) it removes one push/pop pair at entry/exit because there's one
     less register used (no 'current')

 (b) it removes that pointless load of 'current' because it just uses
     the right argument:

	-       movq    %gs:pcpu_hot(%rip), %r12
	-       testq   $16384, (%r12)
	+       testq   $16384, (%rdi)

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20231018184227.446318-1-ubizjak@gmail.com
2023-10-20 11:24:22 +02:00
Uros Bizjak
e39828d2c1 x86/percpu: Use the correct asm operand modifier in percpu_stable_op()
The "P" asm operand modifier is a x86 target-specific modifier.

When used for a constant, it drops all syntax-specific prefixes and
issues the bare constant. This modifier is not correct for address
handling, in this case a generic "a" operand modifier should be used.

The "a" asm operand modifier substitutes a memory reference, with the
actual operand treated as address.  For x86_64, when a symbol is
provided, the "a" modifier emits "sym(%rip)" instead of "sym",
enabling shorter %rip-relative addressing.

Clang allows only "i" and "r" operand constraints with an "a" modifier,
so the patch normalizes the modifier/constraint pair to "a"/"i"
which is consistent between both compilers.

The patch reduces code size of a test build by 4072 bytes:

   text            data     bss    dec             hex     filename
   25523268        4388300  808452 30720020        1d4c014 vmlinux-old.o
   25519196        4388300  808452 30715948        1d4b02c vmlinux-new.o

[ mingo: Changelog clarity. ]

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20231016200755.287403-1-ubizjak@gmail.com
2023-10-18 14:09:16 +02:00
Uros Bizjak
1d10f3aec2 x86/percpu: Use C for arch_raw_cpu_ptr(), to improve code generation
Implement arch_raw_cpu_ptr() in C to allow the compiler to perform
better optimizations, such as setting an appropriate base to compute
the address. The compiler is free to choose either MOV or ADD from
this_cpu_off address to construct the optimal final address.

There are some other issues when memory access to the percpu area is
implemented with an asm. Compilers can not eliminate asm common
subexpressions over basic block boundaries, but are extremely good
at optimizing memory access. By implementing arch_raw_cpu_ptr() in C,
the compiler can eliminate additional redundant loads from this_cpu_off,
further reducing the number of percpu offset reads from 1646 to 1631
on a test build, a -0.9% reduction.

Co-developed-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20231015202523.189168-2-ubizjak@gmail.com
2023-10-16 12:52:02 +02:00
Uros Bizjak
a048d3abae x86/percpu: Rewrite arch_raw_cpu_ptr() to be easier for compilers to optimize
Implement arch_raw_cpu_ptr() as a load from this_cpu_off and then
add the ptr value to the base. This way, the compiler can propagate
addend to the following instruction and simplify address calculation.

E.g.: address calcuation in amd_pmu_enable_virt() improves from:

    48 c7 c0 00 00 00 00	mov    $0x0,%rax
	87b7: R_X86_64_32S	cpu_hw_events

    65 48 03 05 00 00 00	add    %gs:0x0(%rip),%rax
    00
	87bf: R_X86_64_PC32	this_cpu_off-0x4

    48 c7 80 28 13 00 00	movq   $0x0,0x1328(%rax)
    00 00 00 00

to:

    65 48 8b 05 00 00 00	mov    %gs:0x0(%rip),%rax
    00
	8798: R_X86_64_PC32	this_cpu_off-0x4
    48 c7 80 00 00 00 00	movq   $0x0,0x0(%rax)
    00 00 00 00
	87a6: R_X86_64_32S	cpu_hw_events+0x1328

The compiler also eliminates additional redundant loads from
this_cpu_off, reducing the number of percpu offset reads
from 1668 to 1646 on a test build, a -1.3% reduction.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20231015202523.189168-1-ubizjak@gmail.com
2023-10-16 12:51:58 +02:00
Uros Bizjak
e29aad08b1 x86/percpu: Disable named address spaces for KASAN
-fsanitize=kernel-address (KASAN) is at the moment incompatible
with named address spaces - see GCC PR sanitizer/111736:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736

GCC is doing a KASAN check on a percpu address which it shouldn't do,
and didn't used to do because we did the access using inline asm.

But now that GCC does the accesses as normal (albeit special address
space) memory accesses, the KASAN code triggers on them too, and it
all goes to hell in a handbasket very quickly.

Those percpu accessor functions need to disable any KASAN
checking or other sanitizer checking. Not on the percpu address,
because that's not a "real" address, it's obviously just the offset
from the segment register.

And GCC should probably not have generated such code in the first
place, so arguably this is a bug with -fsanitize=kernel-address.

The patch also removes a stale dependency on CONFIG_SMP.

Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20231009151409.53656-1-ubizjak@gmail.com

Closes: https://lore.kernel.org/oe-lkp/202310071301.a5113890-oliver.sang@intel.com
2023-10-10 23:57:35 +02:00
Uros Bizjak
ca42563486 x86/percpu: Use C for percpu read/write accessors
The percpu code mostly uses inline assembly. Using segment qualifiers
allows to use C code instead, which enables the compiler to perform
various optimizations (e.g. propagation of memory arguments). Convert
percpu read and write accessors to C code, so the memory argument can
be propagated to the instruction that uses this argument.

Some examples of propagations:

a) into sign/zero extensions:

the code improves from:

    65 8a 05 00 00 00 00    mov    %gs:0x0(%rip),%al
    0f b6 c0                movzbl %al,%eax

to:

    65 0f b6 05 00 00 00    movzbl %gs:0x0(%rip),%eax
    00

and in a similar way for:

    movzbl %gs:0x0(%rip),%edx
    movzwl %gs:0x0(%rip),%esi
    movzbl %gs:0x78(%rbx),%eax

    movslq %gs:0x0(%rip),%rdx
    movslq %gs:(%rdi),%rbx

b) into compares:

the code improves from:

    65 8b 05 00 00 00 00    mov    %gs:0x0(%rip),%eax
    a9 00 00 0f 00          test   $0xf0000,%eax

to:

    65 f7 05 00 00 00 00    testl  $0xf0000,%gs:0x0(%rip)
    00 00 0f 00

and in a similar way for:

    testl  $0xf0000,%gs:0x0(%rip)
    testb  $0x1,%gs:0x0(%rip)
    testl  $0xff00,%gs:0x0(%rip)

    cmpb   $0x0,%gs:0x0(%rip)
    cmp    %gs:0x0(%rip),%r14d
    cmpw   $0x8,%gs:0x0(%rip)
    cmpb   $0x0,%gs:(%rax)

c) into other insns:

the code improves from:

   1a355:	83 fa ff             	cmp    $0xffffffff,%edx
   1a358:	75 07                	jne    1a361 <...>
   1a35a:	65 8b 15 00 00 00 00 	mov    %gs:0x0(%rip),%edx
   1a361:

to:

   1a35a:	83 fa ff             	cmp    $0xffffffff,%edx
   1a35d:	65 0f 44 15 00 00 00 	cmove  %gs:0x0(%rip),%edx
   1a364:	00

The above propagations result in the following code size
improvements for current mainline kernel (with the default config),
compiled with:

   # gcc (GCC) 12.3.1 20230508 (Red Hat 12.3.1-1)

   text            data     bss    dec             filename
   25508862        4386540  808388 30703790        vmlinux-vanilla.o
   25500922        4386532  808388 30695842        vmlinux-new.o

Co-developed-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20231004192404.31733-1-ubizjak@gmail.com
2023-10-05 09:01:53 +02:00
Nadav Amit
9a462b9eaf x86/percpu: Use compiler segment prefix qualifier
Using a segment prefix qualifier is cleaner than using a segment prefix
in the inline assembly, and provides the compiler with more information,
telling it that __seg_gs:[addr] is different than [addr] when it
analyzes data dependencies. It also enables various optimizations that
will be implemented in the next patches.

Use segment prefix qualifiers when they are supported. Unfortunately,
gcc does not provide a way to remove segment qualifiers, which is needed
to use typeof() to create local instances of the per-CPU variable. For
this reason, do not use the segment qualifier for per-CPU variables, and
do casting using the segment qualifier instead.

Uros: Improve compiler support detection and update the patch
to the current mainline.

Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20231004145137.86537-4-ubizjak@gmail.com
2023-10-05 09:01:52 +02:00
Uros Bizjak
1ca3683cc6 x86/percpu: Enable named address spaces with known compiler version
Enable named address spaces with known compiler versions
(GCC 12.1 and later) in order to avoid possible issues with named
address spaces with older compilers. Set CC_HAS_NAMED_AS when the
compiler satisfies version requirements and set USE_X86_SEG_SUPPORT
to signal when segment qualifiers could be used.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20231004145137.86537-3-ubizjak@gmail.com
2023-10-05 09:01:52 +02:00
Zhu Wang
8ae292c66d x86/lib: Address kernel-doc warnings
Fix all kernel-doc warnings in csum-wrappers_64.c:

  arch/x86/lib/csum-wrappers_64.c:25: warning: Excess function parameter 'isum' description in 'csum_and_copy_from_user'
  arch/x86/lib/csum-wrappers_64.c:25: warning: Excess function parameter 'errp' description in 'csum_and_copy_from_user'
  arch/x86/lib/csum-wrappers_64.c:49: warning: Excess function parameter 'isum' description in 'csum_and_copy_to_user'
  arch/x86/lib/csum-wrappers_64.c:49: warning: Excess function parameter 'errp' description in 'csum_and_copy_to_user'
  arch/x86/lib/csum-wrappers_64.c:71: warning: Excess function parameter 'sum' description in 'csum_partial_copy_nocheck'

Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
2023-10-03 22:46:47 +02:00
Xin Li (Intel)
1882366217 x86/entry: Fix typos in comments
Fix 2 typos in the comments.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Link: https://lore.kernel.org/r/20230926061319.1929127-1-xin@zytor.com
2023-09-27 10:05:04 +02:00
Xin Li (Intel)
da4aff622a x86/entry: Remove unused argument %rsi passed to exc_nmi()
exc_nmi() only takes one argument of type struct pt_regs *, but
asm_exc_nmi() calls it with 2 arguments. The second one passed
in %rsi seems to be a leftover, so simply remove it.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Link: https://lore.kernel.org/r/20230926061319.1929127-1-xin@zytor.com
2023-09-27 10:04:54 +02:00
Ingo Molnar
ad42474325 x86/bitops: Remove unused __sw_hweight64() assembly implementation on x86-32
Header cleanups in the fast-headers tree highlighted that we have an
unused assembly implementation for __sw_hweight64():

    WARNING: modpost: EXPORT symbol "__sw_hweight64" [vmlinux] version ...

__arch_hweight64() on x86-32 is defined in the
arch/x86/include/asm/arch_hweight.h header as an inline, using
__arch_hweight32():

  #ifdef CONFIG_X86_32
  static inline unsigned long __arch_hweight64(__u64 w)
  {
          return  __arch_hweight32((u32)w) +
                  __arch_hweight32((u32)(w >> 32));
  }

*But* there's also a __sw_hweight64() assembly implementation:

  arch/x86/lib/hweight.S

  SYM_FUNC_START(__sw_hweight64)
  #ifdef CONFIG_X86_64
  ...
  #else /* CONFIG_X86_32 */
        /* We're getting an u64 arg in (%eax,%edx): unsigned long hweight64(__u64 w) */
        pushl   %ecx

        call    __sw_hweight32
        movl    %eax, %ecx                      # stash away result
        movl    %edx, %eax                      # second part of input
        call    __sw_hweight32
        addl    %ecx, %eax                      # result

        popl    %ecx
        ret
  #endif

But this __sw_hweight64 assembly implementation is unused - and it's
essentially doing the same thing that the inline wrapper does.

Remove the assembly version and add a comment about it.

Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
2023-09-22 09:34:50 +02:00
Uros Bizjak
7c097ca50d x86/percpu: Do not clobber %rsi in percpu_{try_,}cmpxchg{64,128}_op
The fallback alternative uses %rsi register to manually load pointer
to the percpu variable before the call to the emulation function.
This is unoptimal, because the load is hidden from the compiler.

Move the load of %rsi outside inline asm, so the compiler can
reuse the value. The code in slub.o improves from:

    55ac:	49 8b 3c 24          	mov    (%r12),%rdi
    55b0:	48 8d 4a 40          	lea    0x40(%rdx),%rcx
    55b4:	49 8b 1c 07          	mov    (%r15,%rax,1),%rbx
    55b8:	4c 89 f8             	mov    %r15,%rax
    55bb:	48 8d 37             	lea    (%rdi),%rsi
    55be:	e8 00 00 00 00       	callq  55c3 <...>
			55bf: R_X86_64_PLT32	this_cpu_cmpxchg16b_emu-0x4
    55c3:	75 a3                	jne    5568 <...>
    55c5:	...

 0000000000000000 <.altinstr_replacement>:
   5:	65 48 0f c7 0f       	cmpxchg16b %gs:(%rdi)

to:

    55ac:	49 8b 34 24          	mov    (%r12),%rsi
    55b0:	48 8d 4a 40          	lea    0x40(%rdx),%rcx
    55b4:	49 8b 1c 07          	mov    (%r15,%rax,1),%rbx
    55b8:	4c 89 f8             	mov    %r15,%rax
    55bb:	e8 00 00 00 00       	callq  55c0 <...>
			55bc: R_X86_64_PLT32	this_cpu_cmpxchg16b_emu-0x4
    55c0:	75 a6                	jne    5568 <...>
    55c2:	...

Where the alternative replacement instruction now uses %rsi:

 0000000000000000 <.altinstr_replacement>:
   5:	65 48 0f c7 0e       	cmpxchg16b %gs:(%rsi)

The instruction (effectively a reg-reg move) at 55bb: in the original
assembly is removed. Also, both the CALL and replacement CMPXCHG16B
are 5 bytes long, removing the need for NOPs in the asm code.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230918151452.62344-1-ubizjak@gmail.com
2023-09-21 09:35:50 +02:00
Uros Bizjak
b8e3dfa16e x86/percpu: Use raw_cpu_try_cmpxchg() in preempt_count_set()
Use raw_cpu_try_cmpxchg() instead of raw_cpu_cmpxchg(*ptr, old, new) == old.
x86 CMPXCHG instruction returns success in ZF flag, so this change saves a
compare after CMPXCHG (and related MOV instruction in front of CMPXCHG).

Also, raw_cpu_try_cmpxchg() implicitly assigns old *ptr value to "old" when
cmpxchg fails. There is no need to re-read the value in the loop.

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230830151623.3900-2-ubizjak@gmail.com
2023-09-15 13:19:22 +02:00
Uros Bizjak
5f863897d9 x86/percpu: Define raw_cpu_try_cmpxchg and this_cpu_try_cmpxchg()
Define target-specific raw_cpu_try_cmpxchg_N() and
this_cpu_try_cmpxchg_N() macros. These definitions override
the generic fallback definitions and enable target-specific
optimized implementations.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230830151623.3900-1-ubizjak@gmail.com
2023-09-15 13:18:23 +02:00
Uros Bizjak
54cd971c6f x86/percpu: Define {raw,this}_cpu_try_cmpxchg{64,128}
Define target-specific {raw,this}_cpu_try_cmpxchg64() and
{raw,this}_cpu_try_cmpxchg128() macros. These definitions override
the generic fallback definitions and enable target-specific
optimized implementations.

Several places in mm/slub.o improve from e.g.:

    53bc:	48 8d 4f 40          	lea    0x40(%rdi),%rcx
    53c0:	48 89 fa             	mov    %rdi,%rdx
    53c3:	49 8b 5c 05 00       	mov    0x0(%r13,%rax,1),%rbx
    53c8:	4c 89 e8             	mov    %r13,%rax
    53cb:	49 8d 30             	lea    (%r8),%rsi
    53ce:	e8 00 00 00 00       	call   53d3 <...>
			53cf: R_X86_64_PLT32	this_cpu_cmpxchg16b_emu-0x4
    53d3:	48 31 d7             	xor    %rdx,%rdi
    53d6:	4c 31 e8             	xor    %r13,%rax
    53d9:	48 09 c7             	or     %rax,%rdi
    53dc:	75 ae                	jne    538c <...>

to:

    53bc:	48 8d 4a 40          	lea    0x40(%rdx),%rcx
    53c0:	49 8b 1c 07          	mov    (%r15,%rax,1),%rbx
    53c4:	4c 89 f8             	mov    %r15,%rax
    53c7:	48 8d 37             	lea    (%rdi),%rsi
    53ca:	e8 00 00 00 00       	call   53cf <...>
			53cb: R_X86_64_PLT32	this_cpu_cmpxchg16b_emu-0x4
    53cf:	75 bb                	jne    538c <...>

reducing the size of mm/slub.o by 80 bytes:

   text    data     bss     dec     hex filename
  39758    5337    4208   49303    c097 slub-new.o
  39838    5337    4208   49383    c0e7 slub-old.o

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230906185941.53527-1-ubizjak@gmail.com
2023-09-15 13:16:35 +02:00
Nick Desaulniers
3dae5c43ba x86/asm/bitops: Use __builtin_clz{l|ll} to evaluate constant expressions
Micro-optimize the bitops code some more, similar to commits:

  fdb6649ab7 ("x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions")
  2fcff790dc ("powerpc: Use builtin functions for fls()/__fls()/fls64()")

From a recent discussion, I noticed that x86 is lacking an optimization
that appears in arch/powerpc/include/asm/bitops.h related to constant
folding.  If you add a BUILD_BUG_ON(__builtin_constant_p(param)) to
these functions, you'll find that there were cases where the use of
inline asm pessimized the compiler's ability to perform constant folding
resulting in runtime calculation of a value that could have been
computed at compile time.

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230828-x86_fls-v1-1-e6a31b9f79c3@google.com
2023-09-07 00:05:50 +02:00
Linus Torvalds
4accdb9895 Updates for clocksource/clockevent drivers:
- Remove the OXNAS driver instead of adding a new one!
 
   - A set of boring fixes, cleanups and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmT1ixwTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaIAEACUw02RVWQEbsmJCcnyTPAOoP99xphv
 IPH32UL9Iuq2rgsGOfTGDw8be2bnlUw8d2FEsUtR0lnwfsBDzQKxh+iHWUPzhosX
 VbhcN/tHC0KXzWL/dKPiCjS2CVRMlSRu39XN2+9typmf0GpK1IQzDHjKj5Z95Vdp
 Zb1tPm9N18ZSdT04dNRdNlkmdCjjMaPE9h0hpyIqALQD2TmTs9CwO7nxbf7hXJTB
 4tLGOW1MznuIAhwLK0toSHmMKv8YTNn9arwDZ6PDDFpff06Q+Gw5YitsgzTFNNIh
 TlLQdq04DOiXOh47VwiU0Ixc/JdlxN0pxscfYN0KP6Odjs+IkE+cSyhfYNFQTaFc
 qN4C4j0FpCkFmsIeLpCg9FAU0eqA/ViRR1NR+TIJW9jRYMaJOBD3s9Kehv1Xj5di
 hOCh0EvU/TFZ2v1ZgSD3RWgIcvGnijKxvw1Rrh+ZTJTX2LnkXRJqQY9PTKfIFNLM
 LJoo/oXp6aAcpnrxIMvG2splxRihqpuzUpY+Y0SnOQnbgbPeZ3/lx7PMiXrfDIr7
 y8npY/2upKLjC5TlbCX42FBTulQfeLWT8TI6yGwPOQudigdyeBDWNV3+lpLMxrJe
 s/MbARoOM9Nx2toz9qFa+yuKTcJtaSQfwmcwP1jQRei8O+MXkDe7HmfRnp0bMPF3
 a5A0B/fdRjEEwQ==
 =dGVZ
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2023-09-04-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull clocksource/clockevent driver updates from Thomas Gleixner:

 - Remove the OXNAS driver instead of adding a new one!

 - A set of boring fixes, cleanups and improvements

* tag 'timers-core-2023-09-04-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: Explicitly include correct DT includes
  clocksource/drivers/sun5i: Convert to platform device driver
  clocksource/drivers/sun5i: Remove pointless struct
  clocksource/drivers/sun5i: Remove duplication of code and data
  clocksource/drivers/loongson1: Set variable ls1x_timer_lock storage-class-specifier to static
  clocksource/drivers/arm_arch_timer: Disable timer before programming CVAL
  dt-bindings: timer: oxsemi,rps-timer: remove obsolete bindings
  clocksource/drivers/timer-oxnas-rps: Remove obsolete timer driver
2023-09-04 13:15:57 -07:00
Linus Torvalds
7a1415eebe m68knommu: updates and fixes for v6.6
Fixes include:
 . white space cleanup in dma code
 . removal of unused pcibios_setup() code
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEmsfM6tQwfNjBOxr3TiQVqaG9L4AFAmT1K6cACgkQTiQVqaG9
 L4B3WxAAn96xP54QdAh5HAXoCyFTiZX8cd+Z8mtSzHMVVht+fs93xlJKBWaBbNh0
 NWRLAXpWq8aip10PX4gPx/LQmInlBCr95TvJdiefyAXBQ0vyH4rDIXmJofu3Sf4L
 efe7z7ENGwuqHCS7zckF3OzZ2lt4OcuvjNc6xPnFtKG4aM32OSxlvmuyD0nofwxF
 uHfA/5Zq6n+RdijXu81xwPXUhpQUAd+kH6SRpsxC8+gdGlqI4dDYtVdyBZesHGFY
 dbPGbl+UwnHuVVVgOycJilXGAHZ/U+yqrJasSRPD3IvXrw4a4lWIRnCsnoiatkmp
 K6NGqEFcCHZ8VJT0GMyne4QC8YYKbcMuu7LFkeunPoZ6Viio/W6LrW74KaBV5OjA
 jOWUb+olR2Wd3yzfzakwszn7p15aDO6FoV7AHEWcQ0EeJQ6OQgDHWHfO8H5dHBS0
 ve7Tr8Z3f01ahRZAjAqxvuTdckfP0GZNMIaopWlIpHsVFNQXvuvNyaYU/N+jhKC5
 9hxe7YHoY52yNYZVDC7vk8UR8aRT+E3GmuuRa+X3jCQ7AM19pl5A8jbVZRWloM00
 Qla5fweqDyblvfwkfpjIDOxlDnAOvs/a1ap8KM5BK6dbVHaHbm3wT7QdeiqTeDID
 PBIQbl9nHGNPApsbprWE4t6eW5i0M9GMM/lzWk7cpa2LJnRDkGY=
 =erBK
 -----END PGP SIGNATURE-----

Merge tag 'm68knommu-for-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu

Pull m68knommu updates from Greg Ungerer:
 "Two changes, one a trivial white space clean up, the other removes the
  unnecessary local pcibios_setup() code"

* tag 'm68knommu-for-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68k: coldfire: dma_timer: ERROR: "foo __init bar" should be "foo __init bar"
  m68k/pci: Drop useless pcibios_setup()
2023-09-04 11:34:33 -07:00
Linus Torvalds
68d76d4e7e This pull request contains the following changes for UML:
- Drop 32-bit checksum implementation and re-use it from arch/x86
 - String function cleanup
 - Fixes for -Wmissing-variable-declarations and -Wmissing-prototypes builds
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmT0158WHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wTUvD/4yx5F1tFljUrsBipHtg8NX81Wk
 /yqidBnug71S76BhkoOHX1P+qxzl3WS+zp35zclL2/kPPew1vc5TpVXXFdZa5Xkh
 yaEPGSNKY+yFnaxYWrg/fWYYJbBS+HLDB+Am3fyYdeiXmAEV9K/5HVptZOnDqXnM
 0hJf8hKzkfmn0eaXBlvyfZaQSVofK3R9lKalDXMOwc0vp7i1gEHWuN3K1jjM6WCx
 /lPkpHKErRCDGyCtr/dY9J0FgCROo5ytSQq7KIBat+F+JBGZh7pcruEbl11fg1Ed
 w4RjbcwYITBNHOlrb2kYvxIu7/clxgzYWtcarS0GB3o7Iu8ZQXijWKXakeRV2DBx
 by6NAxliZZDp+QmRpPMDgZPPbN5BQY3GDCYY8eAQEAllrTZXE0ZJyIgBPYX6j4bF
 pOKAHd/hErpoPXi5Ec7r01+pE9aMOZUEK7ebXSM7Hqnj4btcX+1b5V3FMrAHPqEa
 LZpQmeJRtIJQZNRhjjgQx0zd2oJ2OXz0gc+Dap2fj1aSIfGkwIIo0hXvNSoETH3J
 Swsr4v+FynCGCWyNsYFEmWMEe9EWOrElvSNk+Hek4gdQc9lByo9JXxSfOxcoaYLG
 vebV5kog6NDP3DgmnfmPzAnGjpGwMhjNrMQ9UR1Ul3BUOyvZ4nljdzr4tNYVnF0R
 ydtiEOf5IjHW2Jq2Ww==
 =UmrW
 -----END PGP SIGNATURE-----

Merge tag 'uml-for-linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux

Pull UML updates from Richard Weinberger:

 - Drop 32-bit checksum implementation and re-use it from arch/x86

 - String function cleanup

 - Fixes for -Wmissing-variable-declarations and -Wmissing-prototypes
   builds

* tag 'uml-for-linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
  um: virt-pci: fix missing declaration warning
  um: Refactor deprecated strncpy to memcpy
  um: fix 3 instances of -Wmissing-prototypes
  um: port_kern: fix -Wmissing-variable-declarations
  uml: audio: fix -Wmissing-variable-declarations
  um: vector: refactor deprecated strncpy
  um: use obj-y to descend into arch/um/*/
  um: Hard-code the result of 'uname -s'
  um: Use the x86 checksum implementation on 32-bit
  asm-generic: current: Don't include thread-info.h if building asm
  um: Remove unsued extern declaration ldt_host_info()
  um: Fix hostaudio build errors
  um: Remove strlcpy usage
2023-09-04 11:32:21 -07:00
Linus Torvalds
0b90c5637d hyperv-next for v6.6
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmT0EE8THHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXg5FCACGJ6n2ikhtRHAENHIVY/mTh+HbhO07
 ERzjADfqKF43u1Nt9cslgT4MioqwLjQsAu/A0YcJgVxVSOtg7dnbDmurRAjrGT/3
 iKqcVvnaiwSV44TkF8evpeMttZSOg29ImmpyQjoZJJvDMfpxleEK53nuKB9EsjKL
 Mz/0gSPoNc79bWF+85cVhgPnGIh9nBarxHqVsuWjMhc+UFhzjf9mOtk34qqPfJ1Q
 4RsKGEjkVkeXoG6nGd6Gl/+8WoTpenOZQLchhInocY+k9FlAzW1Kr+ICLDx+Topw
 8OJ6fv2rMDOejT9aOaA3/imf7LMer0xSUKb6N0sqQAQX8KzwcOYyKtQJ
 =rC/v
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed-20230902' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv updates from Wei Liu:

 - Support for SEV-SNP guests on Hyper-V (Tianyu Lan)

 - Support for TDX guests on Hyper-V (Dexuan Cui)

 - Use SBRM API in Hyper-V balloon driver (Mitchell Levy)

 - Avoid dereferencing ACPI root object handle in VMBus driver (Maciej
   Szmigiero)

 - A few misecllaneous fixes (Jiapeng Chong, Nathan Chancellor, Saurabh
   Sengar)

* tag 'hyperv-next-signed-20230902' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (24 commits)
  x86/hyperv: Remove duplicate include
  x86/hyperv: Move the code in ivm.c around to avoid unnecessary ifdef's
  x86/hyperv: Remove hv_isolation_type_en_snp
  x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM with the paravisor
  Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the paravisor
  x86/hyperv: Introduce a global variable hyperv_paravisor_present
  Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM
  x86/hyperv: Fix serial console interrupts for fully enlightened TDX guests
  Drivers: hv: vmbus: Support fully enlightened TDX guests
  x86/hyperv: Support hypercalls for fully enlightened TDX guests
  x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests
  x86/hyperv: Fix undefined reference to isolation_type_en_snp without CONFIG_HYPERV
  x86/hyperv: Add missing 'inline' to hv_snp_boot_ap() stub
  hv: hyperv.h: Replace one-element array with flexible-array member
  Drivers: hv: vmbus: Don't dereference ACPI root object handle
  x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES
  x86/hyperv: Add smp support for SEV-SNP guest
  clocksource: hyper-v: Mark hyperv tsc page unencrypted in sev-snp enlightened guest
  x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest
  drivers: hv: Mark percpu hvcall input arg page unencrypted in SEV-SNP enlightened guest
  ...
2023-09-04 11:26:29 -07:00
Linus Torvalds
e4f1b8202f virtio: features
a small pull request this time around, mostly because the
 vduse network got postponed to next relase so we can be sure
 we got the security store right.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmT1BMAPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpYJUH+QHNhfn0JC/yE1IySwDwpmdgr73aaGik1LgV
 ObHi48ucRMtxB+QpXLjPWAlQhVVzZv1wBK+Up9QxW8e9USJrSeI/MWfoHtXOFnGe
 1JdmNr+XQM/uDngZ+mjI4ZUwRkA61iOcTR7gEDdfBUOr+Yl6R7Na/+kKtTDiDMfy
 O8bOCLYVyJNiny2eSMmXH0mb4oPplkne4PzW4i/+ssKNoHlBmUIcx0jqj/qUVpSR
 ozr0SpyhlXKSEQGAtNxwR4PONeMDOOdkRBhxHW5N5QgnP9P7HQ57Ar39Vz7+Kc0i
 6vO2g1gpYV1naQr9BCg8hIF9r68rjgi4IOSghmfpWWUL0yNURtU=
 =z/Df
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "A small pull request this time around, mostly because the vduse
  network got postponed to next relase so we can be sure we got the
  security store right"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio_ring: fix avail_wrap_counter in virtqueue_add_packed
  virtio_vdpa: build affinity masks conditionally
  virtio_net: merge dma operations when filling mergeable buffers
  virtio_ring: introduce dma sync api for virtqueue
  virtio_ring: introduce dma map api for virtqueue
  virtio_ring: introduce virtqueue_reset()
  virtio_ring: separate the logic of reset/enable from virtqueue_resize
  virtio_ring: correct the expression of the description of virtqueue_resize()
  virtio_ring: skip unmap for premapped
  virtio_ring: introduce virtqueue_dma_dev()
  virtio_ring: support add premapped buf
  virtio_ring: introduce virtqueue_set_dma_premapped()
  virtio_ring: put mapping error check in vring_map_one_sg
  virtio_ring: check use_dma_api before unmap desc for indirect
  vdpa_sim: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK
  vdpa: add get_backend_features vdpa operation
  vdpa: accept VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature
  vdpa: add VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK flag
  vdpa/mlx5: Remove unused function declarations
2023-09-04 10:43:44 -07:00
Linus Torvalds
5c5e0e8120 Three cleanup patches, no behavior changes.
tomoyo: remove unused function declaration
 tomoyo: refactor deprecated strncpy
 tomoyo: add format attributes to functions
 
  security/tomoyo/common.c |    1 +
  security/tomoyo/common.h |    6 ++----
  security/tomoyo/domain.c |    5 ++---
  3 files changed, 5 insertions(+), 7 deletions(-)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJk9FxlAAoJEEJfEo0MZPUqVKUP/REOtWjsfI9xZUCA3c9Hx3xs
 Y6f5sqNLE5hv0/xDAh56Z438Ikk+TJ5+K5rouK6WqBAgaJS1JpmUZ/fgpAKUWYbU
 AmRUSdxPJX1Ujepyc44tY2pGny7c4BK1Wt8FJ47WW3bh/HJ50b76L7iNvx0B2UCj
 sf0JmUE34C7LvlSaQMlbfJKHCaOVKL6CLhLBKuKt+eGeOfiqOuB49Wu17tezk99Y
 1ct6c+Pr5aGZt7PsArMaRYvY9z/1kywFla5O/kMBisJmset6cJQg8IoRtdw0Uhfw
 Egi/hwvpO102Q/SEqk7IYw7cOkXAl5t9bwz9PCJTE8laEB52vSO6n6b0BaZEM9jA
 V3U0fV/p+Xfca5WkOc+6wd6NDu6vCnx9q9baiOZRBiXjkK7wZidzaeuZnXzpRpAo
 MLGSv59VkABqLBQH3ZOYh9icFoZ/o+jp3XP4zAP7/dx6Sp5g+IdlYuM/UpyO8S5U
 u2LX6UTukoAbTOpWt2m2mMQCTzoJI0LN7n2wur4+ukiYKuzddZzYfWPROAbyqrbS
 utW8KcKnDIHpWy1V3M3W9izqhZAYn4EBPAso1xyhCsHLN+KyuqJCYhjrKFAepJjF
 iKJWzycG1J3gKpw3hhAxUGcCmtRVjduFpam19kkNEro667hCUDLOx274jGY6LCgu
 bjy6eMVvQTliXj6E0+dW
 =2qhs
 -----END PGP SIGNATURE-----

Merge tag 'tomoyo-pr-20230903' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1

Pull tomoyo updates from Tetsuo Handa:
 "Three cleanup patches, no behavior changes"

* tag 'tomoyo-pr-20230903' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1:
  tomoyo: remove unused function declaration
  tomoyo: refactor deprecated strncpy
  tomoyo: add format attributes to functions
2023-09-04 10:38:35 -07:00
Yuan Yao
1acfe2c122 virtio_ring: fix avail_wrap_counter in virtqueue_add_packed
In current packed virtqueue implementation, the avail_wrap_counter won't
flip, in the case when the driver supplies a descriptor chain with a
length equals to the queue size; total_sg == vq->packed.vring.num.

Let’s assume the following situation:
vq->packed.vring.num=4
vq->packed.next_avail_idx: 1
vq->packed.avail_wrap_counter: 0

Then the driver adds a descriptor chain containing 4 descriptors.

We expect the following result with avail_wrap_counter flipped:
vq->packed.next_avail_idx: 1
vq->packed.avail_wrap_counter: 1

But, the current implementation gives the following result:
vq->packed.next_avail_idx: 1
vq->packed.avail_wrap_counter: 0

To reproduce the bug, you can set a packed queue size as small as
possible, so that the driver is more likely to provide a descriptor
chain with a length equal to the packed queue size. For example, in
qemu run following commands:
sudo qemu-system-x86_64 \
-enable-kvm \
-nographic \
-kernel "path/to/kernel_image" \
-m 1G \
-drive file="path/to/rootfs",if=none,id=disk \
-device virtio-blk,drive=disk \
-drive file="path/to/disk_image",if=none,id=rwdisk \
-device virtio-blk,drive=rwdisk,packed=on,queue-size=4,\
indirect_desc=off \
-append "console=ttyS0 root=/dev/vda rw init=/bin/bash"

Inside the VM, create a directory and mount the rwdisk device on it. The
rwdisk will hang and mount operation will not complete.

This commit fixes the wrap counter error by flipping the
packed.avail_wrap_counter, when start of descriptor chain equals to the
end of descriptor chain (head == i).

Fixes: 1ce9e6055f ("virtio_ring: introduce packed ring support")
Signed-off-by: Yuan Yao <yuanyaogoog@chromium.org>
Message-Id: <20230808051110.3492693-1-yuanyaogoog@chromium.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03 18:10:24 -04:00
Jason Wang
ae15aceaa9 virtio_vdpa: build affinity masks conditionally
We try to build affinity mask via create_affinity_masks()
unconditionally which may lead several issues:

- the affinity mask is not used for parent without affinity support
  (only VDUSE support the affinity now)
- the logic of create_affinity_masks() might not work for devices
  other than block. For example it's not rare in the networking device
  where the number of queues could exceed the number of CPUs. Such
  case breaks the current affinity logic which is based on
  group_cpus_evenly() who assumes the number of CPUs are not less than
  the number of groups. This can trigger a warning[1]:

	if (ret >= 0)
		WARN_ON(nr_present + nr_others < numgrps);

Fixing this by only build the affinity masks only when

- Driver passes affinity descriptor, driver like virtio-blk can make
  sure to limit the number of queues when it exceeds the number of CPUs
- Parent support affinity setting config ops

This help to avoid the warning. More optimizations could be done on
top.

[1]
[  682.146655] WARNING: CPU: 6 PID: 1550 at lib/group_cpus.c:400 group_cpus_evenly+0x1aa/0x1c0
[  682.146668] CPU: 6 PID: 1550 Comm: vdpa Not tainted 6.5.0-rc5jason+ #79
[  682.146671] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[  682.146673] RIP: 0010:group_cpus_evenly+0x1aa/0x1c0
[  682.146676] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc e8 1b c4 74 ff 48 89 ef e8 13 ac 98 ff 4c 89 e7 45 31 e4 e8 08 ac 98 ff eb c2 <0f> 0b eb b6 e8 fd 05 c3 00 45 31 e4 eb e5 cc cc cc cc cc cc cc cc
[  682.146679] RSP: 0018:ffffc9000215f498 EFLAGS: 00010293
[  682.146682] RAX: 000000000001f1e0 RBX: 0000000000000041 RCX: 0000000000000000
[  682.146684] RDX: ffff888109922058 RSI: 0000000000000041 RDI: 0000000000000030
[  682.146686] RBP: ffff888109922058 R08: ffffc9000215f498 R09: ffffc9000215f4a0
[  682.146687] R10: 00000000000198d0 R11: 0000000000000030 R12: ffff888107e02800
[  682.146689] R13: 0000000000000030 R14: 0000000000000030 R15: 0000000000000041
[  682.146692] FS:  00007fef52315740(0000) GS:ffff888237380000(0000) knlGS:0000000000000000
[  682.146695] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  682.146696] CR2: 00007fef52509000 CR3: 0000000110dbc004 CR4: 0000000000370ee0
[  682.146698] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  682.146700] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  682.146701] Call Trace:
[  682.146703]  <TASK>
[  682.146705]  ? __warn+0x7b/0x130
[  682.146709]  ? group_cpus_evenly+0x1aa/0x1c0
[  682.146712]  ? report_bug+0x1c8/0x1e0
[  682.146717]  ? handle_bug+0x3c/0x70
[  682.146721]  ? exc_invalid_op+0x14/0x70
[  682.146723]  ? asm_exc_invalid_op+0x16/0x20
[  682.146727]  ? group_cpus_evenly+0x1aa/0x1c0
[  682.146729]  ? group_cpus_evenly+0x15c/0x1c0
[  682.146731]  create_affinity_masks+0xaf/0x1a0
[  682.146735]  virtio_vdpa_find_vqs+0x83/0x1d0
[  682.146738]  ? __pfx_default_calc_sets+0x10/0x10
[  682.146742]  virtnet_find_vqs+0x1f0/0x370
[  682.146747]  virtnet_probe+0x501/0xcd0
[  682.146749]  ? vp_modern_get_status+0x12/0x20
[  682.146751]  ? get_cap_addr.isra.0+0x10/0xc0
[  682.146754]  virtio_dev_probe+0x1af/0x260
[  682.146759]  really_probe+0x1a5/0x410

Fixes: 3dad56823b ("virtio-vdpa: Support interrupt affinity spreading mechanism")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230811091539.1359865-1-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03 18:10:24 -04:00
Xuan Zhuo
295525e29a virtio_net: merge dma operations when filling mergeable buffers
Currently, the virtio core will perform a dma operation for each
buffer. Although, the same page may be operated multiple times.

This patch, the driver does the dma operation and manages the dma
address based the feature premapped of virtio core.

This way, we can perform only one dma operation for the pages of the
alloc frag. This is beneficial for the iommu device.

kernel command line: intel_iommu=on iommu.passthrough=0

       |  strict=0  | strict=1
Before |  775496pps | 428614pps
After  | 1109316pps | 742853pps

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20230810123057.43407-13-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03 18:10:24 -04:00
Xuan Zhuo
8bd2f71054 virtio_ring: introduce dma sync api for virtqueue
These API has been introduced:

* virtqueue_dma_need_sync
* virtqueue_dma_sync_single_range_for_cpu
* virtqueue_dma_sync_single_range_for_device

These APIs can be used together with the premapped mechanism to sync the
DMA address.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20230810123057.43407-12-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03 18:10:23 -04:00
Xuan Zhuo
b6253b4e21 virtio_ring: introduce dma map api for virtqueue
Added virtqueue_dma_map_api* to map DMA addresses for virtual memory in
advance. The purpose is to keep memory mapped across multiple add/get
buf operations.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20230810123057.43407-11-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03 18:10:23 -04:00
Xuan Zhuo
ba3e0c47c0 virtio_ring: introduce virtqueue_reset()
Introduce virtqueue_reset() to release all buffer inside vq.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230810123057.43407-10-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03 18:10:23 -04:00
Xuan Zhuo
ad48d53b5b virtio_ring: separate the logic of reset/enable from virtqueue_resize
The subsequent reset function will reuse these logic.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230810123057.43407-9-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03 18:10:23 -04:00
Xuan Zhuo
4d09f24080 virtio_ring: correct the expression of the description of virtqueue_resize()
Modify the "useless" to a more accurate "unused".

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230810123057.43407-8-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03 18:10:23 -04:00
Xuan Zhuo
b319940f83 virtio_ring: skip unmap for premapped
Now we add a case where we skip dma unmap, the vq->premapped is true.

We can't just rely on use_dma_api to determine whether to skip the dma
operation. For convenience, I introduced the "do_unmap". By default, it
is the same as use_dma_api. If the driver is configured with premapped,
then do_unmap is false.

So as long as do_unmap is false, for addr of desc, we should skip dma
unmap operation.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20230810123057.43407-7-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03 18:10:23 -04:00
Xuan Zhuo
2df6475907 virtio_ring: introduce virtqueue_dma_dev()
Added virtqueue_dma_dev() to get DMA device for virtio. Then the
caller can do dma operation in advance. The purpose is to keep memory
mapped across multiple add/get buf operations.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230810123057.43407-6-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03 18:10:23 -04:00
Xuan Zhuo
d7344a2f71 virtio_ring: support add premapped buf
If the vq is the premapped mode, use the sg_dma_address() directly.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20230810123057.43407-5-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03 18:10:23 -04:00
Xuan Zhuo
8daafe9ebb virtio_ring: introduce virtqueue_set_dma_premapped()
This helper allows the driver change the dma mode to premapped mode.
Under the premapped mode, the virtio core do not do dma mapping
internally.

This just work when the use_dma_api is true. If the use_dma_api is false,
the dma options is not through the DMA APIs, that is not the standard
way of the linux kernel.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20230810123057.43407-4-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03 18:10:22 -04:00
Xuan Zhuo
0e27fa6dde virtio_ring: put mapping error check in vring_map_one_sg
This patch put the dma addr error check in vring_map_one_sg().

The benefits of doing this:

1. reduce one judgment of vq->use_dma_api.
2. make vring_map_one_sg more simple, without calling
   vring_mapping_error to check the return value. simplifies subsequent
   code

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230810123057.43407-3-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03 18:10:22 -04:00
Xuan Zhuo
610c708bf8 virtio_ring: check use_dma_api before unmap desc for indirect
Inside detach_buf_split(), if use_dma_api is false,
vring_unmap_one_split_indirect will be called many times, but actually
nothing is done. So this patch check use_dma_api firstly.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230810123057.43407-2-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03 18:10:22 -04:00
Eugenio Pérez
2c9c637116 vdpa_sim: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK
Start offering the feature in the simulator.  Other parent drivers can
follow this code to offer it too.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Message-Id: <20230609092127.170673-5-eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03 18:10:22 -04:00
Eugenio Pérez
b63e5c70c3 vdpa: add get_backend_features vdpa operation
This operation allow vdpa parent to expose its own backend feature bits.

Next patches introduce a feature not compatible with all parent drivers:
the ability to enable vq after driver_ok.  Each parent must declare if
it allows it or not.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Message-Id: <20230609092127.170673-4-eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03 18:10:22 -04:00
Eugenio Pérez
9f09fd6171 vdpa: accept VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature
Accepting VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature if
userland sets it.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Message-Id: <20230609092127.170673-3-eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03 18:10:22 -04:00
Eugenio Pérez
8b59b4da9b vdpa: add VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK flag
This feature flag allows the driver enabling virtqueues both before and
after DRIVER_OK.

This is needed for software assisted live migration, so userland can
restore the device status in devices with control virtqueue before the
dataplane is enabled.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Message-Id: <20230609092127.170673-2-eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03 18:10:22 -04:00
Yue Haibing
c1081002bf vdpa/mlx5: Remove unused function declarations
Commit 29064bfdab ("vdpa/mlx5: Add support library for mlx5 VDPA implementation")
declared but never implemented these.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Message-Id: <20230803143041.23388-1-yuehaibing@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03 18:10:22 -04:00
Linus Torvalds
708283abf8 dmaengine updates for v6.6
New support:
  - Qualcomm SM6115 and QCM2290 dmaengine support
  - at_xdma support for microchip,sam9x7 controller
 
  Updates:
  - idxd updates for wq simplification and ats knob updates
  - fsl edma updates for v3 support
  - Xilinx AXI4-Stream control support
  - Yaml conversion for bcm dma binding
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmT0xHEACgkQfBQHDyUj
 g0cUNw/+IN6UpnQyQ61D6Ljv1IjjmmhJS7hVVnkQvAzlFdsRdAw9Ez5jJB2qz7kU
 dAcFKDPTVGZaLYSeQFkmbHeRF8Sk8rCOBHTWMMM085LMttATVhRjth3m2gj4Jp0Y
 XOaJ/TWXGM9DAtojhOoEV+xtY8KdASkwrS7DiPYrwx8u/BTA3p8fa6v3ggyi/ttf
 PoPXfmXR7PupZM5CLDaPMDBW5aIveLbTSsXGlixRrWbccI2dH9RG/l5KKUPFq2Se
 LEIYtsa4ZK8OOb+WpUaZjJ5C7JgXMm0ZCXK9n8tqcdMjxjPAdOoy0lasaVLS0yw9
 BiL4D5sFkUYAOf/9QTY3CSlzv7sjuSCBCGzppV5dYdliKCMx/qv34zNyBnWyhhgA
 oW6h0cWQU/kcX9AUptYn9bHWDb4/Cykd+g9fUlCa1En8DL4lMRkKojHGKieD0mWw
 A9p8W5p8K//g0FOjfPvqNhFP3iCkVCXMys1/mLZvKzofX6PEJc7lPPetY/YLYdBQ
 g02lb6X4FP6MIyPlAzrmudGnCkYuUWGzXE2K4Rs/uTJtfTEkVwICMEQrnAszSl3o
 4hYVjcDfk+5/4guFr4gHRqMq58E+cVv+zV70r2ttnF79lX6sTJ/Uvr7cJa1qVhB9
 0mPLf7GXoKVanrX0rH0w/67Lo7wrQVdd/94BmHyd0kd8lUh63FM=
 =LPWk
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine

Pull dmaengine updates from Vinod Koul:
 "New controller support and updates to drivers.

  New support:
   - Qualcomm SM6115 and QCM2290 dmaengine support
   - at_xdma support for microchip,sam9x7 controller

  Updates:
   - idxd updates for wq simplification and ats knob updates
   - fsl edma updates for v3 support
   - Xilinx AXI4-Stream control support
   - Yaml conversion for bcm dma binding"

* tag 'dmaengine-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (53 commits)
  dmaengine: fsl-edma: integrate v3 support
  dt-bindings: fsl-dma: fsl-edma: add edma3 compatible string
  dmaengine: fsl-edma: move tcd into struct fsl_dma_chan
  dmaengine: fsl-edma: refactor chan_name setup and safety
  dmaengine: fsl-edma: move clearing of register interrupt into setup_irq function
  dmaengine: fsl-edma: refactor using devm_clk_get_enabled
  dmaengine: fsl-edma: simply ATTR_DSIZE and ATTR_SSIZE by using ffs()
  dmaengine: fsl-edma: move common IRQ handler to common.c
  dmaengine: fsl-edma: Remove enum edma_version
  dmaengine: fsl-edma: transition from bool fields to bitmask flags in drvdata
  dmaengine: fsl-edma: clean up EXPORT_SYMBOL_GPL in fsl-edma-common.c
  dmaengine: fsl-edma: fix build error when arch is s390
  dmaengine: idxd: Fix issues with PRS disable sysfs knob
  dmaengine: idxd: Allow ATS disable update only for configurable devices
  dmaengine: xilinx_dma: Program interrupt delay timeout
  dmaengine: xilinx_dma: Use tasklet_hi_schedule for timing critical usecase
  dmaengine: xilinx_dma: Freeup active list based on descriptor completion bit
  dmaengine: xilinx_dma: Increase AXI DMA transaction segment count
  dmaengine: xilinx_dma: Pass AXI4-Stream control words to dma client
  dt-bindings: dmaengine: xilinx_dma: Add xlnx,irq-delay property
  ...
2023-09-03 10:49:42 -07:00
Linus Torvalds
db906f0ca6 phy-for-6.6
- New Support
   - Starfive dphy rx, JH7110 usb and pcie support
   - Rockchip rv1126 inno-dsi phy, rk3588 usb and pcie support
   - Qualcomm sa8775p PCIe support, M31 USB PHY driver
   - Samsung Exynos850 usb support
 
 - Updates
   - Mediatek dsi driver clock  updates
   - Qualcomm sm8150 combo phy with reworking of qmp pcie driver
   - Xilinx zynqmp runtime PM support
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmT0wcIACgkQfBQHDyUj
 g0c8VxAAymIO5fySiW3VF+7yZ+RBClMJPtq+Q1vWgCbz7L7MUN9cRIzBkV59ix5C
 bRJy6a+2IJZZ7mEP535asaEMxt08ypWft1z46eu2mueWhJbWg4GQKzO1hokkzvO3
 QvLb5oZd5YhWepyeDrQX5KjlUJoAMjdj00PB+XG5oRdMr5tjBzAUBrQMBSyUNBTs
 2Yg/HRFREP8cCx7baQ/PCT93bejmHiImIGxesl/1zznYRjzCW8h63eCILvnX4I29
 t6jsn33FfFUEggMGFcNPjTWuhDdNcrfqkRq7FhAOKPWIBOThAzjMNDYpoQOEV2K0
 FDidINQQ2oHL8yPTa8oW7wkaH6ntSM8c7Qac/xlKzuzww2mix74w6n/661gQIebW
 Z6RHjcaDPBGK9mgHqbdcdFGuDmEgsX/AenLxpBgkOeWO+vNGnOUEJELrFr4p0iIt
 UpKloDfP/gS06ay56Cbk4A+RJ2eIBl9t74TC/oGk2+fueuZdnCSZIzxkA/6L78AQ
 dwPx0QwORQU5K1zou6l3eb8mD05I5FK8uSaiIoTMvTg7IyvVAtla5+vBoEjZEqw2
 o50qry4+VUE7KqH8fPPvL0SQ6yaPQ55QHP6k8nkyhGS1YtbJUqhaT/UZEJiyQiHj
 0evPeunWeOG2lC8E59XXyipE6wsQuG2zdFN0JEIJqAuLqzXm/XA=
 =3Wtl
 -----END PGP SIGNATURE-----

Merge tag 'phy-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy

Pull phy updates from Vinod Koul:
 "As usual a couple of new drivers, a bunch of new device support and
  few updates to existing drivers

  New Support:
   - Starfive dphy rx, JH7110 usb and pcie support
   - Rockchip rv1126 inno-dsi phy, rk3588 usb and pcie support
   - Qualcomm sa8775p PCIe support, M31 USB PHY driver
   - Samsung Exynos850 usb support

  Updates:
   - Mediatek dsi driver clock updates
   - Qualcomm sm8150 combo phy with reworking of qmp pcie driver
   - Xilinx zynqmp runtime PM support"

* tag 'phy-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (83 commits)
  phy: exynos5-usbdrd: Add Exynos850 support
  phy: exynos5-usbdrd: Add 26MHz ref clk support
  phy: exynos5-usbdrd: Make it possible to pass custom phy ops
  dt-bindings: phy: samsung,usb3-drd-phy: Add Exynos850 support
  phy: qcom-qmp-combo: fix clock probing
  phy: qcom-qmp-pcie: support SM8150 PCIe QMP PHYs
  phy: qcom-qmp-pcie: populate offsets configuration
  phy: qcom-qmp-pcie: simplify clock handling
  phy: qcom-qmp-pcie: keep offset tables sorted
  phy: qcom-qmp-pcie: drop ln_shrd from v5_20 config
  dt-bindings: phy: qcom,qmp-pcie: describe SM8150 PCIe PHYs
  dt-bindings: phy: migrate QMP PCIe PHY bindings to qcom,sc8280xp-qmp-pcie-phy.yaml
  phy: fsl-imx8mq-usb: add dev_err_probe if getting vbus failed
  phy: qcom: Introduce M31 USB PHY driver
  dt-bindings: phy: qcom,m31: Document qcom,m31 USB phy
  phy: rockchip: inno-dsidphy: Add rv1126 support
  dt-bindings: phy: rockchip-inno-dsidphy: Document rv1126
  dt-bindings: phy: mediatek,tphy: allow simple nodename pattern
  phy: amlogic: meson-g12a-usb2: fix Wvoid-pointer-to-enum-cast warning
  phy: marvell pxa-usb: fix Wvoid-pointer-to-enum-cast warning
  ...
2023-09-03 10:38:02 -07:00
Linus Torvalds
6e32dfcccf soundwire updates for 6.6
- Core support for soundwire device number allocation
  - intel driver updates for adding hw_params for DAI ops, hybrid number
    allocation and power managemnt callback updates
  - DT header include changes for subsystem
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmT0vxoACgkQfBQHDyUj
 g0f/Lg/8CHWdjQzNJVOoJIqY+mFjIKT+QtyRZSpdaQIFYcQF4xcd6v6/rEJzLjgg
 bMlJ+Icwe19sIq2Yhq69kcecCD5/Arx3tqo2WAwcUF5NykFVQnO/FO+qZYdNM8A9
 ohI8+1ZM1qyrC0YocpiUUINOek0MVeLf4wBFFiJMZRVNNMpCH7LrOcs+vH8bLKYn
 i03dXc/sbM0QyBnfsjnu02wbaq84pFOlbkLJYn21cFoiqryMAXjNXRIpuJ35T5BX
 iXgHHhVX4vWgOszLM83BhKQnSCemNb4y24ablm9gtVBMpNg2yAyHL0qpAk9nFhGn
 vVZIup8OVpfziW6O48BBA7HWMBA0fRL8Iz80oH8ZoAAfRl32QGKal37jUaOL7ycO
 +kMNbJq85v0NDRnUiHeUAY/X5gnnRZpbCLzWxpp3ohx8W0mQctQRFoArhITV7O0Q
 38RNfe89Fq6f9poz56tj+d8d/zLx0VzLqPjiuP89HeeOboIjkGtNRfU1z/V9Wh2G
 8Fr77rwSAo+wghlFpHE8S9gkyLvMiVeEdNEHskQod7ZvAD2YrfHPLENOW7L0rpGw
 Eurl+bby1hqgsvA6gQqUQt2xeumeLYelttZYcmMGhPLo7RTYbTckwh6CO2iomNL/
 noCtG/mNbR78pOKIab4VWoLcOzFfJ3qCpdOLzYBQYphxJo2oj50=
 =4w8N
 -----END PGP SIGNATURE-----

Merge tag 'soundwire-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire

Pull soundwire updates from Vinod Koul:
 "Device numbering and intel driver changes are main features:

   - Core support for soundwire device number allocation

   - intel driver updates for adding hw_params for DAI ops, hybrid
     number allocation and power managemnt callback updates

   - DT header include changes for subsystem"

* tag 'soundwire-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
  soundwire: intel_ace2x: add DAI hw_params/prepare/hw_free callbacks
  soundwire: intel_auxdevice: add hybrid IDA-based device_number allocation
  soundwire: bus: add callbacks for device_number allocation
  soundwire: extend parameters of new_peripheral_assigned() callback
  soundWire: intel_auxdevice: resume 'sdw-master' on startup and system resume
  soundwire: intel_auxdevice: enable pm_runtime earlier on startup
  soundwire: Explicitly include correct DT includes
2023-09-03 10:20:57 -07:00
Linus Torvalds
bac8a20fa3 Core MTD changes:
* Use refcount to prevent corruption
 * Call external _get and _put in right order
 * Fix use-after-free in mtd release
 * Explicitly include correct DT includes
 * Clean refcounting with MTD_PARTITIONED_MASTER
 * mtdblock: make warning messages ratelimited
 * dt-bindings: Add SEAMA partition bindings
 
 MTD device driver changes:
 * spear_smi: Use helper function devm_clk_get_enabled()
 * maps: fix -Wvoid-pointer-to-enum-cast warning
 * docg3: Remove unnecessary (void*) conversions
 * physmap-core, spear_smi, st_spi_fsm, lpddr2_nvm, lantiq-flash, plat-ram:
   - Use devm_platform_get_and_ioremap_resource()
 
 Raw NAND core changes:
 * Fix -Wvoid-pointer-to-enum-cast warning
 * Export 'nand_exit_status_op()'
 * dt-bindings: Fix nand-controller.yaml license
 
 Raw NAND controller driver changes:
 * Omap, Omap2, Samsung, Atmel, fsl_upm, lpc32xx_slc, lpc32xx_mlc, STM32_FMC2,
   sh_ftlctl, MXC, Sunxi:
   - Use devm_platform_get_and_ioremap_resource()
 * Orion, vf610_nfc, Sunxi, STM32_FMC2, MTK, mpc5121, lpc32xx_slc, Intel,
   FSMC, Arasan:
   - Use helper function devm_clk_get_optional_enabled()
 * Brcmnand:
   - Use devm_platform_ioremap_resource_byname()
   - Propagate init error -EPROBE_DEFER up
   - Propagate error and simplify ternary operators
   - Fix mtd oobsize
   - Fix potential out-of-bounds access in oob write
   - Fix crash during the panic_write
   - Fix potential false time out warning
   - Fix ECC level field setting for v7.2 controller
 * fsmc: Handle clk prepare error in fsmc_nand_resume()
 * Marvell: Add support for AC5 SoC
 * Meson:
   - Support for 512B ECC step size
   - Fix build error
   - Use NAND core API to check status
   - dt-bindings:
     * Make ECC properties dependent
     * Support for 512B ECC step size
     * Drop unneeded quotes
 * Oxnas: Remove driver and bindings
 * Qcom:
   - Conversion to ->exec_op()
   - Removal of the legacy interface
   - Two full series of improvements/misc fixes
     * Use the BIT() macro
     * Use u8 instead of uint8_t
     * Fix alignment with open parenthesis
     * Fix the spacing
     * Fix wrong indentation
     * Fix a typo
     * Early structure initialization
     * Fix address parsing within ->exec_op()
     * Remove superfluous initialization of "ret"
     * Rename variables in qcom_op_cmd_mapping()
     * Handle unsupported opcode in qcom_op_cmd_mapping()
     * Fix the opcode check in qcom_check_op()
     * Use EOPNOTSUPP instead of ENOTSUPP
     * Wrap qcom_nand_exec_op() to 80 columns
     * Unmap sg_list and free desc within submic_descs()
     * Simplify the call to nand_prog_page_end_op()
     * Do not override the error no of submit_descs()
     * Sort includes alphabetically
     * Clear buf_count and buf_start in raw read
     * Add read/read_start ops in exec_op path
 * vf610_nfc: Do not check 0 for platform_get_irq()
 
 SPI NAND manufacturer driver changes:
 * gigadevice: Add support for GD5F1GQ{4,5}RExxH
 * esmt: Add support for F50D2G41KA
 * toshiba: Add support for T{C,H}58NYG{0,2}S3HBAI4 and TH58NYG3S0HBAI6
 
 SPI NOR core changes:
 * fix assumption on enabling quad mode in
   spi_nor_write_16bit_sr_and_check()
 * avoid setting SRWD bit in SR if WP# signal not connected as it will
   configure the SR permanently as read only. Add "no-wp" dt property.
 * clarify the need for spi-nor compatibles in dt-bindings
 
 SPI NOR manufacturer driver changes:
 * Spansion:
   - Add support for S28HS02GT
   - Switch methods to use vreg_offset from SFDP instead of hardcoding
     the register value
 * Microchip/SST:
   - Add support for sst26vf032b flash
 * Winbond:
   - Correct flags for Winbond w25q128
 * NXP spifi:
   - Use helper function devm_clk_get_enabled()
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEE9HuaYnbmDhq/XIDIJWrqGEe9VoQFAmTstY0ACgkQJWrqGEe9
 VoRpeggAmiUPLVEJRosvtOAaT+en2YTDiVZrRmQ8hjekjRc4FfY6C7DPNWNua3zx
 SaVqLEF7ScjnKH1YYwXN3XG3j4+1NPRV/VmR89yD6NVOcLs8BEJk/Ooc6LQrHAAf
 E87jVafbPLWq8MkcVcnHbdijgHVh2onMbUQtkqjFSn6WAolSmZFJotocfKT12uuY
 K9Hn5TLjRiH5e7O1rQnBcATMXjHIA1o0G1RCklm+T1MojNXIO1KN8yMYRjUoGbEJ
 afFdwczNiTFgL4MJ3qL6NhqhSGC6V6QsUcsYvEjmComepAuZBP2wGnuQMHOxKqYV
 Tl93LW8FOdyWHdCSgJdYkctoRPU6KQ==
 =uMXQ
 -----END PGP SIGNATURE-----

Merge tag 'mtd/for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux

Pull MTD updates from Miquel Raynal:
 "Core MTD changes:
   - Use refcount to prevent corruption
   - Call external _get and _put in right order
   - Fix use-after-free in mtd release
   - Explicitly include correct DT includes
   - Clean refcounting with MTD_PARTITIONED_MASTER
   - mtdblock: make warning messages ratelimited
   - dt-bindings: Add SEAMA partition bindings

  Device driver changes:
   - Use devm helper functions
   - Fix questionable cast, remove pointless ones.
   - error handling fixes
   - add support for new chip versions
   - update DT bindings
   - misc cleanups - fix typos, whitespace, indentation"

* tag 'mtd/for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (105 commits)
  dt-bindings: mtd: amlogic,meson-nand: drop unneeded quotes
  mtd: spear_smi: Use helper function devm_clk_get_enabled()
  mtd: rawnand: orion: Use helper function devm_clk_get_optional_enabled()
  mtd: rawnand: vf610_nfc: Use helper function devm_clk_get_enabled()
  mtd: rawnand: sunxi: Use helper function devm_clk_get_enabled()
  mtd: rawnand: stm32_fmc2: Use helper function devm_clk_get_enabled()
  mtd: rawnand: mtk: Use helper function devm_clk_get_enabled()
  mtd: rawnand: mpc5121: Use helper function devm_clk_get_enabled()
  mtd: rawnand: lpc32xx_slc: Use helper function devm_clk_get_enabled()
  mtd: rawnand: intel: Use helper function devm_clk_get_enabled()
  mtd: rawnand: fsmc: Use helper function devm_clk_get_enabled()
  mtd: rawnand: arasan: Use helper function devm_clk_get_enabled()
  mtd: rawnand: qcom: Add read/read_start ops in exec_op path
  mtd: rawnand: qcom: Clear buf_count and buf_start in raw read
  mtd: maps: fix -Wvoid-pointer-to-enum-cast warning
  mtd: rawnand: fix -Wvoid-pointer-to-enum-cast warning
  mtd: rawnand: fsmc: handle clk prepare error in fsmc_nand_resume()
  mtd: rawnand: Propagate error and simplify ternary operators for brcmstb_nand_wait_for_completion()
  mtd: rawnand: qcom: Sort includes alphabetically
  mtd: rawnand: qcom: Do not override the error no of submit_descs()
  ...
2023-09-03 09:59:53 -07:00
Linus Torvalds
92901222f8 f2fs update for 6.6-rc1
In this cycle, we don't have a highlighted feature enhancement, but mostly
 have fixed issues mainly in two parts: 1) zoned block device, 2) compression
 support. For zoned block device, we've tried to improve the power-off recovery
 flow as much as possible. For compression, we found some corner cases caused by
 wrong compression policy and logics. Other than them, there were some reverts
 and stat corrections.
 
 Bug fix:
  - use finish zone command when closing a zone
  - check zone type before sending async reset zone command
  - fix to assign compress_level for lz4 correctly
  - fix error path of f2fs_submit_page_read()
  - don't {,de}compress non-full cluster
  - send small discard commands during checkpoint back
  - flush inode if atomic file is aborted
  - correct to account gc/cp stats
 
 And, there are minor bug fixes, avoiding false lockdep warning, and clean-ups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmTyemoACgkQQBSofoJI
 UNLLKQ//bupYGPOqAgKbd/s7FhtULMiiRmFVy7W2eoMIc/oeeXOGrzDAF/1NifLC
 WLV4uBNVTS4PS8D1vRzxZNEZt9aqPS0vQ8hxW/3nTI9Z425NX3nz7gLSxxmwIkIe
 xj++V6tvKPcCH0BfKvfFCtcxj09PsflgdEuT8w/sIkH6p4o+VEMFs1Lc9PQsjUmh
 epznK7JGBwpAxmHqI74n1eAw2CI6W+oKx23YDTNMBD6hmXTU0fkTeBURrOlSsUHZ
 nhafPecsrCEI+OpAj03G/7e/zt+iTUKdmHx9O5ir/P00vF/c+SU2vSwB97FiHqBi
 B4UmocTM0MAsU80PQcmE6aU3zgQFI0Yun5yZ24VeWjKTu76ssZSmT2HA/4RL+LLf
 AeAW4FSyfh76pls8X5IWfilxGLWq6kTzSZA0MF7dH2q7qlj5apL5wKpm/XH6POqn
 qELY/Y9+P1QuCcNL8BiYrgA5xBqVJ7Uw/6/6U3Y77PElc+Pwl3vI8UZ7uCOBrsXL
 e0TLXy23AJA6AS2DyLLziy669nXAZRb95B8TWMfEeVZIMFvCeeqYc74N8jOFa0T8
 q6uQFZs+0cETLZA8MSZdlNhzvhJmbW6wgSIz++CEdikWSLBZMKWxBVjCPkkCY9uc
 DMh8zruSVbYPZWBTcxkMFEBJKKrU43++e7pb8ZoqTj4Pq1317b0=
 =Qa8+
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6-6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this cycle, we don't have a highlighted feature enhancement, but
  mostly have fixed issues mainly in two parts: 1) zoned block device,
  and 2) compression support.

  For zoned block device, we've tried to improve the power-off recovery
  flow as much as possible. For compression, we found some corner cases
  caused by wrong compression policy and logics. Other than them, there
  were some reverts and stat corrections.

  Bug fixes:
   - use finish zone command when closing a zone
   - check zone type before sending async reset zone command
   - fix to assign compress_level for lz4 correctly
   - fix error path of f2fs_submit_page_read()
   - don't {,de}compress non-full cluster
   - send small discard commands during checkpoint back
   - flush inode if atomic file is aborted
   - correct to account gc/cp stats

  And, there are minor bug fixes, avoiding false lockdep warning, and
  clean-ups"

* tag 'f2fs-for-6-6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (25 commits)
  f2fs: use finish zone command when closing a zone
  f2fs: compress: fix to assign compress_level for lz4 correctly
  f2fs: fix error path of f2fs_submit_page_read()
  f2fs: clean up error handling in sanity_check_{compress_,}inode()
  f2fs: avoid false alarm of circular locking
  Revert "f2fs: do not issue small discard commands during checkpoint"
  f2fs: doc: fix description of max_small_discards
  f2fs: should update REQ_TIME for direct write
  f2fs: fix to account cp stats correctly
  f2fs: fix to account gc stats correctly
  f2fs: remove unneeded check condition in __f2fs_setxattr()
  f2fs: fix to update i_ctime in __f2fs_setxattr()
  Revert "f2fs: fix to do sanity check on extent cache correctly"
  f2fs: increase usage of folio_next_index() helper
  f2fs: Only lfs mode is allowed with zoned block device feature
  f2fs: check zone type before sending async reset zone command
  f2fs: compress: don't {,de}compress non-full cluster
  f2fs: allow f2fs_ioc_{,de}compress_file to be interrupted
  f2fs: don't reopen the main block device in f2fs_scan_devices
  f2fs: fix to avoid mmap vs set_compress_option case
  ...
2023-09-02 15:37:59 -07:00