42529 Commits

Author SHA1 Message Date
Peter Xu
c2da319c2e mm/uffd: sanity check write bit for uffd-wp protected ptes
Let's add one sanity check for CONFIG_DEBUG_VM on the write bit in
whatever chance we have when walking through the pgtables.  It can bring
the error earlier even before the app notices the data was corrupted on
the snapshot.  Also it helps us to identify this is a wrong pgtable setup,
so hopefully a great information to have for debugging too.

Link: https://lkml.kernel.org/r/20221114000447.1681003-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 15:58:55 -08:00
Andrew Morton
a38358c934 Merge branch 'mm-hotfixes-stable' into mm-stable 2022-11-30 14:58:42 -08:00
Juergen Gross
4aaf269c76 mm: introduce arch_has_hw_nonleaf_pmd_young()
When running as a Xen PV guests commit eed9a328aa1a ("mm: x86: add
CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG") can cause a protection violation in
pmdp_test_and_clear_young():

 BUG: unable to handle page fault for address: ffff8880083374d0
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0003) - permissions violation
 PGD 3026067 P4D 3026067 PUD 3027067 PMD 7fee5067 PTE 8010000008337065
 Oops: 0003 [#1] PREEMPT SMP NOPTI
 CPU: 7 PID: 158 Comm: kswapd0 Not tainted 6.1.0-rc5-20221118-doflr+ #1
 RIP: e030:pmdp_test_and_clear_young+0x25/0x40

This happens because the Xen hypervisor can't emulate direct writes to
page table entries other than PTEs.

This can easily be fixed by introducing arch_has_hw_nonleaf_pmd_young()
similar to arch_has_hw_pte_young() and test that instead of
CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG.

Link: https://lkml.kernel.org/r/20221123064510.16225-1-jgross@suse.com
Fixes: eed9a328aa1a ("mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Yu Zhao <yuzhao@google.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: David Hildenbrand <david@redhat.com>	[core changes]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 14:49:41 -08:00
Juergen Gross
6617da8fb5 mm: add dummy pmd_young() for architectures not having it
In order to avoid #ifdeffery add a dummy pmd_young() implementation as a
fallback.  This is required for the later patch "mm: introduce
arch_has_hw_nonleaf_pmd_young()".

Link: https://lkml.kernel.org/r/fd3ac3cd-7349-6bbd-890a-71a9454ca0b3@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 14:49:41 -08:00
Kefeng Wang
4f20566f5c x86/sgx: use VM_ACCESS_FLAGS
Simplify VM_READ|VM_WRITE|VM_EXEC with VM_ACCESS_FLAGS.

Link: https://lkml.kernel.org/r/20221019034945.93081-3-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@gmail.com>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-08 17:37:19 -08:00
Kefeng Wang
e025ab842e mm: remove kern_addr_valid() completely
Most architectures (except arm64/x86/sparc) simply return 1 for
kern_addr_valid(), which is only used in read_kcore(), and it calls
copy_from_kernel_nofault() which could check whether the address is a
valid kernel address.  So as there is no need for kern_addr_valid(), let's
remove it.

Link: https://lkml.kernel.org/r/20221018074014.185687-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Acked-by: Heiko Carstens <hca@linux.ibm.com>		[s390]
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Helge Deller <deller@gmx.de>			[parisc]
Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
Acked-by: Guo Ren <guoren@kernel.org>			[csky]
Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: <aou@eecs.berkeley.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Chris Zankel <chris@zankel.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Xuerui Wang <kernel@xen0n.name>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-08 17:37:18 -08:00
Naoya Horiguchi
1fdbed657a arch/x86/mm/hugetlbpage.c: pud_huge() returns 0 when using 2-level paging
The following bug is reported to be triggered when starting X on x86-32
system with i915:

  [  225.777375] kernel BUG at mm/memory.c:2664!
  [  225.777391] invalid opcode: 0000 [#1] PREEMPT SMP
  [  225.777405] CPU: 0 PID: 2402 Comm: Xorg Not tainted 6.1.0-rc3-bdg+ #86
  [  225.777415] Hardware name:  /8I865G775-G, BIOS F1 08/29/2006
  [  225.777421] EIP: __apply_to_page_range+0x24d/0x31c
  [  225.777437] Code: ff ff 8b 55 e8 8b 45 cc e8 0a 11 ec ff 89 d8 83 c4 28 5b 5e 5f 5d c3 81 7d e0 a0 ef 96 c1 74 ad 8b 45 d0 e8 2d 83 49 00 eb a3 <0f> 0b 25 00 f0 ff ff 81 eb 00 00 00 40 01 c3 8b 45 ec 8b 00 e8 76
  [  225.777446] EAX: 00000001 EBX: c53a3b58 ECX: b5c00000 EDX: c258aa00
  [  225.777454] ESI: b5c00000 EDI: b5900000 EBP: c4b0fdb4 ESP: c4b0fd80
  [  225.777462] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010202
  [  225.777470] CR0: 80050033 CR2: b5900000 CR3: 053a3000 CR4: 000006d0
  [  225.777479] Call Trace:
  [  225.777486]  ? i915_memcpy_init_early+0x63/0x63 [i915]
  [  225.777684]  apply_to_page_range+0x21/0x27
  [  225.777694]  ? i915_memcpy_init_early+0x63/0x63 [i915]
  [  225.777870]  remap_io_mapping+0x49/0x75 [i915]
  [  225.778046]  ? i915_memcpy_init_early+0x63/0x63 [i915]
  [  225.778220]  ? mutex_unlock+0xb/0xd
  [  225.778231]  ? i915_vma_pin_fence+0x6d/0xf7 [i915]
  [  225.778420]  vm_fault_gtt+0x2a9/0x8f1 [i915]
  [  225.778644]  ? lock_is_held_type+0x56/0xe7
  [  225.778655]  ? lock_is_held_type+0x7a/0xe7
  [  225.778663]  ? 0xc1000000
  [  225.778670]  __do_fault+0x21/0x6a
  [  225.778679]  handle_mm_fault+0x708/0xb21
  [  225.778686]  ? mt_find+0x21e/0x5ae
  [  225.778696]  exc_page_fault+0x185/0x705
  [  225.778704]  ? doublefault_shim+0x127/0x127
  [  225.778715]  handle_exception+0x130/0x130
  [  225.778723] EIP: 0xb700468a

Recently pud_huge() got aware of non-present entry by commit 3a194f3f8ad0
("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present
pud entry") to handle some special states of gigantic page.  However, it's
overlooked that pud_none() always returns false when running with 2-level
paging, and as a result pud_huge() can return true pointlessly.

Introduce "#if CONFIG_PGTABLE_LEVELS > 2" to pud_huge() to deal with this.

Link: https://lkml.kernel.org/r/20221107021010.2449306-1-naoya.horiguchi@linux.dev
Fixes: 3a194f3f8ad0 ("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-08 15:57:25 -08:00
Alexander Potapenko
ba54d194f8 x86/traps: avoid KMSAN bugs originating from handle_bug()
There is a case in exc_invalid_op handler that is executed outside the
irqentry_enter()/irqentry_exit() region when an UD2 instruction is used to
encode a call to __warn().

In that case the `struct pt_regs` passed to the interrupt handler is never
unpoisoned by KMSAN (this is normally done in irqentry_enter()), which
leads to false positives inside handle_bug().

Use kmsan_unpoison_entry_regs() to explicitly unpoison those registers
before using them.

Link: https://lkml.kernel.org/r/20221102110611.1085175-5-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Marco Elver <elver@google.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-08 15:57:24 -08:00
Alexander Potapenko
11385b2612 x86/uaccess: instrument copy_from_user_nmi()
Make sure usercopy hooks from linux/instrumented.h are invoked for
copy_from_user_nmi().  This fixes KMSAN false positives reported when
dumping opcodes for a stack trace.

Link: https://lkml.kernel.org/r/20221102110611.1085175-2-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Marco Elver <elver@google.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-08 15:57:24 -08:00
Linus Torvalds
727ea09e99 - Add Cooper Lake's stepping to the PEBS guest/host events isolation
fixed microcode revisions checking quirk
 
 - Update Icelake and Sapphire Rapids events constraints
 
 - Use the standard energy unit for Sapphire Rapids in RAPL
 
 - Fix the hw_breakpoint test to fail more graciously on !SMP configs
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmNnr/4ACgkQEsHwGGHe
 VUrFRg//dyB0lnQcdvIaPd7DWn3WGop+MeZv0NZI7uYk+SqjtJ3yJ/c4ktcaIgJV
 MhTk8Q/gxHvuT+MZarC/f1QYtTqzRQ//rKD2aO/l9Gr813Hu4R0z2AEwrNKDmzyd
 BYy3O5GXGeBAiLxtmKZ2bDlS5z8a9L3dlbLCWqjq6iGIVncljWmEDmNQmA3YPury
 v8f+V8EqfSE4iWcpnNsZOdrmkMkXEzA8X5vRswQ9l2y6qMmnEeUk9Hn9mFlG+QK4
 VDyxkQEB+vZVfWL2UjD3dpEaH5LVyfCQBwOaVdFfHhMmLhoTO2VmRMLza3Qd9ejZ
 RIE1hlRibqGMqyHDTjZvnkPgnz4QQqayDf8UIIwVdaMVdIaZmxcIQwfsbQS12E5b
 9EBzbaD6TJx42E56WuQHM+ZYt6nz0ktPz0IeBFJIwbU30gqJwdi0uIz2kXNpkthC
 eX4Bq/iM9C41A58mj9+uerF9jshi/DJU74KcMGUZiJ7IeGDJgL9CfViOTueMOjr2
 OI8nvLOtwBpj8X3AO1nEVkevSt4KPoTD+NVCNpXmjVm9DNFvMRo2EUsRHHrCkLJN
 EO7iF14rTlSI7IAE+qxNgRsmXPCyuVBhB3S3/3YmCqsH1kQXqlgxT/2eOJN6kCGz
 tlaWnD3TEaifH/DQQVGmv9nNFjS0C49MSxrZ7Oe7phnmSn3vaGY=
 =midC
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Add Cooper Lake's stepping to the PEBS guest/host events isolation
   fixed microcode revisions checking quirk

 - Update Icelake and Sapphire Rapids events constraints

 - Use the standard energy unit for Sapphire Rapids in RAPL

 - Fix the hw_breakpoint test to fail more graciously on !SMP configs

* tag 'perf_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Add Cooper Lake stepping to isolation_ucodes[]
  perf/x86/intel: Fix pebs event constraints for SPR
  perf/x86/intel: Fix pebs event constraints for ICL
  perf/x86/rapl: Use standard Energy Unit for SPR Dram RAPL domain
  perf/hw_breakpoint: test: Skip the test if dependencies unmet
2022-11-06 12:41:32 -08:00
Linus Torvalds
f6f5204727 - Add new Intel CPU models
- Enforce that TDX guests are successfully loaded only on TDX hardware
 where virtualization exception (#VE) delivery on kernel memory is
 disabled because handling those in all possible cases is "essentially
 impossible"
 
 - Add the proper include to the syscall wrappers so that BTF can see the
 real pt_regs definition and not only the forward declaration
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmNnrUgACgkQEsHwGGHe
 VUoC2w//T6+5SlusY9uYIUpL/cGYj+888b/ysO0H0S37IVATUiI5m0eFAA+pcWON
 pzn81oBqk1Lstm7x/jT2mzxsZ2fIFbe6EA8hnLAexA4KY70oGhall9Q6O363CmFa
 DtUjd0LKjH6GkNH1RUcb5icJGVY3vZPCfSuxlYJUD66NBUx2pEF8l5hzZ0W20Yhq
 cHVY0i1HoCNNDRBOODrH7MEY/kWMSvhFybCYOfRMhoVd3aJhsLlq+7/7Ic5wabyy
 2mE8b0GU8or9mluU51OiCDjp+qnpB+BTFjV+88ji5jNEKLIarAXkoHDDD06xLhOK
 a2L44zZ55RAFxxCBm9L10OE0ta3kUqpq+YKQkh0gGGdDdAylUp8IF0zXRl/6jRDC
 T76jM1QOvC791HWD6kDf5XizY+PeaVD9LzAREezG6778mZbNNQwOtkECHZF0U3UP
 n/NIabDlZIncuQQbT0sSshrIyfwtkH5E+epcyLuuchYUYnDGkvNkVU31ndiwFhUG
 fW8I53XBnIlk5PunJ0jhaq4+Tugr7APipUs75y8IpFEINj6gxuoSdXyezlQVpmQ+
 tL1UXqxSlQaCoW295Fr19p3ZBBfqRKXSCS/toCluB/ekhP3ISzIZV7/cB1smmsIR
 JpgXQtcAMtXjIv9A1ZexQVlp2srk7Y6WrFocMNc47lKxmHZ78KY=
 =nqZp
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Add new Intel CPU models

 - Enforce that TDX guests are successfully loaded only on TDX hardware
   where virtualization exception (#VE) delivery on kernel memory is
   disabled because handling those in all possible cases is "essentially
   impossible"

 - Add the proper include to the syscall wrappers so that BTF can see
   the real pt_regs definition and not only the forward declaration

* tag 'x86_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add several Intel server CPU model numbers
  x86/tdx: Panic on bad configs that #VE on "private" memory access
  x86/tdx: Prepare for using "INFO" call for a second purpose
  x86/syscall: Include asm/ptrace.h in syscall_wrapper header
2022-11-06 12:36:47 -08:00
Linus Torvalds
089d1c3122 ARM:
* Fix the pKVM stage-1 walker erronously using the stage-2 accessor
 
 * Correctly convert vcpu->kvm to a hyp pointer when generating
   an exception in a nVHE+MTE configuration
 
 * Check that KVM_CAP_DIRTY_LOG_* are valid before enabling them
 
 * Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE
 
 * Document the boot requirements for FGT when entering the kernel
   at EL1
 
 x86:
 
 * Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit()
 
 * Make argument order consistent for kvcalloc()
 
 * Userspace API fixes for DEBUGCTL and LBRs
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmNncNEUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOKJQf9HhmONhrKaLQ1Ycp5R5qbwbj4zKZR
 3f78NxGaauG9MUHP96tSPWRSgLNQi36yUKI9FOFwfw/qsp79B+9KWkuqzWkYgXqj
 CagwjTtCbQsLzQvDrvBt8Zrw7IQPtGFBFQjwQfyxRipEQBHndJpip0oYr8hoze5O
 xICLmFsjMDtiHOjLwUhHJhaAh/qAg4xaoC6LsV855vkkqxd9Bhrj4z8QkcdUnjlt
 mrP2u/4iAQGubH+3YnAqdWFQUMYxmd0WsIUw3RTzdZJWei6mLjDaA+B3jAIUiXnv
 6UKrwlL56yQzUQxOt/v+d6J76FTDvjiqmUhgy7pINasJBoB5+xG4sJhOIA==
 =Gqfw
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"ARM:

   - Fix the pKVM stage-1 walker erronously using the stage-2 accessor

   - Correctly convert vcpu->kvm to a hyp pointer when generating an
     exception in a nVHE+MTE configuration

   - Check that KVM_CAP_DIRTY_LOG_* are valid before enabling them

   - Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE

   - Document the boot requirements for FGT when entering the kernel at
     EL1

  x86:

   - Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit()

   - Make argument order consistent for kvcalloc()

   - Userspace API fixes for DEBUGCTL and LBRs"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Fix a typo about the usage of kvcalloc()
  KVM: x86: Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit()
  KVM: VMX: Ignore guest CPUID for host userspace writes to DEBUGCTL
  KVM: VMX: Fold vmx_supported_debugctl() into vcpu_supported_debugctl()
  KVM: VMX: Advertise PMU LBRs if and only if perf supports LBRs
  arm64: booting: Document our requirements for fine grained traps with SME
  KVM: arm64: Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE
  KVM: Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them
  KVM: arm64: Fix bad dereference on MTE-enabled systems
  KVM: arm64: Use correct accessor to parse stage-1 PTEs
2022-11-06 10:46:59 -08:00
Linus Torvalds
6e8c78d32b xen: branch for v6.1-rc4
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCY2dMgQAKCRCAXGG7T9hj
 vtsjAQCajqsnrz+uzySSDRNJDUNPkh9x2vgVQFBwaQMJWSJBXgD+LbwYlCNPTg1R
 E5IzcY5bxMK/bFEkTOpJQ3wacVA0wA4=
 =64Hm
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "One fix for silencing a smatch warning, and a small cleanup patch"

* tag 'for-linus-6.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: simplify sysenter and syscall setup
  x86/xen: silence smatch warning in pmu_msr_chk_emulated()
2022-11-06 10:42:29 -08:00
Paolo Bonzini
1462014966 Merge branch 'kvm-master' into HEAD
x86:
* Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit()

* Make argument order consistent for kvcalloc()

* Userspace API fixes for DEBUGCTL and LBRs
2022-11-06 03:30:38 -05:00
Tony Luck
7beade0dd4 x86/cpu: Add several Intel server CPU model numbers
These servers are all on the public versions of the roadmap. The model
numbers for Grand Ridge, Granite Rapids, and Sierra Forest were included
in the September 2022 edition of the Instruction Set Extensions document.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20221103203310.5058-1-tony.luck@intel.com
2022-11-04 21:12:22 +01:00
Liao Chang
8670866b23 KVM: x86: Fix a typo about the usage of kvcalloc()
Swap the 1st and 2nd arguments to be consistent with the usage of
kvcalloc().

Fixes: c9b8fecddb5b ("KVM: use kvcalloc for array allocations")
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Message-Id: <20221103011749.139262-1-liaochang1@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-03 09:39:29 -04:00
Ben Gardon
074c008007 KVM: x86: Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit()
kvm_zap_gfn_range() must be called in an SRCU read-critical section, but
there is no SRCU annotation in __kvm_set_or_clear_apicv_inhibit(). This
can lead to the following warning via
kvm_arch_vcpu_ioctl_set_guest_debug() if a Shadow MMU is in use (TDP
MMU disabled or nesting):

[ 1416.659809] =============================
[ 1416.659810] WARNING: suspicious RCU usage
[ 1416.659839] 6.1.0-dbg-DEV #1 Tainted: G S        I
[ 1416.659853] -----------------------------
[ 1416.659854] include/linux/kvm_host.h:954 suspicious rcu_dereference_check() usage!
[ 1416.659856]
...
[ 1416.659904]  dump_stack_lvl+0x84/0xaa
[ 1416.659910]  dump_stack+0x10/0x15
[ 1416.659913]  lockdep_rcu_suspicious+0x11e/0x130
[ 1416.659919]  kvm_zap_gfn_range+0x226/0x5e0
[ 1416.659926]  ? kvm_make_all_cpus_request_except+0x18b/0x1e0
[ 1416.659935]  __kvm_set_or_clear_apicv_inhibit+0xcc/0x100
[ 1416.659940]  kvm_arch_vcpu_ioctl_set_guest_debug+0x350/0x390
[ 1416.659946]  kvm_vcpu_ioctl+0x2fc/0x620
[ 1416.659955]  __se_sys_ioctl+0x77/0xc0
[ 1416.659962]  __x64_sys_ioctl+0x1d/0x20
[ 1416.659965]  do_syscall_64+0x3d/0x80
[ 1416.659969]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Always take the KVM SRCU read lock in __kvm_set_or_clear_apicv_inhibit()
to protect the GFN to memslot translation. The SRCU read lock is not
technically required when no Shadow MMUs are in use, since the TDP MMU
walks the paging structures from the roots and does not need to look up
GFN translations in the memslots, but make the SRCU locking
unconditional for simplicty.

In most cases, the SRCU locking is taken care of in the vCPU run loop,
but when called through other ioctls (such as KVM_SET_GUEST_DEBUG)
there is no srcu_read_lock.

Tested: ran tools/testing/selftests/kvm/x86_64/debug_regs on a DBG
	build. This patch causes the suspicious RCU warning to disappear.
	Note that the warning is hit in __kvm_zap_rmaps(), so
	kvm_memslots_have_rmaps() must return true in order for this to
	repro (i.e. the TDP MMU must be off or nesting in use.)

Reported-by: Greg Thelen <gthelen@google.com>
Fixes: 36222b117e36 ("KVM: x86: don't disable APICv memslot when inhibited")
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20221102205359.1260980-1-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-03 09:34:22 -04:00
Juergen Gross
4bff677b30 x86/xen: simplify sysenter and syscall setup
xen_enable_sysenter() and xen_enable_syscall() can be simplified a lot.

While at it, switch to use cpu_feature_enabled() instead of
boot_cpu_has().

Signed-off-by: Juergen Gross <jgross@suse.com>
2022-11-03 10:39:55 +01:00
Juergen Gross
354d8a4b16 x86/xen: silence smatch warning in pmu_msr_chk_emulated()
Commit 8714f7bcd3c2 ("xen/pv: add fault recovery control to pmu msr
accesses") introduced code resulting in a warning issued by the smatch
static checker, claiming to use an uninitialized variable.

This is a false positive, but work around the warning nevertheless.

Fixes: 8714f7bcd3c2 ("xen/pv: add fault recovery control to pmu msr accesses")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2022-11-03 10:23:26 +01:00
Sean Christopherson
b333b8ebb8 KVM: VMX: Ignore guest CPUID for host userspace writes to DEBUGCTL
Ignore guest CPUID for host userspace writes to the DEBUGCTL MSR, KVM's
ABI is that setting CPUID vs. state can be done in any order, i.e. KVM
allows userspace to stuff MSRs prior to setting the guest's CPUID that
makes the new MSR "legal".

Keep the vmx_get_perf_capabilities() check for guest writes, even though
it's technically unnecessary since the vCPU's PERF_CAPABILITIES is
consulted when refreshing LBR support.  A future patch will clean up
vmx_get_perf_capabilities() to avoid the RDMSR on every call, at which
point the paranoia will incur no meaningful overhead.

Note, prior to vmx_get_perf_capabilities() checking that the host fully
supports LBRs via x86_perf_get_lbr(), KVM effectively relied on
intel_pmu_lbr_is_enabled() to guard against host userspace enabling LBRs
on platforms without full support.

Fixes: c646236344e9 ("KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221006000314.73240-5-seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-02 13:18:44 -04:00
Sean Christopherson
18e897d213 KVM: VMX: Fold vmx_supported_debugctl() into vcpu_supported_debugctl()
Fold vmx_supported_debugctl() into vcpu_supported_debugctl(), its only
caller.  Setting bits only to clear them a few instructions later is
rather silly, and splitting the logic makes things seem more complicated
than they actually are.

Opportunistically drop DEBUGCTLMSR_LBR_MASK now that there's a single
reference to the pair of bits.  The extra layer of indirection provides
no meaningful value and makes it unnecessarily tedious to understand
what KVM is doing.

No functional change.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221006000314.73240-4-seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-02 13:18:18 -04:00
Sean Christopherson
145dfad998 KVM: VMX: Advertise PMU LBRs if and only if perf supports LBRs
Advertise LBR support to userspace via MSR_IA32_PERF_CAPABILITIES if and
only if perf fully supports LBRs.  Perf may disable LBRs (by zeroing the
number of LBRs) even on platforms the allegedly support LBRs, e.g. if
probing any LBR MSRs during setup fails.

Fixes: be635e34c284 ("KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES")
Reported-by: Like Xu <like.xu.linux@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221006000314.73240-3-seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-02 13:17:58 -04:00
Kan Liang
6f8faf4714 perf/x86/intel: Add Cooper Lake stepping to isolation_ucodes[]
The intel_pebs_isolation quirk checks both model number and stepping.
Cooper Lake has a different stepping (11) than the other Skylake Xeon.
It cannot benefit from the optimization in commit 9b545c04abd4f
("perf/x86/kvm: Avoid unnecessary work in guest filtering").

Add the stepping of Cooper Lake into the isolation_ucodes[] table.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20221031154550.571663-1-kan.liang@linux.intel.com
2022-11-02 12:22:07 +01:00
Kan Liang
0916886bb9 perf/x86/intel: Fix pebs event constraints for SPR
According to the latest event list, update the MEM_INST_RETIRED events
which support the DataLA facility for SPR.

Fixes: 61b985e3e775 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20221031154119.571386-2-kan.liang@linux.intel.com
2022-11-02 12:22:06 +01:00
Kan Liang
acc5568b90 perf/x86/intel: Fix pebs event constraints for ICL
According to the latest event list, update the MEM_INST_RETIRED events
which support the DataLA facility.

Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support")
Reported-by: Jannis Klinkenberg <jannis.klinkenberg@rwth-aachen.de>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20221031154119.571386-1-kan.liang@linux.intel.com
2022-11-02 12:22:06 +01:00
Zhang Rui
80275ca9e5 perf/x86/rapl: Use standard Energy Unit for SPR Dram RAPL domain
Intel Xeon servers used to use a fixed energy resolution (15.3uj) for
Dram RAPL domain. But on SPR, Dram RAPL domain follows the standard
energy resolution as described in MSR_RAPL_POWER_UNIT.

Remove the SPR Dram energy unit quirk.

Fixes: bcfd218b6679 ("perf/x86/rapl: Add support for Intel SPR platform")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Link: https://lkml.kernel.org/r/20220924054738.12076-3-rui.zhang@intel.com
2022-11-02 12:22:05 +01:00
Kirill A. Shutemov
373e715e31 x86/tdx: Panic on bad configs that #VE on "private" memory access
All normal kernel memory is "TDX private memory".  This includes
everything from kernel stacks to kernel text.  Handling
exceptions on arbitrary accesses to kernel memory is essentially
impossible because they can happen in horribly nasty places like
kernel entry/exit.  But, TDX hardware can theoretically _deliver_
a virtualization exception (#VE) on any access to private memory.

But, it's not as bad as it sounds.  TDX can be configured to never
deliver these exceptions on private memory with a "TD attribute"
called ATTR_SEPT_VE_DISABLE.  The guest has no way to *set* this
attribute, but it can check it.

Ensure ATTR_SEPT_VE_DISABLE is set in early boot.  panic() if it
is unset.  There is no sane way for Linux to run with this
attribute clear so a panic() is appropriate.

There's small window during boot before the check where kernel
has an early #VE handler. But the handler is only for port I/O
and will also panic() as soon as it sees any other #VE, such as
a one generated by a private memory access.

[ dhansen: Rewrite changelog and rebase on new tdx_parse_tdinfo().
	   Add Kirill's tested-by because I made changes since
	   he wrote this. ]

Fixes: 9a22bf6debbf ("x86/traps: Add #VE support for TDX guest")
Reported-by: ruogui.ygr@alibaba-inc.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20221028141220.29217-3-kirill.shutemov%40linux.intel.com
2022-11-01 16:02:40 -07:00
Linus Torvalds
f526d6a822 x86:
- fix lock initialization race in gfn-to-pfn cache (+selftests)
 
 - fix two refcounting errors
 
 - emulator fixes
 
 - mask off reserved bits in CPUID
 
 - fix bug with disabling SGX
 
 RISC-V:
 
 - update MAINTAINERS
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmNcYawUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPMzwf/Uh1lO2Op6IJ3/no2gPfShc9bdqgM
 LiCkzfJp9PQAuDl/hs44CiQHUlPEfeIsI/ns0euNj37TlnB3zKmm46mtiWhEefIH
 rwcm/ngKgw3283pZEf8FeMTDfNexOaBg2ZNoODR7JQsU50tbToY4TNE2nNRgbdL5
 SNmzOwox1rZIQHxEa2r/k2B/HdRbeCFUU82EjwFqaNzH1yhzBXMcokdSCmGCBMsE
 3xfCzQ7uMkXw/rlkkG0be65+5dTNmhfiKQYGAQe4s7PycVPMD79D2EhCfbpvbK7t
 EmgOXStmvtW6+ukqPATHbRVCDwW0VmiQv5IWOGbLB1Qdy5/REynJ5ObC8g==
 =Hvro
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "x86:

   - fix lock initialization race in gfn-to-pfn cache (+selftests)

   - fix two refcounting errors

   - emulator fixes

   - mask off reserved bits in CPUID

   - fix bug with disabling SGX

  RISC-V:

   - update MAINTAINERS"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/xen: Fix eventfd error handling in kvm_xen_eventfd_assign()
  KVM: x86: smm: number of GPRs in the SMRAM image depends on the image format
  KVM: x86: emulator: update the emulation mode after CR0 write
  KVM: x86: emulator: update the emulation mode after rsm
  KVM: x86: emulator: introduce emulator_recalc_and_set_mode
  KVM: x86: emulator: em_sysexit should update ctxt->mode
  KVM: selftests: Mark "guest_saw_irq" as volatile in xen_shinfo_test
  KVM: selftests: Add tests in xen_shinfo_test to detect lock races
  KVM: Reject attempts to consume or refresh inactive gfn_to_pfn_cache
  KVM: Initialize gfn_to_pfn_cache locks in dedicated helper
  KVM: VMX: fully disable SGX if SECONDARY_EXEC_ENCLS_EXITING unavailable
  KVM: x86: Exempt pending triple fault from event injection sanity check
  MAINTAINERS: git://github -> https://github.com for kvm-riscv
  KVM: debugfs: Return retval of simple_attr_open() if it fails
  KVM: x86: Reduce refcount if single_open() fails in kvm_mmu_rmaps_stat_open()
  KVM: x86: Mask off reserved bits in CPUID.8000001FH
  KVM: x86: Mask off reserved bits in CPUID.8000001AH
  KVM: x86: Mask off reserved bits in CPUID.80000008H
  KVM: x86: Mask off reserved bits in CPUID.80000006H
  KVM: x86: Mask off reserved bits in CPUID.80000001H
2022-11-01 12:28:52 -07:00
Dave Hansen
a6dd6f3900 x86/tdx: Prepare for using "INFO" call for a second purpose
The TDG.VP.INFO TDCALL provides the guest with various details about
the TDX system that the guest needs to run.  Only one field is currently
used: 'gpa_width' which tells the guest which PTE bits mark pages shared
or private.

A second field is now needed: the guest "TD attributes" to tell if
virtualization exceptions are configured in a way that can harm the guest.

Make the naming and calling convention more generic and discrete from the
mask-centric one.

Thanks to Sathya for the inspiration here, but there's no code, comments
or changelogs left from where he started.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: stable@vger.kernel.org
2022-11-01 10:07:15 -07:00
Linus Torvalds
434766058e - Rename a perf memory level event define to denote it is of CXL type
- Add Alder and Raptor Lakes support to RAPL
 
 - Make sure raw sample data is output with tracepoints
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmNeRnEACgkQEsHwGGHe
 VUpmTRAAmLvQhTN15L4qr6BSIUhlOk1xmM4pKtUXfpzX9Nki+bhPvH8sczaUXg1N
 90u6pD8+uOIFsd2s+bUVyR/h3cWnjpy9Or1oSYlNTTPxwlqC1XsLqsWjy7/AA91d
 YAUZNfmIsBNTUDtjygslnZ2yZIIPWXGI5utvrkS3W2cbfZtQhuDVTo5KAnx3+0fC
 inKfiO+lAEouNu9l/+GdqPhgiDVB+oK12ROMosAr9++Ewuf61Jnk0nVEynNVoGT0
 OLxbNT6xU3TlOm/n2zwmWnM95ZJ9sM5SEJg+c55VZ9biTAgayd+7Hw8H3CAqIhdD
 utFoxkQpblp7Lq6IporcfjpGISA4WdbaiJaMN56azucGcZsk6VXUzNk6AimXvqjP
 d8z7nVYDGDxYoIWyoSfO7XuIhqek38KRTEbl3qvyRZoF/FRjaWCvZeir9W32mRbx
 bVKPTQ8FgSUtkBLhGZrldHP8PRsw1nf60wJb19p8s5aWNMzimgUN0As0kf78k6l+
 fapTvhuU84EDVjiUS7BTrMq1r3ieaZiN2Ofi4EAG8c4R3S4C3hHKFQH3suIOp3vf
 UpCyYi+29LfdTgiuNX+efklUSu5T2EccJXke07CJQM5BBppvuPqeG7YJET5/YDuz
 tSPIbTZ9lxomeNWJSu9cyuynqPIS0f/j6FtpodA7MY1f/AX7OHM=
 =zS3P
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v6.1_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Rename a perf memory level event define to denote it is of CXL type

 - Add Alder and Raptor Lakes support to RAPL

 - Make sure raw sample data is output with tracepoints

* tag 'perf_urgent_for_v6.1_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL
  perf/x86/rapl: Add support for Intel Raptor Lake
  perf/x86/rapl: Add support for Intel AlderLake-N
  perf: Fix missing raw data on tracepoint events
2022-10-30 09:49:18 -07:00
Linus Torvalds
3c339dbd13 23 hotfixes.
Eight fix pre-6.0 bugs and the remainder address issues which were
 introduced in the 6.1-rc merge cycle, or address issues which aren't
 considered sufficiently serious to warrant a -stable backport.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY1w/LAAKCRDdBJ7gKXxA
 jovHAQDqY3TGAVQsvCBKdUqkp5nakZ7o7kK+mUGvsZ8Cgp5fwQD/Upsu93RZsTgm
 oJfYW4W6eSVEKPu7oAY20xVwLvK6iQ0=
 =z0Fn
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2022-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc hotfixes from Andrew Morton:
 "Eight fix pre-6.0 bugs and the remainder address issues which were
  introduced in the 6.1-rc merge cycle, or address issues which aren't
  considered sufficiently serious to warrant a -stable backport"

* tag 'mm-hotfixes-stable-2022-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits)
  mm: multi-gen LRU: move lru_gen_add_mm() out of IRQ-off region
  lib: maple_tree: remove unneeded initialization in mtree_range_walk()
  mmap: fix remap_file_pages() regression
  mm/shmem: ensure proper fallback if page faults
  mm/userfaultfd: replace kmap/kmap_atomic() with kmap_local_page()
  x86: fortify: kmsan: fix KMSAN fortify builds
  x86: asm: make sure __put_user_size() evaluates pointer once
  Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default
  x86/purgatory: disable KMSAN instrumentation
  mm: kmsan: export kmsan_copy_page_meta()
  mm: migrate: fix return value if all subpages of THPs are migrated successfully
  mm/uffd: fix vma check on userfault for wp
  mm: prep_compound_tail() clear page->private
  mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs
  mm/page_isolation: fix clang deadcode warning
  fs/ext4/super.c: remove unused `deprecated_msg'
  ipc/msg.c: fix percpu_counter use after free
  memory tier, sysfs: rename attribute "nodes" to "nodelist"
  MAINTAINERS: git://github.com -> https://github.com for nilfs2
  mm/kmemleak: prevent soft lockup in kmemleak_scan()'s object iteration loops
  ...
2022-10-29 17:49:33 -07:00
Alexander Potapenko
78a498c3a2 x86: fortify: kmsan: fix KMSAN fortify builds
Ensure that KMSAN builds replace memset/memcpy/memmove calls with the
respective __msan_XXX functions, and that none of the macros are redefined
twice.  This should allow building kernel with both CONFIG_KMSAN and
CONFIG_FORTIFY_SOURCE.

Link: https://lkml.kernel.org/r/20221024212144.2852069-5-glider@google.com
Link: https://github.com/google/kmsan/issues/89
Signed-off-by: Alexander Potapenko <glider@google.com>
Reported-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28 13:37:23 -07:00
Alexander Potapenko
59c8a02e24 x86: asm: make sure __put_user_size() evaluates pointer once
User access macros must ensure their arguments are evaluated only once if
they are used more than once in the macro body.  Adding
instrument_put_user() to __put_user_size() resulted in double evaluation
of the `ptr` argument, which led to correctness issues when performing
e.g.  unsafe_put_user(..., p++, ...).

To fix those issues, evaluate the `ptr` argument of __put_user_size() at
the beginning of the macro.

Link: https://lkml.kernel.org/r/20221024212144.2852069-4-glider@google.com
Fixes: 888f84a6da4d ("x86: asm: instrument usercopy in get_user() and put_user()")
Signed-off-by: Alexander Potapenko <glider@google.com>
Reported-by: youling257 <youling257@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28 13:37:23 -07:00
Alexander Potapenko
42855f588e x86/purgatory: disable KMSAN instrumentation
The stand-alone purgatory.ro does not contain the KMSAN runtime, therefore
it can't be built with KMSAN compiler instrumentation.

Link: https://lkml.kernel.org/r/20221024212144.2852069-2-glider@google.com
Link: https://github.com/google/kmsan/issues/89
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28 13:37:23 -07:00
Linus Torvalds
05c31d25cc This push fixes an alignment crash in x86/polyval.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmNSgrAACgkQxycdCkmx
 i6e9lhAAk2pXbkjVx05aeeNY9hD8WhveNCuEgUEmuX3nn4/0u3gdRNMLlyADV+RN
 kUEhFZsgxszB/65od378A/PA/fOHKzO8QlNzFgcaZbOm1TnzqxfG2q4CIBdm//mx
 5MdogwSg5uLDzEiUFJb94cPv2YizuE2+Zer0goDesWAm7eXEf3PMb7WfLjqdWiM/
 rHnfE74l/cro8RIH7wTgCCNDIuIAI3wOrXULohhhJbzbQ89prOKgy1h5mL/b8N4x
 4MVXd5tTPL2se5JqgofLuddGIB+B7fNCHRyWDO35cI+xh5o9G0K2DslYW6azKWyZ
 LJc86UGSV6I7kNSSyyfMDH6N0u1eXjvQiTaOZubO7ql38Ob6hRuCE2UG8lZlWVfQ
 2j71+QzhF0wrQxIBrQmlvdOKZiL4NsBxz9lr27tj8gdJlPWbgsLFssBQGB921uSA
 0M4vKDQoK95M+TsoJjWYNHcGb+Uu6iFOgOrzCND/0hd73rRi1TSTtc3Q4SnAUthz
 eCyBVXtuI1k3VCWoelUKOSoh1jILaAjKcXz+5dpIBZgtbm17f55eAW3L6xmOGl/u
 /VbHpFDPai8A1gbk3gzUac+KWe96+A2jM/j8R/wWikhrLMhtAvToT2cDrAg9v9wk
 UpuSflSSOQcL9CBv4eJzJEdX/49qqjY4jeJ+i/V5PZzC3QSYWX8=
 =0tfg
 -----END PGP SIGNATURE-----

Merge tag 'v6.1-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
 "Fix an alignment crash in x86/polyval"

* tag 'v6.1-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: x86/polyval - Fix crashes when keys are not 16-byte aligned
2022-10-28 09:53:30 -07:00
Eiichi Tsukata
7353633814 KVM: x86/xen: Fix eventfd error handling in kvm_xen_eventfd_assign()
Should not call eventfd_ctx_put() in case of error.

Fixes: 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
Reported-by: syzbot+6f0c896c5a9449a10ded@syzkaller.appspotmail.com
Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
Message-Id: <20221028092631.117438-1-eiichi.tsukata@nutanix.com>
[Introduce new goto target instead. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-28 06:47:26 -04:00
Maxim Levitsky
696db303e5 KVM: x86: smm: number of GPRs in the SMRAM image depends on the image format
On 64 bit host, if the guest doesn't have X86_FEATURE_LM, KVM will
access 16 gprs to 32-bit smram image, causing out-ouf-bound ram
access.

On 32 bit host, the rsm_load_state_64/enter_smm_save_state_64
is compiled out, thus access overflow can't happen.

Fixes: b443183a25ab61 ("KVM: x86: Reduce the number of emulator GPRs to '8' for 32-bit KVM")

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221025124741.228045-15-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-28 06:10:30 -04:00
Maxim Levitsky
ad8f9e6994 KVM: x86: emulator: update the emulation mode after CR0 write
Update the emulation mode when handling writes to CR0, because
toggling CR0.PE switches between Real and Protected Mode, and toggling
CR0.PG when EFER.LME=1 switches between Long and Protected Mode.

This is likely a benign bug because there is no writeback of state,
other than the RIP increment, and when toggling CR0.PE, the CPU has
to execute code from a very low memory address.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221025124741.228045-14-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-28 06:10:30 -04:00
Maxim Levitsky
055f37f84e KVM: x86: emulator: update the emulation mode after rsm
Update the emulation mode after RSM so that RIP will be correctly
written back, because the RSM instruction can switch the CPU mode from
32 bit (or less) to 64 bit.

This fixes a guest crash in case the #SMI is received while the guest
runs a code from an address > 32 bit.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221025124741.228045-13-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-28 06:10:29 -04:00
Maxim Levitsky
d087e0f79f KVM: x86: emulator: introduce emulator_recalc_and_set_mode
Some instructions update the cpu execution mode, which needs to update the
emulation mode.

Extract this code, and make assign_eip_far use it.

assign_eip_far now reads CS, instead of getting it via a parameter,
which is ok, because callers always assign CS to the same value
before calling this function.

No functional change is intended.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221025124741.228045-12-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-28 06:10:29 -04:00
Maxim Levitsky
5015bb89b5 KVM: x86: emulator: em_sysexit should update ctxt->mode
SYSEXIT is one of the instructions that can change the
processor mode, thus ctxt->mode should be updated after it.

Note that this is likely a benign bug, because the only problematic
mode change is from 32 bit to 64 bit which can lead to truncation of RIP,
and it is not possible to do with sysexit,
since sysexit running in 32 bit mode will be limited to 32 bit version.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221025124741.228045-11-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-28 06:10:28 -04:00
Michal Luczaj
52491a38b2 KVM: Initialize gfn_to_pfn_cache locks in dedicated helper
Move the gfn_to_pfn_cache lock initialization to another helper and
call the new helper during VM/vCPU creation.  There are race
conditions possible due to kvm_gfn_to_pfn_cache_init()'s
ability to re-initialize the cache's locks.

For example: a race between ioctl(KVM_XEN_HVM_EVTCHN_SEND) and
kvm_gfn_to_pfn_cache_init() leads to a corrupted shinfo gpc lock.

                (thread 1)                |           (thread 2)
                                          |
 kvm_xen_set_evtchn_fast                  |
  read_lock_irqsave(&gpc->lock, ...)      |
                                          | kvm_gfn_to_pfn_cache_init
                                          |  rwlock_init(&gpc->lock)
  read_unlock_irqrestore(&gpc->lock, ...) |

Rename "cache_init" and "cache_destroy" to activate+deactivate to
avoid implying that the cache really is destroyed/freed.

Note, there more races in the newly named kvm_gpc_activate() that will
be addressed separately.

Fixes: 982ed0de4753 ("KVM: Reinstate gfn_to_pfn_cache with invalidation support")
Cc: stable@vger.kernel.org
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
[sean: call out that this is a bug fix]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221013211234.1318131-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-27 06:47:53 -04:00
Emanuele Giuseppe Esposito
1c1a41497a KVM: VMX: fully disable SGX if SECONDARY_EXEC_ENCLS_EXITING unavailable
Clear enable_sgx if ENCLS-exiting is not supported, i.e. if SGX cannot be
virtualized.  When KVM is loaded, adjust_vmx_controls checks that the
bit is available before enabling the feature; however, other parts of the
code check enable_sgx and not clearing the variable caused two different
bugs, mostly affecting nested virtualization scenarios.

First, because enable_sgx remained true, SECONDARY_EXEC_ENCLS_EXITING
would be marked available in the capability MSR that are accessed by a
nested hypervisor.  KVM would then propagate the control from vmcs12
to vmcs02 even if it isn't supported by the processor, thus causing an
unexpected VM-Fail (exit code 0x7) in L1.

Second, vmx_set_cpu_caps() would not clear the SGX bits when hardware
support is unavailable.  This is a much less problematic bug as it only
happens if SGX is soft-disabled (available in the processor but hidden
in CPUID) or if SGX is supported for bare metal but not in the VMCS
(will never happen when running on bare metal, but can theoertically
happen when running in a VM).

Last but not least, this ensures that module params in sysfs reflect
KVM's actual configuration.

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=2127128
Fixes: 72add915fbd5 ("KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC")
Cc: stable@vger.kernel.org
Suggested-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025123749.2201649-1-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-27 05:47:27 -04:00
Sean Christopherson
dea0d5a2fd KVM: x86: Exempt pending triple fault from event injection sanity check
Exempt pending triple faults, a.k.a. KVM_REQ_TRIPLE_FAULT, when asserting
that KVM didn't attempt to queue a new exception during event injection.
KVM needs to emulate the injection itself when emulating Real Mode due to
lack of unrestricted guest support (VMX) and will queue a triple fault if
that emulation fails.

Ideally the assertion would more precisely filter out the emulated Real
Mode triple fault case, but rmode.vm86_active is buried in vcpu_vmx and
can't be queried without a new kvm_x86_ops.  And unlike "regular"
exceptions, triple fault cannot put the vCPU into an infinite loop; the
triple fault will force either an exit to userspace or a nested VM-Exit,
and triple fault after nested VM-Exit will force an exit to userspace.
I.e. there is no functional issue, so just suppress the warning for
triple faults.

Opportunistically convert the warning to a one-time thing, when it
fires, it fires _a lot_, and is usually user triggerable, i.e. can be
used to spam the kernel log.

Fixes: 7055fb113116 ("KVM: x86: Treat pending TRIPLE_FAULT requests as pending exceptions")
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/r/202209301338.aca913c3-yujie.liu@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220930230008.1636044-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-27 05:22:01 -04:00
Hou Wenlong
5aa0236677 KVM: x86: Reduce refcount if single_open() fails in kvm_mmu_rmaps_stat_open()
Refcount is increased before calling single_open() in
kvm_mmu_rmaps_stat_open(), If single_open() fails, refcount should be
restored, otherwise the vm couldn't be destroyed.

Fixes: 3bcd0662d66fd ("KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file")
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Message-Id: <a75900413bb8b1e556be690e9588a0f92e946a30.1665733883.git.houwenlong.hwl@antgroup.com>
[Preserved return value of single_open. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-27 04:41:54 -04:00
Jim Mattson
86c4f0d547 KVM: x86: Mask off reserved bits in CPUID.8000001FH
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM
actually supports. CPUID.8000001FH:EBX[31:16] are reserved bits and
should be masked off.

Fixes: 8765d75329a3 ("KVM: X86: Extend CPUID range to include new leaf")
Signed-off-by: Jim Mattson <jmattson@google.com>
Message-Id: <20220929225203.2234702-6-jmattson@google.com>
Cc: stable@vger.kernel.org
[Clear NumVMPL too. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-27 04:41:54 -04:00
Ravi Bangoria
cb6c18b5a4 perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL
PERF_MEM_LVLNUM_EXTN_MEM was introduced to cover CXL devices but it's
bit ambiguous name and also not generic enough to cover cxl.cache and
cxl.io devices. Rename it to PERF_MEM_LVLNUM_CXL to be more specific.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/f6268268-b4e9-9ed6-0453-65792644d953@amd.com
2022-10-27 10:27:32 +02:00
Zhang Rui
eff98a7421 perf/x86/rapl: Add support for Intel Raptor Lake
Raptor Lake RAPL support is the same as previous Sky Lake.
Add Raptor Lake model for RAPL.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Link: https://lkml.kernel.org/r/20221023125120.2727-2-rui.zhang@intel.com
2022-10-27 10:27:31 +02:00
Zhang Rui
1ab28f17ee perf/x86/rapl: Add support for Intel AlderLake-N
AlderLake-N RAPL support is the same as previous Sky Lake.
Add AlderLake-N model for RAPL.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Link: https://lkml.kernel.org/r/20221023125120.2727-1-rui.zhang@intel.com
2022-10-27 10:27:31 +02:00
Steven Rostedt (Google)
a970174d7a x86/mm: Do not verify W^X at boot up
Adding on the kernel command line "ftrace=function" triggered:

  CPA detected W^X violation: 8000000000000063 -> 0000000000000063 range: 0xffffffffc0013000 - 0xffffffffc0013fff PFN 10031b
  WARNING: CPU: 0 PID: 0 at arch/x86/mm/pat/set_memory.c:609
  verify_rwx+0x61/0x6d
  Call Trace:
     __change_page_attr_set_clr+0x146/0x8a6
     change_page_attr_set_clr+0x135/0x268
     change_page_attr_clear.constprop.0+0x16/0x1c
     set_memory_x+0x2c/0x32
     arch_ftrace_update_trampoline+0x218/0x2db
     ftrace_update_trampoline+0x16/0xa1
     __register_ftrace_function+0x93/0xb2
     ftrace_startup+0x21/0xf0
     register_ftrace_function_nolock+0x26/0x40
     register_ftrace_function+0x4e/0x143
     function_trace_init+0x7d/0xc3
     tracer_init+0x23/0x2c
     tracing_set_tracer+0x1d5/0x206
     register_tracer+0x1c0/0x1e4
     init_function_trace+0x90/0x96
     early_trace_init+0x25c/0x352
     start_kernel+0x424/0x6e4
     x86_64_start_reservations+0x24/0x2a
     x86_64_start_kernel+0x8c/0x95
     secondary_startup_64_no_verify+0xe0/0xeb

This is because at boot up, kernel text is writable, and there's no
reason to do tricks to updated it.  But the verifier does not
distinguish updates at boot up and at run time, and causes a warning at
time of boot.

Add a check for system_state == SYSTEM_BOOTING and allow it if that is
the case.

[ These SYSTEM_BOOTING special cases are all pretty horrid, but the x86
  text_poke() code does some odd things at bootup, forcing this for now
    - Linus ]

Link: https://lore.kernel.org/r/20221024112730.180916b3@gandalf.local.home
Fixes: 652c5bf380ad0 ("x86/mm: Refuse W^X violations")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-10-24 18:05:27 -07:00