3417 Commits

Author SHA1 Message Date
Oscar Salvador
8d400913c2 x86/vmemmap: handle unpopulated sub-pmd ranges
When sizeof(struct page) is not a power of 2, sections do not span a PMD
anymore and so when populating them some parts of the PMD will remain
unused.

Because of this, PMDs will be left behind when depopulating sections since
remove_pmd_table() thinks that those unused parts are still in use.

Fix this by marking the unused parts with PAGE_UNUSED, so memchr_inv()
will do the right thing and will let us free the PMD when the last user of
it is gone.

This patch is based on a similar patch by David Hildenbrand:

https://lore.kernel.org/linux-mm/20200722094558.9828-9-david@redhat.com/

[osalvador@suse.de: go back to the ifdef version]
  Link: https://lkml.kernel.org/r/YGy++mSft7K4u+88@localhost.localdomain

Link: https://lkml.kernel.org/r/20210309214050.4674-4-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:38 -07:00
Oscar Salvador
69ccfe74e1 x86/vmemmap: drop handling of 1GB vmemmap ranges
There is no code to allocate 1GB pages when mapping the vmemmap range as
this might waste some memory and requires more complexity which is not
really worth.

Drop the dead code both for the aligned and unaligned cases and leave only
the direct map handling.

Link: https://lkml.kernel.org/r/20210309214050.4674-3-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:38 -07:00
Oscar Salvador
8e2df191ae x86/vmemmap: drop handling of 4K unaligned vmemmap range
Patch series "Cleanup and fixups for vmemmap handling", v6.

This series contains cleanups to remove dead code that handles unaligned
cases for 4K and 1GB pages (patch#1 and patch#2) when removing the vemmmap
range, and a fix (patch#3) to handle the case when two vmemmap ranges
intersect the same PMD.

This patch (of 4):

remove_pte_table() is prepared to handle the case where either the start
or the end of the range is not PAGE aligned.  This cannot actually happen:

__populate_section_memmap enforces the range to be PMD aligned, so as long
as the size of the struct page remains multiple of 8, the vmemmap range
will be aligned to PAGE_SIZE.

Drop the dead code and place a VM_BUG_ON in vmemmap_{populate,free} to
catch nasty cases.  Note that the VM_BUG_ON is placed in there because
vmemmap_{populate,free= } is the gate of all removing and freeing page
tables logic.

Link: https://lkml.kernel.org/r/20210309214050.4674-1-osalvador@suse.de
Link: https://lkml.kernel.org/r/20210309214050.4674-2-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:38 -07:00
Linus Torvalds
635de956a7 The x86 MM changes in this cycle were:
- Implement concurrent TLB flushes, which overlaps the local TLB flush with the
    remote TLB flush. In testing this improved sysbench performance measurably by
    a couple of percentage points, especially if TLB-heavy security mitigations
    are active.
 
  - Further micro-optimizations to improve the performance of TLB flushes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmCKbNcRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hjYBAAsyNUa/gOu0g6/Cx8R86w9HtHHmm5vso/
 6nJjWj2fd2qJ9JShlddxvXEMeXtPTYabVWQkiiriFMuofk6JeKnlHm1Jzl6keABX
 OQFwjIFeNASPRcdXvuuYPOVWAJJdr2oL9QUr6OOK1ccQJTz/Cd0zA+VQ5YqcsCon
 yaWbkxELwKXpgql+qt66eAZ6Q2Y1TKXyrTW7ZgxQi0yeeWqMaEOub0/oyS7Ax1Rg
 qEJMwm1prb76NPzeqR/G3e4KTrDZfQ/B/KnSsz36GTJpl4eye6XqWDUgm1nAGNIc
 5dbc4Vx7JtZsUOuC0AmzWb3hsDyzVcN/lQvijdZ2RsYR3gvuYGaBhKqExqV0XH6P
 oqaWOKWCz+LqWbsgJmxCpqkt1LZl5+VUOcfJ97WkIS7DyIPtSHTzQXbBMZqKLeat
 mn5UcKYB2Gi7wsUPv6VC2ChKbDqN0VT8G86XbYylGo4BE46KoZKPUNY/QWKLUPd6
 0UKcVeNM2HFyf1C73p/tO/z7hzu3qLuMMnsphP6/c2pKLpdgawEXgbnVKNId1B/c
 NrzyhTvVaMt+Um28bBRhHONIlzPJwWcnZbdY7NqMnu+LBKQ68cL/h4FOIV/RDLNb
 GJLgfAr8fIw/zIpqYuFHiiMNo9wWqVtZko1MvXhGceXUL69QuzTra2XR/6aDxkPf
 6gQVesetTvo=
 =3Cyp
 -----END PGP SIGNATURE-----

Merge tag 'x86-mm-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 tlb updates from Ingo Molnar:
 "The x86 MM changes in this cycle were:

   - Implement concurrent TLB flushes, which overlaps the local TLB
     flush with the remote TLB flush.

     In testing this improved sysbench performance measurably by a
     couple of percentage points, especially if TLB-heavy security
     mitigations are active.

   - Further micro-optimizations to improve the performance of TLB
     flushes"

* tag 'x86-mm-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  smp: Micro-optimize smp_call_function_many_cond()
  smp: Inline on_each_cpu_cond() and on_each_cpu()
  x86/mm/tlb: Remove unnecessary uses of the inline keyword
  cpumask: Mark functions as pure
  x86/mm/tlb: Do not make is_lazy dirty for no reason
  x86/mm/tlb: Privatize cpu_tlbstate
  x86/mm/tlb: Flush remote and local TLBs concurrently
  x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()
  x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote()
  smp: Run functions concurrently in smp_call_function_many_cond()
2021-04-29 11:41:43 -07:00
Linus Torvalds
ea5bc7b977 Trivial cleanups and fixes all over the place.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmCGmYIACgkQEsHwGGHe
 VUr45w/8CSXr7MXaFBj4To0hTWJXSZyF6YGqlZOSJXFcFh4cWTNwfVOoFaV47aDo
 +HsCNTkGENcKhLrDUWDRiG/Uo46jxtOtl1vhq7U4pGemSYH871XWOKfb5k5XNMwn
 /uhaHMI4aEfd6bUFnF518NeyRIsD0BdqFj4tB7RbAiyFwdETDX9Tkj/uBKnQ4zon
 4tEDoXgThuK5YKK9zVQg5pa7aFp2zg1CAdX/WzBkS8BHVBPXSV0CF97AJYQOM/V+
 lUHv+BN3wp97GYHPQMPsbkNr8IuFoe2mIvikwjxg8iOFpzEU1G1u09XV9R+PXByX
 LclFTRqK/2uU5hJlcsBiKfUuidyErYMRYImbMAOREt2w0ogWVu2zQ7HkjVve25h1
 sQPwPudbAt6STbqRxvpmB3yoV4TCYwnF91FcWgEy+rcEK2BDsHCnScA45TsK5I1C
 kGR1K17pHXprgMZFPveH+LgxewB6smDv+HllxQdSG67LhMJXcs2Epz0TsN8VsXw8
 dlD3lGReK+5qy9FTgO7mY0xhiXGz1IbEdAPU4eRBgih13puu03+jqgMaMabvBWKD
 wax+BWJUrPtetwD5fBPhlS/XdJDnd8Mkv2xsf//+wT0s4p+g++l1APYxeB8QEehm
 Pd7Mvxm4GvQkfE13QEVIPYQRIXCMH/e9qixtY5SHUZDBVkUyFM0=
 =bO1i
 -----END PGP SIGNATURE-----

Merge tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 cleanups from Borislav Petkov:
 "Trivial cleanups and fixes all over the place"

* tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  MAINTAINERS: Remove me from IDE/ATAPI section
  x86/pat: Do not compile stubbed functions when X86_PAT is off
  x86/asm: Ensure asm/proto.h can be included stand-alone
  x86/platform/intel/quark: Fix incorrect kernel-doc comment syntax in files
  x86/msr: Make locally used functions static
  x86/cacheinfo: Remove unneeded dead-store initialization
  x86/process/64: Move cpu_current_top_of_stack out of TSS
  tools/turbostat: Unmark non-kernel-doc comment
  x86/syscalls: Fix -Wmissing-prototypes warnings from COND_SYSCALL()
  x86/fpu/math-emu: Fix function cast warning
  x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes
  x86: Fix various typos in comments, take #2
  x86: Remove unusual Unicode characters from comments
  x86/kaslr: Return boolean values from a function returning bool
  x86: Fix various typos in comments
  x86/setup: Remove unused RESERVE_BRK_ARRAY()
  stacktrace: Move documentation for arch_stack_walk_reliable() to header
  x86: Remove duplicate TSC DEADLINE MSR definitions
2021-04-26 09:25:47 -07:00
Linus Torvalds
26a4ef7e48 Add support for SEV-ES guests booting through the 32-bit boot path, along with
cleanups, fixes and improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmCGkbQACgkQEsHwGGHe
 VUrJ4w/+M9TbCppcILNvtaHn0mgpcDVmKvRSDdLl/MWcW1kuzczcdFYAK+OFFD0E
 TYKSEmkJUz3Tm0YBkO9PSPTBk+hnduPunXRk2Mzse1Uv3LxPuWEN3q6ZAfP1rOZ1
 3nlEnzHCWZdf4d7uz49qCXj96bfv98+zU2DaCoVoNUImp8jzo6hMtTPI4N31Tply
 Rb0b0acIkdmy0eaADilMciimZevs9EF3KgiwSd0AUAJE1aRtRpPKtv2F1OraJPkH
 T7AunJvoO8Sb2vpHfaW8iZrx2HKE8KZ4QOfM+dAXurjadlPVBLN34MC8FIw4tIS+
 m2dc/CMaVy1QpyHKOTZqY9ZsCndunrMJXsolhCyBjA6fAZ1aFZswxRWUeGrOkCJ2
 ZGJetB0tADi0gIRZerwyPXOKLiJBo8BSmIr8FzHq8CYYoxKH9D1dqEZVj9kBcGLJ
 SYbgUIKNuw54RzE00S8i2s625RG5A7qn6GrRMvnkVyJnKoD01na0trND2AbufBJz
 oDhBXfvP5SwswEt4YYZ1rn3JO1nRZzn4WGfiUQ4ElOEFYuUEZOJtcw1LHwDJ0LcQ
 bfOs0mmDFajFH1DyILyHfji4rdqHGWIpGIHfmYs98Njtfa8dtximU/csr69by/xV
 dcycXbPaw5psDe4Acw2vb7DM7h7T9fHNG+VgRJb25gXeywGutac=
 =AUGR
 -----END PGP SIGNATURE-----

Merge tag 'x86_seves_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 AMD secure virtualization (SEV-ES) updates from Borislav Petkov:
 "Add support for SEV-ES guests booting through the 32-bit boot path,
  along with cleanups, fixes and improvements"

* tag 'x86_seves_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev-es: Optimize __sev_es_ist_enter() for better readability
  x86/sev-es: Replace open-coded hlt-loops with sev_es_terminate()
  x86/boot/compressed/64: Check SEV encryption in the 32-bit boot-path
  x86/boot/compressed/64: Add CPUID sanity check to 32-bit boot-path
  x86/boot/compressed/64: Add 32-bit boot #VC handler
  x86/boot/compressed/64: Setup IDT in startup_32 boot path
  x86/boot/compressed/64: Reload CS in startup_32
  x86/sev: Do not require Hypervisor CPUID bit for SEV guests
  x86/boot/compressed/64: Cleanup exception handling before booting kernel
  x86/virtio: Have SEV guests enforce restricted virtio memory access
  x86/sev-es: Remove subtraction of res variable
2021-04-26 09:11:10 -07:00
Jan Kiszka
16854b567d x86/pat: Do not compile stubbed functions when X86_PAT is off
Those are already provided by linux/io.h as stubs.

The conflict remains invisible until someone would pull linux/io.h into
memtype.c. This fixes a build error when this file is used outside of
the kernel tree.

  [ bp: Massage commit message. ]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/a9351615-7a0d-9d47-af65-d9e2fffe8192@siemens.com
2021-04-14 08:21:41 +02:00
Lai Jiangshan
1591584e2e x86/process/64: Move cpu_current_top_of_stack out of TSS
cpu_current_top_of_stack is currently stored in TSS.sp1. TSS is exposed
through the cpu_entry_area which is visible with user CR3 when PTI is
enabled and active.

This makes it a coveted fruit for attackers.  An attacker can fetch the
kernel stack top from it and continue next steps of actions based on the
kernel stack.

But it is actualy not necessary to be stored in the TSS.  It is only
accessed after the entry code switched to kernel CR3 and kernel GS_BASE
which means it can be in any regular percpu variable.

The reason why it is in TSS is historical (pre PTI) because TSS is also
used as scratch space in SYSCALL_64 and therefore cache hot.

A syscall also needs the per CPU variable current_task and eventually
__preempt_count, so placing cpu_current_top_of_stack next to them makes it
likely that they end up in the same cache line which should avoid
performance regressions. This is not enforced as the compiler is free to
place these variables, so these entry relevant variables should move into
a data structure to make this enforceable.

The seccomp_benchmark doesn't show any performance loss in the "getpid
native" test result.  Actually, the result changes from 93ns before to 92ns
with this change when KPTI is disabled. The test is very stable and
although the test doesn't show a higher degree of precision it gives enough
confidence that moving cpu_current_top_of_stack does not cause a
regression.

[ tglx: Removed unneeded export. Massaged changelog ]

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210125173444.22696-2-jiangshanlai@gmail.com
2021-03-28 22:40:10 +02:00
Isaku Yamahata
8249d17d31 x86/mem_encrypt: Correct physical address calculation in __set_clr_pte_enc()
The pfn variable contains the page frame number as returned by the
pXX_pfn() functions, shifted to the right by PAGE_SHIFT to remove the
page bits. After page protection computations are done to it, it gets
shifted back to the physical address using page_level_shift().

That is wrong, of course, because that function determines the shift
length based on the level of the page in the page table but in all the
cases, it was shifted by PAGE_SHIFT before.

Therefore, shift it back using PAGE_SHIFT to get the correct physical
address.

 [ bp: Rewrite commit message. ]

Fixes: dfaaec9033b8 ("x86: Add support for changing memory encryption attribute in early boot")
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/81abbae1657053eccc535c16151f63cd049dcb97.1616098294.git.isaku.yamahata@intel.com
2021-03-23 11:59:45 +01:00
Ingo Molnar
163b099146 x86: Fix various typos in comments, take #2
Fix another ~42 single-word typos in arch/x86/ code comments,
missed a few in the first pass, in particular in .S files.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-kernel@vger.kernel.org
2021-03-21 23:50:28 +01:00
Joerg Roedel
eab696d8e8 x86/sev: Do not require Hypervisor CPUID bit for SEV guests
A malicious hypervisor could disable the CPUID intercept for an SEV or
SEV-ES guest and trick it into the no-SEV boot path, where it could
potentially reveal secrets. This is not an issue for SEV-SNP guests,
as the CPUID intercept can't be disabled for those.

Remove the Hypervisor CPUID bit check from the SEV detection code to
protect against this kind of attack and add a Hypervisor bit equals zero
check to the SME detection path to prevent non-encrypted guests from
trying to enable SME.

This handles the following cases:

	1) SEV(-ES) guest where CPUID intercept is disabled. The guest
	   will still see leaf 0x8000001f and the SEV bit. It can
	   retrieve the C-bit and boot normally.

	2) Non-encrypted guests with intercepted CPUID will check
	   the SEV_STATUS MSR and find it 0 and will try to enable SME.
	   This will fail when the guest finds MSR_K8_SYSCFG to be zero,
	   as it is emulated by KVM. But we can't rely on that, as there
	   might be other hypervisors which return this MSR with bit
	   23 set. The Hypervisor bit check will prevent that the guest
	   tries to enable SME in this case.

	3) Non-encrypted guests on SEV capable hosts with CPUID intercept
	   disabled (by a malicious hypervisor) will try to boot into
	   the SME path. This will fail, but it is also not considered
	   a problem because non-encrypted guests have no protection
	   against the hypervisor anyway.

 [ bp: s/non-SEV/non-encrypted/g ]

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20210312123824.306-3-joro@8bytes.org
2021-03-18 16:44:40 +01:00
Ingo Molnar
d9f6e12fb0 x86: Fix various typos in comments
Fix ~144 single-word typos in arch/x86/ code comments.

Doing this in a single commit should reduce the churn.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-kernel@vger.kernel.org
2021-03-18 15:31:53 +01:00
Tom Lendacky
229164175f x86/virtio: Have SEV guests enforce restricted virtio memory access
An SEV guest requires that virtio devices use the DMA API to allow the
hypervisor to successfully access guest memory as needed.

The VIRTIO_F_VERSION_1 and VIRTIO_F_ACCESS_PLATFORM features tell virtio
to use the DMA API. Add arch_has_restricted_virtio_memory_access() for
x86, to fail the device probe if these features have not been set for the
device when running as an SEV guest.

 [ bp: Fix -Wmissing-prototypes warning
   Reported-by: kernel test robot <lkp@intel.com> ]

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/b46e0211f77ca1831f11132f969d470a6ffc9267.1614897610.git.thomas.lendacky@amd.com
2021-03-08 20:41:33 +01:00
Nadav Amit
1608e4cf31 x86/mm/tlb: Remove unnecessary uses of the inline keyword
The compiler is smart enough without these hints.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20210220231712.2475218-9-namit@vmware.com
2021-03-06 12:59:10 +01:00
Nadav Amit
09c5272e48 x86/mm/tlb: Do not make is_lazy dirty for no reason
Blindly writing to is_lazy for no reason, when the written value is
identical to the old value, makes the cacheline dirty for no reason.
Avoid making such writes to prevent cache coherency traffic for no
reason.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20210220231712.2475218-7-namit@vmware.com
2021-03-06 12:59:10 +01:00
Nadav Amit
2f4305b19f x86/mm/tlb: Privatize cpu_tlbstate
cpu_tlbstate is mostly private and only the variable is_lazy is shared.
This causes some false-sharing when TLB flushes are performed.

Break cpu_tlbstate intro cpu_tlbstate and cpu_tlbstate_shared, and mark
each one accordingly.

Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20210220231712.2475218-6-namit@vmware.com
2021-03-06 12:59:10 +01:00
Nadav Amit
4ce94eabac x86/mm/tlb: Flush remote and local TLBs concurrently
To improve TLB shootdown performance, flush the remote and local TLBs
concurrently. Introduce flush_tlb_multi() that does so. Introduce
paravirtual versions of flush_tlb_multi() for KVM, Xen and hyper-v (Xen
and hyper-v are only compile-tested).

While the updated smp infrastructure is capable of running a function on
a single local core, it is not optimized for this case. The multiple
function calls and the indirect branch introduce some overhead, and
might make local TLB flushes slower than they were before the recent
changes.

Before calling the SMP infrastructure, check if only a local TLB flush
is needed to restore the lost performance in this common case. This
requires to check mm_cpumask() one more time, but unless this mask is
updated very frequently, this should impact performance negatively.

Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com> # Hyper-v parts
Reviewed-by: Juergen Gross <jgross@suse.com> # Xen and paravirt parts
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20210220231712.2475218-5-namit@vmware.com
2021-03-06 12:59:10 +01:00
Nadav Amit
6035152d8e x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()
Open-code on_each_cpu_cond_mask() in native_flush_tlb_others() to
optimize the code. Open-coding eliminates the need for the indirect branch
that is used to call is_lazy(), and in CPUs that are vulnerable to
Spectre v2, it eliminates the retpoline. In addition, it allows to use a
preallocated cpumask to compute the CPUs that should be.

This would later allow us not to adapt on_each_cpu_cond_mask() to
support local and remote functions.

Note that calling tlb_is_not_lazy() for every CPU that needs to be
flushed, as done in native_flush_tlb_multi() might look ugly, but it is
equivalent to what is currently done in on_each_cpu_cond_mask().
Actually, native_flush_tlb_multi() does it more efficiently since it
avoids using an indirect branch for the matter.

Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20210220231712.2475218-4-namit@vmware.com
2021-03-06 12:59:09 +01:00
Nadav Amit
4c1ba3923e x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote()
The unification of these two functions allows to use them in the updated
SMP infrastrucutre.

To do so, remove the reason argument from flush_tlb_func_local(), add
a member to struct tlb_flush_info that says which CPU initiated the
flush and act accordingly. Optimize the size of flush_tlb_info while we
are at it.

Unfortunately, this prevents us from using a constant tlb_flush_info for
arch_tlbbatch_flush(), but in a later stage we may be able to inline
tlb_flush_info into the IPI data, so it should not have an impact
eventually.

Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20210220231712.2475218-3-namit@vmware.com
2021-03-06 12:59:09 +01:00
NeilBrown
3d2fc4c082 x86: fix seq_file iteration for pat/memtype.c
The memtype seq_file iterator allocates a buffer in the ->start and ->next
functions and frees it in the ->show function.  The preferred handling for
such resources is to free them in the subsequent ->next or ->stop function
call.

Since Commit 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration
code and interface") there is no guarantee that ->show will be called
after ->next, so this function can now leak memory.

So move the freeing of the buffer to ->next and ->stop.

Link: https://lkml.kernel.org/r/161248539022.21478.13874455485854739066.stgit@noble1
Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface")
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 09:41:05 -08:00
Marco Elver
bc8fbc5f30 kfence: add test suite
Add KFENCE test suite, testing various error detection scenarios. Makes
use of KUnit for test organization. Since KFENCE's interface to obtain
error reports is via the console, the test verifies that KFENCE outputs
expected reports to the console.

[elver@google.com: fix typo in test]
  Link: https://lkml.kernel.org/r/X9lHQExmHGvETxY4@elver.google.com
[elver@google.com: show access type in report]
  Link: https://lkml.kernel.org/r/20210111091544.3287013-2-elver@google.com

Link: https://lkml.kernel.org/r/20201103175841.3495947-9-elver@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Co-developed-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 09:41:02 -08:00
Marco Elver
d438fabce7 kfence: use pt_regs to generate stack trace on faults
Instead of removing the fault handling portion of the stack trace based on
the fault handler's name, just use struct pt_regs directly.

Change kfence_handle_page_fault() to take a struct pt_regs, and plumb it
through to kfence_report_error() for out-of-bounds, use-after-free, or
invalid access errors, where pt_regs is used to generate the stack trace.

If the kernel is a DEBUG_KERNEL, also show registers for more information.

Link: https://lkml.kernel.org/r/20201105092133.2075331-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 09:41:02 -08:00
Alexander Potapenko
1dc0da6e9e x86, kfence: enable KFENCE for x86
Add architecture specific implementation details for KFENCE and enable
KFENCE for the x86 architecture. In particular, this implements the
required interface in <asm/kfence.h> for setting up the pool and
providing helper functions for protecting and unprotecting pages.

For x86, we need to ensure that the pool uses 4K pages, which is done
using the set_memory_4k() helper function.

[elver@google.com: add missing copyright and description header]
  Link: https://lkml.kernel.org/r/20210118092159.145934-2-elver@google.com

Link: https://lkml.kernel.org/r/20201103175841.3495947-3-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Co-developed-by: Marco Elver <elver@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 09:41:02 -08:00
Linus Torvalds
ae821d2107 - PTRACE_GETREGS/PTRACE_PUTREGS regset selection cleanup
- Another initial cleanup - more to follow - to the fault handling code.
 
 - Other minor cleanups and corrections.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmAqU0oACgkQEsHwGGHe
 VUruWw//VA+/K7Ykd8tjZdmJPWdfsdqBtOrolh4hiajM6iYckTip/FdwHpeEQwM9
 ff0iNMrxICG3gbQxCX6WNzPeJatYsnjtF67whfat2SEzNHSDtZDb1Bm20s2/1fbY
 OurRBTEBzuYMolpEJ2XABpu7LQ+6TV3LJ6yUBungILMOjP7KvrCK0SUrWj253VDU
 XljK5XBZnmYlEjPU6dlhn64Wsl/GD7AWCAeZGq47EgjH2cR6gxNmu9kYAArGbdiJ
 WjF8MWE7qVwCPUTiCBv+P1CjsQawvlcUY54wtG65dBYAZvpjmN82T2ypguzAt8KT
 12A38vFlBuEUAWC0rUymNouh8Q20AElpdw/odLElHkpNxbHhf/7RyZ1E00LjsFtn
 MF9Gp9aSIQbfYWK+Hin9oRvqXckV08u3KtzUNeyMbdCmpyqHh6prj8JEZaxKZZUp
 zCaX8Qasn+Q9zL0DO51WI9EPOwpvSpifUYHmd5RHGbQDW9DjYK4mkBCHhjVfYXd/
 NcxRO5rrMLmMG+XuNPg9vuHMi2HJnClJ6odD6b80xGvBodTZxZnqnYO9tUImbYnW
 pdmt73YDvakei8XY7cAdNWcsTi0kQYZGfInna6z43Ri2l+I1TZaoKGDqn7TbzNbb
 9RB0lrD0tfW0PvvDbVwco0Q+8/ykIbvPkHPvjQGWioxHi6yI49s=
 =uVEk
 -----END PGP SIGNATURE-----

Merge tag 'x86_mm_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mm cleanups from Borislav Petkov:

 - PTRACE_GETREGS/PTRACE_PUTREGS regset selection cleanup

 - Another initial cleanup - more to follow - to the fault handling
   code.

 - Other minor cleanups and corrections.

* tag 'x86_mm_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/{fault,efi}: Fix and rename efi_recover_from_page_fault()
  x86/fault: Don't run fixups for SMAP violations
  x86/fault: Don't look for extable entries for SMEP violations
  x86/fault: Rename no_context() to kernelmode_fixup_or_oops()
  x86/fault: Bypass no_context() for implicit kernel faults from usermode
  x86/fault: Split the OOPS code out from no_context()
  x86/fault: Improve kernel-executing-user-memory handling
  x86/fault: Correct a few user vs kernel checks wrt WRUSS
  x86/fault: Document the locking in the fault_signal_pending() path
  x86/fault/32: Move is_f00f_bug() to do_kern_addr_fault()
  x86/fault: Fold mm_fault_error() into do_user_addr_fault()
  x86/fault: Skip the AMD erratum #91 workaround on unaffected CPUs
  x86/fault: Fix AMD erratum #91 errata fixup for user code
  x86/Kconfig: Remove HPET_EMULATE_RTC depends on RTC
  x86/asm: Fixup TASK_SIZE_MAX comment
  x86/ptrace: Clean up PTRACE_GETREGS/PTRACE_PUTREGS regset selection
  x86/vm86/32: Remove VM86_SCREEN_BITMAP support
  x86: Remove definition of DEBUG
  x86/entry: Remove now unused do_IRQ() declaration
  x86/mm: Remove duplicate definition of _PAGE_PAT_LARGE
  ...
2021-02-20 19:34:09 -08:00
Linus Torvalds
b9cdab6820 Do not unroll string I/O for SEV-ES guests because they support it.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmAqT+UACgkQEsHwGGHe
 VUr1hQ//d4lgbpkXAq17f/43kMZK5GJKMocdBZw+6QmUr4TYWS2PPTfXEr4FCjM1
 NqxR2wo97bjbg0oHZOd0JzjxzhC6fHXwiyyFGRv3sS7WNfVPlJRZ47dNa1qPxuZX
 QjFwjtWtR2YzKue5Xq7O87gf2YEWFgojWRd5cCyqNU2a0SAHSOhNij8vSswXaBZA
 tD2ApIlce1de4HJxXXFBN9xAows7EqKH4OTqRld66aUqDsUfLQCdPB0aKqnTpItA
 pzW6TRM392K97j2jvMjEJsHcRUVdsYvE8ICz36vk2vmpP1V3stPxOW8lfC4JV5dL
 NXmhuol+CzW/3FeyW5wTtmhJRj3I5MvPl3ddf30Ex9tXSx1RT/7VU7p/zDcv81WZ
 32bQFYnPeNI0SHXH+CW7p13GWDE+yTBCUy838Olmkx69K+2OBgt5B6W2BcOkiKPN
 RH1NAkwxhjP53eR7CIjirvKK77/p/48/Jg2TyLVWKHcdr7QPZdFG0+FpYbQ2nKoj
 q2Gy7l/uxTciu7/RTRHM3E/STVqEmrWSjIuy+dhTOJaHqjMXPyiWXWdYgH2QymUX
 pkTi72uaHy5HF0DyOo5pONkJrODUBasr1iJxUKuWkR++wLJihxN5+lbTxd9Vvcop
 KN5HNhLoyyQ8jYuc8OLwoHIDEskJgMqLDfNx9sLqCXfR380FfAc=
 =anHE
 -----END PGP SIGNATURE-----

Merge tag 'x86_seves_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SEV-ES fix from Borislav Petkov:
 "Do not unroll string I/O for SEV-ES guests because they support it"

* tag 'x86_seves_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev-es: Do not unroll string I/O for SEV-ES guests
2021-02-20 19:16:02 -08:00
Ingo Molnar
40c1fa52cd Merge branch 'x86/cleanups' into x86/mm
Merge recent cleanups to the x86 MM code to resolve a conflict.

Conflicts:
	arch/x86/mm/fault.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-02-12 13:40:02 +01:00
Andy Lutomirski
c46f52231e x86/{fault,efi}: Fix and rename efi_recover_from_page_fault()
efi_recover_from_page_fault() doesn't recover -- it does a special EFI
mini-oops.  Rename it to make it clear that it crashes.

While renaming it, I noticed a blatant bug: a page fault oops in a
different thread happening concurrently with an EFI runtime service call
would be misinterpreted as an EFI page fault.  Fix that.

This isn't quite exact. The situation could be improved by using a
special CS for calls into EFI.

 [ bp: Massage commit message and simplify in interrupt check. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/f43b1e80830dc78ed60ed8b0826f4f189254570c.1612924255.git.luto@kernel.org
2021-02-10 18:39:23 +01:00
Andy Lutomirski
ca24728378 x86/fault: Don't run fixups for SMAP violations
A SMAP-violating kernel access is not a recoverable condition.  Imagine
kernel code that, outside of a uaccess region, dereferences a pointer to
the user range by accident.  If SMAP is on, this will reliably generate
as an intentional user access.  This makes it easy for bugs to be
overlooked if code is inadequately tested both with and without SMAP.

This was discovered because BPF can generate invalid accesses to user
memory, but those warnings only got printed if SMAP was off. Make it so
that this type of error will be discovered with SMAP on as well.

 [ bp: Massage commit message. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/66a02343624b1ff46f02a838c497fc05c1a871b3.1612924255.git.luto@kernel.org
2021-02-10 16:27:57 +01:00
Andy Lutomirski
66fcd98883 x86/fault: Don't look for extable entries for SMEP violations
If the kernel gets a SMEP violation or a fault that would have been a
SMEP violation if it had SMEP support, it shouldn't run fixups. Just
OOPS.

 [ bp: Massage commit message. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/46160d8babce2abf1d6daa052146002efa24ac56.1612924255.git.luto@kernel.org
2021-02-10 14:45:39 +01:00
Andy Lutomirski
6456a2a69e x86/fault: Rename no_context() to kernelmode_fixup_or_oops()
The name no_context() has never been very clear.  It's only called for
faults from kernel mode, so rename it and change the no-longer-useful
user_mode(regs) check to a WARN_ON_ONCE.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/c21940efe676024bb4bc721f7d70c29c420e127e.1612924255.git.luto@kernel.org
2021-02-10 14:41:19 +01:00
Andy Lutomirski
5042d40a26 x86/fault: Bypass no_context() for implicit kernel faults from usermode
Drop an indentation level and remove the last user_mode(regs) == true
caller of no_context() by directly OOPSing for implicit kernel faults
from usermode.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/6e3d1129494a8de1e59d28012286e3a292a2296e.1612924255.git.luto@kernel.org
2021-02-10 14:39:52 +01:00
Andy Lutomirski
2cc624b0a7 x86/fault: Split the OOPS code out from no_context()
Not all callers of no_context() want to run exception fixups.
Separate the OOPS code out from the fixup code in no_context().

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/450f8d8eabafb83a5df349108c8e5ea83a2f939d.1612924255.git.luto@kernel.org
2021-02-10 14:33:36 +01:00
Andy Lutomirski
03c81ea333 x86/fault: Improve kernel-executing-user-memory handling
Right now, the case of the kernel trying to execute from user memory
is treated more or less just like the kernel getting a page fault on a
user access. In the failure path, it checks for erratum #93, tries to
otherwise fix up the error, and then oopses.

If it manages to jump to the user address space, with or without SMEP,
it should not try to resolve the page fault. This is an error, pure and
simple. Rearrange the code so that this case is caught early, check for
erratum #93, and bail out.

 [ bp: Massage commit message. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/ab8719c7afb8bd501c4eee0e36493150fbbe5f6a.1612924255.git.luto@kernel.org
2021-02-10 14:20:54 +01:00
Andy Lutomirski
56e62cd28a x86/fault: Correct a few user vs kernel checks wrt WRUSS
In general, page fault errors for WRUSS should be just like get_user(),
etc.  Fix three bugs in this area:

There is a comment that says that, if the kernel can't handle a page fault
on a user address due to OOM, the OOM-kill-and-retry logic would be
skipped.  The code checked kernel *privilege*, not kernel mode, so it
missed WRUSS.  This means that the kernel would malfunction if it got OOM
on a WRUSS fault -- this would be a kernel-mode, user-privilege fault, and
the OOM killer would be invoked and the handler would retry the faulting
instruction.

A failed user access from kernel while a fatal signal is pending should
fail even if the instruction in question was WRUSS.

do_sigbus() should not send SIGBUS for WRUSS -- it should handle it like
any other kernel mode failure.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/a7b7bcea730bd4069e6b7e629236bb2cf526c2fb.1612924255.git.luto@kernel.org
2021-02-10 14:13:32 +01:00
Andy Lutomirski
ef2544fb3f x86/fault: Document the locking in the fault_signal_pending() path
If fault_signal_pending() returns true, then the core mm has unlocked the
mm for us.  Add a comment to help future readers of this code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/c56de3d103f40e6304437b150aa7b215530d23f7.1612924255.git.luto@kernel.org
2021-02-10 14:12:07 +01:00
Andy Lutomirski
f42a40fd53 x86/fault/32: Move is_f00f_bug() to do_kern_addr_fault()
bad_area() and its relatives are called from many places in fault.c, and
exactly one of them wants the F00F workaround.

__bad_area_nosemaphore() no longer contains any kernel fault code, which
prepares for further cleanups.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/e9668729a48ce6754022b0a4415631e8ebdd00e7.1612924255.git.luto@kernel.org
2021-02-10 14:11:07 +01:00
Andy Lutomirski
ec352711ce x86/fault: Fold mm_fault_error() into do_user_addr_fault()
mm_fault_error() is logically just the end of do_user_addr_fault().
Combine the functions.  This makes the code easier to read.

Most of the churn here is from renaming hw_error_code to error_code in
do_user_addr_fault().

This makes no difference at all to the generated code (objdump -dr) as
compared to changing noinline to __always_inline in the definition of
mm_fault_error().

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/dedc4d9c9b047e51ce38b991bd23971a28af4e7b.1612924255.git.luto@kernel.org
2021-02-10 14:10:07 +01:00
Andy Lutomirski
d24df8ecf9 x86/fault: Skip the AMD erratum #91 workaround on unaffected CPUs
According to the Revision Guide for AMD Athlon™ 64 and AMD Opteron™
Processors, only early revisions of family 0xF are affected. This will
avoid unnecessarily fetching instruction bytes before sending SIGSEGV to
user programs.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/477173b7784bc28afb3e53d76ae5ef143917e8dd.1612924255.git.luto@kernel.org
2021-02-10 13:38:12 +01:00
Andy Lutomirski
35f1c89b0c x86/fault: Fix AMD erratum #91 errata fixup for user code
The recent rework of probe_kernel_address() and its conversion to
get_kernel_nofault() inadvertently broke is_prefetch(). Before this
change, probe_kernel_address() was used as a sloppy "read user or
kernel memory" helper, but it doesn't do that any more. The new
get_kernel_nofault() reads *kernel* memory only, which completely broke
is_prefetch() for user access.

Adjust the code to the correct accessor based on access mode. The
manual address bounds check is no longer necessary, since the accessor
helpers (get_user() / get_kernel_nofault()) do the right thing all by
themselves. As a bonus, by using the correct accessor, the open-coded
address bounds check is not needed anymore.

 [ bp: Massage commit message. ]

Fixes: eab0c6089b68 ("maccess: unify the probe kernel arch hooks")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/b91f7f92f3367d2d3a88eec3b09c6aab1b2dc8ef.1612924255.git.luto@kernel.org
2021-02-10 13:11:41 +01:00
Sean Christopherson
ccd85d90ce KVM: SVM: Treat SVM as unsupported when running as an SEV guest
Don't let KVM load when running as an SEV guest, regardless of what
CPUID says.  Memory is encrypted with a key that is not accessible to
the host (L0), thus it's impossible for L0 to emulate SVM, e.g. it'll
see garbage when reading the VMCB.

Technically, KVM could decrypt all memory that needs to be accessible to
the L0 and use shadow paging so that L0 does not need to shadow NPT, but
exposing such information to L0 largely defeats the purpose of running as
an SEV guest.  This can always be revisited if someone comes up with a
use case for running VMs inside SEV guests.

Note, VMLOAD, VMRUN, etc... will also #GP on GPAs with C-bit set, i.e. KVM
is doomed even if the SEV guest is debuggable and the hypervisor is willing
to decrypt the VMCB.  This may or may not be fixed on CPUs that have the
SVME_ADDR_CHK fix.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210202212017.2486595-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-03 04:30:37 -05:00
Tom Lendacky
62a08a7193 x86/sev-es: Do not unroll string I/O for SEV-ES guests
Under the GHCB specification, SEV-ES guests can support string I/O.
The current #VC handler contains this support, so remove the need to
unroll kernel string I/O operations. This will reduce the number of #VC
exceptions generated as well as the number VM exits for the guest.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/3de04b5b638546ac75d42ba52307fe1a922173d3.1612203987.git.thomas.lendacky@amd.com
2021-02-02 16:25:05 +01:00
Andy Lutomirski
8ece53ef7f x86/vm86/32: Remove VM86_SCREEN_BITMAP support
The implementation was rather buggy.  It unconditionally marked PTEs
read-only, even for VM_SHARED mappings.  I'm not sure whether this is
actually a problem, but it certainly seems unwise.  More importantly, it
released the mmap lock before flushing the TLB, which could allow a racing
CoW operation to falsely believe that the underlying memory was not
writable.

I can't find any users at all of this mechanism, so just remove it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Stas Sergeev <stsp2@yandex.ru>
Link: https://lkml.kernel.org/r/f3086de0babcab36f69949b5780bde851f719bc8.1611078018.git.luto@kernel.org
2021-01-21 20:08:53 +01:00
Tom Rix
b86cb29287 x86: Remove definition of DEBUG
Defining DEBUG should only be done in development. So remove it.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20210114212827.47584-1-trix@redhat.com
2021-01-15 08:23:10 +01:00
Dan Williams
d1c5246e08 x86/mm: Fix leak of pmd ptlock
Commit

  28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")

introduced a new location where a pmd was released, but neglected to
run the pmd page destructor. In fact, this happened previously for a
different pmd release path and was fixed by commit:

  c283610e44ec ("x86, mm: do not leak page->ptl for pmd page tables").

This issue was hidden until recently because the failure mode is silent,
but commit:

  b2b29d6d0119 ("mm: account PMD tables like PTE tables")

turns the failure mode into this signature:

 BUG: Bad page state in process lt-pmem-ns  pfn:15943d
 page:000000007262ed7b refcount:0 mapcount:-1024 mapping:0000000000000000 index:0x0 pfn:0x15943d
 flags: 0xaffff800000000()
 raw: 00affff800000000 dead000000000100 0000000000000000 0000000000000000
 raw: 0000000000000000 ffff913a029bcc08 00000000fffffbff 0000000000000000
 page dumped because: nonzero mapcount
 [..]
  dump_stack+0x8b/0xb0
  bad_page.cold+0x63/0x94
  free_pcp_prepare+0x224/0x270
  free_unref_page+0x18/0xd0
  pud_free_pmd_page+0x146/0x160
  ioremap_pud_range+0xe3/0x350
  ioremap_page_range+0x108/0x160
  __ioremap_caller.constprop.0+0x174/0x2b0
  ? memremap+0x7a/0x110
  memremap+0x7a/0x110
  devm_memremap+0x53/0xa0
  pmem_attach_disk+0x4ed/0x530 [nd_pmem]
  ? __devm_release_region+0x52/0x80
  nvdimm_bus_probe+0x85/0x210 [libnvdimm]

Given this is a repeat occurrence it seemed prudent to look for other
places where this destructor might be missing and whether a better
helper is needed. try_to_free_pmd_page() looks like a candidate, but
testing with setting up and tearing down pmd mappings via the dax unit
tests is thus far not triggering the failure.

As for a better helper pmd_free() is close, but it is a messy fit
due to requiring an @mm arg. Also, ___pmd_free_tlb() wants to call
paravirt_tlb_remove_table() instead of free_page(), so open-coded
pgtable_pmd_page_dtor() seems the best way forward for now.

Debugged together with Matthew Wilcox <willy@infradead.org>.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/160697689204.605323.17629854984697045602.stgit@dwillia2-desk3.amr.corp.intel.com
2021-01-05 11:40:23 +01:00
Lorenzo Stoakes
167dcfc08b x86/mm: Increase pgt_buf size for 5-level page tables
pgt_buf is used to allocate page tables on initial direct page mapping
which bootstraps the kernel into being able to allocate these before the
direct mapping makes further pages available.

INIT_PGD_PAGE_COUNT is set to 6 pages (doubled for KASLR) - 3 (PUD, PMD,
PTE) for the 1 MiB ISA mapping and 3 more for the first direct mapping
assignment in each case providing 2 MiB of address space.

This has not been updated for 5-level page tables which has an
additional P4D page table level above PUD.

In most instances, this will not have a material impact as the first
4 page levels allocated for the ISA mapping will provide sufficient
address space to encompass all further address mappings.

If the first direct mapping is within 512 GiB of the ISA mapping, only
a PMD and PTE needs to be added in the instance the kernel is using 4
KiB page tables (e.g. CONFIG_DEBUG_PAGEALLOC is enabled) and only a PMD
if the kernel can use 2 MiB pages (the first allocation is limited to
PMD_SIZE so a GiB page cannot be used there).

However, if the machine has more than 512 GiB of RAM and the kernel is
allocating 4 KiB page size, 3 further page tables are required.

If the machine has more than 256 TiB of RAM at 4 KiB or 2 MiB page size,
further 3 or 4 page tables are required respectively.

Update INIT_PGD_PAGE_COUNT to reflect this.

 [ bp: Sanitize text into passive voice without ambiguous personal pronouns. ]

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/20201215205641.34096-1-lstoakes@gmail.com
2021-01-04 18:07:50 +01:00
Linus Torvalds
007c74e16c Merge branch 'stable/for-linus-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb update from Konrad Rzeszutek Wilk:
 "A generic (but for right now engaged only with AMD SEV) mechanism to
  adjust a larger size SWIOTLB based on the total memory of the SEV
  guests which right now require the bounce buffer for interacting with
  the outside world.

  Normal knobs (swiotlb=XYZ) still work"

* 'stable/for-linus-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
  x86,swiotlb: Adjust SWIOTLB bounce buffer size for SEV guests
2020-12-16 13:51:34 -08:00
Linus Torvalds
ac73e3dc8a Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:

 - a few random little subsystems

 - almost all of the MM patches which are staged ahead of linux-next
   material. I'll trickle to post-linux-next work in as the dependents
   get merged up.

Subsystems affected by this patch series: kthread, kbuild, ide, ntfs,
ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache,
gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation,
kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction,
oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc,
uaccess, zram, and cleanups).

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (200 commits)
  mm: cleanup kstrto*() usage
  mm: fix fall-through warnings for Clang
  mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at
  mm: shmem: convert shmem_enabled_show to use sysfs_emit_at
  mm:backing-dev: use sysfs_emit in macro defining functions
  mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening
  mm: use sysfs_emit for struct kobject * uses
  mm: fix kernel-doc markups
  zram: break the strict dependency from lzo
  zram: add stat to gather incompressible pages since zram set up
  zram: support page writeback
  mm/process_vm_access: remove redundant initialization of iov_r
  mm/zsmalloc.c: rework the list_add code in insert_zspage()
  mm/zswap: move to use crypto_acomp API for hardware acceleration
  mm/zswap: fix passing zero to 'PTR_ERR' warning
  mm/zswap: make struct kernel_param_ops definitions const
  userfaultfd/selftests: hint the test runner on required privilege
  userfaultfd/selftests: fix retval check for userfaultfd_open()
  userfaultfd/selftests: always dump something in modes
  userfaultfd: selftests: make __{s,u}64 format specifiers portable
  ...
2020-12-15 12:53:37 -08:00
Mike Rapoport
32a0de886e arch, mm: make kernel_page_present() always available
For architectures that enable ARCH_HAS_SET_MEMORY having the ability to
verify that a page is mapped in the kernel direct map can be useful
regardless of hibernation.

Add RISC-V implementation of kernel_page_present(), update its forward
declarations and stubs to be a part of set_memory API and remove ugly
ifdefery in inlcude/linux/mm.h around current declarations of
kernel_page_present().

Link: https://lkml.kernel.org/r/20201109192128.960-5-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:43 -08:00
Mike Rapoport
5d6ad668f3 arch, mm: restore dependency of __kernel_map_pages() on DEBUG_PAGEALLOC
The design of DEBUG_PAGEALLOC presumes that __kernel_map_pages() must
never fail.  With this assumption is wouldn't be safe to allow general
usage of this function.

Moreover, some architectures that implement __kernel_map_pages() have this
function guarded by #ifdef DEBUG_PAGEALLOC and some refuse to map/unmap
pages when page allocation debugging is disabled at runtime.

As all the users of __kernel_map_pages() were converted to use
debug_pagealloc_map_pages() it is safe to make it available only when
DEBUG_PAGEALLOC is set.

Link: https://lkml.kernel.org/r/20201109192128.960-4-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:43 -08:00
Linus Torvalds
edd7ab7684 The new preemtible kmap_local() implementation:
- Consolidate all kmap_atomic() internals into a generic implementation
     which builds the base for the kmap_local() API and make the
     kmap_atomic() interface wrappers which handle the disabling/enabling of
     preemption and pagefaults.
 
   - Switch the storage from per-CPU to per task and provide scheduler
     support for clearing mapping when scheduling out and restoring them
     when scheduling back in.
 
   - Merge the migrate_disable/enable() code, which is also part of the
     scheduler pull request. This was required to make the kmap_local()
     interface available which does not disable preemption when a mapping
     is established. It has to disable migration instead to guarantee that
     the virtual address of the mapped slot is the same accross preemption.
 
   - Provide better debug facilities: guard pages and enforced utilization
     of the mapping mechanics on 64bit systems when the architecture allows
     it.
 
   - Provide the new kmap_local() API which can now be used to cleanup the
     kmap_atomic() usage sites all over the place. Most of the usage sites
     do not require the implicit disabling of preemption and pagefaults so
     the penalty on 64bit and 32bit non-highmem systems is removed and quite
     some of the code can be simplified. A wholesale conversion is not
     possible because some usage depends on the implicit side effects and
     some need to be cleaned up because they work around these side effects.
 
     The migrate disable side effect is only effective on highmem systems
     and when enforced debugging is enabled. On 64bit and 32bit non-highmem
     systems the overhead is completely avoided.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/XyQwTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoUolD/9+R+BX96fGir+I8rG9dc3cbLw5meSi
 0I/Nq3PToZMs2Iqv50DsoaPYHHz/M6fcAO9LRIgsE9jRbnY93GnsBM0wU9Y8yQaT
 4wUzOG5WHaLDfqIkx/CN9coUl458oEiwOEbn79A2FmPXFzr7IpkufnV3ybGDwzwP
 p73bjMJMPPFrsa9ig87YiYfV/5IAZHi82PN8Cq1v4yNzgXRP3Tg6QoAuCO84ZnWF
 RYlrfKjcJ2xPdn+RuYyXolPtxr1hJQ0bOUpe4xu/UfeZjxZ7i1wtwLN9kWZe8CKH
 +x4Lz8HZZ5QMTQ9sCHOLtKzu2MceMcpISzoQH4/aFQCNMgLn1zLbS790XkYiQCuR
 ne9Cua+IqgYfGMG8cq8+bkU9HCNKaXqIBgPEKE/iHYVmqzCOqhW5Cogu4KFekf6V
 Wi7pyyUdX2en8BAWpk5NHc8de9cGcc+HXMq2NIcgXjVWvPaqRP6DeITERTZLJOmz
 XPxq5oPLGl7wdm7z+ICIaNApy8zuxpzb6sPLNcn7l5OeorViORlUu08AN8587wAj
 FiVjp6ZYomg+gyMkiNkDqFOGDH5TMENpOFoB0hNNEyJwwS0xh6CgWuwZcv+N8aPO
 HuS/P+tNANbD8ggT4UparXYce7YCtgOf3IG4GA3JJYvYmJ6pU+AZOWRoDScWq4o+
 +jlfoJhMbtx5Gg==
 =n71I
 -----END PGP SIGNATURE-----

Merge tag 'core-mm-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull kmap updates from Thomas Gleixner:
 "The new preemtible kmap_local() implementation:

   - Consolidate all kmap_atomic() internals into a generic
     implementation which builds the base for the kmap_local() API and
     make the kmap_atomic() interface wrappers which handle the
     disabling/enabling of preemption and pagefaults.

   - Switch the storage from per-CPU to per task and provide scheduler
     support for clearing mapping when scheduling out and restoring them
     when scheduling back in.

   - Merge the migrate_disable/enable() code, which is also part of the
     scheduler pull request. This was required to make the kmap_local()
     interface available which does not disable preemption when a
     mapping is established. It has to disable migration instead to
     guarantee that the virtual address of the mapped slot is the same
     across preemption.

   - Provide better debug facilities: guard pages and enforced
     utilization of the mapping mechanics on 64bit systems when the
     architecture allows it.

   - Provide the new kmap_local() API which can now be used to cleanup
     the kmap_atomic() usage sites all over the place. Most of the usage
     sites do not require the implicit disabling of preemption and
     pagefaults so the penalty on 64bit and 32bit non-highmem systems is
     removed and quite some of the code can be simplified. A wholesale
     conversion is not possible because some usage depends on the
     implicit side effects and some need to be cleaned up because they
     work around these side effects.

     The migrate disable side effect is only effective on highmem
     systems and when enforced debugging is enabled. On 64bit and 32bit
     non-highmem systems the overhead is completely avoided"

* tag 'core-mm-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  ARM: highmem: Fix cache_is_vivt() reference
  x86/crashdump/32: Simplify copy_oldmem_page()
  io-mapping: Provide iomap_local variant
  mm/highmem: Provide kmap_local*
  sched: highmem: Store local kmaps in task struct
  x86: Support kmap_local() forced debugging
  mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
  mm/highmem: Provide and use CONFIG_DEBUG_KMAP_LOCAL
  microblaze/mm/highmem: Add dropped #ifdef back
  xtensa/mm/highmem: Make generic kmap_atomic() work correctly
  mm/highmem: Take kmap_high_get() properly into account
  highmem: High implementation details and document API
  Documentation/io-mapping: Remove outdated blurb
  io-mapping: Cleanup atomic iomap
  mm/highmem: Remove the old kmap_atomic cruft
  highmem: Get rid of kmap_types.h
  xtensa/mm/highmem: Switch to generic kmap atomic
  sparc/mm/highmem: Switch to generic kmap atomic
  powerpc/mm/highmem: Switch to generic kmap atomic
  nds32/mm/highmem: Switch to generic kmap atomic
  ...
2020-12-14 18:35:53 -08:00