950078 Commits

Author SHA1 Message Date
Marc Zyngier
816c347f3a Merge remote-tracking branch 'arm64/for-next/ghostbusters' into kvm-arm64/hyp-pcpu
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-09-30 09:48:30 +01:00
David Brazdil
a3bb9c3a00 kvm: arm64: Remove unnecessary hyp mappings
With all nVHE per-CPU variables being part of the hyp per-CPU region,
mapping them individual is not necessary any longer. They are mapped to hyp
as part of the overall per-CPU region.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Andrew Scull <ascull@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200922204910.7265-11-dbrazdil@google.com
2020-09-30 08:37:57 +01:00
David Brazdil
30c953911c kvm: arm64: Set up hyp percpu data for nVHE
Add hyp percpu section to linker script and rename the corresponding ELF
sections of hyp/nvhe object files. This moves all nVHE-specific percpu
variables to the new hyp percpu section.

Allocate sufficient amount of memory for all percpu hyp regions at global KVM
init time and create corresponding hyp mappings.

The base addresses of hyp percpu regions are kept in a dynamically allocated
array in the kernel.

Add NULL checks in PMU event-reset code as it may run before KVM memory is
initialized.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200922204910.7265-10-dbrazdil@google.com
2020-09-30 08:37:14 +01:00
David Brazdil
2a1198c9b4 kvm: arm64: Create separate instances of kvm_host_data for VHE/nVHE
Host CPU context is stored in a global per-cpu variable `kvm_host_data`.
In preparation for introducing independent per-CPU region for nVHE hyp,
create two separate instances of `kvm_host_data`, one for VHE and one
for nVHE.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200922204910.7265-9-dbrazdil@google.com
2020-09-30 08:37:13 +01:00
David Brazdil
df4c8214a1 kvm: arm64: Duplicate arm64_ssbd_callback_required for nVHE hyp
Hyp keeps track of which cores require SSBD callback by accessing a
kernel-proper global variable. Create an nVHE symbol of the same name
and copy the value from kernel proper to nVHE as KVM is being enabled
on a core.

Done in preparation for separating percpu memory owned by kernel
proper and nVHE.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200922204910.7265-8-dbrazdil@google.com
2020-09-30 08:37:13 +01:00
David Brazdil
572494995b kvm: arm64: Add helpers for accessing nVHE hyp per-cpu vars
Defining a per-CPU variable in hyp/nvhe will result in its name being
prefixed with __kvm_nvhe_. Add helpers for declaring these variables
in kernel proper and accessing them with this_cpu_ptr and per_cpu_ptr.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200922204910.7265-7-dbrazdil@google.com
2020-09-30 08:37:13 +01:00
David Brazdil
ea391027d3 kvm: arm64: Remove hyp_adr/ldr_this_cpu
The hyp_adr/ldr_this_cpu helpers were introduced for use in hyp code
because they always needed to use TPIDR_EL2 for base, while
adr/ldr_this_cpu from kernel proper would select between TPIDR_EL2 and
_EL1 based on VHE/nVHE.

Simplify this now that the hyp mode case can be handled using the
__KVM_VHE/NVHE_HYPERVISOR__ macros.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Andrew Scull <ascull@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200922204910.7265-6-dbrazdil@google.com
2020-09-30 08:37:07 +01:00
David Brazdil
717cf94adb kvm: arm64: Remove __hyp_this_cpu_read
this_cpu_ptr is meant for use in kernel proper because it selects between
TPIDR_EL1/2 based on nVHE/VHE. __hyp_this_cpu_ptr was used in hyp to always
select TPIDR_EL2. Unify all users behind this_cpu_ptr and friends by
selecting _EL2 register under __KVM_NVHE_HYPERVISOR__. VHE continues
selecting the register using alternatives.

Under CONFIG_DEBUG_PREEMPT, the kernel helpers perform a preemption check
which is omitted by the hyp helpers. Preserve the behavior for nVHE by
overriding the corresponding macros under __KVM_NVHE_HYPERVISOR__. Extend
the checks into VHE hyp code.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Andrew Scull <ascull@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200922204910.7265-5-dbrazdil@google.com
2020-09-30 08:33:52 +01:00
David Brazdil
3471ee06e3 kvm: arm64: Only define __kvm_ex_table for CONFIG_KVM
Minor cleanup that only creates __kvm_ex_table ELF section and
related symbols if CONFIG_KVM is enabled. Also useful as more
hyp-specific sections will be added.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200922204910.7265-4-dbrazdil@google.com
2020-09-30 08:33:52 +01:00
David Brazdil
ce492a16ff kvm: arm64: Move nVHE hyp namespace macros to hyp_image.h
Minor cleanup to move all macros related to prefixing nVHE hyp section
and symbol names into one place: hyp_image.h.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200922204910.7265-3-dbrazdil@google.com
2020-09-30 08:33:52 +01:00
David Brazdil
ab25464bda kvm: arm64: Partially link nVHE hyp code, simplify HYPCOPY
Relying on objcopy to prefix the ELF section names of the nVHE hyp code
is brittle and prevents us from using wildcards to match specific
section names.

Improve the build rules by partially linking all '.nvhe.o' files and
prefixing their ELF section names using a linker script. Continue using
objcopy for prefixing ELF symbol names.

One immediate advantage of this approach is that all subsections
matching a pattern can be merged into a single prefixed section, eg.
.text and .text.* can be linked into a single '.hyp.text'. This removes
the need for -fno-reorder-functions on GCC and will be useful in the
future too: LTO builds use .text subsections, compilers routinely
generate .rodata subsections, etc.

Partially linking all hyp code into a single object file also makes it
easier to analyze.

Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200922204910.7265-2-dbrazdil@google.com
2020-09-30 08:33:52 +01:00
Will Deacon
780c083a8f arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option
The PR_SPEC_DISABLE_NOEXEC option to the PR_SPEC_STORE_BYPASS prctl()
allows the SSB mitigation to be enabled only until the next execve(),
at which point the state will revert back to PR_SPEC_ENABLE and the
mitigation will be disabled.

Add support for PR_SPEC_DISABLE_NOEXEC on arm64.

Reported-by: Anthony Steinhauser <asteinhauser@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:17 +01:00
Will Deacon
5c8b0cbd9d arm64: Pull in task_stack_page() to Spectre-v4 mitigation code
The kbuild robot reports that we're relying on an implicit inclusion to
get a definition of task_stack_page() in the Spectre-v4 mitigation code,
which is not always in place for some configurations:

  | arch/arm64/kernel/proton-pack.c:329:2: error: implicit declaration of function 'task_stack_page' [-Werror,-Wimplicit-function-declaration]
  |         task_pt_regs(task)->pstate |= val;
  |         ^
  | arch/arm64/include/asm/processor.h:268:36: note: expanded from macro 'task_pt_regs'
  |         ((struct pt_regs *)(THREAD_SIZE + task_stack_page(p)) - 1)
  |                                           ^
  | arch/arm64/kernel/proton-pack.c:329:2: note: did you mean 'task_spread_page'?

Add the missing include to fix the build error.

Fixes: a44acf477220 ("arm64: Move SSBD prctl() handler alongside other spectre mitigation code")
Reported-by: Anthony Steinhauser <asteinhauser@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/202009260013.Ul7AD29w%lkp@intel.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:17 +01:00
Will Deacon
9ef2b48be9 KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled
Patching the EL2 exception vectors is integral to the Spectre-v2
workaround, where it can be necessary to execute CPU-specific sequences
to nobble the branch predictor before running the hypervisor text proper.

Remove the dependency on CONFIG_RANDOMIZE_BASE and allow the EL2 vectors
to be patched even when KASLR is not enabled.

Fixes: 7a132017e7a5 ("KVM: arm64: Replace CONFIG_KVM_INDIRECT_VECTORS with CONFIG_RANDOMIZE_BASE")
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/202009221053.Jv1XsQUZ%lkp@intel.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:17 +01:00
Marc Zyngier
31c84d6c9c arm64: Get rid of arm64_ssbd_state
Out with the old ghost, in with the new...

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:17 +01:00
Marc Zyngier
d63d975a71 KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state()
Convert the KVM WA2 code to using the Spectre infrastructure,
making the code much more readable. It also allows us to
take SSBS into account for the mitigation.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:17 +01:00
Marc Zyngier
7311467702 KVM: arm64: Get rid of kvm_arm_have_ssbd()
kvm_arm_have_ssbd() is now completely unused, get rid of it.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:17 +01:00
Marc Zyngier
29e8910a56 KVM: arm64: Simplify handling of ARCH_WORKAROUND_2
Owing to the fact that the host kernel is always mitigated, we can
drastically simplify the WA2 handling by keeping the mitigation
state ON when entering the guest. This means the guest is either
unaffected or not mitigated.

This results in a nice simplification of the mitigation space,
and the removal of a lot of code that was never really used anyway.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:16 +01:00
Will Deacon
c28762070c arm64: Rewrite Spectre-v4 mitigation code
Rewrite the Spectre-v4 mitigation handling code to follow the same
approach as that taken by Spectre-v2.

For now, report to KVM that the system is vulnerable (by forcing
'ssbd_state' to ARM64_SSBD_UNKNOWN), as this will be cleared up in
subsequent steps.

Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:16 +01:00
Will Deacon
9e78b659b4 arm64: Move SSBD prctl() handler alongside other spectre mitigation code
As part of the spectre consolidation effort to shift all of the ghosts
into their own proton pack, move all of the horrible SSBD prctl() code
out of its own 'ssbd.c' file.

Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:16 +01:00
Will Deacon
9b0955baa4 arm64: Rename ARM64_SSBD to ARM64_SPECTRE_V4
In a similar manner to the renaming of ARM64_HARDEN_BRANCH_PREDICTOR
to ARM64_SPECTRE_V2, rename ARM64_SSBD to ARM64_SPECTRE_V4. This isn't
_entirely_ accurate, as we also need to take into account the interaction
with SSBS, but that will be taken care of in subsequent patches.

Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:16 +01:00
Will Deacon
532d581583 arm64: Treat SSBS as a non-strict system feature
If all CPUs discovered during boot have SSBS, then spectre-v4 will be
considered to be "mitigated". However, we still allow late CPUs without
SSBS to be onlined, albeit with a "SANITY CHECK" warning. This is
problematic for userspace because it means that the system can quietly
transition to "Vulnerable" at runtime.

Avoid this by treating SSBS as a non-strict system feature: if all of
the CPUs discovered during boot have SSBS, then late arriving secondaries
better have it as well.

Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:16 +01:00
Will Deacon
a8de949893 arm64: Group start_thread() functions together
The is_ttbrX_addr() functions have somehow ended up in the middle of
the start_thread() functions, so move them out of the way to keep the
code readable.

Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:16 +01:00
Marc Zyngier
e1026237f9 KVM: arm64: Set CSV2 for guests on hardware unaffected by Spectre-v2
If the system is not affected by Spectre-v2, then advertise to the KVM
guest that it is not affected, without the need for a safelist in the
guest.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:16 +01:00
Will Deacon
d4647f0a2a arm64: Rewrite Spectre-v2 mitigation code
The Spectre-v2 mitigation code is pretty unwieldy and hard to maintain.
This is largely due to it being written hastily, without much clue as to
how things would pan out, and also because it ends up mixing policy and
state in such a way that it is very difficult to figure out what's going
on.

Rewrite the Spectre-v2 mitigation so that it clearly separates state from
policy and follows a more structured approach to handling the mitigation.

Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:15 +01:00
Will Deacon
455697adef arm64: Introduce separate file for spectre mitigations and reporting
The spectre mitigation code is spread over a few different files, which
makes it both hard to follow, but also hard to remove it should we want
to do that in future.

Introduce a new file for housing the spectre mitigations, and populate
it with the spectre-v1 reporting code to start with.

Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:15 +01:00
Will Deacon
688f1e4b6d arm64: Rename ARM64_HARDEN_BRANCH_PREDICTOR to ARM64_SPECTRE_V2
For better or worse, the world knows about "Spectre" and not about
"Branch predictor hardening". Rename ARM64_HARDEN_BRANCH_PREDICTOR to
ARM64_SPECTRE_V2 as part of moving all of the Spectre mitigations into
their own little corner.

Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:15 +01:00
Will Deacon
b181048f41 KVM: arm64: Simplify install_bp_hardening_cb()
Use is_hyp_mode_available() to detect whether or not we need to patch
the KVM vectors for branch hardening, which avoids the need to take the
vector pointers as parameters.

Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:15 +01:00
Will Deacon
5359a87d5b KVM: arm64: Replace CONFIG_KVM_INDIRECT_VECTORS with CONFIG_RANDOMIZE_BASE
The removal of CONFIG_HARDEN_BRANCH_PREDICTOR means that
CONFIG_KVM_INDIRECT_VECTORS is synonymous with CONFIG_RANDOMIZE_BASE,
so replace it.

Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:15 +01:00
Will Deacon
6e5f092784 arm64: Remove Spectre-related CONFIG_* options
The spectre mitigations are too configurable for their own good, leading
to confusing logic trying to figure out when we should mitigate and when
we shouldn't. Although the plethora of command-line options need to stick
around for backwards compatibility, the default-on CONFIG options that
depend on EXPERT can be dropped, as the mitigations only do anything if
the system is vulnerable, a mitigation is available and the command-line
hasn't disabled it.

Remove CONFIG_HARDEN_BRANCH_PREDICTOR and CONFIG_ARM64_SSBD in favour of
enabling this code unconditionally.

Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:15 +01:00
Marc Zyngier
39533e1206 arm64: Run ARCH_WORKAROUND_2 enabling code on all CPUs
Commit 606f8e7b27bf ("arm64: capabilities: Use linear array for
detection and verification") changed the way we deal with per-CPU errata
by only calling the .matches() callback until one CPU is found to be
affected. At this point, .matches() stop being called, and .cpu_enable()
will be called on all CPUs.

This breaks the ARCH_WORKAROUND_2 handling, as only a single CPU will be
mitigated.

In order to address this, forcefully call the .matches() callback from a
.cpu_enable() callback, which brings us back to the original behaviour.

Fixes: 606f8e7b27bf ("arm64: capabilities: Use linear array for detection and verification")
Cc: <stable@vger.kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-09-29 16:08:03 +01:00
Marc Zyngier
18fce56134 arm64: Run ARCH_WORKAROUND_1 enabling code on all CPUs
Commit 73f381660959 ("arm64: Advertise mitigation of Spectre-v2, or lack
thereof") changed the way we deal with ARCH_WORKAROUND_1, by moving most
of the enabling code to the .matches() callback.

This has the unfortunate effect that the workaround gets only enabled on
the first affected CPU, and no other.

In order to address this, forcefully call the .matches() callback from a
.cpu_enable() callback, which brings us back to the original behaviour.

Fixes: 73f381660959 ("arm64: Advertise mitigation of Spectre-v2, or lack thereof")
Cc: <stable@vger.kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-09-21 18:31:09 +01:00
Marc Zyngier
b11483ef5a arm64: Make use of ARCH_WORKAROUND_1 even when KVM is not enabled
We seem to be pretending that we don't have any firmware mitigation
when KVM is not compiled in, which is not quite expected.

Bring back the mitigation in this case.

Fixes: 4db61fef16a1 ("arm64: kvm: Modernize __smccc_workaround_1_smc_start annotations")
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-09-21 18:31:09 +01:00
Linus Torvalds
f4d51dffc6 Linux 5.9-rc4 v5.9-rc4 2020-09-06 17:11:40 -07:00
Linus Torvalds
a8205e3100 io_uring-5.9-2020-09-06
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl9U/MMQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpg4BEAC6TQ6ctc1yNTTWwiz4UrdIKMAWP7S7wepu
 k00A9+JjLLJBVdkz9rZ2Y/SyGe12qBM+riiRSn/gNTbkd4qq2rCY2d8U1vXXKyP5
 VDXPo12zsUD5WpqdEMfXUB4+DOQs0a3DAyDLT0+K/Qw0rpOQVZA26Ovn6GOh9+Kq
 KJCllynYkwUQ/7CXsdI13ktMI1HADFOx3149SlGkbuPOggwNQrGiLJAxyUOn/i+E
 uiHy8b8o3B2nun61+Y98q+MDISLf0xXYEbHeAsvETEy52ya2iadnMo1lwS5zHzM+
 p6jOBybHaM/wz5t1V44VTBvfog1KAUtp8K0gsxcB6Ezf7LhYfTVjtvfZqpwVz1sl
 txkxhnGEBURHpdr0aCtC15cIpbDGM85ymjP4RD0YlV7oyT0+Ufx7r9jJjLf7IhZO
 FMyEFmVwSPD/NdJE9ZNx0I2v/qdIzYwfeAix4Z4bXoe51BtPrf1uBgsGWVuqNVz/
 dKVf1vK1tPF8PgIiIW/o8GI4iF3RRVQcLwJGiAWMzBS11iniJLf9mUPRVl5Bxpeb
 YRd2chm+ppHC/IgtK44x7Ce6415hnbKehE2KUr43PHB7nIMNWcRsurxLizM7ZGei
 Gv2/K9PM3+4O/b1k20xLakz4Vw3Isk4W6/Flj9LoxheBo+CUG4Rsx5UyzFEB6apA
 uXxiF6ORLg==
 =v5QQ
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.9-2020-09-06' of git://git.kernel.dk/linux-block

Pull more io_uring fixes from Jens Axboe:
 "Two followup fixes. One is fixing a regression from this merge window,
  the other is two commits fixing cancelation of deferred requests.

  Both have gone through full testing, and both spawned a few new
  regression test additions to liburing.

   - Don't play games with const, properly store the output iovec and
     assign it as needed.

   - Deferred request cancelation fix (Pavel)"

* tag 'io_uring-5.9-2020-09-06' of git://git.kernel.dk/linux-block:
  io_uring: fix linked deferred ->files cancellation
  io_uring: fix cancel of deferred reqs with ->files
  io_uring: fix explicit async read/write mapping for large segments
2020-09-06 12:10:27 -07:00
Linus Torvalds
2ccdd9f8b2 IOMMU Fixes for Linux v5.9-rc3
Including:
 
 	- Three Intel VT-d fixes to fix address handling on 32bit, fix a
 	  NULL pointer dereference bug and serialize a hardware register
 	  access as required by the VT-d spec.
 
 	- Two patches for AMD IOMMU to force AMD GPUs into translation mode
 	  when memory encryption is active and disallow using IOMMUv2
 	  functionality. This makes the AMDGPU driver working when
 	  memory encryption is active.
 
 	- Two more fixes for AMD IOMMU to fix updating the Interrupt
 	  Remapping Table Entries.
 
 	- MAINTAINERS file update for the Qualcom IOMMU driver.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl9U+Q4ACgkQK/BELZcB
 GuOkexAA14LsYG8QWd1lwgf3Yh6/xGwhw/vDTHCMCNqFCMUHTLjte6uLZ9QH/Pxy
 fvwyX73ZOjsrKXOwkKvEFD9fCJDyMsPsfH3ssYNgOZfTFculcGdNc6NhLWhoKCc8
 pmM0tVikRKxhAMxWgc9EX4dwu9WRe8vobvZj0cAPFlMAkY2ZY3lDtl8bniX9cygW
 Oo2hmhud1CeLE3ffUWhWgT9T7GfTL1i6KFF8wUqrNFE53+b7pS4LbJaCWuHKjtKu
 MaBC0eE0rG0qvSng6ArdZx22a47o2j57zc0mG6IeHtGe7eUAgwNkRLQmL1RU35xW
 cDniCJXosQ7DJ0Hel12ahk1HlUQFFqMlZVZbNpYP04v8xqHNEsWvWHw2NS6xZg2T
 FozBWdnhw8HzaXpWmg14WQ+e9+rnsK/FndRxBjXP9H9bKQQNKQV8PaiUEQCZin0s
 QJohAgqyjyPLB3a/B9q1jzG5HHI2pK2s2/4ry9XBdijZ7MhADq9pAvAe9k8baU/5
 gEy6pa4PmZEMibbrfUNjIoeYz80hDaHIxIbW/fFgl5n1JJJoz+qsfZlYaddrN6nc
 XFWDg9dDgQjDaYDjK9uG2vJkbMFiqE0kRxtJdWfqZNGgSWxqVIMtf42ZvYlqH+Wb
 Zc9nBEuH27o4NZ//2mXzynwBooBTmOsLRw/kqSPOuBVMki08fqk=
 =2uqs
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

 - three Intel VT-d fixes to fix address handling on 32bit, fix a NULL
   pointer dereference bug and serialize a hardware register access as
   required by the VT-d spec.

 - two patches for AMD IOMMU to force AMD GPUs into translation mode
   when memory encryption is active and disallow using IOMMUv2
   functionality.  This makes the AMDGPU driver work when memory
   encryption is active.

 - two more fixes for AMD IOMMU to fix updating the Interrupt Remapping
   Table Entries.

 - MAINTAINERS file update for the Qualcom IOMMU driver.

* tag 'iommu-fixes-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Handle 36bit addressing for x86-32
  iommu/amd: Do not use IOMMUv2 functionality when SME is active
  iommu/amd: Do not force direct mapping when SME is active
  iommu/amd: Use cmpxchg_double() when updating 128-bit IRTE
  iommu/amd: Restore IRTE.RemapEn bit after programming IRTE
  iommu/vt-d: Fix NULL pointer dereference in dev_iommu_priv_set()
  iommu/vt-d: Serialize IOMMU GCMD register modifications
  MAINTAINERS: Update QUALCOMM IOMMU after Arm SMMU drivers move
2020-09-06 11:58:15 -07:00
Linus Torvalds
015b3155c4 Misc fixes:
- Fix more generic entry code ABI fallout
  - Fix debug register handling bugs
  - Fix vmalloc mappings on 32-bit kernels
  - Fix kprobes instrumentation output on 32-bit kernels
  - Fix over-eager WARN_ON_ONCE() on !SMAP hardware
  - Fix NUMA debugging
  - Fix Clang related crash on !RETPOLINE kernels
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl9UljIRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g1yA//VecoyJOw4jb43LdkeKDGtUjCsPVZlt4w
 fw55nT4taqqbgl9mQjrJQlh8thtk7LvAqcsrEGk/SH+1fp/hDvBG0i3etyI1mPJ2
 t97MCVtD1bz2zyLpOtGN48tgiRxSazr4S9nZPCLTec+c75I3pmJssj44m/eJi/Z2
 hoj/syiO4J0BPa7a1ou++Jeyag6J+PgXdJTOMyjuqi99vqai1aTVKo8GdWMInext
 +fJNYd0ZQRj1FxVdMusDfzxOk7N7b8nAzvd30iJN67R6QwoEazO12K1F4IYQmHSq
 0rhHrwe0lTLtjmYdp/ef14kfzD7DRFN6Nv2gk/zyZsH+tjGflxTZConkFPnfoJEc
 33cNHfigh0V9TSVNDDhHnkRyy6dzCHkYHEf33KFuX3amC236TgrCEL7+oWE2rcNp
 9PJbPGlXCqNb2feNy2de4cY+KiZ2a1N/T4VcdMK6DEdENFh5T03EZgIChQEd0S99
 LNBYHqTWJdQEKfkzfAXlR4Bd2hX1LWLMM6rNcXxInrH7rWDXUCS0X9m3gLZR9DIs
 7/nXoK4OkaJdgH/D2CToDgwMNT5hlIiTGtVtB3H6Qz8eQQ4+fwTyboQDqpeG4Upy
 LfOH2h5Fo33FCgqnrua8IsgUKLwW2yJGdghJpcd9d0qfVUDEJuXGo6xe6SEHdSu/
 VEiQtFUf50U=
 =EhRy
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

 - more generic entry code ABI fallout

 - debug register handling bugfixes

 - fix vmalloc mappings on 32-bit kernels

 - kprobes instrumentation output fix on 32-bit kernels

 - fix over-eager WARN_ON_ONCE() on !SMAP hardware

 - NUMA debugging fix

 - fix Clang related crash on !RETPOLINE kernels

* tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry: Unbreak 32bit fast syscall
  x86/debug: Allow a single level of #DB recursion
  x86/entry: Fix AC assertion
  tracing/kprobes, x86/ptrace: Fix regs argument order for i386
  x86, fakenuma: Fix invalid starting node ID
  x86/mm/32: Bring back vmalloc faulting on x86_32
  x86/cmdline: Disable jump tables for cmdline.c
2020-09-06 10:28:00 -07:00
Linus Torvalds
68beef5710 xen: branch for v5.9-rc4
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCX1Rn1wAKCRCAXGG7T9hj
 vlEjAQC/KGC3wYw5TweWcY48xVzgvued3JLAQ6pcDlOe6osd6AEAzZcZKgL948cx
 oY0T98dxb/U+lUhbIzhpBr/30g8JbAQ=
 =Xcxp
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:
 "A small series for fixing a problem with Xen PVH guests when running
  as backends (e.g. as dom0).

  Mapping other guests' memory is now working via ZONE_DEVICE, thus not
  requiring to abuse the memory hotplug functionality for that purpose"

* tag 'for-linus-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: add helpers to allocate unpopulated memory
  memremap: rename MEMORY_DEVICE_DEVDAX to MEMORY_DEVICE_GENERIC
  xen/balloon: add header guard
2020-09-06 09:59:27 -07:00
Pavel Begunkov
c127a2a1b7 io_uring: fix linked deferred ->files cancellation
While looking for ->files in ->defer_list, consider that requests there
may actually be links.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-05 16:02:42 -06:00
Pavel Begunkov
b7ddce3cbf io_uring: fix cancel of deferred reqs with ->files
While trying to cancel requests with ->files, it also should look for
requests in ->defer_list, otherwise it might end up hanging a thread.

Cancel all requests in ->defer_list up to the last request there with
matching ->files, that's needed to follow drain ordering semantics.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-05 15:59:51 -06:00
Linus Torvalds
dd9fb9bb33 A trivial patch for auxdisplay:
- Replace HTTP links with HTTPS ones (Alexander A. Klimov)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAl9TdpwACgkQGXyLc2ht
 IW0MMxAAkh9gkR1ygJik6Ii9t63jbpG/NG3jJjlxCllh6OZrJ64SJBM7sF80NC1Y
 MAq+j16yPcyvfI1M378hfwD71c6OYuwfH4z1By8WbAVQ0d7S/Qd9UJMhn8f6U/aA
 /KLwvq/+YiCT5328MAZQHmmpsGqKBVnARMAf65fNviiYDwjeSd+9G9RtFF2KD2Q6
 m+OsOqAr7WvhqZ/1lMOuAXxkuee11WbEi/ZBijJl8ZGglld3eQcW1+aMKHsD/W/Q
 QXEi5QhdeLv60tu3HSQu6QyNYSPArDv7nZt1Yrxu05Sb+sryMzXgtuy03q6c0iUm
 zZmFDijcfneWJ5mJda+9301wJOC1NsQnEhJBi0RgjquRw8ydB175rKAzGBIKzaWp
 toQ9cN/LG/dJU/a4o1lrugIdqv7FJHUdvW2fsaL2twy/ly0sV8yv+dM/e3kItkv4
 hzCvGe9RKW2VF2JwhCFDtEkknXkwnaOX+yYB5si1NYSvODvPrln/MADRIB7TONw+
 eI8C9B52JQ+66YDb/Yg3OBhncobufYCOgs30taRv7q3sBd1P394RT598mW8JTRIN
 CAWKYsBfE4aTSL0PZeSfbQrNYxj+BxMrDjg9VQ9lGVzH4lw2vJZ51nSLtxgyuSr6
 fwSC6R3Y3WxKKj5M2E0tAzDF0C0V/kVvDzUetFgTHSJN/Kuht6o=
 =w1IL
 -----END PGP SIGNATURE-----
mergetag object 4e4bb894467cd2a08fc3116407b13e6bc23a7119
 type commit
 tag clang-format-for-linus-v5.9-rc4
 tagger Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> 1599306139 +0200
 
 The usual trivial update:
 
  - Update with the latest for_each macro list (Miguel Ojeda)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAl9TecAACgkQGXyLc2ht
 IW0D6A/9Guy8Q+fQoZ+jVynb7zRklfq8+D7rp4LrI+TXshGE1kUWzpwrPQX+QDcD
 H1wce3TMteZxOJe1KiWF5UwsFD2sGKpI1O8NJLn8Af9oImbIwhoJKxtdT0NOJ0cI
 08bzCDfonBAMn5DA2abDkmuRpkeyRXn1AN2mHcxnZ33lTZttWf/SCyrrynhHP4Co
 AAyeOuJtD5duMFvVtnXU/3aRGUinZia6roVG9wDbWoWYD50H3OeVecVRbnM0Vonf
 DEUbU2yjQRKVHd8P/mhf8tRhiVCnAt6b+c4wwgzgpRcFhu69b0fjJPmZETRrt4Dc
 aQZT4ltY9H0R6Wz82Pt8peZieZEvq8cgX+YgCkpLuHNiv8+/Fce+Nllpaokg8lYb
 rirS+u4t5zX3AaVdKk1riPaYinAOuC5PirGr8JDap4T7kg2Zw+xUSSZgH0sqTK0N
 /UwvoqZK5wOnwotsn16vhQvxOtzbdgh4lkkkkBPbXMUXtN8PfkHiZ/4S5fSDdrTu
 hz2F4nHkZlvhkZYKd7tp8UytnN1XEJQF70Z17LTEnVyVTjSoAA2LfSoJ0NCIwdhe
 A12/H7DB1FlPjcoxdjJka5teoGnlV5JkSAklp6Q+FlC/Roytxq5hEpmm77YShW4a
 /KuaKlohInKCqA6/ZJq3H1jzz+dT14OHekwdSXk5ZsaMkXejEMA=
 =Lcl0
 -----END PGP SIGNATURE-----
mergetag object e5fc436f06eef54ef512ea55a9db8eb9f2e76959
 type commit
 tag compiler-attributes-for-linus-v5.9-rc4
 tagger Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> 1599306822 +0200
 
 Two trivial changes and a fix for sparse:
 
  - sparse: use static inline for __chk_{user,io}_ptr() (Luc Van Oostenryck)
 
  - Compiler Attributes: fix comment concerning GCC 4.6 (Luc Van Oostenryck)
 
  - Compiler Attributes: remove comment about sparse not supporting __has_attribute (Luc Van Oostenryck)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAl9TfFAACgkQGXyLc2ht
 IW0vfQ//dtC2zaqJo3L0RMyLqAWwJbbML1eyy7tT8qPnw635arJXSqyjPliVoRwA
 3O3iQOjnss/vfL5bs9OSFEN5yjALP4wm9YZl2EponWZ8WiIKfC/YPSSH2iM2m2kW
 AK6bG6HPfoSy3Kx04CuxeqGN2F9/hZGLneLfF+y6gdfeAVBRTf1o+SvuCV/phLyW
 ek2UQK00+00BOVa62ZLZzqCKCHjTBSXtjDrrwg39g4MJ+gsGcUDoF13InRdgSWTY
 rgUD09OqOTjwaB2GeTiV7mXNn0yydyH0IoXz1EsfqRJV5mik3LMEpo/RWTBX95K1
 82N2mrL4ejOPX8gj+iKvcDjC8b1Fiwnb4pw6SsAOTJ5KG/euMVV2ZqAIS4SVVvSx
 VcGdErNcDBFrpM2py6rM3UhHBOafdlOYAtLKb596hMh4H8Kjt3LFIZAuyLVAJQfz
 q3xuBhkKSPw/j3ZVJ4iBFJSMIkxKRLUkOtF4dxAUknHOuYktnsIH0T/NcN0sHUqY
 c/3bcJFG6jQhzrA3DLlwcKWcgeihEjx5JpJFreN97heSEWsn7G1GAMMWUJ1y4KqJ
 kheczTljl5RhxUqUH2WBMj4MgdmUhwtXqW6Ht9HqP6nKDDotUjinjrSGOP5alej7
 3JYNz11c2pWwQNIzItHspmSnaiSUXbcn7WJoRTrO09kcPfsMw7E=
 =RrSY
 -----END PGP SIGNATURE-----

Merge tags 'auxdisplay-for-linus-v5.9-rc4', 'clang-format-for-linus-v5.9-rc4' and 'compiler-attributes-for-linus-v5.9-rc4' of git://github.com/ojeda/linux

Pull misc fixes from Miguel Ojeda:
 "A trivial patch for auxdisplay:

   - Replace HTTP links with HTTPS ones (Alexander A. Klimov)

  The usual clang-format trivial update:

   - Update with the latest for_each macro list (Miguel Ojeda)

  And Luc requested me to pick a sparse fix on my queue, so here it goes
  along with other two trivial Compiler Attributes ones (also from Luc).

   - sparse: use static inline for __chk_{user,io}_ptr() (Luc Van
     Oostenryck)

   - Compiler Attributes: fix comment concerning GCC 4.6 (Luc Van
     Oostenryck)

   - Compiler Attributes: remove comment about sparse not supporting
     __has_attribute (Luc Van Oostenryck)"

* tag 'auxdisplay-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
  auxdisplay: Replace HTTP links with HTTPS ones

* tag 'clang-format-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
  clang-format: Update with the latest for_each macro list

* tag 'compiler-attributes-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
  sparse: use static inline for __chk_{user,io}_ptr()
  Compiler Attributes: fix comment concerning GCC 4.6
  Compiler Attributes: remove comment about sparse not supporting __has_attribute
2020-09-05 14:22:46 -07:00
Linus Torvalds
70187f7727 ARC fixes for 5.9-rc4
- HSDK-4xd Dev system: perf driver updates for sampling interrupt
 
  - HSDK* Dev System : Ethernet broken	[Evgeniy Didin]
 
  - HIGHMEM broken (2 memory banks)	[Mike Rapoport]
 
  - show_regs() rewrite once and for all
 
  - Other minor fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEOXpuCuR6hedrdLCJadfx3eKKwl4FAl9S5E4ACgkQadfx3eKK
 wl7y5xAAyExBJlAWH8tyJjKYnRnKUwP5xZxzSLt9k5sUF8+aVMBpSr6PCF9kf1ZP
 HTFJ3T9NLFOh4cI7Lk9QFkLXbOvuvjgkfqlSmbHjnlTYcuWCqCQfU93bLMhRvrvi
 oRnWzEXLXwRuYoYz4TVLzxw9SuQr/EtCvR+a4rAyMpUlg+9pWHTokpiv69nDeYSl
 t3ckCVkdKbqPsUW7fXclyzo5qtsdGgLhYh6pb8ZE/x/Qdk+IjaN4WEmjC5DCy10B
 VTKEKdK7VKfRGuhIJbPq/UDs0TwMMxic+5sCdQtM54vrQPuVTmpsUopOHQ4/2Y5D
 IVWsZ3Uy/jnZxS32FP4d4r4Rv+BT9r1c7o148glT9iI0poOZ13BZHNZbLuSxYi4l
 1UQ2r/o+E+XKOWzAOc1WIgEJGdxcDu9seJNSSmxs1kBveOIg8S8JjoDyS6ZyTWD+
 fbOttxNe3kqGWCdKkgtUGqrg2w0qr3kzJd5aF4ju4fE+KMSVQRylRSWBpM1hcovP
 rxhcVvxyd1xSS3krrXjBTsjgTlkPDAaLyku/G5VuwpG91LyTiWfSlg+IOfLO+zFm
 LAwMTlV0gXfV2GH/AlEDJXtvT7GPhXN9sqmqqqOpUmOcKbPE30RmUkUxTJJbGfPN
 s7ioUV8CJVzRAkxyo79IDQQjT20p13cbJrIeCfgaSKWBf1M64BU=
 =vOZp
 -----END PGP SIGNATURE-----

Merge tag 'arc-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:

 - HSDK-4xd Dev system: perf driver updates for sampling interrupt

 - HSDK* Dev System: Ethernet broken [Evgeniy Didin]

 - HIGHMEM broken (2 memory banks) [Mike Rapoport]

 - show_regs() rewrite once and for all

 - Other minor fixes

* tag 'arc-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: [plat-hsdk]: Switch ethernet phy-mode to rgmii-id
  arc: fix memory initialization for systems with two memory banks
  irqchip/eznps: Fix build error for !ARC700 builds
  ARC: show_regs: fix r12 printing and simplify
  ARC: HSDK: wireup perf irq
  ARC: perf: don't bail setup if pct irq missing in device-tree
  ARC: pgalloc.h: delete a duplicated word + other fixes
2020-09-05 13:46:14 -07:00
Linus Torvalds
7514c0362f Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "19 patches.

  Subsystems affected by this patch series: MAINTAINERS, ipc, fork,
  checkpatch, lib, and mm (memcg, slub, pagemap, madvise, migration,
  hugetlb)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  include/linux/log2.h: add missing () around n in roundup_pow_of_two()
  mm/khugepaged.c: fix khugepaged's request size in collapse_file
  mm/hugetlb: fix a race between hugetlb sysctl handlers
  mm/hugetlb: try preferred node first when alloc gigantic page from cma
  mm/migrate: preserve soft dirty in remove_migration_pte()
  mm/migrate: remove unnecessary is_zone_device_page() check
  mm/rmap: fixup copying of soft dirty and uffd ptes
  mm/migrate: fixup setting UFFD_WP flag
  mm: madvise: fix vma user-after-free
  checkpatch: fix the usage of capture group ( ... )
  fork: adjust sysctl_max_threads definition to match prototype
  ipc: adjust proc_ipc_sem_dointvec definition to match prototype
  mm: track page table modifications in __apply_to_page_range()
  MAINTAINERS: IA64: mark Status as Odd Fixes only
  MAINTAINERS: add LLVM maintainers
  MAINTAINERS: update Cavium/Marvell entries
  mm: slub: fix conversion of freelist_corrupted()
  mm: memcg: fix memcg reclaim soft lockup
  memcg: fix use-after-free in uncharge_batch
2020-09-05 13:28:40 -07:00
Jason Gunthorpe
428fc0aff4 include/linux/log2.h: add missing () around n in roundup_pow_of_two()
Otherwise gcc generates warnings if the expression is complicated.

Fixes: 312a0c170945 ("[PATCH] LOG2: Alter roundup_pow_of_two() so that it can use a ilog2() on a constant")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/0-v1-8a2697e3c003+41165-log_brackets_jgg@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-05 12:14:30 -07:00
David Howells
e5a59d308f mm/khugepaged.c: fix khugepaged's request size in collapse_file
collapse_file() in khugepaged passes PAGE_SIZE as the number of pages to
be read to page_cache_sync_readahead().  The intent was probably to read
a single page.  Fix it to use the number of pages to the end of the
window instead.

Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Yang Shi <shy828301@gmail.com>
Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Eric Biggers <ebiggers@google.com>
Link: https://lkml.kernel.org/r/20200903140844.14194-2-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-05 12:14:30 -07:00
Muchun Song
17743798d8 mm/hugetlb: fix a race between hugetlb sysctl handlers
There is a race between the assignment of `table->data` and write value
to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
the other thread.

  CPU0:                                 CPU1:
                                        proc_sys_write
  hugetlb_sysctl_handler                  proc_sys_call_handler
  hugetlb_sysctl_handler_common             hugetlb_sysctl_handler
    table->data = &tmp;                       hugetlb_sysctl_handler_common
                                                table->data = &tmp;
      proc_doulongvec_minmax
        do_proc_doulongvec_minmax           sysctl_head_finish
          __do_proc_doulongvec_minmax         unuse_table
            i = table->data;
            *i = val;  // corrupt CPU1's stack

Fix this by duplicating the `table`, and only update the duplicate of
it.  And introduce a helper of proc_hugetlb_doulongvec_minmax() to
simplify the code.

The following oops was seen:

    BUG: kernel NULL pointer dereference, address: 0000000000000000
    #PF: supervisor instruction fetch in kernel mode
    #PF: error_code(0x0010) - not-present page
    Code: Bad RIP value.
    ...
    Call Trace:
     ? set_max_huge_pages+0x3da/0x4f0
     ? alloc_pool_huge_page+0x150/0x150
     ? proc_doulongvec_minmax+0x46/0x60
     ? hugetlb_sysctl_handler_common+0x1c7/0x200
     ? nr_hugepages_store+0x20/0x20
     ? copy_fd_bitmaps+0x170/0x170
     ? hugetlb_sysctl_handler+0x1e/0x20
     ? proc_sys_call_handler+0x2f1/0x300
     ? unregister_sysctl_table+0xb0/0xb0
     ? __fd_install+0x78/0x100
     ? proc_sys_write+0x14/0x20
     ? __vfs_write+0x4d/0x90
     ? vfs_write+0xef/0x240
     ? ksys_write+0xc0/0x160
     ? __ia32_sys_read+0x50/0x50
     ? __close_fd+0x129/0x150
     ? __x64_sys_write+0x43/0x50
     ? do_syscall_64+0x6c/0x200
     ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: e5ff215941d5 ("hugetlb: multiple hstates for multiple page sizes")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20200828031146.43035-1-songmuchun@bytedance.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-05 12:14:30 -07:00
Li Xinhai
953f064aa6 mm/hugetlb: try preferred node first when alloc gigantic page from cma
Since commit cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic
hugepages using cma"), the gigantic page would be allocated from node
which is not the preferred node, although there are pages available from
that node.  The reason is that the nid parameter has been ignored in
alloc_gigantic_page().

Besides, the __GFP_THISNODE also need be checked if user required to
alloc only from the preferred node.

After this patch, the preferred node is tried first before other allowed
nodes, and don't try to allocate from other nodes if __GFP_THISNODE is
specified.  If user don't specify the preferred node, the current node
will be used as preferred node, which makes sure consistent behavior of
allocating gigantic and non-gigantic hugetlb page.

Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Link: https://lkml.kernel.org/r/20200902025016.697260-1-lixinhai.lxh@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-05 12:14:30 -07:00
Ralph Campbell
3d321bf82c mm/migrate: preserve soft dirty in remove_migration_pte()
The code to remove a migration PTE and replace it with a device private
PTE was not copying the soft dirty bit from the migration entry.  This
could lead to page contents not being marked dirty when faulting the page
back from device private memory.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Link: https://lkml.kernel.org/r/20200831212222.22409-3-rcampbell@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-05 12:14:30 -07:00
Ralph Campbell
6128763fc3 mm/migrate: remove unnecessary is_zone_device_page() check
Patch series "mm/migrate: preserve soft dirty in remove_migration_pte()".

I happened to notice this from code inspection after seeing Alistair
Popple's patch ("mm/rmap: Fixup copying of soft dirty and uffd ptes").

This patch (of 2):

The check for is_zone_device_page() and is_device_private_page() is
unnecessary since the latter is sufficient to determine if the page is a
device private page.  Simplify the code for easier reading.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Link: https://lkml.kernel.org/r/20200831212222.22409-1-rcampbell@nvidia.com
Link: https://lkml.kernel.org/r/20200831212222.22409-2-rcampbell@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-05 12:14:30 -07:00
Alistair Popple
ad7df764b7 mm/rmap: fixup copying of soft dirty and uffd ptes
During memory migration a pte is temporarily replaced with a migration
swap pte.  Some pte bits from the existing mapping such as the soft-dirty
and uffd write-protect bits are preserved by copying these to the
temporary migration swap pte.

However these bits are not stored at the same location for swap and
non-swap ptes.  Therefore testing these bits requires using the
appropriate helper function for the given pte type.

Unfortunately several code locations were found where the wrong helper
function is being used to test soft_dirty and uffd_wp bits which leads to
them getting incorrectly set or cleared during page-migration.

Fix these by using the correct tests based on pte type.

Fixes: a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages")
Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Alistair Popple <alistair@popple.id.au>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200825064232.10023-2-alistair@popple.id.au
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-05 12:14:30 -07:00