IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
* kvm-arm64/nv-timer-improvements:
: Timer emulation improvements, courtesy of Marc Zyngier.
:
: - Avoid re-arming an hrtimer for a guest timer that is already pending
:
: - Only reload the affected timer context when emulating a sysreg access
: instead of both the virtual/physical timers.
KVM: arm64: timers: Don't BUG() on unhandled timer trap
KVM: arm64: Reduce overhead of trapped timer sysreg accesses
KVM: arm64: Don't arm a hrtimer for an already pending timer
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/MAINTAINERS:
: KVM/arm64 MAINTAINERS updates
:
: - Drop the old columbia.edu mailing list (you will be missed!)
:
: - Move Oliver up to co-maintainer w/ Marc
KVM: arm64: Drop Columbia-hosted mailing list
MAINTAINERS: Add Oliver Upton as co-maintainer of KVM/arm64
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/parallel-access-faults:
: Parallel stage-2 access fault handling
:
: The parallel faults changes that went in to 6.2 covered most stage-2
: aborts, with the exception of stage-2 access faults. Building on top of
: the new infrastructure, this series adds support for handling access
: faults (i.e. updating the access flag) in parallel.
:
: This is expected to provide a performance uplift for cores that do not
: implement FEAT_HAFDBS, such as those from the fruit company.
KVM: arm64: Condition HW AF updates on config option
KVM: arm64: Handle access faults behind the read lock
KVM: arm64: Don't serialize if the access flag isn't set
KVM: arm64: Return EAGAIN for invalid PTE in attr walker
KVM: arm64: Ignore EAGAIN for walks outside of a fault
KVM: arm64: Use KVM's pte type/helpers in handle_access_fault()
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/virtual-cache-geometry:
: Virtualized cache geometry for KVM guests, courtesy of Akihiko Odaki.
:
: KVM/arm64 has always exposed the host cache geometry directly to the
: guest, even though non-secure software should never perform CMOs by
: Set/Way. This was slightly wrong, as the cache geometry was derived from
: the PE on which the vCPU thread was running and not a sanitized value.
:
: All together this leads to issues migrating VMs on heterogeneous
: systems, as the cache geometry saved/restored could be inconsistent.
:
: KVM/arm64 now presents 1 level of cache with 1 set and 1 way. The cache
: geometry is entirely controlled by userspace, such that migrations from
: older kernels continue to work.
KVM: arm64: Mark some VM-scoped allocations as __GFP_ACCOUNT
KVM: arm64: Normalize cache configuration
KVM: arm64: Mask FEAT_CCIDX
KVM: arm64: Always set HCR_TID2
arm64/cache: Move CLIDR macro definitions
arm64/sysreg: Add CCSIDR2_EL1
arm64/sysreg: Convert CCSIDR_EL1 to automatic generation
arm64: Allow the definition of UNKNOWN system register fields
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Generally speaking, any memory allocations that can be associated with a
particular VM should be charged to the cgroup of its process.
Nonetheless, there are a couple spots in KVM/arm64 that aren't currently
accounted:
- the ccsidr array containing the virtualized cache hierarchy
- the cpumask of supported cpus, for use of the vPMU on heterogeneous
systems
Go ahead and set __GFP_ACCOUNT for these allocations.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Link: https://lore.kernel.org/r/20230206235229.4174711-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
When checking for ID_AA64SMFR0_EL1.SMEver, __check_override assumes
that the ID_AA64SMFR0_EL1 value is in x1, and the intent of the code
is to reuse value read a few lines above.
However, as the comment says at the beginning of the macro, x1 will
be clobbered, and the checks always fails.
The easiest fix is just to reload the id register before checking it.
Fixes: f122576f3533 ("arm64/sme: Enable host kernel to access ZT0")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The newly added zt-test program copied the pattern from the other FP
stress test programs of having a redundant _start label which is
rejected by clang, as we did in a parallel series for the other tests
remove the label so we can build with clang.
No functional change.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230130-arm64-fix-sme2-clang-v1-1-3ce81d99ea8f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Although not handling a trap is a pretty bad situation to be in,
panicing the kernel isn't useful and provides no valuable
information to help debugging the situation.
Instead, dump the encoding of the unhandled sysreg, and inject
an UNDEF in the guest. At least, this gives a user an opportunity
to report the issue with some information to help debugging it.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230112123829.458912-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Each read/write to a trapped timer system register results
in a whole kvm_timer_vcpu_put/load() cycle which affects all
of the timers, and a bit more.
There is no need for such a thing, and we can limit the impact
to the timer being affected, and only this one.
This drastically simplifies the emulated case, and limits the
damage for trapped accesses. This also brings some performance
back for NV.
Whilst we're at it, fix a comment that didn't quite capture why
we always set CNTVOFF_EL2 to 0 when disabling the virtual timer.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230112123829.458912-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
When fully emulating a timer, we back it with a hrtimer that is
armver on vcpu_load(). However, we do this even if the timer is
already pending.
This causes spurious interrupts to be taken, though the guest
doesn't observe them (the interrupt is already pending).
Although this is a waste of precious cycles, this isn't the
end of the world with the current state of KVM. However, this
can lead to a situation where a guest doesn't make forward
progress anymore with NV.
Fix it by checking that if the timer is already pending
before arming a new hrtimer. Also drop the hrtimer cancelling,
which is useless, by construction.
Reported-by: D Scott Phillips <scott@os.amperecomputing.com>
Fixes: bee038a67487 ("KVM: arm/arm64: Rework the timer code to use a timer_map")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230112123829.458912-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
After many years of awesome service, the kvmarm mailing list hosted by
Columbia is being decommissioned, and replaced by kvmarm@lists.linux.dev.
Many thanks to Columbia for having hosted us for so long, and to the
kernel.org folks for giving us a new home.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230113132809.1979119-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Going forward I intend to help Marc with maintaining KVM/arm64. We've
spoken about this quite a bit and he has been a tremendous help in
ramping up to the task (thank you!). We haven't worked out the exact
details of how the process will work, but the goal is to even out the
maintenance responsibilities to give us both ample time for development.
To that end, updating the maintainers entry to reflect the change.
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230123210256.2728218-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Before this change, the cache configuration of the physical CPU was
exposed to vcpus. This is problematic because the cache configuration a
vcpu sees varies when it migrates between vcpus with different cache
configurations.
Fabricate cache configuration from the sanitized value, which holds the
CTR_EL0 value the userspace sees regardless of which physical CPU it
resides on.
CLIDR_EL1 and CCSIDR_EL1 are now writable from the userspace so that
the VMM can restore the values saved with the old kernel.
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Link: https://lore.kernel.org/r/20230112023852.42012-8-akihiko.odaki@daynix.com
[ Oliver: Squash Marc's fix for CCSIDR_EL1.LineSize when set from userspace ]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
We should have a ZT register frame with an expected size when ZA is enabled
and have no ZT frame when ZA is disabled. Since we don't load any data into
ZT we expect the data to all be zeros since the architecture guarantees it
will be set to 0 as ZA is enabled.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-18-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Following the pattern for the other register sets add a stress test program
for ZT0 which continually loads and verifies patterns in the register in
an effort to discover context switching problems.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-14-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Implement support for a new note type NT_ARM64_ZT providing access to
ZT0 when implemented. Since ZT0 is a register with constant size this is
much simpler than for other SME state.
As ZT0 is only accessible when PSTATE.ZA is set writes to ZT0 cause
PSTATE.ZA to be set, the main alternative would be to return -EBUSY in
this case but this seemed more constructive. Practical users are also
going to be working with ZA anyway and have some understanding of the
state.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-12-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add a new signal context type for ZT which is present in the signal frame
when ZA is enabled and ZT is supported by the system. In order to account
for the possible addition of further ZT registers in the future we make the
number of registers variable in the ABI, though currently the only possible
number is 1. We could just use a bare list head for the context since the
number of registers can be inferred from the size of the context but for
usability and future extensibility we define a header with the number of
registers and some reserved fields in it.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-11-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When the system supports SME2 the ZT0 register must be context switched as
part of the floating point state. This register is stored immediately
after ZA in memory and is only accessible when PSTATE.ZA is set so we
handle it in the same functions we use to save and restore ZA.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-10-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When the system supports SME2 there is an additional register ZT0 which
we must store when the task is using SME. Since ZT0 is accessible only
when PSTATE.ZA is set just like ZA we allocate storage for it along with
ZA, increasing the allocation size for the memory region where we store
ZA and storing the data for ZT after that for ZA.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-9-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The new register ZT0 introduced by SME2 comes with a new trap, disable it
for the host kernel so that we can implement support for it.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-7-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order to avoid unrealistic toolchain requirements we manually encode the
instructions for loading and storing ZT0.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-6-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As well as a number of simple features which only add new instructions and
require corresponding hwcaps SME2 introduces a new register ZT0 for which
we must define ABI. Fortunately this is a fixed size 512 bits and therefore
much more straightforward than the base SME state, the only wrinkle is that
it is only accessible when ZA is accessible.
While there is only a single register the architecture is written with a
view to exensibility, including a number in the name, so follow this in the
ABI.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-4-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
FEAT_SME2 and FEAT_SME2P1 introduce several new SME features which can
be enumerated via ID_AA64SMFR0_EL1 and a new register ZT0 access to
which is controlled via SMCR_ELn, add the relevant register description.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-3-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
SME 2 introduces the new ZT0 register, we require that access to this
reigster is not trapped when we identify that the feature is supported.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-2-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In preparation for adding support for storage for ZT0 to the thread_struct
rename za_state to sme_state. Since ZT0 is accessible when PSTATE.ZA is
set just like ZA itself we will extend the allocation done for ZA to
cover it, avoiding the need to further expand task_struct for non-SME
tasks.
No functional changes.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-1-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
- Fixes for two resctrl races when moving a task or creating a new monitoring
group
- Fix SEV-SNP guests running under HyperV where MTRRs are disabled to not return
a UC- type mapping type on memremap() and thus cause a serious slowdown
- Fix insn mnemonics in bioscall.S now that binutils is starting to fix
confusing insn suffixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPD5xsACgkQEsHwGGHe
VUr65g/8CkfKQKIQ/kPn1B+M/PI4S8DBmz7CdufQTbB66GSfDwRpGnxIKJKZM1UG
pyOmP1kHVXGGCsvFQimalxnBtFx6t3wFS+R+c/l5pRtgU63bwQpNjJ+vBcBpb4xs
J7f06VgF6jB8qk0NqXBKNJt4kauZA50wiErYm8PU/yf0tmHratjrvDfKQmus9pI4
AMcQEudRhDDo8xqn8fvIC/S7xm9/TgNzKxYP+HciPa74HZm5vrjS0hIyZkIsh5d7
Q4MsP4VaBH12W6MbpF9TBw5VTDomwY8oS4xTNfkhyOTcHi8uLrMjA66VmaJwWXpe
EZjOk4+KhtxYigaI5oQO9M5e9IINK6ZfbzU65P74wTvP3orszsBPG7QvodIzLc76
YI1GEea/bXgxYgPuMR6spZoGMS58K7BXgViVMVRwZ9bZk17+5pfbLkzKqZsfw/s/
nj3n8z4ayoc+ffDPllh0471Dn16ugKMf+EvW2Su1Q1QoaA7icNNkZrKaM6eSuoFr
bClsrHeglQFadwy4kmY0fi7BUfiKTp+c5Ur7lM7VojrjH0XcoZreThVg0tNrk3ZR
IAyBjtCdtg1rzrRfb7qBf6B+WNGdxYWGuDgOPiH1Xh+/plvyCqQ2/3AGWQAbu/qP
to+IYmZg09mRpVBDYpCY4z6IzRbMJ9rRho4rNXCQUeAzSVMCrZE=
=W2zR
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Make sure the poking PGD is pinned for Xen PV as it requires it this
way
- Fixes for two resctrl races when moving a task or creating a new
monitoring group
- Fix SEV-SNP guests running under HyperV where MTRRs are disabled to
not return a UC- type mapping type on memremap() and thus cause a
serious slowdown
- Fix insn mnemonics in bioscall.S now that binutils is starting to fix
confusing insn suffixes
* tag 'x86_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: fix poking_init() for Xen PV guests
x86/resctrl: Fix event counts regression in reused RMIDs
x86/resctrl: Fix task CLOSID/RMID update race
x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case
x86/boot: Avoid using Intel mnemonics in AT&T syntax asm
- Fix a memory leak in highbank's probing function
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPD41kACgkQEsHwGGHe
VUqYOQ//bsKMgAbPDbvgSm8OkFvtLd+r38RE+dcg84QrBaeQ7z2x269i1bUhqkQV
baatecdz+VoNQK3zmrIRshwgCGuwDEpRWAuPKJxoCfSU1eNzEipxHCoctNcp2q+Y
7nP65JtkQ6y1nEnBYh5h1snwbM3yuLIvmmuhYhPpd204k00L16Tzsu1uGiz7nDsZ
9ohBtS6laHztJnVd9gKIHOcgBCylPjNYpRecsgZk2COK/O9uTo8drQ46gKxwHQ4l
P4RsvDligoaFO8rRLAzg2sfsOt9O2runkdWhYnVKSLdXOOA+wQ+eacOuzWs5dNpQ
BQfWZB3GwxQ3+G1c2WOVrq/15YNrxpqMA9/060vPzGqW9VDCnOnoS3w5nHCDw5Ep
YB+3OVn31Z4Hw+nP/WCR/0QYKDSBmqvGd4wKmHcUnL22rRZSl6UZTSqnBw3yrhV9
YzmZ+r3B9b7AtvDs3qWMjQwFYnxmeU7DSXsxjbUpDcMDuy/Y8/P9CSzE2ruC26L1
TdK/Vy6igoDCThzv0RMFEUfmpojIYoTtGNuyMrA90ZNKwXKVOcf9mM3TQm4vLH7m
P3Q9UvBlApoaKbXSZaU0cV+klnba2prAnu4LTVeBWoyjPlmFZkv9ZxDHId8n2rie
dTe/KZJRlScfwVtp83xh428ixdYWNs03R6P5tQAMKoZNLnixqXQ=
=TiMH
-----END PGP SIGNATURE-----
Merge tag 'edac_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
- Fix the EDAC device's confusion in the polling setting units
- Fix a memory leak in highbank's probing function
* tag 'edac_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/highbank: Fix memory leak in highbank_mc_probe()
EDAC/device: Fix period calculation in edac_device_reset_delay_period()
- Fix a build failure with some versions of ld that have an odd version string.
- Fix incorrect use of mutex in the IMC PMU driver.
Thanks to: Kajol Jain, Michael Petlan, Ojaswin Mujoo, Peter Zijlstra, Yang Yingliang.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmPD2ZoTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgIJTD/9HeqFTpviUPd0QCW2HWVv2jwLA97Qy
BGsMVz6eSk+f3YBFDFJ0n5KqYR4Bry5mgwPm1nQe86cKr+ZCdtnF+10vPL0TEdK5
M4a+u9PA+xqq9ukIN96ZP7QGi80YY0RRIrfkOwK6iQJVLSxT+TUcl1ko/iPalbI7
/5c2BCnkIoDbwU1ux3/6+uJCDFE3oLy5v1nt/mbzuekFTRvPGRCt+DWBywrYKYi8
XfBBVTz6F+PoBZ7vZa6Kv5IQANiuU4COTjpm9AjVvjp0oKfYskBmtZvHQhr3v6v4
HZsms49w0r3D7sZgEB2hmKFM2/QijSDeyBxnmt/hHeNYMMiFg+0lhoxoNoykzcN9
UfFU7NID3uqruYMkhAxiCIvyul9Vzcr+pe3GgooY+AtuokhuMUEUXJjEDIyXtoci
2VnEsdbl0/gihdHedfhLRXlEn8xz6fQvxDcpYZClSIDeS/nL7cuPd/9+JHn1hilq
aJ0MX6VUKnwkSBb9Gkd1bt09jS6lqDUQS5+88IMvoJo53xHVlHF+5kqXAJRkpt1w
XsDMBuKqT/aEC+rI5GyHXNglGuBYqMvEmbdEGtIVFCbUkcI35Xe8RXKmqDbO2U7N
qqfYj53dgaptJF/sllcGuCTj6JrdJvKEVnuzsIwoE+XyXzr1e9UBmyZ+I//VjTjL
Mhy9rXp7tJYnkw==
=BaFo
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix a build failure with some versions of ld that have an odd version
string
- Fix incorrect use of mutex in the IMC PMU driver
Thanks to Kajol Jain, Michael Petlan, Ojaswin Mujoo, Peter Zijlstra, and
Yang Yingliang.
* tag 'powerpc-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s/hash: Make stress_hpt_timer_fn() static
powerpc/imc-pmu: Fix use of mutex in IRQs disabled section
powerpc/boot: Fix incorrect version calculation issue in ld_version
If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
only releases pages to the buddy allocator if they are not in the
deferred range. This is correct for free pages (as defined by
for_each_free_mem_pfn_range_in_zone()) because free pages in the
deferred range will be initialized and released as part of the deferred
init process. memblock_free_pages() is called by memblock_free_late(),
which is used to free reserved ranges after memblock_free_all() has
run. All pages in reserved ranges have been initialized at that point,
and accordingly, those pages are not touched by the deferred init
process. This means that currently, if the pages that
memblock_free_late() intends to release are in the deferred range, they
will never be released to the buddy allocator. They will forever be
reserved.
In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
which is also correct for free pages but is not correct for reserved
pages. KMSAN metadata for reserved pages is initialized by
kmsan_init_shadow(), which runs shortly before memblock_free_all().
For both of these reasons, memblock_free_pages() should only be called
for free pages, and memblock_free_late() should call __free_pages_core()
directly instead.
One case where this issue can occur in the wild is EFI boot on
x86_64. The x86 EFI code reserves all EFI boot services memory ranges
via memblock_reserve() and frees them later via memblock_free_late()
(efi_reserve_boot_services() and efi_free_boot_services(),
respectively). If any of those ranges happens to fall within the
deferred init range, the pages will not be released and that memory will
be unavailable.
For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI:
v6.2-rc2:
Node 0, zone DMA
spanned 4095
present 3999
managed 3840
Node 0, zone DMA32
spanned 246652
present 245868
managed 178867
v6.2-rc2 + patch:
Node 0, zone DMA
spanned 4095
present 3999
managed 3840
Node 0, zone DMA32
spanned 246652
present 245868
managed 222816 # +43,949 pages
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmPCrI8QHHJwcHRAa2Vy
bmVsLm9yZwAKCRA5A4Ymyw79kT1lB/wPbLpePLzZfDGyV/NR9gi4FuJiaRfhlklV
rbxnJce050GERbSQoF/r4zrxn2pzvIWGMh1xWZBGi/q8mT2rOIYtVqUahY9YuL/Z
7+xqdCOALIxEj+cXqYocqp8/NFgUWLGuMoomc9lWvEkUs+zOvkD8Z/bRecfPYvOa
BftPALmtXgx46Ecce0gZvvh4YULpVLNdDPPiwZTabV+47Cl8+cJ0Y+iEHsUfOesU
hQG0unWJH77O3IU4QxiirLekLP/6a5O5f0W7u3PZmNNv7N+UdwE+De+QF0aamfgA
LZDO1qOakflegFZvK0JchCzS4hc6dtRKqIvNM3cCBMXLvV4REHKP
=geNh
-----END PGP SIGNATURE-----
Merge tag 'fixes-2023-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock fix from Mike Rapoport:
"memblock: always release pages to the buddy allocator in
memblock_free_late()
If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
only releases pages to the buddy allocator if they are not in the
deferred range. This is correct for free pages (as defined by
for_each_free_mem_pfn_range_in_zone()) because free pages in the
deferred range will be initialized and released as part of the
deferred init process.
memblock_free_pages() is called by memblock_free_late(), which is used
to free reserved ranges after memblock_free_all() has run. All pages
in reserved ranges have been initialized at that point, and
accordingly, those pages are not touched by the deferred init process.
This means that currently, if the pages that memblock_free_late()
intends to release are in the deferred range, they will never be
released to the buddy allocator. They will forever be reserved.
In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
which is also correct for free pages but is not correct for reserved
pages. KMSAN metadata for reserved pages is initialized by
kmsan_init_shadow(), which runs shortly before memblock_free_all().
For both of these reasons, memblock_free_pages() should only be called
for free pages, and memblock_free_late() should call
__free_pages_core() directly instead.
One case where this issue can occur in the wild is EFI boot on x86_64.
The x86 EFI code reserves all EFI boot services memory ranges via
memblock_reserve() and frees them later via memblock_free_late()
(efi_reserve_boot_services() and efi_free_boot_services(),
respectively).
If any of those ranges happens to fall within the deferred init range,
the pages will not be released and that memory will be unavailable.
For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI:
v6.2-rc2:
Node 0, zone DMA
spanned 4095
present 3999
managed 3840
Node 0, zone DMA32
spanned 246652
present 245868
managed 178867
v6.2-rc2 + patch:
Node 0, zone DMA
spanned 4095
present 3999
managed 3840
Node 0, zone DMA32
spanned 246652
present 245868
managed 222816 # +43,949 pages"
* tag 'fixes-2023-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
mm: Always release pages to the buddy allocator in memblock_free_late().
Just one fix for modules by Nick.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmPB5S4SHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoin0OUQALx+uch7cZvl3aznH4CAkF1U2mKuESLT
unQsnSlxIF0BETTAdPbhcCVEtKi102Xq4R6ffCSilB33b0IGM6qY6ZUQb/GMGhg5
qIur7EDiFksjCBmBuE1iNaAknkiKeirkJTEC4lEOl8OLXprmnuMeeR+E0BDi5sH3
YIxexLuhyKeF9Ke4bjAJ7A4WKvh2R7yamISCUcP6GL3w7U9urSjo9FHKcUmFH9nr
qCX/1zE0fz7iJzb9YtBNVhdgKNOYNOxa7TDNXQCVuLcZQfmQqDMBVgKdOkj6TAWX
6L2CHT4N9IjPt9+DrlEUf6bSSwP4N4aFdyMAo6UlVXcgvEbTS/kdoMSqFOAQSAdb
G/lzsvS3fD76VcCdZkwAFYnEhUTJ4xWTS0oaI++tu0EFX5lvRuQv2DRt0WULlD/u
L0paUwmjtVajcSIATxRZkjoMiVD4btDRz30kaIUU/xoc1Gg/EADrSLHESaZ9eZVL
EJ40aqLLIRBXGZrVEzvf97HIzuQiKfaPzywNvbMpxG3m0tV2pn3Z4ts/A8aO7c+O
mBDnTURiZN6pT+xsnJBvqWrlXwPRUGwI+NjRcdPZhUyfgj5MHpEI8PAcUWy6TTUn
H2P6x2iC3/nypqhnwjoixSptjaUWcak3R6UgwVS2YqfjePCqaq0wg9DAFDQH0Yx3
awAOoum0Ubin
=aa3U
-----END PGP SIGNATURE-----
Merge tag 'modules-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull module fix from Luis Chamberlain:
"Just one fix for modules by Nick"
* tag 'modules-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
kallsyms: Fix scheduling with interrupts disabled in self-test
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmPB/t0ACgkQiiy9cAdy
T1Gvbwv+LIXF5dHNGHDuezecbD8T9sRF2v15Mh7i6SSg7BWeXXebYY4dyrSQ5SHu
KqsN2Y3P3A0ZQ3ApzFryImM6BwSpOeyCsVRhl7VCWnMgXcroqc/O6F6/YRFVkUAi
iWWZLXM7WFBQGXUbPXaiPc+wCRARrnul9p+48Teyy0CJWiWormQmkznVxeihErDX
/pWdQdvJeFcUrIj1H3e4cyJF2hVzRiUGI/eZmBGlDyaK192vYgGYO2AhHnTfd7fU
dUJ+/trVw0koyC5/86veHRqCcXzFD44ORkAB46NaCic1K8t+RPhmxgtriiNLcCrQ
kEmeub6ayPkuniV88NBEPXaDy0S/cEYHr7GEuZTn4sq+hw//y5KbKN3sa6aLHsMH
46BIHcyTXU59eNJ4lWOjoqD2NiqP2GYFY4PZftB85H1EW8Fchwcsw4WzW0ENpzmi
qcWslXDKYjJIZ6NHnBiR/FYI1VdsmoGbDzMhrHreCrWCuIuVaqfrFPMnvE8f5dqY
fkfnkqT4
=SdZA
-----END PGP SIGNATURE-----
Merge tag '6.2-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
- memory leak and double free fix
- two symlink fixes
- minor cleanup fix
- two smb1 fixes
* tag '6.2-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix uninitialized memory read for smb311 posix symlink create
cifs: fix potential memory leaks in session setup
cifs: do not query ifaces on smb1 mounts
cifs: fix double free on failed kerberos auth
cifs: remove redundant assignment to the variable match
cifs: fix file info setting in cifs_open_file()
cifs: fix file info setting in cifs_query_path_info()
Two minor fixes in the hisi_sas driver which only impact enterprise
style multi-expander and shared disk situations and no core changes.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCY8KxpSYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishRAlAP9cQmiq
M8WGimymqulRFEpkWpDM1R06eqHDI2K0h5/rfAD+KlgS0cOVvx0nenWyhmprYvDX
2Z239J+WXOjSJq8UJL4=
=gMQX
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two minor fixes in the hisi_sas driver which only impact enterprise
style multi-expander and shared disk situations and no core changes"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: hisi_sas: Set a port invalid only if there are no devices attached when refreshing port id
scsi: hisi_sas: Use abort task set to reset SAS disks when discovered
A single fix for rc4 to prevent building the pata_cs5535 driver with
user mode linux as it uses msr operations that are not defined with UML.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCY8HnowAKCRDdoc3SxdoY
dmDWAPwKMUiDzFSkeD7zfKGwCd72HWUa1298yL+XnD8Y7vLBtgEAqhYi9fAnVnN0
dUHm12rEwFOay+lVwxWuQowFaVmzGAs=
=RXSo
-----END PGP SIGNATURE-----
Merge tag 'ata-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ATA fix from Damien Le Moal:
"A single fix to prevent building the pata_cs5535 driver with user mode
linux as it uses msr operations that are not defined with UML"
* tag 'ata-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: pata_cs5535: Don't build on UML
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmPBsFAQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpqgnEAC0OqxnMsOPNbkLO7k6FsSrG7ZoENkOIMCt
Grk3D1cPkM13I0xc+WiaOBezMriPzfdXvt5AGDn9fd53Ih47qpSY4eU6pCqoCk5y
HWdn8KXZvhJGZsSy0Nz+cfPDW/8diJON8YBpJwWM/DfDdP8XibtjlIMTVTtJab6h
aGWjmy3leNfghOJ0cZ1wjL6maWFoowQASs52PZfajSc0mQ5X0i8BgQb1WOHNu89C
vEir9PYlTmdMnYlAKLsyEL3KoGUPm++zSLtJeyWYavlCMGK5WTyNkzmeXqsQhAGf
b1LjovQASe//1t2wvCzQviRf4cae0pE9JhiaYt2oxoDdHrfQj/WPndVS4yE9c+0O
BnLVTCFHNv86TRXNCbEUzI+Ftj6m9qt4MrHz8YpstX7FxGxYC+T5RqTwYClWZQ0j
llBuJUHj+kkAv6kBMJCHTyat6pxIDgcb52QMJr5mFWuEaTloraBIJC70hMtxBQV/
j5mrBYqCngCHVs+hAl9UQ4zqQVSvkeT11QFvwFolxIfs7qtfLqeGzYxvaeomqO3V
sA+H5NY50OEuPfFFmCpcNUJXeUKg7wP39iNHdz6P5cCDBCfUwbNbgKKKNmBovaC+
KhPd8Xo1MmzDuF+cylvTcjOBDte4425GN7PBj4vP1xbuHYcjg6AEFLawgqE9Y4XX
xyNlgJXPOg==
=ujiw
-----END PGP SIGNATURE-----
Merge tag 'block-6.2-2023-01-13' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"Nothing major in here, just a collection of NVMe fixes and dropping a
wrong might_sleep() that static checkers tripped over but which isn't
valid"
* tag 'block-6.2-2023-01-13' of git://git.kernel.dk/linux:
MAINTAINERS: stop nvme matching for nvmem files
nvme: don't allow unprivileged passthrough on partitions
nvme: replace the "bool vec" arguments with flags in the ioctl path
nvme: remove __nvme_ioctl
nvme-pci: fix error handling in nvme_pci_enable()
nvme-pci: add NVME_QUIRK_IDENTIFY_CNS quirk to Apple T2 controllers
nvme-apple: add NVME_QUIRK_IDENTIFY_CNS quirk to fix regression
block: Drop spurious might_sleep() from blk_put_queue()
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmPBsL8QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgplLXD/9kmbwSseR8sm8s6rB3mZcfgN4vvkRDa5Kg
r5bj32sOl2o37szObGI53MneFcCpdOszA2R3AHqUoj7ZQsyPlt59OIpKtdBlp6xv
ACX1weBnxMTtdMZ90Mp/GxGLZuvSifbfL4z5YgbRRKnGorz3prmfRkKuO51DcJes
mhoPGTDUsAmxoU261LzuHI7DtD69We8yBzj58p21NF8DvBI3NtuWWhOI+EGs2CDr
aqilG5LxOYBKuEY660KhWKMAXwVbVkyLak4eIh4iax0R0Um/SKXnRMIRI3L1QZoP
eqGtF4zGJGM+47RUnl6GF59/IsyRZR04GJlZ5ma8aqqZNh4oj8E6A+hpgczJ0025
QM4hG+NwqXeNpMjN0PwDypo3WoYjyIhaMoEyCT7dmVzj62y3pm3DDaZDnzUlQF5w
8YvoSPwyCzWHWFDGEZp6WRYn2YfoiE+L48euknADmTL5FwUOs1J5OUK0v8BcAYLO
tqJ8Q8emx15ZTzHeI+Z9lDYIuYKBU9XuO8ugfbss/Xx2tQMDeHe6BIaujhwam6c4
BWyAIViXgWKpIIDB3Emsu3lStc3PJ1WLbBdw4ja0nwhCRB7IeclzZf1IZHTT4xsQ
eg5EK2QORLrlY9keCoFhqfT4guSYQptBlwPuxhHM2gcrxXegR6294hBoPVSd1be5
g5NK0rrSWA==
=9C0J
-----END PGP SIGNATURE-----
Merge tag 'io_uring-6.2-2023-01-13' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
"A fix for a regression that happened last week, rest is fixes that
will be headed to stable as well. In detail:
- Fix for a regression added with the leak fix from last week (me)
- In writing a test case for that leak, inadvertently discovered a
case where we a poll request can race. So fix that up and mark it
for stable, and also ensure that fdinfo covers both the poll tables
that we have. The latter was an oversight when the split poll table
were added (me)
- Fix for a lockdep reported issue with IOPOLL (Pavel)"
* tag 'io_uring-6.2-2023-01-13' of git://git.kernel.dk/linux:
io_uring: lock overflowing for IOPOLL
io_uring/poll: attempt request issue after racy poll wakeup
io_uring/fdinfo: include locked hash table in fdinfo output
io_uring/poll: add hash if ready poll request can't complete inline
io_uring/io-wq: only free worker if it was allocated for creation
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmPBniAUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vwOjRAAhjyRAgyiZV2rWS4pyvpQpqcpZWD9
796ZSqnzLJjVYCymGvUTX23FEA48n59+bCM/WpfEGUPrBf8LZQxC9YOCm6ltuM8+
FoSBykW/tHPq5IWaLzgrWpHeDOgEnZu/WFGGvrV3tl1mLpM1SJT8bGDsjHXlo+FM
qkTEiA3nUEKQs5x9r2TTLCeUWGPNTIHNd2VfuxOqM3qC/nVCOfTTxU8nm6Lk7Eix
nboAugAIADJIjs/+ZGekLBuzZYPkLYuDTyMYJ5hdo1p7wWCLc9gArEqvXKwVgmD3
ptenZeOlQi9Ay45HmkfIgfgKeeQ7REJj3dx04vf67neAianyUrB0EZDqDjR7LmgM
ozlNt0XjyoeEhu6AQS0s1LZtbDiED1R/00P6Gb+YEjUCVipW2lEYYwP0v9dsnNoh
6wblgnkQoxLFM+5CAXRmCmpaoQn0Uam7okfVeohtsz8/kNQF2St0hjzr4Dmws+O3
k9PUqnnUl4ByElzpEDesVGZMJ3pxFVH15ufu8VnRqN60pLTvNrsPyU4cVnG176Rc
3RSDN3zMtPxnHJVy4r3bTNEZsX/7RUrOb4xScOXMmRDBMUc8QdscF8Oj1ucKlj5j
mp7vB/7+VjU96uRarRyqUxGeQc77DCTcvOa1IGh/cuYom8ZJ6vpSCpKy6f6SFGuf
i8iTTUcQKCdqVW4=
=Fv2v
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fixes from Bjorn Helgaas:
- Work around apparent firmware issue that made Linux reject MMCONFIG
space, which broke PCI extended config space (Bjorn Helgaas)
- Fix CONFIG_PCIE_BT1 dependency due to mid-air collision between a
PCI_MSI_IRQ_DOMAIN -> PCI_MSI change and addition of PCIE_BT1 (Lukas
Bulwahn)
* tag 'pci-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
x86/pci: Simplify is_mmconf_reserved() messages
PCI: dwc: Adjust to recent removal of PCI_MSI_IRQ_DOMAIN