IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Pull KCSAN updates from Paul McKenney:
"This contains initialization fixups, testing improvements, addition of
instruction pointer to data-race reports, and scoped data-race checks"
* tag 'kcsan.2021.11.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
kcsan: selftest: Cleanup and add missing __init
kcsan: Move ctx to start of argument list
kcsan: Support reporting scoped read-write access type
kcsan: Start stack trace with explicit location if provided
kcsan: Save instruction pointer for scoped accesses
kcsan: Add ability to pass instruction pointer of access to reporting
kcsan: test: Fix flaky test case
kcsan: test: Use kunit_skip() to skip tests
kcsan: test: Defer kcsan_test_init() after kunit initialization
Pull apparmor updates from John Johansen:
"Features
- use per file locks for transactional queries
- update policy management capability checks to work with LSM stacking
Bug Fixes:
- check/put label on apparmor_sk_clone_security()
- fix error check on update of label hname
- fix introspection of of task mode for unconfined tasks
Cleanups:
- avoid -Wempty-body warning
- remove duplicated 'Returns:' comments
- fix doc warning
- remove unneeded one-line hook wrappers
- use struct_size() helper in kzalloc()
- fix zero-length compiler warning in AA_BUG()
- file.h: delete duplicated word
- delete repeated words in comments
- remove repeated declaration"
* tag 'apparmor-pr-2021-11-10' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: remove duplicated 'Returns:' comments
apparmor: remove unneeded one-line hook wrappers
apparmor: Use struct_size() helper in kzalloc()
apparmor: fix zero-length compiler warning in AA_BUG()
apparmor: use per file locks for transactional queries
apparmor: fix doc warning
apparmor: Remove the repeated declaration
apparmor: avoid -Wempty-body warning
apparmor: Fix internal policy capable check for policy management
apparmor: fix error check
security: apparmor: delete repeated words in comments
security: apparmor: file.h: delete duplicated word
apparmor: switch to apparmor to internal capable check for policy management
apparmor: update policy capable checks to use a label
apparmor: fix introspection of of task mode for unconfined tasks
apparmor: check/put label on apparmor_sk_clone_security()
Merge more updates from Andrew Morton:
"The post-linux-next material.
7 patches.
Subsystems affected by this patch series (all mm): debug,
slab-generic, migration, memcg, and kasan"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kasan: add kasan mode messages when kasan init
mm: unexport {,un}lock_page_memcg
mm: unexport folio_memcg_{,un}lock
mm/migrate.c: remove MIGRATE_PFN_LOCKED
mm: migrate: simplify the file-backed pages validation when migrating its mapping
mm: allow only SLUB on PREEMPT_RT
mm/page_owner.c: modify the type of argument "order" in some functions
Pull m68knommu updates from Greg Ungerer:
"Only two changes.
One removes the now unused CONFIG_MCPU32 symbol. The other sets a
default for the CONFIG_MEMORY_RESERVE config symbol (this aids
scripting and other automation) so you don't interactively get asked
for a value at configure time.
Summary:
- remove unused CONFIG_MCPU32 symbol
- default CONFIG_MEMORY_RESERVE value (for scripting)"
* tag 'm68knommu-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68knommu: Remove MCPU32 config symbol
m68k: set a default value for MEMORY_RESERVE
Although unlikely for it to be possible for rsp to be null here,
the check is safer to add, and quiets a Coverity warning.
Addresses-Coverity: 1443909 ("Explicit Null dereference")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
This reverts commit 2a4d9408c9.
Robert reported a NULL pointer dereference caused by the PCI core
(local_pci_probe()) calling the i2c_designware_pci driver's
.runtime_resume() method before the .probe() method. i2c_dw_pci_resume()
depends on initialization done by i2c_dw_pci_probe().
Prior to 2a4d9408c9 ("PCI: Use to_pci_driver() instead of
pci_dev->driver"), pci_pm_runtime_resume() avoided calling the
.runtime_resume() method because pci_dev->driver had not been set yet.
2a4d9408c9 and b5f9c644eb ("PCI: Remove struct pci_dev->driver"),
removed pci_dev->driver, replacing it by device->driver, which *has* been
set by this time, so pci_pm_runtime_resume() called the .runtime_resume()
method when it previously had not.
Fixes: 2a4d9408c9 ("PCI: Use to_pci_driver() instead of pci_dev->driver")
Link: https://lore.kernel.org/linux-i2c/CAP145pgdrdiMAT7=-iB1DMgA7t_bMqTcJL4N0=6u8kNY3EU0dw@mail.gmail.com/
Reported-by: Robert Święcki <robert@swiecki.net>
Tested-by: Robert Święcki <robert@swiecki.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When BLKRESETZONE ioctl and data read race, the data read leaves stale
page cache. The commit e511350590 ("block: Discard page cache of zone
reset target range") added page cache truncation to avoid stale page
cache after the ioctl. However, the stale page cache still can be read
during the reset zone operation for the ioctl. To avoid the stale page
cache completely, hold invalidate_lock of the block device file mapping.
Fixes: e511350590 ("block: Discard page cache of zone reset target range")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: stable@vger.kernel.org # v5.15
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211111085238.942492-1-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blk_mq_sched_bio_merge is only called from blk-mq.c:blk_attempt_bio_merge(),
which is called when queue usage counter is grabbed already:
1) blk_mq_get_new_requests()
2) blk_mq_get_request()
- cached request in current plug owns one queue usage counter
So don't grab ->q_usage_counter in blk_mq_sched_bio_merge(), and more
importantly this nest way causes hang in blk_mq_freeze_queue_wait().
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211111085134.345235-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The naming got changed as part of a revision of the patchset, but the
kerneldoc apparently never got updated. Fix it.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: a2247f19ee ("block: Add independent access ranges support")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull tracing fixes from Steven Rostedt:
"Two locking fixes:
- Add mutex protection to ring_buffer_reset()
- Fix deadlock in modify_ftrace_direct_multi()"
* tag 'trace-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace/direct: Fix lockup in modify_ftrace_direct_multi
ring-buffer: Protect ring_buffer_reset() from reentrancy
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, can and netfilter.
Current release - regressions:
- bpf: do not reject when the stack read size is different from the
tracked scalar size
- net: fix premature exit from NAPI state polling in napi_disable()
- riscv, bpf: fix RV32 broken build, and silence RV64 warning
Current release - new code bugs:
- net: fix possible NULL deref in sock_reserve_memory
- amt: fix error return code in amt_init(); fix stopping the
workqueue
- ax88796c: use the correct ioctl callback
Previous releases - always broken:
- bpf: stop caching subprog index in the bpf_pseudo_func insn
- security: fixups for the security hooks in sctp
- nfc: add necessary privilege flags in netlink layer, limit
operations to admin only
- vsock: prevent unnecessary refcnt inc for non-blocking connect
- net/smc: fix sk_refcnt underflow on link down and fallback
- nfnetlink_queue: fix OOB when mac header was cleared
- can: j1939: ignore invalid messages per standard
- bpf, sockmap:
- fix race in ingress receive verdict with redirect to self
- fix incorrect sk_skb data_end access when src_reg = dst_reg
- strparser, and tls are reusing qdisc_skb_cb and colliding
- ethtool: fix ethtool msg len calculation for pause stats
- vlan: fix a UAF in vlan_dev_real_dev() when ref-holder tries to
access an unregistering real_dev
- udp6: make encap_rcv() bump the v6 not v4 stats
- drv: prestera: add explicit padding to fix m68k build
- drv: felix: fix broken VLAN-tagged PTP under VLAN-aware bridge
- drv: mvpp2: fix wrong SerDes reconfiguration order
Misc & small latecomers:
- ipvs: auto-load ipvs on genl access
- mctp: sanity check the struct sockaddr_mctp padding fields
- libfs: support RENAME_EXCHANGE in simple_rename()
- avoid double accounting for pure zerocopy skbs"
* tag 'net-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (123 commits)
selftests/net: udpgso_bench_rx: fix port argument
net: wwan: iosm: fix compilation warning
cxgb4: fix eeprom len when diagnostics not implemented
net: fix premature exit from NAPI state polling in napi_disable()
net/smc: fix sk_refcnt underflow on linkdown and fallback
net/mlx5: Lag, fix a potential Oops with mlx5_lag_create_definer()
gve: fix unmatched u64_stats_update_end()
net: ethernet: lantiq_etop: Fix compilation error
selftests: forwarding: Fix packet matching in mirroring selftests
vsock: prevent unnecessary refcnt inc for nonblocking connect
net: marvell: mvpp2: Fix wrong SerDes reconfiguration order
net: ethernet: ti: cpsw_ale: Fix access to un-initialized memory
net: stmmac: allow a tc-taprio base-time of zero
selftests: net: test_vxlan_under_vrf: fix HV connectivity test
net: hns3: allow configure ETS bandwidth of all TCs
net: hns3: remove check VF uc mac exist when set by PF
net: hns3: fix some mac statistics is always 0 in device version V2
net: hns3: fix kernel crash when unload VF while it is being reset
net: hns3: sync rx ring head in echo common pull
net: hns3: fix pfc packet number incorrect after querying pfc parameters
...
Pull char/misc fix from Greg KH:
"Here is a single fix for 5.16-rc1 to resolve a build problem that came
in through the coresight tree (and as such came in through the
char/misc tree merge in the 5.16-rc1 merge window).
It resolves a build problem with 'allmodconfig' on arm64 and is acked
by the proper subsystem maintainers. It has been in linux-next all
week with no reported problems"
* tag 'char-misc-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
arm64: cpufeature: Export this_cpu_has_cap helper
Pull USB fixes from Greg KH:
"Here are some small reverts and fixes for USB drivers for issues that
came up during the 5.16-rc1 merge window.
These include:
- two reverts of xhci and USB core patches that are causing problems
in many systems.
- xhci 3.1 enumeration delay fix for systems that were having
problems.
All three of these have been in linux-next all week with no reported
issues"
* tag 'usb-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: Fix USB 3.1 enumeration issues by increasing roothub power-on-good delay
Revert "usb: core: hcd: Add support for deferring roothub registration"
Revert "xhci: Set HCD flag to defer primary roothub registration"
Memory allocators may disable interrupts or preemption as part of the
allocation and freeing process. For PREEMPT_RT it is important that
these sections remain deterministic and short and therefore don't depend
on the size of the memory to allocate/ free or the inner state of the
algorithm.
Until v3.12-RT the SLAB allocator was an option but involved several
changes to meet all the requirements. The SLUB design fits better with
PREEMPT_RT model and so the SLAB patches were dropped in the 3.12-RT
patchset. Comparing the two allocator, SLUB outperformed SLAB in both
throughput (time needed to allocate and free memory) and the maximal
latency of the system measured with cyclictest during hackbench.
SLOB was never evaluated since it was unlikely that it preforms better
than SLAB. During a quick test, the kernel crashed with SLOB enabled
during boot.
Disable SLAB and SLOB on PREEMPT_RT.
[bigeasy@linutronix.de: commit description]
Link: https://lkml.kernel.org/r/20211015210336.gen3tib33ig5q2md@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sync this one last bit of discrepancy between kernel and userspace
libxfs.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
* Fix misuse of gfn-to-pfn cache when recording guest steal time / preempted status
* Fix selftests on APICv machines
* Fix sparse warnings
* Fix detection of KVM features in CPUID
* Cleanups for bogus writes to MSR_KVM_PV_EOI_EN
* Fixes and cleanups for MSR bitmap handling
* Cleanups for INVPCID
* Make x86 KVM_SOFT_MAX_VCPUS consistent with other architectures
Add support for AMD SEV and SEV-ES intra-host migration support. Intra
host migration provides a low-cost mechanism for userspace VMM upgrades.
In the common case for intra host migration, we can rely on the normal
ioctls for passing data from one VMM to the next. SEV, SEV-ES, and other
confidential compute environments make most of this information opaque, and
render KVM ioctls such as "KVM_GET_REGS" irrelevant. As a result, we need
the ability to pass this opaque metadata from one VMM to the next. The
easiest way to do this is to leave this data in the kernel, and transfer
ownership of the metadata from one KVM VM (or vCPU) to the next. In-kernel
hand off makes it possible to move any data that would be
unsafe/impossible for the kernel to hand directly to userspace, and
cannot be reproduced using data that can be handed to userspace.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM_CAP_NR_VCPUS is used to get the "recommended" maximum number of
VCPUs and arm64/mips/riscv report num_online_cpus(). Powerpc reports
either num_online_cpus() or num_present_cpus(), s390 has multiple
constants depending on hardware features. On x86, KVM reports an
arbitrary value of '710' which is supposed to be the maximum tested
value but it's possible to test all KVM_MAX_VCPUS even when there are
less physical CPUs available.
Drop the arbitrary '710' value and return num_online_cpus() on x86 as
well. The recommendation will match other architectures and will mean
'no CPU overcommit'.
For reference, QEMU only queries KVM_CAP_NR_VCPUS to print a warning
when the requested vCPU number exceeds it. The static limit of '710'
is quite weird as smaller systems with just a few physical CPUs should
certainly "recommend" less.
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211111134733.86601-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Handle #GP on INVPCID due to an invalid type in the common switch
statement instead of relying on the callers (VMX and SVM) to manually
validate the type.
Unlike INVVPID and INVEPT, INVPCID is not explicitly documented to check
the type before reading the operand from memory, so deferring the
type validity check until after that point is architecturally allowed.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109174426.2350547-3-vipinsh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
handle_invept(), handle_invvpid(), handle_invpcid() read the same reg2
field in vmcs.VMX_INSTRUCTION_INFO to get the index of the GPR that
holds the invalidation type. Add a helper to retrieve reg2 from VMX
instruction info to consolidate and document the shift+mask magic.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109174426.2350547-2-vipinsh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Clean up the x2APIC MSR bitmap intereption code for L2, which is the last
holdout of open coded bitmap manipulations. Freshen up the SDM/PRM
comment, rename the function to make it abundantly clear the funky
behavior is x2APIC specific, and explain _why_ vmcs01's bitmap is ignored
(the previous comment was flat out wrong for x2APIC behavior).
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109013047.2041518-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add builder macros to generate the MSR bitmap helpers to reduce the
amount of copy-paste code, especially with respect to all the magic
numbers needed to calc the correct bit location.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109013047.2041518-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Always check vmcs01's MSR bitmap when merging L0 and L1 bitmaps for L2,
and always update the relevant bits in vmcs02. This fixes two distinct,
but intertwined bugs related to dynamic MSR bitmap modifications.
The first issue is that KVM fails to enable MSR interception in vmcs02
for the FS/GS base MSRs if L1 first runs L2 with interception disabled,
and later enables interception.
The second issue is that KVM fails to honor userspace MSR filtering when
preparing vmcs02.
Fix both issues simultaneous as fixing only one of the issues (doesn't
matter which) would create a mess that no one should have to bisect.
Fixing only the first bug would exacerbate the MSR filtering issue as
userspace would see inconsistent behavior depending on the whims of L1.
Fixing only the second bug (MSR filtering) effectively requires fixing
the first, as the nVMX code only knows how to transition vmcs02's
bitmap from 1->0.
Move the various accessor/mutators that are currently buried in vmx.c
into vmx.h so that they can be shared by the nested code.
Fixes: 1a155254ff ("KVM: x86: Introduce MSR filtering")
Fixes: d69129b4e4 ("KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible")
Cc: stable@vger.kernel.org
Cc: Alexander Graf <graf@amazon.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109013047.2041518-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Check the current VMCS controls to determine if an MSR write will be
intercepted due to MSR bitmaps being disabled. In the nested VMX case,
KVM will disable MSR bitmaps in vmcs02 if they're disabled in vmcs12 or
if KVM can't map L1's bitmaps for whatever reason.
Note, the bad behavior is relatively benign in the current code base as
KVM sets all bits in vmcs02's MSR bitmap by default, clears bits if and
only if L0 KVM also disables interception of an MSR, and only uses the
buggy helper for MSR_IA32_SPEC_CTRL. Because KVM explicitly tests WRMSR
before disabling interception of MSR_IA32_SPEC_CTRL, the flawed check
will only result in KVM reading MSR_IA32_SPEC_CTRL from hardware when it
isn't strictly necessary.
Tag the fix for stable in case a future fix wants to use
msr_write_intercepted(), in which case a buggy implementation in older
kernels could prove subtly problematic.
Fixes: d28b387fb7 ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109013047.2041518-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When kvm_gfn_to_hva_cache_init() call from kvm_lapic_set_pv_eoi() fails,
MSR write to MSR_KVM_PV_EOI_EN results in #GP so it is reasonable to
expect that the value we keep internally in KVM wasn't updated.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211108152819.12485-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently when kvm_update_cpuid_runtime() runs, it assumes that the
KVM_CPUID_FEATURES leaf is located at 0x40000001. This is not true,
however, if Hyper-V support is enabled. In this case the KVM leaves will
be offset.
This patch introdues as new 'kvm_cpuid_base' field into struct
kvm_vcpu_arch to track the location of the KVM leaves and function
kvm_update_kvm_cpuid_base() (called from kvm_set_cpuid()) to locate the
leaves using the 'KVMKVMKVM\0\0\0' signature (which is now given a
definition in kvm_para.h). Adjustment of KVM_CPUID_FEATURES will hence now
target the correct leaf.
NOTE: A new for_each_possible_hypervisor_cpuid_base() macro is intoduced
into processor.h to avoid having duplicate code for the iteration
over possible hypervisor base leaves.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Message-Id: <20211105095101.5384-3-pdurrant@amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the core logic of SET_CPUID and SET_CPUID2 to a common helper, the
only difference between the two ioctls() is the format of the userspace
struct. A future fix will add yet more code to the core logic.
No functional change intended.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211105095101.5384-2-pdurrant@amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The fast page fault path bails out on write faults to huge pages in
order to accommodate dirty logging. This change adds a check to do that
only when dirty logging is actually enabled, so that access tracking for
huge pages can still use the fast path for write faults in the common
case.
Signed-off-by: Junaid Shahid <junaids@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211104003359.2201967-1-junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Wrap the read of iter->sptep in tdp_mmu_map_handle_target_level() with
rcu_dereference(). Shadow pages in the TDP MMU, and thus their SPTEs,
are protected by rcu.
This fixes a Sparse warning at tdp_mmu.c:900:51:
warning: incorrect type in argument 1 (different address spaces)
expected unsigned long long [usertype] *sptep
got unsigned long long [noderef] [usertype] __rcu *[usertype] sptep
Fixes: 7158bee4b4 ("KVM: MMU: pass kvm_mmu_page struct to make_spte")
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211103161833.3769487-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM_GUESTDBG_BLOCKIRQ relies on interrupts being injected using
standard kvm's inject_pending_event, and not via APICv/AVIC.
Since this is a debug feature, just inhibit APICv/AVIC while
KVM_GUESTDBG_BLOCKIRQ is in use on at least one vCPU.
Fixes: 61e5f69ef0 ("KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ")
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Tested-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211108090245.166408-1-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These function names sound like predicates, and they have siblings,
*is_valid_msr(), which _are_ predicates. Moreover, there are comments
that essentially warn that these functions behave unexpectedly.
Flip the polarity of the return values, so that they become
predicates, and convert the boolean result to a success/failure code
at the outer call site.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211105202058.1048757-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In commit b043138246 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is
not missed") we switched to using a gfn_to_pfn_cache for accessing the
guest steal time structure in order to allow for an atomic xchg of the
preempted field. This has a couple of problems.
Firstly, kvm_map_gfn() doesn't work at all for IOMEM pages when the
atomic flag is set, which it is in kvm_steal_time_set_preempted(). So a
guest vCPU using an IOMEM page for its steal time would never have its
preempted field set.
Secondly, the gfn_to_pfn_cache is not invalidated in all cases where it
should have been. There are two stages to the GFN->PFN conversion;
first the GFN is converted to a userspace HVA, and then that HVA is
looked up in the process page tables to find the underlying host PFN.
Correct invalidation of the latter would require being hooked up to the
MMU notifiers, but that doesn't happen---so it just keeps mapping and
unmapping the *wrong* PFN after the userspace page tables change.
In the !IOMEM case at least the stale page *is* pinned all the time it's
cached, so it won't be freed and reused by anyone else while still
receiving the steal time updates. The map/unmap dance only takes care
of the KVM administrivia such as marking the page dirty.
Until the gfn_to_pfn cache handles the remapping automatically by
integrating with the MMU notifiers, we might as well not get a
kernel mapping of it, and use the perfectly serviceable userspace HVA
that we already have. We just need to implement the atomic xchg on
the userspace address with appropriate exception handling, which is
fairly trivial.
Cc: stable@vger.kernel.org
Fixes: b043138246 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <3645b9b889dac6438394194bb5586a46b68d581f.camel@infradead.org>
[I didn't entirely agree with David's assessment of the
usefulness of the gfn_to_pfn cache, and integrated the outcome
of the discussion in the above commit message. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Avoid code duplication across all callers of misc_cg_try_charge and
misc_cg_uncharge. The resource type for KVM is always derived from
sev->es_active, and the quantity is always 1.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Generalize KVM_REQ_VM_BUGGED so that it can be called even in cases
where it is by design that the VM cannot be operated upon. In this
case any KVM_BUG_ON should still warn, so introduce a new flag
kvm->vm_dead that is separate from kvm->vm_bugged.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add guest api and guest kernel support for SEV live migration.
Introduces a new hypercall to notify the host of changes to the page
encryption status. If the page is encrypted then it must be migrated
through the SEV firmware or a helper VM sharing the key. If page is
not encrypted then it can be migrated normally by userspace. This new
hypercall is invoked using paravirt_ops.
Conflicts: sev_active() replaced by cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT).