Commit Graph

967212 Commits

Author SHA1 Message Date
Andrew Jones
22f232d134 KVM: selftests: x86: Set supported CPUIDs on default VM
Almost all tests do this anyway and the ones that don't don't
appear to care. Only vmx_set_nested_state_test assumes that
a feature (VMX) is disabled until later setting the supported
CPUIDs. It's better to disable that explicitly anyway.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20201111122636.73346-11-drjones@redhat.com>
[Restore CPUID_VMX, or vmx_set_nested_state breaks. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-16 13:14:20 -05:00
Andrew Jones
08d3e27718 KVM: selftests: Make test skipping consistent
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20201111122636.73346-12-drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-16 13:14:20 -05:00
Andrew Jones
87c5f35e5c KVM: selftests: Also build dirty_log_perf_test on AArch64
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20201111122636.73346-10-drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:20 -05:00
Andrew Jones
0aa9ec45d4 KVM: selftests: Introduce vm_create_[default_]_with_vcpus
Introduce new vm_create variants that also takes a number of vcpus,
an amount of per-vcpu pages, and optionally a list of vcpuids. These
variants will create default VMs with enough additional pages to
cover the vcpu stacks, per-vcpu pages, and pagetable pages for all.
The new 'default' variant uses VM_MODE_DEFAULT, whereas the other
new variant accepts the mode as a parameter.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20201111122636.73346-6-drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:20 -05:00
Andrew Jones
ec2f18bb47 KVM: selftests: Make vm_create_default common
The code is almost 100% the same anyway. Just move it to common
and add a few arch-specific macros.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20201111122636.73346-5-drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:19 -05:00
Paolo Bonzini
f63f0b68c8 KVM: selftests: always use manual clear in dirty_log_perf_test
Nothing sets USE_CLEAR_DIRTY_LOG anymore, so anything it surrounds
is dead code.

However, it is the recommended way to use the dirty page bitmap
for new enough kernel, so use it whenever KVM has the
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 capability.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:19 -05:00
Jim Mattson
2259c17f01 kvm: x86: Sink cpuid update into vendor-specific set_cr4 functions
On emulated VM-entry and VM-exit, update the CPUID bits that reflect
CR4.OSXSAVE and CR4.PKE.

This fixes a bug where the CPUID bits could continue to reflect L2 CR4
values after emulated VM-exit to L1. It also fixes a related bug where
the CPUID bits could continue to reflect L1 CR4 values after emulated
VM-entry to L2. The latter bug is mainly relevant to SVM, wherein
CPUID is not a required intercept. However, it could also be relevant
to VMX, because the code to conditionally update these CPUID bits
assumes that the guest CPUID and the guest CR4 are always in sync.

Fixes: 8eb3f87d90 ("KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit")
Fixes: 2acf923e38 ("KVM: VMX: Enable XSAVE/XRSTOR for guest")
Fixes: b9baba8614 ("KVM, pkeys: expose CPUID/CR4 to guest")
Reported-by: Abhiroop Dabral <adabral@paloaltonetworks.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Cc: Haozhong Zhang <haozhong.zhang@intel.com>
Cc: Dexuan Cui <dexuan.cui@intel.com>
Cc: Huaitong Han <huaitong.han@intel.com>
Message-Id: <20201029170648.483210-1-jmattson@google.com>
2020-11-15 09:49:18 -05:00
Paolo Bonzini
8aa426e854 selftests: kvm: keep .gitignore add to date
Add tsc_msrs_test, remove clear_dirty_log_test and alphabetize
everything.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:18 -05:00
Peter Xu
edd3de6fc3 KVM: selftests: Add "-c" parameter to dirty log test
It's only used to override the existing dirty ring size/count.  If
with a bigger ring count, we test async of dirty ring.  If with a
smaller ring count, we test ring full code path.  Async is default.

It has no use for non-dirty-ring tests.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201001012241.6208-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:18 -05:00
Peter Xu
019d321a68 KVM: selftests: Run dirty ring test asynchronously
Previously the dirty ring test was working in synchronous way, because
only with a vmexit (with that it was the ring full event) we'll know
the hardware dirty bits will be flushed to the dirty ring.

With this patch we first introduce a vcpu kick mechanism using SIGUSR1,
which guarantees a vmexit and also therefore the flushing of hardware
dirty bits.  Once this is in place, we can keep the vcpu dirty work
asynchronous of the whole collection procedure now.  Still, we need
to be very careful that when reaching the ring buffer soft limit
(KVM_EXIT_DIRTY_RING_FULL) we must collect the dirty bits before
continuing the vcpu.

Further increase the dirty ring size to current maximum to make sure
we torture more on the no-ring-full case, which should be the major
scenario when the hypervisors like QEMU would like to use this feature.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201001012239.6159-1-peterx@redhat.com>
[Use KVM_SET_SIGNAL_MASK+sigwait instead of a signal handler. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:17 -05:00
Peter Xu
84292e5659 KVM: selftests: Add dirty ring buffer test
Add the initial dirty ring buffer test.

The current test implements the userspace dirty ring collection, by
only reaping the dirty ring when the ring is full.

So it's still running synchronously like this:

            vcpu                             main thread

  1. vcpu dirties pages
  2. vcpu gets dirty ring full
     (userspace exit)

                                       3. main thread waits until full
                                          (so hardware buffers flushed)
                                       4. main thread collects
                                       5. main thread continues vcpu

  6. vcpu continues, goes back to 1

We can't directly collects dirty bits during vcpu execution because
otherwise we can't guarantee the hardware dirty bits were flushed when
we collect and we're very strict on the dirty bits so otherwise we can
fail the future verify procedure.  A follow up patch will make this
test to support async just like the existing dirty log test, by adding
a vcpu kick mechanism.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201001012237.6111-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:17 -05:00
Peter Xu
60f644fb51 KVM: selftests: Introduce after_vcpu_run hook for dirty log test
Provide a hook for the checks after vcpu_run() completes.  Preparation
for the dirty ring test because we'll need to take care of another
exit reason.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201001012235.6063-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:16 -05:00
Peter Xu
044c59c409 KVM: Don't allocate dirty bitmap if dirty ring is enabled
Because kvm dirty rings and kvm dirty log is used in an exclusive way,
Let's avoid creating the dirty_bitmap when kvm dirty ring is enabled.
At the meantime, since the dirty_bitmap will be conditionally created
now, we can't use it as a sign of "whether this memory slot enabled
dirty tracking".  Change users like that to check against the kvm
memory slot flags.

Note that there still can be chances where the kvm memory slot got its
dirty_bitmap allocated, _if_ the memory slots are created before
enabling of the dirty rings and at the same time with the dirty
tracking capability enabled, they'll still with the dirty_bitmap.
However it should not hurt much (e.g., the bitmaps will always be
freed if they are there), and the real users normally won't trigger
this because dirty bit tracking flag should in most cases only be
applied to kvm slots only before migration starts, that should be far
latter than kvm initializes (VM starts).

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201001012226.5868-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:16 -05:00
Peter Xu
b2cc64c4f3 KVM: Make dirty ring exclusive to dirty bitmap log
There's no good reason to use both the dirty bitmap logging and the
new dirty ring buffer to track dirty bits.  We should be able to even
support both of them at the same time, but it could complicate things
which could actually help little.  Let's simply make it the rule
before we enable dirty ring on any arch, that we don't allow these two
interfaces to be used together.

The big world switch would be KVM_CAP_DIRTY_LOG_RING capability
enablement.  That's where we'll switch from the default dirty logging
way to the dirty ring way.  As long as kvm->dirty_ring_size is setup
correctly, we'll once and for all switch to the dirty ring buffer mode
for the current virtual machine.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201001012224.5818-1-peterx@redhat.com>
[Change errno from EINVAL to ENXIO. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:16 -05:00
Peter Xu
fb04a1eddb KVM: X86: Implement ring-based dirty memory tracking
This patch is heavily based on previous work from Lei Cao
<lei.cao@stratus.com> and Paolo Bonzini <pbonzini@redhat.com>. [1]

KVM currently uses large bitmaps to track dirty memory.  These bitmaps
are copied to userspace when userspace queries KVM for its dirty page
information.  The use of bitmaps is mostly sufficient for live
migration, as large parts of memory are be dirtied from one log-dirty
pass to another.  However, in a checkpointing system, the number of
dirty pages is small and in fact it is often bounded---the VM is
paused when it has dirtied a pre-defined number of pages. Traversing a
large, sparsely populated bitmap to find set bits is time-consuming,
as is copying the bitmap to user-space.

A similar issue will be there for live migration when the guest memory
is huge while the page dirty procedure is trivial.  In that case for
each dirty sync we need to pull the whole dirty bitmap to userspace
and analyse every bit even if it's mostly zeros.

The preferred data structure for above scenarios is a dense list of
guest frame numbers (GFN).  This patch series stores the dirty list in
kernel memory that can be memory mapped into userspace to allow speedy
harvesting.

This patch enables dirty ring for X86 only.  However it should be
easily extended to other archs as well.

[1] https://patchwork.kernel.org/patch/10471409/

Signed-off-by: Lei Cao <lei.cao@stratus.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201001012222.5767-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:15 -05:00
Peter Xu
28bd726aa4 KVM: Pass in kvm pointer into mark_page_dirty_in_slot()
The context will be needed to implement the kvm dirty ring.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201001012044.5151-5-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:13 -05:00
Paolo Bonzini
2f5414423e KVM: remove kvm_clear_guest_page
kvm_clear_guest_page is not used anymore after "KVM: X86: Don't track dirty
for KVM_SET_[TSS_ADDR|IDENTITY_MAP_ADDR]", except from kvm_clear_guest.
We can just inline it in its sole user.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:13 -05:00
Peter Xu
ff5a983cbb KVM: X86: Don't track dirty for KVM_SET_[TSS_ADDR|IDENTITY_MAP_ADDR]
Originally, we have three code paths that can dirty a page without
vcpu context for X86:

  - init_rmode_identity_map
  - init_rmode_tss
  - kvmgt_rw_gpa

init_rmode_identity_map and init_rmode_tss will be setup on
destination VM no matter what (and the guest cannot even see them), so
it does not make sense to track them at all.

To do this, allow __x86_set_memory_region() to return the userspace
address that just allocated to the caller.  Then in both of the
functions we directly write to the userspace address instead of
calling kvm_write_*() APIs.

Another trivial change is that we don't need to explicitly clear the
identity page table root in init_rmode_identity_map() because no
matter what we'll write to the whole page with 4M huge page entries.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201001012044.5151-4-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:12 -05:00
Vitaly Kuznetsov
8b460692fe KVM: selftests: test KVM_GET_SUPPORTED_HV_CPUID as a system ioctl
KVM_GET_SUPPORTED_HV_CPUID is now supported as both vCPU and VM ioctl,
test that.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200929150944.1235688-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:12 -05:00
Vitaly Kuznetsov
c21d54f030 KVM: x86: hyper-v: allow KVM_GET_SUPPORTED_HV_CPUID as a system ioctl
KVM_GET_SUPPORTED_HV_CPUID is a vCPU ioctl but its output is now
independent from vCPU and in some cases VMMs may want to use it as a system
ioctl instead. In particular, QEMU doesn CPU feature expansion before any
vCPU gets created so KVM_GET_SUPPORTED_HV_CPUID can't be used.

Convert KVM_GET_SUPPORTED_HV_CPUID to 'dual' system/vCPU ioctl with the
same meaning.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200929150944.1235688-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:11 -05:00
David Woodhouse
b59e00dd8c kvm/eventfd: Drain events from eventfd in irqfd_wakeup()
Don't allow the events to accumulate in the eventfd counter, drain them
as they are handled.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20201027135523.646811-4-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:11 -05:00
David Woodhouse
b1b397aeef vfio/virqfd: Drain events from eventfd in virqfd_wakeup()
Don't allow the events to accumulate in the eventfd counter, drain them
as they are handled.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20201027135523.646811-3-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-15 09:49:10 -05:00
David Woodhouse
28f1326710 eventfd: Export eventfd_ctx_do_read()
Where events are consumed in the kernel, for example by KVM's
irqfd_wakeup() and VFIO's virqfd_wakeup(), they currently lack a
mechanism to drain the eventfd's counter.

Since the wait queue is already locked while the wakeup functions are
invoked, all they really need to do is call eventfd_ctx_do_read().

Add a check for the lock, and export it for them.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20201027135523.646811-2-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:10 -05:00
David Woodhouse
e8dbf19508 kvm/eventfd: Use priority waitqueue to catch events before userspace
As far as I can tell, when we use posted interrupts we silently cut off
the events from userspace, if it's listening on the same eventfd that
feeds the irqfd.

I like that behaviour. Let's do it all the time, even without posted
interrupts. It makes it much easier to handle IRQ remapping invalidation
without having to constantly add/remove the fd from the userspace poll
set. We can just leave userspace polling on it, and the bypass will...
well... bypass it.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20201026175325.585623-2-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:10 -05:00
David Woodhouse
c4d51a52c6 sched/wait: Add add_wait_queue_priority()
This allows an exclusive wait_queue_entry to be added at the head of the
queue, instead of the tail as normal. Thus, it gets to consume events
first without allowing non-exclusive waiters to be woken at all.

The (first) intended use is for KVM IRQFD, which currently has
inconsistent behaviour depending on whether posted interrupts are
available or not. If they are, KVM will bypass the eventfd completely
and deliver interrupts directly to the appropriate vCPU. If not, events
are delivered through the eventfd and userspace will receive them when
polling on the eventfd.

By using add_wait_queue_priority(), KVM will be able to consistently
consume events within the kernel without accidentally exposing them
to userspace when they're supposed to be bypassed. This, in turn, means
that userspace doesn't have to jump through hoops to avoid listening
on the erroneously noisy eventfd and injecting duplicate interrupts.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20201027143944.648769-2-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:09 -05:00
Yadong Qi
bf0cd88ce3 KVM: x86: emulate wait-for-SIPI and SIPI-VMExit
Background: We have a lightweight HV, it needs INIT-VMExit and
SIPI-VMExit to wake-up APs for guests since it do not monitor
the Local APIC. But currently virtual wait-for-SIPI(WFS) state
is not supported in nVMX, so when running on top of KVM, the L1
HV cannot receive the INIT-VMExit and SIPI-VMExit which cause
the L2 guest cannot wake up the APs.

According to Intel SDM Chapter 25.2 Other Causes of VM Exits,
SIPIs cause VM exits when a logical processor is in
wait-for-SIPI state.

In this patch:
    1. introduce SIPI exit reason,
    2. introduce wait-for-SIPI state for nVMX,
    3. advertise wait-for-SIPI support to guest.

When L1 hypervisor is not monitoring Local APIC, L0 need to emulate
INIT-VMExit and SIPI-VMExit to L1 to emulate INIT-SIPI-SIPI for
L2. L2 LAPIC write would be traped by L0 Hypervisor(KVM), L0 should
emulate the INIT/SIPI vmexit to L1 hypervisor to set proper state
for L2's vcpu state.

Handle procdure:
Source vCPU:
    L2 write LAPIC.ICR(INIT).
    L0 trap LAPIC.ICR write(INIT): inject a latched INIT event to target
       vCPU.
Target vCPU:
    L0 emulate an INIT VMExit to L1 if is guest mode.
    L1 set guest VMCS, guest_activity_state=WAIT_SIPI, vmresume.
    L0 set vcpu.mp_state to INIT_RECEIVED if (vmcs12.guest_activity_state
       == WAIT_SIPI).

Source vCPU:
    L2 write LAPIC.ICR(SIPI).
    L0 trap LAPIC.ICR write(INIT): inject a latched SIPI event to traget
       vCPU.
Target vCPU:
    L0 emulate an SIPI VMExit to L1 if (vcpu.mp_state == INIT_RECEIVED).
    L1 set CS:IP, guest_activity_state=ACTIVE, vmresume.
    L0 resume to L2.
    L2 start-up.

Signed-off-by: Yadong Qi <yadong.qi@intel.com>
Message-Id: <20200922052343.84388-1-yadong.qi@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20201106065122.403183-1-yadong.qi@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:09 -05:00
Paolo Bonzini
1c96dcceae KVM: x86: fix apic_accept_events vs check_nested_events
vmx_apic_init_signal_blocked is buggy in that it returns true
even in VMX non-root mode.  In non-root mode, however, INITs
are not latched, they just cause a vmexit.  Previously,
KVM was waiting for them to be processed when kvm_apic_accept_events
and in the meanwhile it ate the SIPIs that the processor received.

However, in order to implement the wait-for-SIPI activity state,
KVM will have to process KVM_APIC_SIPI in vmx_check_nested_events,
and it will not be possible anymore to disregard SIPIs in non-root
mode as the code is currently doing.

By calling kvm_x86_ops.nested_ops->check_events, we can force a vmexit
(with the side-effect of latching INITs) before incorrectly injecting
an INIT or SIPI in a guest, and therefore vmx_apic_init_signal_blocked
can do the right thing.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:08 -05:00
Sean Christopherson
7a873e4555 KVM: selftests: Verify supported CR4 bits can be set before KVM_SET_CPUID2
Extend the KVM_SET_SREGS test to verify that all supported CR4 bits, as
enumerated by KVM, can be set before KVM_SET_CPUID2, i.e. without first
defining the vCPU model.  KVM is supposed to skip guest CPUID checks
when host userspace is stuffing guest state.

Check the inverse as well, i.e. that KVM rejects KVM_SET_REGS if CR4
has one or more unsupported bits set.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201007014417.29276-7-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:08 -05:00
Sean Christopherson
ee69c92bac KVM: x86: Return bool instead of int for CR4 and SREGS validity checks
Rework the common CR4 and SREGS checks to return a bool instead of an
int, i.e. true/false instead of 0/-EINVAL, and add "is" to the name to
clarify the polarity of the return value (which is effectively inverted
by this change).

No functional changed intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201007014417.29276-6-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:08 -05:00
Sean Christopherson
c2fe3cd460 KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook
Split out VMX's checks on CR4.VMXE to a dedicated hook, .is_valid_cr4(),
and invoke the new hook from kvm_valid_cr4().  This fixes an issue where
KVM_SET_SREGS would return success while failing to actually set CR4.

Fixing the issue by explicitly checking kvm_x86_ops.set_cr4()'s return
in __set_sregs() is not a viable option as KVM has already stuffed a
variety of vCPU state.

Note, kvm_valid_cr4() and is_valid_cr4() have different return types and
inverted semantics.  This will be remedied in a future patch.

Fixes: 5e1746d620 ("KVM: nVMX: Allow setting the VMXE bit in CR4")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201007014417.29276-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:07 -05:00
Sean Christopherson
311a06593b KVM: SVM: Drop VMXE check from svm_set_cr4()
Drop svm_set_cr4()'s explicit check CR4.VMXE now that common x86 handles
the check by incorporating VMXE into the CR4 reserved bits, via
kvm_cpu_caps.  SVM obviously does not set X86_FEATURE_VMX.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201007014417.29276-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:07 -05:00
Sean Christopherson
a447e38a7f KVM: VMX: Drop explicit 'nested' check from vmx_set_cr4()
Drop vmx_set_cr4()'s explicit check on the 'nested' module param now
that common x86 handles the check by incorporating VMXE into the CR4
reserved bits, via kvm_cpu_caps.  X86_FEATURE_VMX is set in kvm_cpu_caps
(by vmx_set_cpu_caps()), if and only if 'nested' is true.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201007014417.29276-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:06 -05:00
Sean Christopherson
d3a9e4146a KVM: VMX: Drop guest CPUID check for VMXE in vmx_set_cr4()
Drop vmx_set_cr4()'s somewhat hidden guest_cpuid_has() check on VMXE now
that common x86 handles the check by incorporating VMXE into the CR4
reserved bits, i.e. in cr4_guest_rsvd_bits.  This fixes a bug where KVM
incorrectly rejects KVM_SET_SREGS with CR4.VMXE=1 if it's executed
before KVM_SET_CPUID{,2}.

Fixes: 5e1746d620 ("KVM: nVMX: Allow setting the VMXE bit in CR4")
Reported-by: Stas Sergeev <stsp@users.sourceforge.net>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201007014417.29276-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 09:49:06 -05:00
Paolo Bonzini
c887c9b9ca kvm: mmu: fix is_tdp_mmu_check when the TDP MMU is not in use
In some cases where shadow paging is in use, the root page will
be either mmu->pae_root or vcpu->arch.mmu->lm_root.  Then it will
not have an associated struct kvm_mmu_page, because it is allocated
with alloc_page instead of kvm_mmu_alloc_page.

Just return false quickly from is_tdp_mmu_root if the TDP MMU is
not in use, which also includes the case where shadow paging is
enabled.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15 08:55:43 -05:00
Babu Moger
96308b0661 KVM: SVM: Update cr3_lm_rsvd_bits for AMD SEV guests
For AMD SEV guests, update the cr3_lm_rsvd_bits to mask
the memory encryption bit in reserved bits.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <160521948301.32054.5783800787423231162.stgit@bmoger-ubuntu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-13 06:31:14 -05:00
Babu Moger
0107973a80 KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch
SEV guests fail to boot on a system that supports the PCID feature.

While emulating the RSM instruction, KVM reads the guest CR3
and calls kvm_set_cr3(). If the vCPU is in the long mode,
kvm_set_cr3() does a sanity check for the CR3 value. In this case,
it validates whether the value has any reserved bits set. The
reserved bit range is 63:cpuid_maxphysaddr(). When AMD memory
encryption is enabled, the memory encryption bit is set in the CR3
value. The memory encryption bit may fall within the KVM reserved
bit range, causing the KVM emulation failure.

Introduce a new field cr3_lm_rsvd_bits in kvm_vcpu_arch which will
cache the reserved bits in the CR3 value. This will be initialized
to rsvd_bits(cpuid_maxphyaddr(vcpu), 63).

If the architecture has any special bits(like AMD SEV encryption bit)
that needs to be masked from the reserved bits, should be cleared
in vendor specific kvm_x86_ops.vcpu_after_set_cpuid handler.

Fixes: a780a3ea62 ("KVM: X86: Fix reserved bits check for MOV to CR3")
Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <160521947657.32054.3264016688005356563.stgit@bmoger-ubuntu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-13 06:28:37 -05:00
David Edmondson
51b958e5ae KVM: x86: clflushopt should be treated as a no-op by emulation
The instruction emulator ignores clflush instructions, yet fails to
support clflushopt. Treat both similarly.

Fixes: 13e457e0ee ("KVM: x86: Emulator does not decode clflush well")
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20201103120400.240882-1-david.edmondson@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-13 06:28:33 -05:00
Paolo Bonzini
2c38234c42 KVM/arm64 fixes for v5.10, take #3
- Allow userspace to downgrade ID_AA64PFR0_EL1.CSV2
 - Inject UNDEF on SCXTNUM_ELx access
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl+tsAQPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDieIP/06lrDbhKUv1BX5oOlNFKifsaxmrCiP2A9Ql
 1RiT1wI4Ba+QcgtnyUOI/SQgNx4Z+LkUFghkqP3TvtPEj3Y3zhCFiyz3wn/H0YJA
 eZ5kI5XkG+9NOdzpyhNKiN2ZOVz0/RpHnIyHWU1SFD3Ky58xHsI1w5boNcTYJDXE
 IVVAQ05HzNMOnqEnfS3Z2Oe99jiYXS1C80Rf2WvQuQQW6Nwu3J0W5VZztw/E9VG0
 wbivuOaFzk2Zee30oTXxkJfFDS7m3fZ2dXvHSUB9Luv3GMAFp/sK2ZmEg7ZUiAl1
 zBPW35jHv1bahU88IQ7LhvTa+Tg6aEGnCrjHO9JiCx4z0VLnEz86AzejItaGvRu7
 SGf7taj4xRfUVxlJsW1i5Nel7hpmk8ip59hWUq5jTu7bPQvnEFpSfWANgobQrGF4
 pAtYUyaJcU5hRml4NUOy/gGkBzZSDloe1ClDUsdVZrbMKSjnATD8/0Z2oxHthVI1
 vvzovTXOQ7LK81Qm9GZ6Xlj0vXJh2V91wMTxy82lK5PAmKuVWvgqOWbH7e8YX+2T
 VlY5jkIyjwj9vwyMQHmaR5f01eZotYVTM+YKZcjx6O+1MGkrSxZkVptf0g8Bj0X3
 VmCYHyA5LIil8bx58kLfoZhAtjOaAFf+j5XCTjP0zCB4mVHcrCk0rLBPyvPsZB73
 I3WFpQPq
 =eZCZ
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for v5.10, take #3

- Allow userspace to downgrade ID_AA64PFR0_EL1.CSV2
- Inject UNDEF on SCXTNUM_ELx access
2020-11-13 06:28:23 -05:00
Linus Torvalds
585e5b17b9 another fscrypt fix for 5.10-rc4
Fix a regression where new files weren't using inline encryption when
 they should be.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCX63AIhQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOKzlsAP9/m9XfxW3SwG4D1dnajXQPNZgsaby2
 AxkqJyjxq3kBvQEAo8fPe8uURAzYBA9C5tcP0+QCB3jqZkHu0HVCeQKvXwI=
 =zldW
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt

Pull fscrypt fix from Eric Biggers:
 "Fix a regression where new files weren't using inline encryption when
  they should be"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
  fscrypt: fix inline encryption not used on new files
2020-11-12 16:39:58 -08:00
Linus Torvalds
20ca21dfcc Fix jdata data corruption and glock reference leak
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAl+ttO4UHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTqAdg//Xssi+p1N13BfAdyYdPoqEQ9JKSZH
 vEwth53ASsEy+WXz7TMulmrwJXWyQXfEibPvGLrHZZ0zgdw3DyLqGCnDCJOqmdOQ
 /e08MmJ9LdGz06IHnlzhWj3Lm4KpJaFzil4HBYE8Jlu4PimLVKcIQoDRRMV2DURl
 arPSm/dJtqUFVDj9+bawq5mRmxA0gXPFspf857wnDzNB+hGlll2wvK70vapNlg39
 hNVjDMfhb04CVNsJoVZS4wRI3TlwvOjlxB9WKvNvyRDF1jQT8bSX8UJyq5Qupf+K
 /HJvZFrm0gv6W0Z9UFMy5JXIQ5NTriBzrdu5rgzhKPA/r8oV0/8i0pq/macXHOQF
 shPNZQXdsN6uCK57JNugSW6C96l2K4kP7rOSjTE6N0WFo9y/u2bCAbo0+hh0lvns
 2sSKLX2ZtVwvyy0LVcAJa0Q1ZU9CK5F4J8F2Zy3DFOJqV1GuRCh0LBgyWoL4jJEQ
 J3JLJdUevP7E7dvXIwzsDNWOUiRy0xAqFQOIcdvt4WhMsH7QsHIiYjodHFLKx2qq
 Xk9YSTua7A+UjpLDyt2iMbumplMomQqx72NLUM5Kv9r+kSId4Ird/Q8HYHvWezzY
 xUjIvOAMXI/ZKsrJRwE0V2xI+q2wmHPFYrBjl0CymWsCksc9kB1wzkqQYp/Tcxxm
 hhp3iPPY/mEXr4s=
 =bTtI
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v5.10-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fixes from Andreas Gruenbacher:
 "Fix jdata data corruption and glock reference leak"

* tag 'gfs2-v5.10-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix case in which ail writes are done to jdata holes
  Revert "gfs2: Ignore journal log writes for jdata holes"
  gfs2: fix possible reference leak in gfs2_check_blk_type
2020-11-12 16:37:14 -08:00
Linus Torvalds
db7c953555 Networking fixes for 5.10-rc4, including fixes from the bpf subtree.
Current release - regressions:
 
  - arm64: dts: fsl-ls1028a-kontron-sl28: specify in-band mode for ENETC
 
 Current release - bugs in new features:
 
  - mptcp: provide rmem[0] limit offset to fix oops
 
 Previous release - regressions:
 
  - IPv6: Set SIT tunnel hard_header_len to zero to fix path MTU
    calculations
 
  - lan743x: correctly handle chips with internal PHY
 
  - bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE
 
  - mlx5e: Fix VXLAN port table synchronization after function reload
 
 Previous release - always broken:
 
  - bpf: Zero-fill re-used per-cpu map element
 
  - net: udp: fix out-of-order packets when forwarding with UDP GSO
              fraglists turned on
    - fix UDP header access on Fast/frag0 UDP GRO
    - fix IP header access and skb lookup on Fast/frag0 UDP GRO
 
  - ethtool: netlink: add missing netdev_features_change() call
 
  - net: Update window_clamp if SOCK_RCVBUF is set
 
  - igc: Fix returning wrong statistics
 
  - ch_ktls: fix multiple leaks and corner cases in Chelsio TLS offload
 
  - tunnels: Fix off-by-one in lower MTU bounds for ICMP/ICMPv6 replies
 
  - r8169: disable hw csum for short packets on all chip versions
 
  - vrf: Fix fast path output packet handling with async Netfilter rules
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+thbIACgkQMUZtbf5S
 Irsy0RAAhYIYDNMSkQhcVcQPMxbtStwgTtKrWxg/D2zh3Kg+B4oRgoNZnt9kmlHX
 Su/aRWbTWBkDIMxIWBfRsO3z5zSQm4yLG1FTlfsOcWzOJcsntCO8SzikyxtnEZK8
 Bpi7dOoKB6KF0V2YjM9AHh5fbXvS7KJfp/PjZ7Kpn5BEbFV8rKtIyiJxwXXZUr6O
 ddM9Om4i0zf+dmsY1HVEyowPQMVB3vbn8F3dPk3ZrD8NVa53NtvMRxHKSsourRbZ
 yp4LKZV+POKHPFglO4jhLymhyeiwb1qgA8wssk7EKu0bwPeOcER4Tpewh1ib4C/C
 sRRzj0Wlw6dyPCkyNKx23D7dF/DrnLmXLUBhGS2mu2htSlWOH6w6rFQoVSNGGy9T
 DKUlUVUPG80mgYdME6NLJ27GOGQzxoAvzWgpcL6dJs9jz8nQqABJeXvdjw/vc/XH
 AOaKy4VwE3qf0W106JpUb+a/q0RJf7w3o4c1vLc/AZwpshNBOsrJBqrTk2E5Nrhd
 mcQykaF++DbLPIyTqhHl0GpKapohThESyMvfc4WRBFBaCwgFdOY/t0Gz3GA2N8Jc
 fuq9NOB1bfouaFGfzdkZ7RZJi3lFqZfv/XiJCh/knp1/lHAQPo4TuADcFDsjeEc9
 yr48SRDnCqahAQ7bUP0b5i31SZzwAYb/HnwYuvf4LWFvHl9XG5A=
 =AKM7
 -----END PGP SIGNATURE-----

Merge tag 'net-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Current release - regressions:

   - arm64: dts: fsl-ls1028a-kontron-sl28: specify in-band mode for
     ENETC

  Current release - bugs in new features:

   - mptcp: provide rmem[0] limit offset to fix oops

  Previous release - regressions:

   - IPv6: Set SIT tunnel hard_header_len to zero to fix path MTU
     calculations

   - lan743x: correctly handle chips with internal PHY

   - bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE

   - mlx5e: Fix VXLAN port table synchronization after function reload

  Previous release - always broken:

   - bpf: Zero-fill re-used per-cpu map element

   - fix out-of-order UDP packets when forwarding with UDP GSO fraglists
     turned on:
       - fix UDP header access on Fast/frag0 UDP GRO
       - fix IP header access and skb lookup on Fast/frag0 UDP GRO

   - ethtool: netlink: add missing netdev_features_change() call

   - net: Update window_clamp if SOCK_RCVBUF is set

   - igc: Fix returning wrong statistics

   - ch_ktls: fix multiple leaks and corner cases in Chelsio TLS offload

   - tunnels: Fix off-by-one in lower MTU bounds for ICMP/ICMPv6 replies

   - r8169: disable hw csum for short packets on all chip versions

   - vrf: Fix fast path output packet handling with async Netfilter
     rules"

* tag 'net-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
  lan743x: fix use of uninitialized variable
  net: udp: fix IP header access and skb lookup on Fast/frag0 UDP GRO
  net: udp: fix UDP header access on Fast/frag0 UDP GRO
  devlink: Avoid overwriting port attributes of registered port
  vrf: Fix fast path output packet handling with async Netfilter rules
  cosa: Add missing kfree in error path of cosa_write
  net: switch to the kernel.org patchwork instance
  ch_ktls: stop the txq if reaches threshold
  ch_ktls: tcb update fails sometimes
  ch_ktls/cxgb4: handle partial tag alone SKBs
  ch_ktls: don't free skb before sending FIN
  ch_ktls: packet handling prior to start marker
  ch_ktls: Correction in middle record handling
  ch_ktls: missing handling of header alone
  ch_ktls: Correction in trimmed_len calculation
  cxgb4/ch_ktls: creating skbs causes panic
  ch_ktls: Update cheksum information
  ch_ktls: Correction in finding correct length
  cxgb4/ch_ktls: decrypted bit is not enough
  net/x25: Fix null-ptr-deref in x25_connect
  ...
2020-11-12 14:02:04 -08:00
Linus Torvalds
200f9d21aa NFS Client Bugfixes for Linux 5.10-rc4
- Stable fixes:
   - Fix failure to unregister shrinker
 
 - Other fixes:
   - Fix unnecessary locking to clear up some contention
   - Fix listxattr receive buffer size
   - Fix default mount options for nfsroot
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl+tbGIACgkQ18tUv7Cl
 QOvRyxAA4YXD1dlnO2Xbqo7ZyrgoZkVn08rb9yloeCuCNJDZPDSXt2QHAKdbmMU+
 8dxpcWN/8RUEUJK3cccNf2+XV/AWqqaFnFXylcfXLUnjZx0f30ou+HO+BRZFInVd
 OgG3njO94jV1B3RK38J7jyVRqx3hd0Vkq9Ja4LVF2l/x9ueGrj+pOdNauWr1JhFo
 6l4Fk2PKakLKJGsxLXmKlBb7p+EEwKa1qRov8SED33uTZkSnbFOmbxtEp1bu7sQx
 UKBTLADny9FClA1sjM45XN2nLS99/uUl/CaRKm/GB5nP4WKG4J3HgziAAvVglHcP
 yrUIiwLaUGZvteiO5O6NJqZpk6NyzWnBo4ZDt/TZcQ5nvK7uD6buUbDFFn++lbKm
 qwVWCnsme7sx3zVLLS4pY2GXnNNkGozjyrQOV0NQx1QphfalKsXHxeXikY+dkXr5
 FZwKodWxiKlsZj8cyOVjrm9q3+EsBnW8FyitgVQH4QIvcU9Z9zdB5QFyy7KsG4bw
 3iKsbz4HsJ0K10m7ykNEcR5R6XQBnFVWGxAHkQ3qbxzw9hYvhEebP/N2P7x3DC1X
 3gVPDto03Vc5PsuGoXm50kqXpRD3w+fnpf+HMZFmRbqjanqBHvgyYu58Zy0fXEnQ
 VigUcvsjAJhmoneahO3va8HF3a70PPqhzTTVKtfORBNg9uHmS1M=
 =7a8T
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.10-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client bugfixes from Anna Schumaker:
 "Stable fixes:
  - Fix failure to unregister shrinker

  Other fixes:
  - Fix unnecessary locking to clear up some contention
  - Fix listxattr receive buffer size
  - Fix default mount options for nfsroot"

* tag 'nfs-for-5.10-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: Remove unnecessary inode lock in nfs_fsync_dir()
  NFS: Remove unnecessary inode locking in nfs_llseek_dir()
  NFS: Fix listxattr receive buffer size
  NFSv4.2: fix failure to unregister shrinker
  nfsroot: Default mount option should ask for built-in NFS version
2020-11-12 13:49:12 -08:00
Marc Zyngier
ed4ffaf49b KVM: arm64: Handle SCXTNUM_ELx traps
As the kernel never sets HCR_EL2.EnSCXT, accesses to SCXTNUM_ELx
will trap to EL2. Let's handle that as gracefully as possible
by injecting an UNDEF exception into the guest. This is consistent
with the guest's view of ID_AA64PFR0_EL1.CSV2 being at most 1.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201110141308.451654-4-maz@kernel.org
2020-11-12 21:22:46 +00:00
Marc Zyngier
338b17933a KVM: arm64: Unify trap handlers injecting an UNDEF
A large number of system register trap handlers only inject an
UNDEF exeption, and yet each class of sysreg seems to provide its
own, identical function.

Let's unify them all, saving us introducing yet another one later.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201110141308.451654-3-maz@kernel.org
2020-11-12 21:22:45 +00:00
Marc Zyngier
23711a5e66 KVM: arm64: Allow setting of ID_AA64PFR0_EL1.CSV2 from userspace
We now expose ID_AA64PFR0_EL1.CSV2=1 to guests running on hosts
that are immune to Spectre-v2, but that don't have this field set,
most likely because they predate the specification.

However, this prevents the migration of guests that have started on
a host the doesn't fake this CSV2 setting to one that does, as KVM
rejects the write to ID_AA64PFR0_EL2 on the grounds that it isn't
what is already there.

In order to fix this, allow userspace to set this field as long as
this doesn't result in a promising more than what is already there
(setting CSV2 to 0 is acceptable, but setting it to 1 when it is
already set to 0 isn't).

Fixes: e1026237f9 ("KVM: arm64: Set CSV2 for guests on hardware unaffected by Spectre-v2")
Reported-by: Peng Liang <liangpeng10@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201110141308.451654-2-maz@kernel.org
2020-11-12 21:22:22 +00:00
Marc Zyngier
4f6b838c37 Linux 5.10-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl+V+LMeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGGQoH/1FIf6373lekuQf0
 pSq+2PPeILjL6+BppjNGJdwTKTEFEaz7xBpDwZURW2dt0M5jib2sn/0VJ/lh0Ln3
 880hXPjVyziU7/p1vTiPFYwKxav/ZE5cHrEW+nKimucyYPgkDxikFRuvrPQ1M0Sc
 vLZMmwjQlBD1kTsh9WR5lQ9Z8KqUtOazW47AbWE5QTTCQPmIXIdqByqLXlqS46Ok
 gW8tqaCI+FpBLP3fJn0EX5UTYH1Tsj9TmIFE8jqm5lGa/+VDM5KNyczEosKv86Xk
 0hBEUbAAZWdwieySJwBH7Njqu9g1o7bRUIJJsbXm0Fcnu+Ft619r3mJkkkXaaWKN
 mk7M/Uk=
 =1dE8
 -----END PGP SIGNATURE-----

Merge tag 'v5.10-rc1' into kvmarm-master/next

Linux 5.10-rc1

Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-12 21:20:43 +00:00
Linus Torvalds
af5043c89a ACPI updates for 5.10-rc4.
- Fix documentation regarding GPIO properties (Andy Shevchenko).
 
  - Fix spelling mistakes in ACPI documentation (Flavio Suligoi).
 
  - Fix white space inconsistencies in ACPI code (Maximilian Luz).
 
  - Fix string formatting in the ACPI Generic Event Device (GED)
    driver (Nick Desaulniers).
 
  - Add Intel Alder Lake device IDs to the ACPI drivers used by the
    Dynamic Platform and Thermal Framework (Srinivas Pandruvada).
 
  - Add lid-related DMI quirk for Medion Akoya E2228T to the ACPI
    button driver (Hans de Goede).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl+tVekSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxoAQQAJOvpgaXEwAm64wLVCuJRllGWcMmufh5
 EdUb1JMZ4IKhnPLi6ZWvmOKDNkWyIqG5DgT0FILl5b5LgWOGtvqsZ5aTqOKDTJvJ
 57cMVXQHBna5+Zp9nL51XeQfDZukmVYaTxckdgaeltsal8/6Gfy/V6mkLlSl3a5L
 PkxxrDVa9M1SVg/aRsx//HKw3M4O/aGURR3kv6ao8DetMRNORbuY1pv2znWRSda/
 eMcNZXEyEwgekL34VKBJhxUD/pSjunV6qcUPin3lA8viaSjbaLkvdTteOVrlwu/S
 EE8wXfwDODPJT1PBvckobGjsQfHix0COK8MatkxUMEyLBG2LdHnHhV8fObQtAEuM
 wOf2Yz7LtCrSWVC9VOEMUKfIXbIpj4VHqOj7Oby+ymIrq5OaXxOmixwjaQh2HLgM
 XCCSicP9kk+UxiVK15gGF1veVqld7CA6SRm9cGHc94QJuTsvrl3p5E32UHz0CjkM
 l+CBIhOUE7cDq1AQ0ikJJmfdr152NzFILIbMAa+xjFgFmWZabOJszYGSlKl7FNnG
 xbYI4cR8uDsYR1Mjb66yhpdncSxThq3HkuX0zgvhEpclyfWm3Ocg+4ZhIhn9VHug
 Wj/dDjBQozNgGYvtUj085FzDCnVgarR4wjZ3QtubUEvMia1m7ssTrPSys9aE5Gwt
 RWqs7x9Feqw/
 =tUOl
 -----END PGP SIGNATURE-----

Merge tag 'acpi-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These are mostly docmentation fixes and janitorial changes plus some
  new device IDs and a new quirk.

  Specifics:

   - Fix documentation regarding GPIO properties (Andy Shevchenko)

   - Fix spelling mistakes in ACPI documentation (Flavio Suligoi)

   - Fix white space inconsistencies in ACPI code (Maximilian Luz)

   - Fix string formatting in the ACPI Generic Event Device (GED) driver
     (Nick Desaulniers)

   - Add Intel Alder Lake device IDs to the ACPI drivers used by the
     Dynamic Platform and Thermal Framework (Srinivas Pandruvada)

   - Add lid-related DMI quirk for Medion Akoya E2228T to the ACPI
     button driver (Hans de Goede)"

* tag 'acpi-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: DPTF: Support Alder Lake
  Documentation: ACPI: fix spelling mistakes
  ACPI: button: Add DMI quirk for Medion Akoya E2228T
  ACPI: GED: fix -Wformat
  ACPI: Fix whitespace inconsistencies
  ACPI: scan: Fix acpi_dma_configure_id() kerneldoc name
  Documentation: firmware-guide: gpio-properties: Clarify initial output state
  Documentation: firmware-guide: gpio-properties: active_low only for GpioIo()
  Documentation: firmware-guide: gpio-properties: Fix factual mistakes
2020-11-12 11:06:53 -08:00
Linus Torvalds
fcfb67918c Power management fixes for 5.10-rc4.
Make the intel_pstate driver behave as expected when it operates in
 the passive mode with HWP enabled and the "powersave" governor on
 top of it.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl+tVWUSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxNIwQALNj2uV+CJL4DCvcMPFqvtB7bwwsk1mS
 fqRk/wEhz5noE/x3uhD1DKlL3VPj1sDUVRmHKSdIgqrwSFX2zbw+cf2y6E94WdDz
 /7x0Khj4mT5cfGHacItNBnkglCrxVXxSdU4DcPTgINlWM8iv6W8D3uK5OpFYDtKr
 5shqf45U8/+fh7hGCtNnofAZEVU+YTDzY0jHnnIxD8FKXFLaDFj6jVGjgdXgBD5s
 /XgsKz867SybzLuTW9O0SKDughMhmhaqXnHwtu9jvlw/3i1Wn16r2LeCWyIkoxWy
 MRNXg4rOerJvK084gxJW9BWmCuA6NnKBtKvXNlqHzl14ept3Cf3dYtaQ50x7eYQB
 osMWbBDdRjV1fo7SptMXQmn8sKxZgrjc0pSYicbiMOH3BkpIn5ed4+MPWfWN8pyb
 piRkx17sFwPE7jI5Rkuv+EucisG8tNvWImE9gFENxtelF1rj7njV1xMYlevrD/9u
 aYTNYUeRAc6DA2AF/mzXtqwXpDqoxa7X0UBl8JFmkLvcvORtR3XY6HlcYMgp820a
 /Sh0rZmuUlssUnrhd1Kr6QRiMIrCihTnbhTXsY0oZH4QSYYJCS89qijngAtqobEt
 K+eqsHHoGmVK3Ch+O+YpFo+GpH5Avk0b/DisX3Zu20hGEX4fvv4q7ZTuNSnHhgOL
 ERjLBUQZUFaf
 =zh3K
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "Make the intel_pstate driver behave as expected when it operates in
  the passive mode with HWP enabled and the 'powersave' governor on top
  of it"

* tag 'pm-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: intel_pstate: Take CPUFREQ_GOV_STRICT_TARGET into account
  cpufreq: Add strict_target to struct cpufreq_policy
  cpufreq: Introduce CPUFREQ_GOV_STRICT_TARGET
  cpufreq: Introduce governor flags
2020-11-12 11:03:38 -08:00
Sven Van Asbroeck
edbc21113b lan743x: fix use of uninitialized variable
When no devicetree is present, the driver will use an
uninitialized variable.

Fix by initializing this variable.

Fixes: 902a66e08c ("lan743x: correctly handle chips with internal PHY")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Link: https://lore.kernel.org/r/20201112152513.1941-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12 10:03:16 -08:00
Jakub Kicinski
5861c8cb1c Merge branch 'net-udp-fix-fast-frag0-udp-gro'
Alexander Lobakin says:

====================
net: udp: fix Fast/frag0 UDP GRO

While testing UDP GSO fraglists forwarding through driver that uses
Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order
iperf packets:

[ ID] Interval           Transfer     Bitrate         Jitter
[SUM]  0.0-40.0 sec  12106 datagrams received out-of-order

Simple switch to napi_gro_receive() or any other method without frag0
shortcut completely resolved them.

I've found two incorrect header accesses in GRO receive callback(s):
 - udp_hdr() (instead of udp_gro_udphdr()) that always points to junk
   in "fast" mode and could probably do this in "regular".
   This was the actual bug that caused all out-of-order delivers;
 - udp{4,6}_lib_lookup_skb() -> ip{,v6}_hdr() (instead of
   skb_gro_network_header()) that potentionally might return odd
   pointers in both modes.

Each patch addresses one of these two issues.

This doesn't cover a support for nested tunnels as it's out of the
subject and requires more invasive changes. It will be handled
separately in net-next series.

Credits:
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>

Since v4 [0]:
 - split the fix into two logical ones (Willem);
 - replace ternaries with plain ifs to beautify the code (Jakub);
 - drop p->data part to reintroduce it later in abovementioned set.

Since v3 [1]:
 - restore the original {,__}udp{4,6}_lib_lookup_skb() and use
   private versions of them inside GRO code (Willem).

Since v2 [2]:
 - dropped redundant check introduced in v2 as it's performed right
   before (thanks to Eric);
 - udp_hdr() switched to data + off for skbs from list (also Eric);
 - fixed possible malfunction of {,__}udp{4,6}_lib_lookup_skb() with
   Fast/frag0 due to ip{,v6}_hdr() usage (Willem).

Since v1 [3]:
 - added a NULL pointer check for "uh" as suggested by Willem.

[0] https://lore.kernel.org/netdev/Ha2hou5eJPcblo4abjAqxZRzIl1RaLs2Hy0oOAgFs@cp4-web-036.plabs.ch
[1] https://lore.kernel.org/netdev/MgZce9htmEtCtHg7pmWxXXfdhmQ6AHrnltXC41zOoo@cp7-web-042.plabs.ch
[2] https://lore.kernel.org/netdev/0eaG8xtbtKY1dEKCTKUBubGiC9QawGgB3tVZtNqVdY@cp4-web-030.plabs.ch
[3] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch
====================

Link: https://lore.kernel.org/r/hjGOh0iCOYyo1FPiZh6TMXcx3YCgNs1T1eGKLrDz8@cp4-web-037.plabs.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12 09:55:59 -08:00