linux/arch/x86
Vitaly Kuznetsov 51500b71d5 x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi()
KASAN detected the following issue:

 BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_multi+0xf88/0x1060
 Read of size 4 at addr ffff8880011ccbc0 by task kcompactd0/33

 CPU: 1 PID: 33 Comm: kcompactd0 Not tainted 5.14.0-39.el9.x86_64+debug #1
 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine,
     BIOS Hyper-V UEFI Release v4.0 12/17/2019
 Call Trace:
  dump_stack_lvl+0x57/0x7d
  print_address_description.constprop.0+0x1f/0x140
  ? hyperv_flush_tlb_multi+0xf88/0x1060
  __kasan_report.cold+0x7f/0x11e
  ? hyperv_flush_tlb_multi+0xf88/0x1060
  kasan_report+0x38/0x50
  hyperv_flush_tlb_multi+0xf88/0x1060
  flush_tlb_mm_range+0x1b1/0x200
  ptep_clear_flush+0x10e/0x150
...
 Allocated by task 0:
  kasan_save_stack+0x1b/0x40
  __kasan_kmalloc+0x7c/0x90
  hv_common_init+0xae/0x115
  hyperv_init+0x97/0x501
  apic_intr_mode_init+0xb3/0x1e0
  x86_late_time_init+0x92/0xa2
  start_kernel+0x338/0x3eb
  secondary_startup_64_no_verify+0xc2/0xcb

 The buggy address belongs to the object at ffff8880011cc800
  which belongs to the cache kmalloc-1k of size 1024
 The buggy address is located 960 bytes inside of
  1024-byte region [ffff8880011cc800, ffff8880011ccc00)

'hyperv_flush_tlb_multi+0xf88/0x1060' points to
hv_cpu_number_to_vp_number() and '960 bytes' means we're trying to get
VP_INDEX for CPU#240. 'nr_cpus' here is exactly 240 so we're trying to
access past hv_vp_index's last element. This can (and will) happen
when 'cpus' mask is empty and cpumask_last() will return '>=nr_cpus'.

Commit ad0a6bad44 ("x86/hyperv: check cpu mask after interrupt has
been disabled") tried to deal with empty cpumask situation but
apparently didn't fully fix the issue.

'cpus' cpumask which is passed to hyperv_flush_tlb_multi() is
'mm_cpumask(mm)' (which is '&mm->cpu_bitmap'). This mask changes every
time the particular mm is scheduled/unscheduled on some CPU (see
switch_mm_irqs_off()), disabling IRQs on the CPU which is performing remote
TLB flush has zero influence on whether the particular process can get
scheduled/unscheduled on _other_ CPUs so e.g. in the case where the mm was
scheduled on one other CPU and got unscheduled during
hyperv_flush_tlb_multi()'s execution will lead to cpumask becoming empty.

It doesn't seem that there's a good way to protect 'mm_cpumask(mm)'
from changing during hyperv_flush_tlb_multi()'s execution. It would be
possible to copy it in the very beginning of the function but this is a
waste. It seems we can deal with changing cpumask just fine.

When 'cpus' cpumask changes during hyperv_flush_tlb_multi()'s
execution, there are two possible issues:
- 'Under-flushing': we will not flush TLB on a CPU which got added to
the mask while hyperv_flush_tlb_multi() was already running. This is
not a problem as this is equal to mm getting scheduled on that CPU
right after TLB flush.
- 'Over-flushing': we may flush TLB on a CPU which is already cleared
from the mask. First, extra TLB flush preserves correctness. Second,
Hyper-V's TLB flush hypercall takes 'mm->pgd' argument so Hyper-V may
avoid the flush if CR3 doesn't match.

Fix the immediate issue with cpumask_last()/hv_cpu_number_to_vp_number()
and remove the pointless cpumask_empty() check from the beginning of the
function as it really doesn't protect anything. Also, avoid the hypercall
altogether when 'flush->processor_mask' ends up being empty.

Fixes: ad0a6bad44 ("x86/hyperv: check cpu mask after interrupt has been disabled")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220106094611.1404218-1-vkuznets@redhat.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-01-10 11:50:20 +00:00
..
boot - Do not #GP on userspace use of CLI/STI but pretend it was a NOP to 2021-11-02 07:56:47 -07:00
configs configs: remove the obsolete CONFIG_INPUT_POLLDEV 2021-09-08 11:50:28 -07:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2021-11-01 21:24:02 -07:00
entry x86/xen: Add xenpv_restore_regs_and_return_to_usermode() 2021-12-03 19:21:15 +01:00
events x86/perf: Fix snapshot_branch_stack warning in VM 2021-11-17 14:48:43 +01:00
hyperv x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi() 2022-01-10 11:50:20 +00:00
ia32 audit/stable-5.16 PR 20211101 2021-11-01 21:17:39 -07:00
include x86/hyperv: Fix definition of hv_ghcb_pg variable 2021-12-28 14:18:47 +00:00
kernel hyper-v: Enable swiotlb bounce buffer for Isolation VM 2021-12-20 18:01:09 +00:00
kvm KVM: x86: Don't WARN if userspace mucks with RCX during string I/O exit 2021-12-10 09:38:02 -05:00
lib - Do not #GP on userspace use of CLI/STI but pretend it was a NOP to 2021-11-02 07:56:47 -07:00
math-emu x86/math-emu: Convert to fpstate 2021-10-20 23:57:54 +02:00
mm Merge branch 'kvm-guest-sev-migration' into kvm-master 2021-11-11 07:40:26 -05:00
net Core: 2021-11-02 06:20:58 -07:00
pci xen: branch for v5.16-rc1 2021-11-10 11:14:21 -08:00
platform x86/sme: Explicitly map new EFI memmap table as encrypted 2021-12-05 16:44:52 +01:00
power x86/fpu: Replace the includes of fpu/internal.h 2021-10-20 15:27:29 +02:00
purgatory kernel.h: split out panic and oops helpers 2021-07-01 11:06:04 -07:00
ras
realmode x86/64/mm: Map all kernel memory into trampoline_pgd 2021-12-03 09:11:43 +01:00
tools Driver core changes for 5.16-rc1 2021-11-04 08:32:38 -07:00
um um: fix stub location calculation 2021-08-26 22:28:03 +02:00
video
xen x86/xen: Add xenpv_restore_regs_and_return_to_usermode() 2021-12-03 19:21:15 +01:00
.gitignore
Kbuild kbuild: use more subdir- for visiting subdirectories while cleaning 2021-10-24 13:49:46 +09:00
Kconfig EFI fix for v5.16 2021-12-06 10:09:00 -08:00
Kconfig.assembler
Kconfig.cpu x86/CPU: Add support for Vortex CPUs 2021-10-21 15:49:07 +02:00
Kconfig.debug tracing: Refactor TRACE_IRQFLAGS_SUPPORT in Kconfig 2021-08-16 11:37:21 -04:00
Makefile Kbuild updates for v5.16 2021-11-08 09:15:45 -08:00
Makefile_32.cpu x86/build: Do not add -falign flags unconditionally for clang 2021-09-19 10:35:53 +09:00
Makefile.um um: allow not setting extra rpaths in the linux binary 2021-06-17 21:54:15 +02:00