1121504 Commits

Author SHA1 Message Date
Xuan Zhuo
04ca0b0b16 virtio_pci: support VIRTIO_F_RING_RESET
This patch implements virtio pci support for QUEUE RESET.

Performing reset on a queue is divided into these steps:

 1. notify the device to reset the queue
 2. recycle the buffer submitted
 3. reset the vring (may re-alloc)
 4. mmap vring to device, and enable the queue

This patch implements virtio_reset_vq(), virtio_enable_resetq() in the
pci scenario.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-33-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:40 -04:00
Xuan Zhuo
56bdc06139 virtio_pci: extract the logic of active vq for modern pci
Introduce vp_active_vq() to configure vring to backend after vq attach
vring. And configure vq vector if necessary.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-32-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:40 -04:00
Xuan Zhuo
0b50cece0b virtio_pci: introduce helper to get/set queue reset
Introduce new helpers to implement queue reset and get queue reset
status.

 https://github.com/oasis-tcs/virtio-spec/issues/124
 https://github.com/oasis-tcs/virtio-spec/issues/139

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-31-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:40 -04:00
Xuan Zhuo
0cdd450e70 virtio_pci: struct virtio_pci_common_cfg add queue_reset
Add queue_reset in virtio_pci_modern_common_cfg.

 https://github.com/oasis-tcs/virtio-spec/issues/124
 https://github.com/oasis-tcs/virtio-spec/issues/139

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-30-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:40 -04:00
Xuan Zhuo
4913e85441 virtio_ring: struct virtqueue introduce reset
Introduce a new member reset to the structure virtqueue to determine
whether the current vq is in the reset state. Subsequent patches will
use it.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-29-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:40 -04:00
Xuan Zhuo
d94587b5bb virtio: queue_reset: add VIRTIO_F_RING_RESET
Added VIRTIO_F_RING_RESET, it came from here

https://github.com/oasis-tcs/virtio-spec/issues/124
https://github.com/oasis-tcs/virtio-spec/issues/139

This feature indicates that the driver can reset a queue individually.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-28-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:40 -04:00
Xuan Zhuo
3251063155 virtio: allow to unbreak/break virtqueue individually
This patch allows the new introduced
__virtqueue_break()/__virtqueue_unbreak() to break/unbreak the
virtqueue.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-27-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:40 -04:00
Xuan Zhuo
ea024594b1 virtio_pci: struct virtio_pci_common_cfg add queue_notify_data
Add queue_notify_data in struct virtio_pci_common_cfg, which comes from
here https://github.com/oasis-tcs/virtio-spec/issues/89

In order not to affect the API, add a dedicated structure struct
virtio_pci_modern_common_cfg to virtio_pci_modern.h.

Since I want to add queue_reset after queue_notify_data, I submitted
this patch first.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-26-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:40 -04:00
Xuan Zhuo
c790e8e181 virtio_ring: introduce virtqueue_resize()
Introduce virtqueue_resize() to implement the resize of vring.
Based on these, the driver can dynamically adjust the size of the vring.
For example: ethtool -G.

virtqueue_resize() implements resize based on the vq reset function. In
case of failure to allocate a new vring, it will give up resize and use
the original vring.

During this process, if the re-enable reset vq fails, the vq can no
longer be used. Although the probability of this situation is not high.

The parameter recycle is used to recycle the buffer that is no longer
used.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-25-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:40 -04:00
Xuan Zhuo
947f9fcf67 virtio_ring: packed: introduce virtqueue_resize_packed()
virtio ring packed supports resize.

Only after the new vring is successfully allocated based on the new num,
we will release the old vring. In any case, an error is returned,
indicating that the vring still points to the old vring.

In the case of an error, re-initialize(by virtqueue_reinit_packed()) the
virtqueue to ensure that the vring can be used.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-24-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:40 -04:00
Xuan Zhuo
56775e141b virtio_ring: packed: introduce virtqueue_reinit_packed()
Introduce a function to initialize vq without allocating new ring,
desc_state, desc_extra.

Subsequent patches will call this function after reset vq to
reinitialize vq.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-23-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:40 -04:00
Xuan Zhuo
51d649f14a virtio_ring: packed: extract the logic of attach vring
Separate the logic of attach vring, the subsequent patch will call it
separately.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-22-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:39 -04:00
Xuan Zhuo
1a107c87eb virtio_ring: packed: extract the logic of vring init
Separate the logic of initializing vring, and subsequent patches will
call it separately.

This function completes the variable initialization of packed vring. It
together with the logic of atatch constitutes the initialization of
vring.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-21-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:39 -04:00
Xuan Zhuo
ef3167cfd5 virtio_ring: packed: extract the logic of alloc state and extra
Separate the logic for alloc desc_state and desc_extra, which will
be called separately by subsequent patches.

Use struct vring_packed to pass desc_state, desc_extra.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-20-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:39 -04:00
Xuan Zhuo
6b60b9c008 virtio_ring: packed: extract the logic of alloc queue
Separate the logic of packed to create vring queue.

This feature is required for subsequent virtuqueue reset vring.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-19-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:39 -04:00
Xuan Zhuo
6356f8bb01 virtio_ring: packed: introduce vring_free_packed
Free the structure struct vring_vritqueue_packed.

Subsequent patches require it.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-18-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:39 -04:00
Xuan Zhuo
6fea20e561 virtio_ring: split: introduce virtqueue_resize_split()
virtio ring split supports resize.

Only after the new vring is successfully allocated based on the new num,
we will release the old vring. In any case, an error is returned,
indicating that the vring still points to the old vring.

In the case of an error, re-initialize(virtqueue_reinit_split()) the
virtqueue to ensure that the vring can be used.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-17-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:39 -04:00
Xuan Zhuo
af36b16f6c virtio_ring: split: reserve vring_align, may_reduce_num
In vring_alloc_queue_split() save vring_align, may_reduce_num to
structure vring_virtqueue_split. Used to create a new vring when
implementing resize.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-16-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:39 -04:00
Xuan Zhuo
e5175b419a virtio_ring: split: introduce virtqueue_reinit_split()
Introduce a function to initialize vq without allocating new ring,
desc_state, desc_extra.

Subsequent patches will call this function after reset vq to
reinitialize vq.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-15-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:39 -04:00
Xuan Zhuo
e1d6a423ea virtio_ring: split: extract the logic of attach vring
Separate the logic of attach vring, subsequent patches will call it
separately.

virtqueue_vring_init_split() completes the initialization of other
variables of vring split. We can directly use
vq->split = *vring_split to complete attach.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-14-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:39 -04:00
Xuan Zhuo
198fa7be96 virtio_ring: split: extract the logic of vring init
Separate the logic of initializing vring, and subsequent patches will
call it separately.

This function completes the variable initialization of split vring. It
together with the logic of atatch constitutes the initialization of
vring.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-13-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:39 -04:00
Xuan Zhuo
a2b36c8d7d virtio_ring: split: extract the logic of alloc state and extra
Separate the logic of creating desc_state, desc_extra, and subsequent
patches will call it independently.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-12-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:39 -04:00
Xuan Zhuo
c2d87fe68d virtio_ring: split: extract the logic of alloc queue
Separate the logic of split to create vring queue.

This feature is required for subsequent virtuqueue reset vring.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-11-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:39 -04:00
Xuan Zhuo
89f05d94a3 virtio_ring: split: introduce vring_free_split()
Free the structure struct vring_vritqueue_split.

Subsequent patches require it.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-10-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:39 -04:00
Xuan Zhuo
cd4c812acb virtio_ring: split: __vring_new_virtqueue() accept struct vring_virtqueue_split
__vring_new_virtqueue() instead accepts struct vring_virtqueue_split.

The purpose of this is to pass more information into
__vring_new_virtqueue() to make the code simpler and the structure
cleaner.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-9-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:38 -04:00
Xuan Zhuo
07d9629d49 virtio_ring: split: stop __vring_new_virtqueue as export symbol
There is currently only one place to reference __vring_new_virtqueue()
directly from the outside of virtio core. And here vring_new_virtqueue()
can be used instead.

Subsequent patches will modify __vring_new_virtqueue, so stop it as an
export symbol for now.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-8-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:38 -04:00
Xuan Zhuo
3a897128d3 virtio_ring: introduce virtqueue_init()
Separate the logic of virtqueue initialization. These variables should
be reset during reset.

This logic can be called independently when implementing resize/reset
later.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-7-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:38 -04:00
Xuan Zhuo
d76136e434 virtio_ring: split vring_virtqueue
Separate the two inline structures(split and packed) from the structure
vring_virtqueue.

In this way, we can use these two structures later to pass parameters
and retain temporary variables.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-6-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:38 -04:00
Xuan Zhuo
3ea19e3265 virtio_ring: extract the logic of freeing vring
Introduce vring_free() to free the vring of vq.

Subsequent patches will use vring_free() alone.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-5-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:38 -04:00
Xuan Zhuo
a62eecb3a9 virtio_ring: update the document of the virtqueue_detach_unused_buf for queue reset
Added documentation for virtqueue_detach_unused_buf, allowing it to be
called on queue reset.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-4-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:38 -04:00
Xuan Zhuo
3086e9fc91 virtio: struct virtio_config_ops add callbacks for queue_reset
reset can be divided into the following four steps (example):
 1. transport: notify the device to reset the queue
 2. vring:     recycle the buffer submitted
 3. vring:     reset/resize the vring (may re-alloc)
 4. transport: mmap vring to device, and enable the queue

In order to support queue reset, add two callbacks in struct
virtio_config_ops to implement steps 1 and 4.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-3-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:38 -04:00
Xuan Zhuo
da80296183 virtio: record the maximum queue num supported by the device.
virtio-net can display the maximum (supported by hardware) ring size in
ethtool -g eth0.

When the subsequent patch implements vring reset, it can judge whether
the ring size passed by the driver is legal based on this.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-2-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:38 -04:00
David Hildenbrand
0b6fd46ec5 drivers/virtio: Clarify CONFIG_VIRTIO_MEM for unsupported architectures
Let's make it clearer that simply unlocking CONFIG_VIRTIO_MEM on an
architecture is most probably not sufficient to have it working as
expected.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Gavin Shan <gshan@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20220610094737.65254-1-david@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:38 -04:00
Minghao Xue
02213273f7 virtio_mmio: add support to set IRQ of a virtio device as wakeup source
According to virtio_mmio wakeup flag in device trees, set its IRQ
as wakeup source in virtqueue initialization.

Signed-off-by: Minghao Xue <quic_mingxue@quicinc.com>
Message-Id: <1654851507-13891-3-git-send-email-quic_mingxue@quicinc.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:38 -04:00
Minghao Xue
51ded7cdf2 dt-bindings: virtio: mmio: add optional wakeup-source property
Some systems want to set the interrupt of virtio_mmio device
as a wakeup source. On such systems, we'll use the existence
of the "wakeup-source" property as a signal of requirement.

Signed-off-by: Minghao Xue <quic_mingxue@quicinc.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Message-Id: <1654851507-13891-2-git-send-email-quic_mingxue@quicinc.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:38 -04:00
Robin Murphy
366958a7fd vdpa: Use device_iommu_capable()
Use the new interface to check the capability for our device
specifically.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Message-Id: <548e316fa282ce513fabb991a4c4d92258062eb5.1654688822.git.robin.murphy@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11 04:06:38 -04:00
Michael S. Tsirkin
ebe797f25f virtio: VIRTIO_HARDEN_NOTIFICATION is broken
This option doesn't really work and breaks too many drivers.
Not yet sure what's the right thing to do, for now
let's make sure randconfig isn't broken by this.

Fixes: c346dae4f3fb ("virtio: disable notification hardening by default")
Cc: "Jason Wang" <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11 04:06:37 -04:00
Jason Wang
5d66322b2b virtio_pmem: set device ready in probe()
The NVDIMM region could be available before the virtio_device_ready()
that is called by virtio_dev_probe(). This means the driver tries to
use device before DRIVER_OK which violates the spec, fixing this by
set device ready before the nvdimm_pmem_region_create().

Note that this means the virtio_pmem_host_ack() could be triggered
before the creation of the nd region, this is safe since the pmem_lock
has been initialized and whether or not any available buffer is added
before is validated by virtio_pmem_host_ack().

Fixes 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
Acked-by: Pankaj Gupta <pankaj.gupta@amd.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220628083430.61856-2-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:37 -04:00
Jason Wang
020e1aede7 virtio_pmem: initialize provider_data through nd_region_desc
We used to initialize the provider_data manually after
nvdimm_pemm_region_create(). This seems to be racy if the flush is
issued before the initialization of provider_data[1]. Fixing this by
initializing the provider_data through nd_region_desc to make sure the
provider_data is ready after the pmem is created.

[1]:

[   80.152281] nd_pmem namespace0.0: unable to guarantee persistence of writes
[   92.393956] BUG: kernel NULL pointer dereference, address: 0000000000000318
[   92.394551] #PF: supervisor read access in kernel mode
[   92.394955] #PF: error_code(0x0000) - not-present page
[   92.395365] PGD 0 P4D 0
[   92.395566] Oops: 0000 [#1] PREEMPT SMP PTI
[   92.395867] CPU: 2 PID: 506 Comm: mkfs.ext4 Not tainted 5.19.0-rc1+ #453
[   92.396365] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[   92.397178] RIP: 0010:virtio_pmem_flush+0x2f/0x1f0
[   92.397521] Code: 55 41 54 55 53 48 81 ec a0 00 00 00 65 48 8b 04
25 28 00 00 00 48 89 84 24 98 00 00 00 31 c0 48 8b 87 78 03 00 00 48
89 04 24 <48> 8b 98 18 03 00 00 e8 85 bf 6b 00 ba 58 00 00 00 be c0 0c
00 00
[   92.398982] RSP: 0018:ffff9a7380aefc88 EFLAGS: 00010246
[   92.399349] RAX: 0000000000000000 RBX: ffff8e77c3f86f00 RCX: 0000000000000000
[   92.399833] RDX: ffffffffad4ea720 RSI: ffff8e77c41e39c0 RDI: ffff8e77c41c5c00
[   92.400388] RBP: ffff8e77c41e39c0 R08: ffff8e77c19f0600 R09: 0000000000000000
[   92.400874] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8e77c0814e28
[   92.401364] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8e77c41e39c0
[   92.401849] FS:  00007f3cd75b2780(0000) GS:ffff8e7937d00000(0000)
knlGS:0000000000000000
[   92.402423] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   92.402821] CR2: 0000000000000318 CR3: 0000000103c80002 CR4: 0000000000370ee0
[   92.403307] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   92.403793] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   92.404278] Call Trace:
[   92.404481]  <TASK>
[   92.404654]  ? mempool_alloc+0x5d/0x160
[   92.404939]  ? terminate_walk+0x5f/0xf0
[   92.405226]  ? bio_alloc_bioset+0xbb/0x3f0
[   92.405525]  async_pmem_flush+0x17/0x80
[   92.405806]  nvdimm_flush+0x11/0x30
[   92.406067]  pmem_submit_bio+0x1e9/0x200
[   92.406354]  __submit_bio+0x80/0x120
[   92.406621]  submit_bio_noacct_nocheck+0xdc/0x2a0
[   92.406958]  submit_bio_wait+0x4e/0x80
[   92.407234]  blkdev_issue_flush+0x31/0x50
[   92.407526]  ? punt_bios_to_rescuer+0x230/0x230
[   92.407852]  blkdev_fsync+0x1e/0x30
[   92.408112]  do_fsync+0x33/0x70
[   92.408354]  __x64_sys_fsync+0xb/0x10
[   92.408625]  do_syscall_64+0x43/0x90
[   92.408895]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[   92.409257] RIP: 0033:0x7f3cd76c6c44

Fixes 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
Acked-by: Pankaj Gupta <pankaj.gupta@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220628083430.61856-1-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:37 -04:00
Stefano Garzarella
309bba39c9 vringh: iterate on iotlb_translate to handle large translations
iotlb_translate() can return -ENOBUFS if the bio_vec is not big enough
to contain all the ranges for translation.
This can happen for example if the VMM maps a large bounce buffer,
without using hugepages, that requires more than 16 ranges to translate
the addresses.

To handle this case, let's extend iotlb_translate() to also return the
number of bytes successfully translated.
In copy_from_iotlb()/copy_to_iotlb() loops by calling iotlb_translate()
several times until we complete the translation.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20220624075656.13997-1-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:37 -04:00
Xuan Zhuo
96ef18a24b virtio_ring: remove the arg vq of vring_alloc_desc_extra()
The parameter vq of vring_alloc_desc_extra() is useless. This patch
removes this parameter.

Subsequent patches will call this function to avoid passing useless
arguments.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220624025621.128843-6-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:37 -04:00
Xuan Zhuo
c2a052a4a9 remoteproc: rename len of rpoc_vring to num
Rename the member len in the structure rpoc_vring to num. And remove 'in
bytes' from the comment of it. This is misleading. Because this actually
refers to the size of the virtio vring to be created. The unit is not
bytes.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20220624025621.128843-2-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11 04:06:37 -04:00
Alexei Starovoitov
4e4588f1c4 bpf: Shut up kern_sys_bpf warning.
Shut up this warning:
kernel/bpf/syscall.c:5089:5: warning: no previous prototype for function 'kern_sys_bpf' [-Wmissing-prototypes]
int kern_sys_bpf(int cmd, union bpf_attr *attr, unsigned int size)

Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-10 23:58:13 -07:00
Bagas Sanjaya
19a7cc817a KVM: x86/MMU: properly format KVM_CAP_VM_DISABLE_NX_HUGE_PAGES capability table
There is unexpected warning on KVM_CAP_VM_DISABLE_NX_HUGE_PAGES capability
table, which cause the table to be rendered as paragraph text instead.

The warning is due to missing colon at capability name and returns keyword,
as well as improper alignment on multi-line returns field.

Fix the warning by adding missing colons and aligning the field.

Link: https://lore.kernel.org/lkml/20220627181937.3be67263@canb.auug.org.au/
Fixes: 084cc29f8bbb03 ("KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Matlack <dmatlack@google.com>
Cc: Ben Gardon <bgardon@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-next@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Message-Id: <20220627095151.19339-3-bagasdotme@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-11 02:35:37 -04:00
Bagas Sanjaya
b4aed4d85f Documentation: KVM: extend KVM_CAP_VM_DISABLE_NX_HUGE_PAGES heading underline
Extend heading underline for KVM_CAP_VM_DISABLE_NX_HUGE_PAGE to match
the heading text length.

Link: https://lore.kernel.org/lkml/20220627181937.3be67263@canb.auug.org.au/
Fixes: 084cc29f8bbb03 ("KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Matlack <dmatlack@google.com>
Cc: Ben Gardon <bgardon@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-next@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Message-Id: <20220627095151.19339-2-bagasdotme@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-11 02:35:36 -04:00
Maxim Mikityanskiy
94ce3b64c6 net/tls: Use RCU API to access tls_ctx->netdev
Currently, tls_device_down synchronizes with tls_device_resync_rx using
RCU, however, the pointer to netdev is stored using WRITE_ONCE and
loaded using READ_ONCE.

Although such approach is technically correct (rcu_dereference is
essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store
NULL), using special RCU helpers for pointers is more valid, as it
includes additional checks and might change the implementation
transparently to the callers.

Mark the netdev pointer as __rcu and use the correct RCU helpers to
access it. For non-concurrent access pass the right conditions that
guarantee safe access (locks taken, refcount value). Also use the
correct helper in mlx5e, where even READ_ONCE was missing.

The transition to RCU exposes existing issues, fixed by this commit:

1. bond_tls_device_xmit could read netdev twice, and it could become
NULL the second time, after the NULL check passed.

2. Drivers shouldn't stop processing the last packet if tls_device_down
just set netdev to NULL, before tls_dev_del was called. This prevents a
possible packet drop when transitioning to the fallback software mode.

Fixes: 89df6a810470 ("net/bonding: Implement TLS TX device offload")
Fixes: c55dcdd435aa ("net/tls: Fix use-after-free after the TLS device goes down and up")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Link: https://lore.kernel.org/r/20220810081602.1435800-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10 22:58:43 -07:00
Jakub Kicinski
d800a7b357 tls: rx: device: don't try to copy too much on detach
Another device offload bug, we use the length of the output
skb as an indication of how much data to copy. But that skb
is sized to offset + record length, and we start from offset.
So we end up double-counting the offset which leads to
skb_copy_bits() returning -EFAULT.

Reported-by: Tariq Toukan <tariqt@nvidia.com>
Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Tested-by: Ran Rozenstein <ranro@nvidia.com>
Link: https://lore.kernel.org/r/20220809175544.354343-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10 22:53:25 -07:00
Jakub Kicinski
86b259f6f8 tls: rx: device: bound the frag walk
We can't do skb_walk_frags() on the input skbs, because
the input skbs is really just a pointer to the tcp read
queue. We need to bound the "is decrypted" check by the
amount of data in the message.

Note that the walk in tls_device_reencrypt() is after a
CoW so the skb there is safe to walk. Actually in the
current implementation it can't have frags at all, but
whatever, maybe one day it will.

Reported-by: Tariq Toukan <tariqt@nvidia.com>
Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Tested-by: Ran Rozenstein <ranro@nvidia.com>
Link: https://lore.kernel.org/r/20220809175544.354343-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10 22:53:25 -07:00
Thadeu Lima de Souza Cascardo
9ad36309e2 net_sched: cls_route: remove from list when handle is 0
When a route filter is replaced and the old filter has a 0 handle, the old
one won't be removed from the hashtable, while it will still be freed.

The test was there since before commit 1109c00547fc ("net: sched: RCU
cls_route"), when a new filter was not allocated when there was an old one.
The old filter was reused and the reinserting would only be necessary if an
old filter was replaced. That was still wrong for the same case where the
old handle was 0.

Remove the old filter from the list independently from its handle value.

This fixes CVE-2022-2588, also reported as ZDI-CAN-17440.

Reported-by: Zhenpeng Lin <zplin@u.northwestern.edu>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Kamal Mostafa <kamal@canonical.com>
Cc: <stable@vger.kernel.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20220809170518.164662-1-cascardo@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10 22:53:11 -07:00
Mohan Kumar
636aa8807b ALSA: hda: Fix crash due to jack poll in suspend
With jackpoll_in_suspend flag set, there is a possibility that
jack poll worker thread will run even after system suspend was
completed. Any register access after system pm callback flow
will result in kernel crash as still jack poll worker thread
tries to access registers.

To fix the crash issue during system flow, cancel the jack poll
worker thread during system pm prepare callback and cancel the
worker thread at start of runtime suspend callback and re-schedule
at last to avoid any unwarranted access of register by worker thread
during suspend flow.

Signed-off-by: Mohan Kumar <mkumard@nvidia.com>
Fixes: b33115bd05af ("ALSA: hda: Jack detection poll in suspend state")
Link: https://lore.kernel.org/r/20220811052704.2944-1-mkumard@nvidia.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-08-11 07:44:59 +02:00