1023253 Commits

Author SHA1 Message Date
David Hildenbrand
c740bb97cc virtio-mem: prioritize unplug from ZONE_MOVABLE in Sub Block Mode
Until now, memory provided by a single virtio-mem device was usually
either onlined completely to ZONE_MOVABLE (online_movable) or to
ZONE_NORMAL (online_kernel); however, that will change in the future.

There are two reasons why we want to track to which zone a memory blocks
belongs to and prioritize ZONE_MOVABLE blocks:

1) Memory managed by ZONE_MOVABLE can more likely get unplugged, therefore,
   resulting in a faster memory hotunplug process. Further, we can more
   reliably unplug and remove complete memory blocks, removing metadata
   allocated for the whole memory block.

2) We want to avoid corner cases where unplugging with the current scheme
   (highest to lowest address) could result in accidential zone imbalances,
   whereby we remove too much ZONE_NORMAL memory for ZONE_MOVABLE memory
   of the same device.

Let's track the zone via memory block states and try unplug from
ZONE_MOVABLE first. Rename VIRTIO_MEM_SBM_MB_ONLINE* to
VIRTIO_MEM_SBM_MB_KERNEL* to avoid even longer state names.

In commit 27f852795a06 ("virtio-mem: don't special-case ZONE_MOVABLE"),
we removed slightly similar tracking for fully plugged memory blocks to
support unplugging from ZONE_MOVABLE at all -- as we didn't allow partially
plugged memory blocks in ZONE_MOVABLE before that. That commit already
mentioned "In the future, we might want to remember the zone again and use
the information when (un)plugging memory."

Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20210602185720.31821-6-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:02 -04:00
David Hildenbrand
5304ca3dd7 virtio-mem: simplify high-level unplug handling in Sub Block Mode
Let's simplify by introducing a new virtio_mem_sbm_unplug_any_sb(),
similar to virtio_mem_sbm_plug_any_sb(), to simplify high-level memory
block selection when unplugging in Sub Block Mode.

Rename existing virtio_mem_sbm_unplug_any_sb() to
virtio_mem_sbm_unplug_any_sb_raw().

The only change is that we now temporarily unlock the hotplug mutex around
cond_resched() when processing offline memory blocks, which doesn't
make a real difference as we already have to temporarily unlock in
virtio_mem_sbm_unplug_any_sb_offline() when removing a memory block.

Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20210602185720.31821-5-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:02 -04:00
David Hildenbrand
f4cf803dff virtio-mem: simplify high-level plug handling in Sub Block Mode
Let's simplify high-level memory block selection when plugging in Sub
Block Mode.

No need for two separate loops when selecting memory blocks for plugging
memory. Avoid passing the "online" state by simply obtaining the state
in virtio_mem_sbm_plug_any_sb().

Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20210602185720.31821-4-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:02 -04:00
David Hildenbrand
49d42872d5 virtio-mem: use page_zonenum() in virtio_mem_fake_offline()
Let's use page_zonenum() instead of zone_idx(page_zone()).

Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20210602185720.31821-3-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:02 -04:00
David Hildenbrand
500817bf5e virtio-mem: don't read big block size in Sub Block Mode
We are reading a Big Block Mode value while in Sub Block Mode
when initializing. Fortunately, vm->bbm.bb_size maps to some counter
in the vm->sbm.mb_count array, which is 0 at that point in time.

No harm done; still, this was unintended and is not future-proof.

Fixes: 4ba50cd3355d ("virtio-mem: Big Block Mode (BBM) memory hotplug")
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20210602185720.31821-2-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:02 -04:00
Eli Cohen
efa08cb468 virtio/vdpa: clear the virtqueue state during probe
Clear the available index as part of the initialization process to
clear and values that might be left from previous usage of the device.
For example, if the device was previously used by vhost_vdpa and now
probed by vhost_vdpa, you want to start with indices.

Fixes: c043b4a8cf3b ("virtio: introduce a vDPA based transport")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210602021536.39525-5-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
2021-07-08 07:49:02 -04:00
Jason Wang
1225c216d9 vp_vdpa: allow set vq state to initial state after reset
We used to fail the set_vq_state() since it was not supported yet by
the virtio spec. But if the bus tries to set the state which is equal
to the device initial state after reset, we can let it go.

This is a must for virtio_vdpa() to set vq state during probe which is
required for some vDPA parents.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210602021536.39525-4-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
2021-07-08 07:49:02 -04:00
Jason Wang
0140b3d076 virtio-pci library: introduce vp_modern_get_driver_features()
This patch introduce a helper to get driver/guest features from the
device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210602021536.39525-3-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
2021-07-08 07:49:01 -04:00
Jason Wang
530a5678bc vdpa: support packed virtqueue for set/get_vq_state()
This patch extends the vdpa_vq_state to support packed virtqueue
state which is basically the device/driver ring wrap counters and the
avail and used index. This will be used for the virito-vdpa support
for the packed virtqueue and the future vhost/vhost-vdpa support for
the packed virtqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210602021536.39525-2-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
2021-07-08 07:49:01 -04:00
Jason Wang
72b5e89587 virtio-ring: store DMA metadata in desc_extra for split virtqueue
For split virtqueue, we used to depend on the address, length and
flags stored in the descriptor ring for DMA unmapping. This is unsafe
for the case since the device can manipulate the behavior of virtio
driver, IOMMU drivers and swiotlb.

For safety, maintain the DMA address, DMA length, descriptor flags and
next filed of the non indirect descriptors in vring_desc_state_extra
when DMA API is used for virtio as we did for packed virtqueue and use
those metadata for performing DMA operations. Indirect descriptors
should be safe since they are using streaming mappings.

With this the descriptor ring is write only form the view of the
driver.

This slight increase the footprint of the drive but it's not noticed
through pktgen (64B) test and netperf test in the case of virtio-net.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210604055350.58753-8-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:01 -04:00
Jason Wang
5bc72234f7 virtio: use err label in __vring_new_virtqueue()
Using error label for unwind in __vring_new_virtqueue. This is useful
for future refacotring.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210604055350.58753-7-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:01 -04:00
Jason Wang
fe4c3862df virtio_ring: introduce virtqueue_desc_add_split()
This patch introduces a helper for storing descriptor in the
descriptor table for split virtqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210604055350.58753-6-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:01 -04:00
Jason Wang
44593865b7 virtio_ring: secure handling of mapping errors
We should not depend on the DMA address, length and flag of descriptor
table since they could be wrote with arbitrary value by the device. So
this patch switches to use the stored one in desc_extra.

Note that the indirect descriptors are fine since they are read-only
streaming mappings.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210604055350.58753-5-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:01 -04:00
Jason Wang
5a22242160 virtio-ring: factor out desc_extra allocation
A helper is introduced for the logic of allocating the descriptor
extra data. This will be reused by split virtqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210604055350.58753-4-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:01 -04:00
Jason Wang
1f28750f2e virtio_ring: rename vring_desc_extra_packed
Rename vring_desc_extra_packed to vring_desc_extra since the structure
are pretty generic which could be reused by split virtqueue as well.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210604055350.58753-3-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:01 -04:00
Jason Wang
aeef9b4733 virtio-ring: maintain next in extra state for packed virtqueue
This patch moves next from vring_desc_state_packed to
vring_desc_desc_extra_packed. This makes it simpler to let extra state
to be reused by split virtqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210604055350.58753-2-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:01 -04:00
Eli Cohen
e3aadf2e16 vdpa/mlx5: Clear vq ready indication upon device reset
After device reset, the virtqueues are not ready so clear the ready
field.

Failing to do so can result in virtio_vdpa failing to load if the device
was previously used by vhost_vdpa and the old values are ready.
virtio_vdpa expects to find VQs in "not ready" state.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210606053128.170399-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2021-07-08 07:49:01 -04:00
Eli Cohen
b57c46cb3c vdpa/mlx5: Add support for doorbell bypassing
Implement mlx5_get_vq_notification() to return the doorbell address.
Since the notification area is mapped to userspace, make sure that the
BAR size is at least PAGE_SIZE large.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210603081153.5750-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2021-07-08 07:49:01 -04:00
Michael S. Tsirkin
a7766ef18b virtio_net: disable cb aggressively
There are currently two cases where we poll TX vq not in response to a
callback: start xmit and rx napi.  We currently do this with callbacks
enabled which can cause extra interrupts from the card.  Used not to be
a big issue as we run with interrupts disabled but that is no longer the
case, and in some cases the rate of spurious interrupts is so high
linux detects this and actually kills the interrupt.

Fix up by disabling the callbacks before polling the tx vq.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:01 -04:00
Michael S. Tsirkin
8d622d21d2 virtio: fix up virtio_disable_cb
virtio_disable_cb is currently a nop for split ring with event index.
This is because it used to be always called from a callback when we know
device won't trigger more events until we update the index.  However,
now that we run with interrupts enabled a lot we also poll without a
callback so that is different: disabling callbacks will help reduce the
number of spurious interrupts.
Further, if using event index with a packed ring, and if being called
from a callback, we actually do disable interrupts which is unnecessary.

Fix both issues by tracking whenever we get a callback. If that is
the case disabling interrupts with event index can be a nop.
If not the case disable interrupts. Note: with a split ring
there's no explicit "no interrupts" value. For now we write
a fixed value so our chance of triggering an interupt
is 1/ring size. It's probably better to write something
related to the last used index there to reduce the chance
even further. For now I'm keeping it simple.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:51:18 -04:00
Michael S. Tsirkin
22bc63c58e virtio_net: move txq wakeups under tx q lock
We currently check num_free outside tx q lock
which is unsafe: new packets can arrive meanwhile
and there won't be space in the queue.
Thus a spurious queue wakeup causing overhead
and even packet drops.

Move the check under the lock to fix that.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:51:17 -04:00
Michael S. Tsirkin
5a2f966d0f virtio_net: move tx vq operation under tx queue lock
It's unsafe to operate a vq from multiple threads.
Unfortunately this is exactly what we do when invoking
clean tx poll from rx napi.
Same happens with napi-tx even without the
opportunistic cleaning from the receive interrupt: that races
with processing the vq in start_xmit.

As a fix move everything that deals with the vq to under tx lock.

Fixes: b92f1e6751a6 ("virtio-net: transmit napi")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:51:17 -04:00
Eli Cohen
6f5312f801 vdpa/mlx5: Add support for running with virtio_vdpa
In order to support running vdpa using vritio_vdpa driver, we need  to
create a different kind of MR, one that has 1:1 mapping, since the
addresses referring to virtqueues are dma addresses.

We create the 1:1 MR in mlx5_vdpa_dev_add() only in case firmware
supports the general capability umem_uid_0. The reason for that is that
1:1 MRs must be created with uid == 0 while virtqueue objects can be
created with uid == 0 only when the firmware capability is on.

If the set_map() callback is called with new translations provided
through iotlb, the driver will destroy the 1:1 MR and create a regular
one.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210602085854.62690-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2021-07-03 04:51:17 -04:00
Eli Cohen
7d23dcdf21 vdp/mlx5: Fix setting the correct dma_device
Before SF support was introduced, the DMA device was equal to
mdev->device which was in essence equal to pdev->dev.

With SF introduction this is no longer true. It has already been
handled for vhost_vdpa since the reference to the dma device can from
within mlx5_vdpa. With virtio_vdpa this broke. To fix this we set the
real dma device when initializing the device.

In addition, for the sake of consistency, previous references in the
code to the dma device are changed to vdev->dma_dev.

Fixes: d13a15d544ce5 ("vdpa/mlx5: Use the correct dma device when registering memory")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210606053150.170489-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2021-07-03 04:50:57 -04:00
Eli Cohen
e13cd45d35 vdpa/mlx5: Support creating resources with uid == 0
Currently all resources must be created with uid != 0 which is essential
when userspace processes are allocating virtquueue resources. Since this
is a kernel implementation, it is perfectly legal to open resources with
uid == 0.

In case firmware supports, avoid allocating user context.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210531160404.31368-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2021-07-03 04:50:56 -04:00
Eli Cohen
71ab6a7cfb vdpa/mlx5: Fix possible failure in umem size calculation
umem size is a 32 bit unsigned value so assigning it to an int could
cause false failures. Set the calculated value inside the function and
modify function name to reflect the fact it updates the size.

This bug was found during code review but never had real impact to this
date.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210530090349.8360-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2021-07-03 04:50:56 -04:00
Eli Cohen
e3011776af vdpa/mlx5: Fix umem sizes assignments on VQ create
Fix copy paste bug assigning umem1 size to umem2 and umem3. The issue
was discovered when trying to use a 1:1 MR that covers the entire
address space where firmware complained that provided sizes are not
large enough. 1:1 MRs are required to support virtio_vdpa.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210530090317.8284-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2021-07-03 04:50:56 -04:00
Yang Li
31c11db6bd virtio_ring: Fix kernel-doc
Fix function name in virtio_ring.c kernel-doc comment
to remove a warning found by clang_w1.

drivers/virtio/virtio_ring.c:1903: warning: expecting prototype for
virtqueue_get_buf(). Prototype was for virtqueue_get_buf_ctx() instead

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/1621998731-17445-1-git-send-email-yang.lee@linux.alibaba.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:55 -04:00
Mike Christie
d8f35f41e2 vhost: fix up vhost_work coding style
Switch from a mix of tabs and spaces to just tabs.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20210525174733.6212-6-michael.christie@oracle.com
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:55 -04:00
Mike Christie
efb18e1e50 vhost: fix poll coding style
We use 3 coding styles in this struct. Switch to just tabs.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210525174733.6212-5-michael.christie@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:55 -04:00
Mike Christie
d60146c161 vhost-scsi: reduce flushes during endpoint clearing
vhost_scsi_flush will flush everything, so we can clear the backends then
flush, then destroy. We don't need to flush before each vq destruction
because after the flush we will have made sure there can be no new cmds
started and there are no running cmds.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20210525174733.6212-4-michael.christie@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:54 -04:00
Mike Christie
31fbea3ab9 vhost-scsi: remove extra flushes
The vhost work flush function was flushing the entire work queue, so
there is no need for the double vhost_work_dev_flush calls in
vhost_scsi_flush.

And we do not need to call vhost_poll_flush for each poller because
that call also ends up flushing the same work queue thread the
vhost_work_dev_flush call flushed.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210525174733.6212-3-michael.christie@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:54 -04:00
Mike Christie
1465cb6117 vhost: remove work arg from vhost_work_flush
vhost_work_flush doesn't do anything with the work arg. This patch drops
it and then renames vhost_work_flush to vhost_work_dev_flush to reflect
that the function flushes all the works in the dev and not just a
specific queue or work item.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210525174733.6212-2-michael.christie@oracle.com
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:54 -04:00
Xie Yongji
d00d8da586 virtio_console: Assure used length from device is limited
The buf->len might come from an untrusted device. This
ensures the value would not exceed the size of the buffer
to avoid data corruption or loss.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210525125622.1203-1-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:53 -04:00
Stefan Hajnoczi
63947b3434 virtio-blk: limit seg_max to a safe value
The struct virtio_blk_config seg_max value is read from the device and
incremented by 2 to account for the request header and status byte
descriptors added by the driver.

In preparation for supporting untrusted virtio-blk devices, protect
against integer overflow and limit the value to a safe maximum.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20210524154020.98195-1-stefanha@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:53 -04:00
Shaokun Zhang
7a43ce37cd vhost: Remove the repeated declaration
Function 'vhost_vring_ioctl' is declared twice, remove the repeated
declaration.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Link: https://lore.kernel.org/r/1621857884-19964-1-git-send-email-zhangshaokun@hisilicon.com
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:53 -04:00
Jason Wang
94e48d6aaf vp_vdpa: correct the return value when fail to map notification
We forget to assign a error value when we fail to map the notification
during prove. This patch fixes it.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 11d8ffed00b23 ("vp_vdpa: switch to use vp_modern_map_vq_notify()")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210624035939.26618-1-jasowang@redhat.com
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:52 -04:00
Xie Yongji
3f2869cace virtio_net: Fix error handling in virtnet_restore()
Do some cleanups in virtnet_restore() when virtnet_cpu_notif_add() failed.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20210517084516.332-1-xieyongji@bytedance.com
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:52 -04:00
Xie Yongji
b71ba22e7c virtio-blk: Fix memory leak among suspend/resume procedure
The vblk->vqs should be freed before we call init_vqs()
in virtblk_restore().

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20210517084332.280-1-xieyongji@bytedance.com
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:52 -04:00
Zhu Lingshan
42326903c6 vDPA/ifcvf: reuse pre-defined macros for device ids and vendor ids
This commit would reuse pre-defined macros for ifcvf device ids
and vendor ids

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Link: https://lore.kernel.org/r/20210510081015.4212-3-lingshan.zhu@intel.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:51 -04:00
Zhu Lingshan
d61914ea6a virtio: update virtio id table, add transitional ids
This commit updates virtio id table by adding transitional device
ids

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Link: https://lore.kernel.org/r/20210510081015.4212-2-lingshan.zhu@intel.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:51 -04:00
Zhu Lingshan
5f1b73a275 vDPA/ifcvf: implement doorbell mapping for ifcvf
This commit implements doorbell mapping feature for ifcvf.
This feature maps the notify page to userspace, to eliminate
vmexit when kick a vq.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Link: https://lore.kernel.org/r/20210602084550.289599-3-lingshan.zhu@intel.com
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:51 -04:00
Zhu Lingshan
04c6ad8f22 vDPA/ifcvf: record virtio notify base
This commit records virtio notify base physical addr and
calculate doorbell physical address for vqs.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210602084550.289599-2-lingshan.zhu@intel.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:51 -04:00
Wan Jiabing
e22626a876 vdpa_sim_blk: remove duplicate include of linux/blkdev.h
In commit 7d189f617f83f ("vdpa_sim_blk: implement ramdisk behaviour")
linux/blkdev.h was included here causing the duplicate include.
Remove the later duplicate include.

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Link: https://lore.kernel.org/r/20210510024307.7143-1-wanjiabing@vivo.com
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:50 -04:00
Stefano Garzarella
8693059284 vhost-iotlb: fix vhost_iotlb_del_range() documentation
Trivial change for the vhost_iotlb_del_range() documentation,
fixing the function name in the comment block.

Discovered with `make C=2 M=drivers/vhost`:
../drivers/vhost/iotlb.c:92: warning: expecting prototype for vring_iotlb_del_range(). Prototype was for vhost_iotlb_del_range() instead

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20210504135444.158716-1-sgarzare@redhat.com
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:50 -04:00
Sohaib
4f118472d4 virtio_blk: cleanups: remove check obsoleted by CONFIG_LBDAF removal
Prior to 72deb455b5ec ("block: remove CONFIG_LBDAF"), it was optional if
the 32-bit kernel support block device and/or file sizes larger than 2 TiB
(considering the sector size is 512 bytes)
But now sector_t and blkcnt_t are always 64-bit in size.

Suggested-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Sohaib Mohammed <sohaib.amhmd@gmail.com>
Link: https://lore.kernel.org/r/20210430103611.77345-1-sohaib.amhmd@gmail.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 04:50:49 -04:00
Linus Torvalds
3dbdb38e28 Merge branch 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - cgroup.kill is added which implements atomic killing of the whole
   subtree.

   Down the line, this should be able to replace the multiple userland
   implementations of "keep killing till empty".

 - PSI can now be turned off at boot time to avoid overhead for
   configurations which don't care about PSI.

* 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: make per-cgroup pressure stall tracking configurable
  cgroup: Fix kernel-doc
  cgroup: inline cgroup_task_freeze()
  tests/cgroup: test cgroup.kill
  tests/cgroup: move cg_wait_for(), cg_prepare_for_wait()
  tests/cgroup: use cgroup.kill in cg_killall()
  docs/cgroup: add entry for cgroup.kill
  cgroup: introduce cgroup.kill
2021-07-01 17:22:14 -07:00
Linus Torvalds
e267992f9e Merge branch 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu
Pull percpu updates from Dennis Zhou:

 - percpu chunk depopulation - depopulate backing pages for chunks with
   empty pages when we exceed a global threshold without those pages.
   This lets us reclaim a portion of memory that would previously be
   lost until the full chunk would be freed (possibly never).

 - memcg accounting cleanup - previously separate chunks were managed
   for normal allocations and __GFP_ACCOUNT allocations. These are now
   consolidated which cleans up the code quite a bit.

 - a few misc clean ups for clang warnings

* 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
  percpu: optimize locking in pcpu_balance_workfn()
  percpu: initialize best_upa variable
  percpu: rework memcg accounting
  mm, memcg: introduce mem_cgroup_kmem_disabled()
  mm, memcg: mark cgroup_memory_nosocket, nokmem and noswap as __ro_after_init
  percpu: make symbol 'pcpu_free_slot' static
  percpu: implement partial chunk depopulation
  percpu: use pcpu_free_slot instead of pcpu_nr_slots - 1
  percpu: factor out pcpu_check_block_hint()
  percpu: split __pcpu_balance_workfn()
  percpu: fix a comment about the chunks ordering
2021-07-01 17:17:24 -07:00
Linus Torvalds
19b4385922 - added support for OpeneEmbed SOM9331 board
- Ingenic fixes/improvments
 - other fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmDdyJQaHHRzYm9nZW5k
 QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHBlyRAAhj5dC1EyGsc0cWyzC8ZU
 Nh45vzMCSxa4mxAelNjuVKdE0h7gLeRa8/sKPV+EcS3ZFcSfZQeIHEbH9Na3EDS7
 KtUZmkjrHCDdRTh7kou7E7mb716HvoQEyq6d1VyZOahyqf2ZcjIsFinK+As+4wBV
 JcXJMcpUWemI4Ojm8cbmNWQW3V2Ty9qLNUa6BpmntbdOdYowTih9QWHv2u1YsOR7
 R6LJXdyo6V1RieeqfZaWXTQtN8yyXYhBewLIy0DxBAm329f5sRHUVd9/ZD2RMByD
 1weEhbs0jhmYZFSfkwZ8SjAb4GkusjNTnDiK4Rsuz6pQK0BIGNAV0Mrnedq+i5eD
 wrrTI1/envStDpj9XGlSNajcaQGpTza1V1uaCIm4EanOMMTBc8DTXaj7MCOlTD8j
 fkNE3ykfloVSSzZAWCmcpV9fBsFwQp3m+cWrtIAAnOJDK+NV5FfABUwFHKnBUpxG
 YOUNCed0WJFx++xrHKylt8hWILLEATLHh1h5vDsEYJi4gwt2KxCSYQBGgSYKa1At
 tOUa5UINTMvpe4U9GEjHh6f1VedtUNYUzXjD5g2cn62d81RCSx3hB1KkfQSKtaDw
 H9sR8d+rDFt4fK9T/HPZ3EyM2i9FkZNv+CulfGNd3M0vnF8+qX2xRehWrqhwBBi0
 2p7V9/t5TPhziAfJySKX+Jg=
 =j2kH
 -----END PGP SIGNATURE-----

Merge tag 'mips_5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS updates from Thomas Bogendoerfer:

 - add support for OpeneEmbed SOM9331 board

 - Ingenic fixes/improvments

 - other fixes and cleanups

* tag 'mips_5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (39 commits)
  MIPS: Fix PKMAP with 32-bit MIPS huge page support
  MIPS: CI20: Add second percpu timer for SMP.
  MIPS: CI20: Reduce clocksource to 750 kHz.
  MIPS: Ingenic: Add MAC syscon nodes for Ingenic SoCs.
  dt-bindings: clock: Add documentation for MAC PHY control bindings.
  MIPS: X1830: Respect cell count of common properties.
  MIPS: set mips32r5 for virt extensions
  MIPS: loongsoon64: Reserve memory below starting pfn to prevent Oops
  MIPS: MT extensions are not available on MIPS32r1
  mips/kvm: Use BUG_ON instead of if condition followed by BUG
  MIPS: OCTEON: octeon-usb: Use devm_platform_get_and_ioremap_resource()
  MIPS: add PMD table accounting into MIPS'pmd_alloc_one
  MIPS: Loongson64: fix spelling of SPDX tag
  MIPS: ingenic: rs90: Add dedicated VRAM memory region
  MIPS: ingenic: gcw0: Set codec to cap-less mode for FM radio
  MIPS: ingenic: jz4780: Fix I2C nodes to match DT doc
  MIPS: ingenic: Select CPU_SUPPORTS_CPUFREQ && MIPS_EXTERNAL_TIMER
  MIPS: Kconfig: ingenic: Ensure MACH_INGENIC_GENERIC selects all SoCs
  MIPS: cpu-probe: Fix FPU detection on Ingenic JZ4760(B)
  MIPS: boot: Support specifying UART port on Ingenic SoCs
  ...
2021-07-01 17:03:11 -07:00
Linus Torvalds
a32b344e6f This is the bulk of pin control changes for the v5.14 kernel:
New drivers:
 
 - Last merge window we created a driver for the Ralink RT2880.
   We are now moving the Ralink SoC pin control drivers out of the MIPS
   architecture code and into the pin control subsystem. This concerns
   RT288X, MT7620, RT305X, RT3883 and MT7621.
 
 - Qualcomm SM6125 SoC pin control driver.
 
 - Qualcomm spmi-gpio support for PM7325.
 
 - Qualcomm spmi-mpp also handles PMI8994 (just a compatible string)
 
 - Mediatek MT8365 SoC pin controller.
 
 - New device HID for the AMD GPIO controller.
 
 Improvements:
 
 - Pin bias config support for a slew of Renesas pin controllers.
 
 - Incremental improvements and non-urgent bug fixes to the Renesas
   SoC drivers.
 
 - Implement irq_set_wake on the AMD pin controller so we can wake
   up from external pin events.
 
 Misc:
 
 - Devicetree bindings for the Apple M1 pin controller, we will probably
   see a proper driver for this soon as well.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAmDeTu0ACgkQQRCzN7AZ
 XXO2Yg/+LHbqYX8V+Ig1ZcY4p5bfbGyyC6QG6g3d/kzzCmsjHFgmDFQoZ+LoRx+p
 FRUSvmiR0VERMZCEepHsZgzns6ezzJfBt4Cu/388d4iYZppaETpQV47TzqY3eP7Q
 4Shu2wIKwd7C3vNrCifub0JOYAAEsqdlHd75g0bqhal9hgH/MgYQSq9F22/TKAFl
 hteFwyw5L4OwKIDUpqDOIcG8thhHYWrQy77/Pp82/TVnmO9gamt863dKBjIg6iF9
 c+pmIWI8K2mBhNO+epGG4VSroUudIBwKV88nwUjKSe+pu0VAU7lit/V0Uh1IhG0s
 FUHHGDeF62Ncn4SOYetlnSlKbQkhJaBDV2sDgQ3xzqvs1P3WEHRWqYIh1egq5iW6
 /KtpSlRLQ/aO+k0iN66pErpAfsGNFAxkqlCSypyJG7ROnb2rADzZ0ftEKQb8RzZb
 nypPupOO5/bFfQHbQtFORDaNu9MUTR5PR04eTPMoApG0nv7zY+kcJ6iJuKE9spLb
 ahoxLstfQ/fKK27yms72E6PqwanuUEzcQv7gjhuHmFEjNrW1ARUqoa5hpdAzhZOX
 20P8SZWkSeUZnqB26YQq+1U9p6wV0064Vp+jYY/wzQpV40dgX9oumiRkxCWCzpjt
 6mw6x9txlrEEu+2WadW8yZd4ewKvWFLEGI+C/83pnI5NF1Dp0Go=
 =Ajcr
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control updates from Linus Walleij:
 "This is the bulk of pin control changes for the v5.14 kernel. Not so
  much going on. No core changes, just drivers.

  The most interesting would be that MIPS Ralink is migrating to pin
  control and we have some bindings but not yet code for the Apple M1
  pin controller.

  New drivers:

   - Last merge window we created a driver for the Ralink RT2880. We are
     now moving the Ralink SoC pin control drivers out of the MIPS
     architecture code and into the pin control subsystem. This concerns
     RT288X, MT7620, RT305X, RT3883 and MT7621.

   - Qualcomm SM6125 SoC pin control driver.

   - Qualcomm spmi-gpio support for PM7325.

   - Qualcomm spmi-mpp also handles PMI8994 (just a compatible string)

   - Mediatek MT8365 SoC pin controller.

   - New device HID for the AMD GPIO controller.

  Improvements:

   - Pin bias config support for a slew of Renesas pin controllers.

   - Incremental improvements and non-urgent bug fixes to the Renesas
     SoC drivers.

   - Implement irq_set_wake on the AMD pin controller so we can wake up
     from external pin events.

  Misc:

   - Devicetree bindings for the Apple M1 pin controller, we will
     probably see a proper driver for this soon as well"

* tag 'pinctrl-v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (54 commits)
  pinctrl: ralink: rt305x: add missing include
  pinctrl: stm32: check for IRQ MUX validity during alloc()
  pinctrl: zynqmp: some code cleanups
  drivers: qcom: pinctrl: Add pinctrl driver for sm6125
  dt-bindings: pinctrl: qcom: sm6125: Document SM6125 pinctrl driver
  dt-bindings: pinctrl: mcp23s08: add documentation for reset-gpios
  pinctrl: mcp23s08: Add optional reset GPIO
  pinctrl: mediatek: fix mode encoding
  pinctrl: mcp23s08: Fix missing unlock on error in mcp23s08_irq()
  pinctrl: bcm: Constify static pinmux_ops
  pinctrl: bcm: Constify static pinctrl_ops
  pinctrl: ralink: move RT288X SoC pinmux config into a new 'pinctrl-rt288x.c' file
  pinctrl: ralink: move MT7620 SoC pinmux config into a new 'pinctrl-mt7620.c' file
  pinctrl: ralink: move RT305X SoC pinmux config into a new 'pinctrl-rt305x.c' file
  pinctrl: ralink: move RT3883 SoC pinmux config into a new 'pinctrl-rt3883.c' file
  pinctrl: ralink: move MT7621 SoC pinmux config into a new 'pinctrl-mt7621.c' file
  pinctrl: ralink: move ralink architecture pinmux header into the driver
  pinctrl: single: config: enable the pin's input
  pinctrl: mtk: Fix mt8365 Kconfig dependency
  pinctrl: mcp23s08: fix race condition in irq handler
  ...
2021-07-01 16:57:14 -07:00