1155524 Commits

Author SHA1 Message Date
Liming Wu
62b763ad76 vhost-test: remove meaningless debug info
remove printk as it is meaningless.

Signed-off-by: Liming Wu <liming.wu@jaguarmicro.com>
Message-Id: <20230105070357.274-1-liming.wu@jaguarmicro.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2023-02-20 19:26:58 -05:00
Jason Wang
6c3d329e64 vdpa_sim: get rid of DMA ops
We used to (ab)use the DMA ops for setting up identical mappings in
the IOTLB. This patch tries to get rid of the those unnecessary DMA
ops by maintaining a simple identical/passthrough mappings by
default. When bound to virtio_vdpa driver, DMA API will simply use PA
as the IOVA and we will be all fine. When the vDPA bus tries to setup
customized mapping (e.g when bound to vhost-vDPA), the
identical/passthrough mapping will be removed.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221223060021.28011-1-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
2023-02-20 19:26:58 -05:00
Jason Wang
0899774cb3 vdpa_sim_net: vendor satistics
This patch adds support for basic vendor stats that include counters
for tx, rx and cvq.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221223055548.27810-5-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2023-02-20 19:26:57 -05:00
Jason Wang
5dbb063a3e vdpa_sim: support vendor statistics
This patch adds a new config ops callback to allow individual
simulator to implement the vendor stats callback.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221223055548.27810-4-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2023-02-20 19:26:57 -05:00
Jason Wang
bb105d514a vdpasim: customize allocation size
Allow individual simulator to customize the allocation size.

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221223055548.27810-3-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:57 -05:00
Jason Wang
0497f23e73 vdpa_sim: switch to use __vdpa_alloc_device()
This allows us to control the allocation size of the structure.

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221223055548.27810-2-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:57 -05:00
Jason Wang
2f8200efe7 vdpa_sim: use weak barriers
vDPA simulators are software emulated device, so let's switch to use
weak barriers to avoid extra overhead in the driver.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221221062146.15356-1-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
2023-02-20 19:26:57 -05:00
Suwan Kim
07b679f70d virtio-blk: support completion batching for the IRQ path
This patch adds completion batching to the IRQ path. It reuses batch
completion code of virtblk_poll(). It collects requests to io_comp_batch
and processes them all at once. It can boost up the performance by 2%.

To validate the performance improvement and stabilty, I did fio test with
4 vCPU VM and 12 vCPU VM respectively. Both VMs have 8GB ram and the same
number of HW queues as vCPU.
The fio cammad is as follows and I ran the fio 5 times and got IOPS average.
(io_uring, randread, direct=1, bs=512, iodepth=64 numjobs=2,4)

Test result shows about 2% improvement.

           4 vcpu VM       |   numjobs=2   |   numjobs=4
      -----------------------------------------------------------
        fio without patch  |  367.2K IOPS  |   397.6K IOPS
      -----------------------------------------------------------
        fio with patch     |  372.8K IOPS  |   407.7K IOPS

           12 vcpu VM      |   numjobs=2   |   numjobs=4
      -----------------------------------------------------------
        fio without patch  |  363.6K IOPS  |   374.8K IOPS
      -----------------------------------------------------------
        fio with patch     |  373.8K IOPS  |   385.3K IOPS

Signed-off-by: Suwan Kim <suwan.kim027@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20221221145456.281218-3-suwan.kim027@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:57 -05:00
Suwan Kim
489e18f3d7 virtio-blk: set req->state to MQ_RQ_COMPLETE after polling I/O is finished
Driver should set req->state to MQ_RQ_COMPLETE after it finishes to process
req. But virtio-blk doesn't set MQ_RQ_COMPLETE after virtblk_poll() handles
req and req->state still remains MQ_RQ_IN_FLIGHT. Fortunately so far there
is no issue about it because blk_mq_end_request_batch() sets req->state to
MQ_RQ_IDLE.

In this patch, virblk_poll() calls blk_mq_complete_request_remote() to set
req->state to MQ_RQ_COMPLETE before it adds req to a batch completion list.
So it properly sets req->state after polling I/O is finished.

Fixes: 4e0400525691 ("virtio-blk: support polling I/O")
Signed-off-by: Suwan Kim <suwan.kim027@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20221221145456.281218-2-suwan.kim027@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:57 -05:00
Bagas Sanjaya
2b034e82ff docs: driver-api: virtio: commentize spec version checking
A sentence that checks for later spec version is meant for developers
hacking the documentation source. Make it comment block (hidden from
actual output).

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Message-Id: <20221220095828.27588-4-bagasdotme@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:57 -05:00
Bagas Sanjaya
ae8d2247af docs: driver-api: virtio: slightly reword virtqueues allocation paragraph
"It's at this stage that" means "At this point", so use the latter as it
is more effective.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Message-Id: <20221220095828.27588-3-bagasdotme@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:56 -05:00
Bagas Sanjaya
fb25d45694 docs: driver-api: virtio: parenthesize external reference targets
Parenthesize targets to links in "References" section to distinguish
them from remaining texts.

While at it, describe the second target.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Message-Id: <20221220095828.27588-2-bagasdotme@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:56 -05:00
Sebastien Boeuf
f9d9f57ef0 vdpa_sim: Implement resume vdpa op
Implement resume operation for vdpa_sim devices, so vhost-vdpa will
offer that backend feature and userspace can effectively resume the
device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Message-Id: <15a4566826033c5dd9a2167e5cfb0ef4d90cea49.1672742878.git.sebastien.boeuf@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2023-02-20 19:26:56 -05:00
Sebastien Boeuf
3b688d7a08 vhost-vdpa: uAPI to resume the device
This new ioctl adds support for resuming the device from userspace.

This is required when trying to restore the device in a functioning
state after it's been suspended. It is already possible to reset a
suspended device, but that means the device must be reconfigured and
all the IOMMU/IOTLB mappings must be recreated. This new operation
allows the device to be resumed without going through a full reset.

This is particularly useful when trying to perform offline migration of
a virtual machine (also known as snapshot/restore) as it allows the VMM
to resume the virtual machine back to a running state after the snapshot
is performed.

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Message-Id: <73b75fb87d25cff59768b4955a81fe7ffe5b4770.1672742878.git.sebastien.boeuf@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2023-02-20 19:26:56 -05:00
Sebastien Boeuf
69106b6fb3 vhost-vdpa: Introduce RESUME backend feature bit
Userspace knows if the device can be resumed or not by checking this
feature bit.

It's only exposed if the vdpa driver backend implements the resume()
operation callback. Userspace trying to negotiate this feature when it
hasn't been exposed will result in an error.

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Message-Id: <b18db236ba3d990cdb41278eb4703be9201d9514.1672742878.git.sebastien.boeuf@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2023-02-20 19:26:56 -05:00
Sebastien Boeuf
1538a8a49e vdpa: Add resume operation
Add a new operation to allow a vDPA device to be resumed after it has
been suspended. Trying to resume a device that wasn't suspended will
result in a no-op.

This operation is optional. If it's not implemented, the associated
backend feature bit will not be exposed. And if the feature bit is not
exposed, invoking this operation will return an error.

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Message-Id: <6e05c4b31b47f3e29cb2bd7ebd56c81f84b8f48a.1672742878.git.sebastien.boeuf@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2023-02-20 19:26:56 -05:00
Alvaro Karsz
51a8f9d7f5 virtio: vdpa: new SolidNET DPU driver.
This commit includes:
 1) The driver to manage the controlplane over vDPA bus.
 2) A HW monitor device to read health values from the DPU.

Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230110165638.123745-4-alvaro.karsz@solid-run.com>
Message-Id: <20230209075128.78915-1-alvaro.karsz@solid-run.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:56 -05:00
Alvaro Karsz
d089d69cc1 PCI: Avoid FLR for SolidRun SNET DPU rev 1
This patch fixes a FLR bug on the SNET DPU rev 1 by setting the
PCI_DEV_FLAGS_NO_FLR_RESET flag.

As there is a quirk to avoid FLR (quirk_no_flr), I added a new quirk
to check the rev ID before calling to quirk_no_flr.

Without this patch, a SNET DPU rev 1 may hang when FLR is applied.

Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Message-Id: <20230110165638.123745-3-alvaro.karsz@solid-run.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Alvaro Karsz
db6c4dee4c PCI: Add SolidRun vendor ID
Add SolidRun vendor ID to pci_ids.h

The vendor ID is used in 2 different source files, the SNET vDPA driver
and PCI quirks.

Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Message-Id: <20230110165638.123745-2-alvaro.karsz@solid-run.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Eugenio Pérez
d8b3832a78 vdpa_sim_net: Offer VIRTIO_NET_F_STATUS
VIRTIO_NET_S_LINK_UP is already returned in config reads since vdpasim
creation, but the feature bit was not offered to the driver.

Tested modifying VIRTIO_NET_S_LINK_UP and different values of "status"
in qemu virtio-net options, using vhost_vdpa.

Not considering as a fix, because there should be no driver trusting in
this config read before the feature flag.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20221117155502.1394700-1-eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
46fc0917bb vDPA/ifcvf: implement features provisioning
This commit implements features provisioning for ifcvf, that means:
1)checkk whether the provisioned features are supported by
the management device
2)vDPA device only presents selected feature bits

Examples:
a)The management device supported features:
$ vdpa mgmtdev show pci/0000:01:00.5
pci/0000:01:00.5:
  supported_classes net
  max_supported_vqs 9
  dev_features MTU MAC MRG_RXBUF CTRL_VQ MQ ANY_LAYOUT VERSION_1 ACCESS_PLATFORM

b)Provision a vDPA device with all supported features:
$ vdpa dev add name vdpa0 mgmtdev pci/0000:01:00.5
$ vdpa/vdpa dev config show vdpa0
vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false max_vq_pairs 4 mtu 1500
  negotiated_features MRG_RXBUF CTRL_VQ MQ VERSION_1 ACCESS_PLATFORM

c)Provision a vDPA device with a subset of the supported features:
$ vdpa dev add name vdpa0 mgmtdev pci/0000:01:00.5 device_features 0x300020020
$ vdpa dev config show vdpa0
mac 00:e8:ca:11:be:05 link up link_announce false
  negotiated_features CTRL_VQ VERSION_1 ACCESS_PLATFORM

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20221125145724.1129962-13-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
267000e980 vDPA/ifcvf: retire ifcvf_private_to_vf
This commit retires ifcvf_private_to_vf, because
the vf is already a member of the adapter,
so it could be easily addressed by adapter->vf.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20221125145724.1129962-12-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
93139037b5 vDPA/ifcvf: allocate the adapter in dev_add()
The adapter is the container of the vdpa_device,
this commits allocate the adapter in dev_add()
rather than in probe(). So that the vdpa_device()
could be re-created when the userspace creates
the vdpa device, and free-ed in dev_del()

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-11-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
6a3b2f179b vDPA/ifcvf: manage ifcvf_hw in the mgmt_dev
This commit allocates the hw structure in the
management device structure. So the hardware
can be initialized once the management device
is allocated in probe.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-10-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
7cfd36b7e8 vDPA/ifcvf: ifcvf_request_irq works on ifcvf_hw
All ifcvf_request_irq's callees are refactored
to work on ifcvf_hw, so it should be decoupled
from the adapter as well

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-9-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
a70d833e69 vDPA/ifcvf: decouple config/dev IRQ requester and vectors allocator from the adapter
This commit decouples the config irq requester, the device
shared irq requester and the MSI vectors allocator from
the adapter. So they can be safely invoked since probe
before the adapter is allocated.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-8-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
f9a9ffb2e4 vDPA/ifcvf: decouple vq irq requester from the adapter
This commit decouples the vq irq requester from the adapter,
so that these functions can be invoked since probe.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-7-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Zhu Lingshan
23dac55cec vDPA/ifcvf: decouple config IRQ releaser from the adapter
This commit decouples config IRQ releaser from the adapter,
so that it could be invoked once probe or in err handlers.
ifcvf_free_irq() works on ifcvf_hw in this commit

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-6-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Zhu Lingshan
004cbcabab vDPA/ifcvf: decouple vq IRQ releasers from the adapter
This commit decouples the IRQ releasers from the
adapter, so that these functions could be
safely invoked once probe

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-5-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Zhu Lingshan
66e3970b16 vDPA/ifcvf: alloc the mgmt_dev before the adapter
This commit reverses the order of allocating the
management device and the adapter. So that it would
be possible to move the allocation of the adapter
to dev_add().

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-4-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Zhu Lingshan
af8eb69a62 vDPA/ifcvf: decouple config space ops from the adapter
This commit decopules the config space ops from the
adapter layer, so these functions can be invoked
once the device is probed.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-3-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Zhu Lingshan
d59f633dd0 vDPA/ifcvf: decouple hw features manipulators from the adapter
This commit gets rid of ifcvf_adapter in hw features related
functions in ifcvf_base. Then these functions are more rubust
and de-coupling from the ifcvf_adapter layer. So these
functions could be invoded once the device is probed, even
before the adapter is allocaed.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-2-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Eli Cohen
0a59975088 vdpa/mlx5: Add RX counters to debugfs
For each interface, either VLAN tagged or untagged, add two hardware
counters: one for unicast and another for multicast. The counters count
RX packets and bytes and can be read through debugfs:

$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/untagged/mcast/packets
$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/untagged/ucast/bytes

This feature is controlled via the config option
MLX5_VDPA_STEERING_DEBUG. It is off by default as it may have some
impact on performance.

includes a fixup By Yang Yingliang <yangyingliang@huawei.com>:

vdpa/mlx5: fix check wrong pointer in mlx5_vdpa_add_mac_vlan_rules()

The local variable 'rule' is not used anymore, fix return value
check after calling mlx5_add_flow_rules().

Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-9-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Message-Id: <20230104074418.1737510-1-yangyingliang@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eli Cohen <elic@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2023-02-20 19:26:54 -05:00
Eli Cohen
2942210043 vdpa/mlx5: Add debugfs subtree
Add debugfs subtree and expose flow table ID and TIR number. This
information can be used by external tools to do extended
troubleshooting.

The information can be retrieved like so:
$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/table_id
$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/tirn

Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-8-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Eli Cohen
72c67e9b90 vdpa/mlx5: Move some definitions to a new header file
Move some definitions from mlx5_vnet.c to newly added header file
mlx5_vnet.h. We need these definitions for the following patches that
add debugfs tree to expose information vital for debug.

Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-7-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Michael S. Tsirkin
b16a1756c7 virtio_blk: mark all zone fields LE
zone is a virtio 1.x feature so all fields are LE,
they are handled as such, but have mistakenly been labeled
__virtioXX in the header.  This results in a bunch of sparse warnings.

Use the __leXX tags to make sparse happy.

Message-Id: <20221222193214.55146-1-mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-15 06:46:22 -05:00
Michael S. Tsirkin
2a9c844e89 virtio_blk: zone append in header type tweak
virtio blk returns a 64 bit append_sector in an input buffer,
in LE format. This field is not tagged as LE correctly, so
even though the generated code is ok, we get warnings from sparse:

drivers/block/virtio_blk.c:332:33: sparse: sparse: cast to restricted __le64

Make sparse happy by using the correct type.

Message-Id: <20221220125154.564265-1-mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-15 06:46:22 -05:00
Michael S. Tsirkin
04e5421e6f virtio_blk: temporary variable type tweak
virtblk_result returns blk_status_t which is a bitwise restricted type,
so we are not supposed to stuff it in a plain int temporary variable.
All we do with it is pass it on to a function expecting blk_status_t so
the generated code is ok, but we get warnings from sparse:

drivers/block/virtio_blk.c:326:36: sparse: sparse: incorrect type in initializer (different base types) @@     expected int status @@
+got restricted blk_status_t @@
drivers/block/virtio_blk.c:334:33: sparse: sparse: incorrect type in argument 2 (different base types) @@     expected restricted
+blk_status_t [usertype] error @@     got int status @@

Make sparse happy by using the correct type.

Message-Id: <20221220124152.523531-1-mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2023-02-15 06:46:22 -05:00
Dmitry Fomichev
95bfec41bd virtio-blk: add support for zoned block devices
This patch adds support for Zoned Block Devices (ZBDs) to the kernel
virtio-blk driver.

The patch accompanies the virtio-blk ZBD support draft that is now
being proposed for standardization. The latest version of the draft is
linked at

https://github.com/oasis-tcs/virtio-spec/issues/143 .

The QEMU zoned device code that implements these protocol extensions
has been developed by Sam Li and it is currently in review at the QEMU
mailing list.

A number of virtblk request structure changes has been introduced to
accommodate the functionality that is specific to zoned block devices
and, most importantly, make room for carrying the Zoned Append sector
value from the device back to the driver along with the request status.

The zone-specific code in the patch is heavily influenced by NVMe ZNS
code in drivers/nvme/host/zns.c, but it is simpler because the proposed
virtio ZBD draft only covers the zoned device features that are
relevant to the zoned functionality provided by Linux block layer.

includes the following fixup:

virtio-blk: fix probe without CONFIG_BLK_DEV_ZONED

When building without CONFIG_BLK_DEV_ZONED, VIRTIO_BLK_F_ZONED
is excluded from array of driver features.
As a result virtio_has_feature panics in virtio_check_driver_offered_feature
since that by design verifies that a feature we are checking for
is listed in the feature array.

To fix, replace the call to virtio_has_feature with a stub.

Message-Id: <20221016034127.330942-3-dmitry.fomichev@wdc.com>
Co-developed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Message-Id: <20221220112340.518841-1-mst@redhat.com>
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Reported-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Debugged-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2023-02-15 06:46:22 -05:00
Ricardo Cañuelo
d16c0cd273 docs: driver-api: virtio: virtio on Linux
Basic doc about Virtio on Linux and a short tutorial on Virtio drivers.

includes the following fixup:

virtio: fix virtio_config_ops kerneldocs

Fixes two warning messages when building htmldocs:

    warning: duplicate section name 'Note'
    warning: expecting prototype for virtio_config_ops().
             Prototype was for vq_callback_t() instead

Message-Id: <20221010064359.1324353-2-ricardo.canuelo@collabora.com>
Signed-off-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20221220100035.2712449-1-ricardo.canuelo@collabora.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-15 06:46:22 -05:00
Michael Sammler
d5ff73bbb0 virtio_pmem: populate numa information
Compute the numa information for a virtio_pmem device from the memory
range of the device. Previously, the target_node was always 0 since
the ndr_desc.target_node field was never explicitly set. The code for
computing the numa node is taken from cxl_pmem_region_probe in
drivers/cxl/pmem.c.

Signed-off-by: Michael Sammler <sammler@google.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Tested-by: Mina Almasry <almasrymina@google.com>
Message-Id: <20221115214036.1571015-1-sammler@google.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-15 06:46:22 -05:00
Eugenio Pérez
0e84f918fa vdpa_sim: not reset state in vdpasim_queue_ready
vdpasim_queue_ready calls vringh_init_iotlb, which resets split indexes.
But it can be called after setting a ring base with
vdpasim_set_vq_state.

Fix it by stashing them. They're still resetted in vdpasim_vq_reset.

This was discovered and tested live migrating the vdpa_sim_net device.

Fixes: 2c53d0f64c06 ("vdpasim: vDPA device simulator")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230118164359.1523760-2-eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
2023-02-13 07:25:00 -05:00
Linus Torvalds
ceaa837f96 Linux 6.2-rc8 v6.2-rc8 2023-02-12 14:10:17 -08:00
John Paul Adrian Glaubitz
80510b63f7 MAINTAINERS: Add myself as maintainer for arch/sh (SUPERH)
Both Rich Felker and Yoshinori Sato haven't done any work on arch/sh
for a while. As I have been maintaining Debian's sh4 port since 2014,
I am interested to keep the architecture alive.

Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Acked-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-02-12 13:57:09 -08:00
Linus Torvalds
5e98e916f9 tracing: Fix showing of TASK_COMM_LEN instead of its value
The TASK_COMM_LEN was converted from a macro into an enum so that BTF
 would have access to it. But this unfortunately caused TASK_COMM_LEN to
 display in the format fields of trace events, as they are created by the
 TRACE_EVENT() macro and such, macros convert to their values, where as
 enums do not.
 
 To handle this, instead of using the field itself to be display, save the
 value of the array size as another field in the trace_event_fields
 structure, and use that instead. Not only does this fix the issue, but
 also converts the other trace events that have this same problem (but were
 not breaking tooling). With this change, the original work around
 b3bc8547d3be6 ("tracing: Have TRACE_DEFINE_ENUM affect trace event types
 as well") could be reverted (but that should be done in the merge window).
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY+lOqxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6quYPAQD+9j+RPIUm9Ms4XCIEOXkFI04yjsbd
 rQSRcpYBWyP59AEAnZNYNwE11vDsKBGxPrOPgGYe4Pzfr5yOWY84mgaxgwo=
 =iYsE
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:
 "Fix showing of TASK_COMM_LEN instead of its value

  The TASK_COMM_LEN was converted from a macro into an enum so that BTF
  would have access to it. But this unfortunately caused TASK_COMM_LEN
  to display in the format fields of trace events, as they are created
  by the TRACE_EVENT() macro and such, macros convert to their values,
  where as enums do not.

  To handle this, instead of using the field itself to be display, save
  the value of the array size as another field in the trace_event_fields
  structure, and use that instead.

  Not only does this fix the issue, but also converts the other trace
  events that have this same problem (but were not breaking tooling).

  With this change, the original work around b3bc8547d3be6 ("tracing:
  Have TRACE_DEFINE_ENUM affect trace event types as well") could be
  reverted (but that should be done in the merge window)"

* tag 'trace-v6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix TASK_COMM_LEN in trace event format file
2023-02-12 13:52:17 -08:00
Linus Torvalds
711e9a4d52 for-6.2-rc7-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmPo41YACgkQxWXV+ddt
 WDsPXA/8DPCp1PEvmkJ998wBCgSuoVvG9b4l1HOI0aFWC/giJWYsTdBF/+rFP/83
 +UFBmxDsbG8tMoq73Dw8XxTvmYwRUyCdtn/AmKkGpu/l9KF4fnM+RTIh94e4DaH7
 O1R5zPVOX34ScgL/bR6Hmcrw8a7q6yUmW9xORR40AAbYOccUld4nvUZOI+hVUbtN
 84pphG+U4KowtX2J4fqLWALGU/2hDP9Aiq3aKOdupoiRYJacx3FoMP4aaEblJlMk
 ViLJYBXrJ+6v71frjT4LgSdDd7+l6QEaHHlQwIxMrf3r7AXUkMerwoiOhasMRXTB
 WnZjC8XeS9yogY6Ls5/gIEEWB7buz6TFJwm3rwfXMM+0OQ1g0RFvjXQPD8sOLazS
 X/5ToML8SZYpfkmIMnP+hBnmAMFKpjC06o40cN5/96xkqqMAwL7ws+XIlso/Hx+l
 Lu01cgnDLluRflWtVwMLmrhOGLStjbiDJKmG4zKl/WsyqGdodjIUyCOjhB0Wy0CN
 RMrkvOUwngTfAdWQYTHDdxkTdn1+b/nB+N9BvLbD8Dt+Q5H7loGR+0mS5xsRNg4Q
 jDY0yLDtR6bDxvcp4L2Vz1ezn+dSo8XAR9zqd4pT+7mZ6tLsf0R5F3iedAZkaqQC
 1uVkjiHyi1Gq/6iKRwf72rQMNKdDmAgM+sDx0uQK5JyG8ZGqgLA=
 =KGNk
 -----END PGP SIGNATURE-----

Merge tag 'for-6.2-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - one more fix for a tree-log 'write time corruption' report, update
   the last dir index directly and don't keep in the log context

 - do VFS-level inode lock around FIEMAP to prevent a deadlock with
   concurrent fsync, the extent-level lock is not sufficient

 - don't cache a single-device filesystem device to avoid cases when a
   loop device is reformatted and the entry gets stale

* tag 'for-6.2-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: free device in btrfs_close_devices for a single device filesystem
  btrfs: lock the inode in shared mode before starting fiemap
  btrfs: simplify update of last_dir_index_offset when logging a directory
2023-02-12 11:26:36 -08:00
Linus Torvalds
e2bca0ebf7 USB fixes for 6.2-rc8
Here are 2 small USB driver fixes that resolve some reported regressions
 and one new device quirk.  Specifically these are:
   - new quirk for Alcor Link AK9563 smartcard reader
   - revert of u_ether gadget change in 6.2-rc1 that caused problems
   - typec pin probe fix
 
 All of these have been in linux-next with no reported problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY+jGFQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylYkQCgybIKsO7j9Mfi5nTsZktsWJRexu8AoMGuckH6
 zko7UCEUrN5mk4xy5w9p
 =1LuO
 -----END PGP SIGNATURE-----

Merge tag 'usb-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are 2 small USB driver fixes that resolve some reported
  regressions and one new device quirk. Specifically these are:

   - new quirk for Alcor Link AK9563 smartcard reader

   - revert of u_ether gadget change in 6.2-rc1 that caused problems

   - typec pin probe fix

  All of these have been in linux-next with no reported problems"

* tag 'usb-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: core: add quirk for Alcor Link AK9563 smartcard reader
  usb: typec: altmodes/displayport: Fix probe pin assign check
  Revert "usb: gadget: u_ether: Do not make UDC parent of the net device"
2023-02-12 11:18:57 -08:00
Linus Torvalds
dd78af9fde Final EFI fix for v6.2
A fix from Darren to widen the SMBIOS match for detecting Ampere Altra
 machines with problematic firmware. In the mean time, we are working on
 a more precise check, but this is still work in progress.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmPo134ACgkQw08iOZLZ
 jyT7Ywv/YrEUEm6dlWDSCkhWdli7B6pcwWhcvbsPcDLZWSDuo4vIJhtav1eeWkjs
 E2YDewXRSwewDlHeQPX+v25loVNKqbeBUDK6kX/DkNsk+jRVVM4Xt9myJA9XO//v
 YDO7Srbkbk/GlBkZFNUkVYPaEy5aKVO7l/hQZy+GTYhuA/UXZVtlZrqA0EJNOnT0
 xR7s+yXUNd0gBsgvTypnRACviL1qgZkY/yso51Gv/oXxzsVqm8K1XGsRVZblHwHK
 YfhgytI/kj6mZ4I6WaOYiCt5NTq+GT7g8lMUmHISHNXxl9qzvaZ51jV2Cxf/9Bck
 1RyfsIh3JoLHBlwCrfKRqIooitRENXWlIj+8PxZYG2/ONov7MkqEork7mSb1ITJw
 0uqb0tClIZE23C+fdHI7fctbNrh+CQLr1RjSz7iNX+HUWsXJRag6bDrjFXzwqQnx
 tLur+4QbpC8KbpwDoEQu74wveacJ6kn4r0KeKRWTp7IRsdA7NH+wQl6IJkMKr49A
 41UnT1x8
 =g+5h
 -----END PGP SIGNATURE-----

Merge tag 'efi-fixes-for-v6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fix from Ard Biesheuvel:
 "A fix from Darren to widen the SMBIOS match for detecting Ampere Altra
  machines with problematic firmware. In the mean time, we are working
  on a more precise check, but this is still work in progress"

* tag 'efi-fixes-for-v6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  arm64: efi: Force the use of SetVirtualAddressMap() on eMAG and Altra Max machines
2023-02-12 11:13:29 -08:00
Linus Torvalds
49a0bdb0a3 powerpc fixes for 6.2 #5
- Fix interrupt exit race with security mitigation switching.
 
  - Don't select ARCH_WANTS_NO_INSTR until warnings are fixed.
 
  - Build fix for CONFIG_NUMA=n.
 
 Thanks to: Nicholas Piggin, Randy Dunlap, Sachin Sant.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmPojjYTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPn4D/9opcq75nPUVTBmYnM5ar4MK4VtSp5b
 iOe12LV8W8bT0wy5A7RRoOkV+bBNHr3057S8W78Y/darkbiD+F29/Hf2pMiwxhLf
 l+Kl4GQbrFrerqIK6tgIZrtm/kooUOtSfzpYRduOTHnt8N6uXtg43glbr1/uIKZ8
 X21wJtnlCfhiysuf9zoa1VFB43QoFa6zE1zyTgo91Ofr+4tZJN5v66YYzO2fK+co
 PTLjrv45tpo/srnxGFbfY6yZmQkGu0j7Z17V9as1HZ8od9y7jCbIBS+hHz6Drc2K
 835VRc5pfeEO9RHobQoGOEgGxqvMCW0fbgvr2sMqaqc4z6ddtqJc9IDV/ed3q95C
 5BlxNqwT2KquC/PwDem1qg0KzebP2Q3r8sKLlenEsxh1CFYdG+cBOAyQ3w6bY8Gh
 rRhKTDJjmOIABVQf5ZCAMKvbdOMz2peA7tnbWU/PMVKkpE7GYtkrFsme83lfq+L6
 u7Kjvw1GJHcz1EhA58xYk2vSzJXyO6c8f4hbRXSuhQaVHLFmiLKp6l7dZ+2DJKkX
 CPRTpm7xCegYJoSUNyT8UrP1XPoln1ECEZHTo8U4L1o1wUWmoWafaXdpzbyhl36i
 mfyOp0JT2HwzdziFd6iCWDPovOGmqSdTgEH2PkCYl+gofKRGSYR7D1qs+e+B/zn3
 Bnw/bgl5bGausQ==
 =pSxf
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix interrupt exit race with security mitigation switching.

 - Don't select ARCH_WANTS_NO_INSTR until warnings are fixed.

 - Build fix for CONFIG_NUMA=n.

Thanks to Nicholas Piggin, Randy Dunlap, and Sachin Sant.

* tag 'powerpc-6.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s/interrupt: Fix interrupt exit race with security mitigation switch
  powerpc/kexec_file: fix implicit decl error
  powerpc: Don't select ARCH_WANTS_NO_INSTR
2023-02-12 11:08:15 -08:00
David Chen
462a8e08e0 Fix page corruption caused by racy check in __free_pages
When we upgraded our kernel, we started seeing some page corruption like
the following consistently:

  BUG: Bad page state in process ganesha.nfsd  pfn:1304ca
  page:0000000022261c55 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x1304ca
  flags: 0x17ffffc0000000()
  raw: 0017ffffc0000000 ffff8a513ffd4c98 ffffeee24b35ec08 0000000000000000
  raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000
  page dumped because: nonzero mapcount
  CPU: 0 PID: 15567 Comm: ganesha.nfsd Kdump: loaded Tainted: P    B      O      5.10.158-1.nutanix.20221209.el7.x86_64 #1
  Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
  Call Trace:
   dump_stack+0x74/0x96
   bad_page.cold+0x63/0x94
   check_new_page_bad+0x6d/0x80
   rmqueue+0x46e/0x970
   get_page_from_freelist+0xcb/0x3f0
   ? _cond_resched+0x19/0x40
   __alloc_pages_nodemask+0x164/0x300
   alloc_pages_current+0x87/0xf0
   skb_page_frag_refill+0x84/0x110
   ...

Sometimes, it would also show up as corruption in the free list pointer
and cause crashes.

After bisecting the issue, we found the issue started from commit
e320d3012d25 ("mm/page_alloc.c: fix freeing non-compound pages"):

	if (put_page_testzero(page))
		free_the_page(page, order);
	else if (!PageHead(page))
		while (order-- > 0)
			free_the_page(page + (1 << order), order);

So the problem is the check PageHead is racy because at this point we
already dropped our reference to the page.  So even if we came in with
compound page, the page can already be freed and PageHead can return
false and we will end up freeing all the tail pages causing double free.

Fixes: e320d3012d25 ("mm/page_alloc.c: fix freeing non-compound pages")
Link: https://lore.kernel.org/lkml/BYAPR02MB448855960A9656EEA81141FC94D99@BYAPR02MB4488.namprd02.prod.outlook.com/
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-02-12 10:30:05 -08:00