IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The MLX driver was not updating its control virtqueue size at set_vq_num
and instead always initialized to MLX5_CVQ_MAX_ENT (16) at
setup_cvq_vring.
Qemu would try to set the size to 64 by default, however, because the
CVQ size always was initialized to 16, an error would be thrown when
sending >16 control messages (as used-ring entry 17 is initialized to 0).
For example, starting a guest with x-svq=on and then executing the
following command would produce the error below:
# for i in {1..20}; do ifconfig eth0 hw ether XX:xx:XX:xx:XX:XX; done
qemu-system-x86_64: Insufficient written data (0)
[ 435.331223] virtio_net virtio0: Failed to set mac address by vq command.
SIOCSIFHWADDR: Invalid argument
Acked-by: Dragos Tatulea <dtatulea@nvidia.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <20240216142502.78095-1-jonah.palmer@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Fixes: 5262912ef3cf ("vdpa/mlx5: Add support for control VQ and MAC setting")
Track allocated mrs in a list and show warning when leaks are detected
on device free or reset.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20231225151203.152687-9-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Deleting the old mr during mr update (.set_map) and then modifying the
vqs with the new mr is not a good flow for firmware. The firmware
expects that mkeys are deleted after there are no more vqs referencing
them.
Introduce reference counting for mrs to fix this. It is the only way to
make sure that mkeys are not in use by vqs.
An mr reference is taken when the mr is associated to the mr asid table
and when the mr is linked to the vq on create/modify. The reference is
released when the mkey is unlinked from the vq (trough modify/destroy)
and from the mr asid table.
To make things consistent, get rid of mlx5_vdpa_destroy_mr and use
get/put semantics everywhere.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20231225151203.152687-8-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Instead of tearing down and setting up vq resources, use vq
suspend/resume during .set_map to speed things up a bit.
The vq mr is updated with the new mapping while the vqs are suspended.
If the device doesn't support resumable vqs, do the old teardown and
setup dance.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20231225151203.152687-7-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
.set_vq_state will set the indices and mark the fields to be modified in
the hw vq.
Advertise that the device supports changing the vq state when the device
is in DRIVER_OK state and suspended.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231225151203.152687-6-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Addresses get set by .set_vq_address. hw vq addresses will be updated on
next modify_virtqueue.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20231225151203.152687-5-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Implement vdpa vq and device resume if capability detected. Add support
for suspend -> ready state change.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20231225151203.152687-4-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add a bitmask variable that tracks hw vq field changes that
are supposed to be modified on next hw vq change command.
This will be useful to set multiple vq fields when resuming the vq.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20231225151203.152687-3-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
mlx5_vdpa does not preserve userland's view of vring base for the control
queue in the following sequence:
ioctl VHOST_SET_VRING_BASE
ioctl VHOST_VDPA_SET_STATUS VIRTIO_CONFIG_S_DRIVER_OK
mlx5_vdpa_set_status()
setup_cvq_vring()
vringh_init_iotlb()
vringh_init_kern()
vrh->last_avail_idx = 0;
ioctl VHOST_GET_VRING_BASE
To fix, restore the value of cvq->vring.last_avail_idx after calling
vringh_init_iotlb.
Fixes: 5262912ef3cf ("vdpa/mlx5: Add support for control VQ and MAC setting")
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <1699014387-194368-1-git-send-email-steven.sistare@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Since commit 6f5312f80183 ("vdpa/mlx5: Add support for running with
virtio_vdpa"), mlx5_vdpa starts with preallocate 1:1 DMA MR at device
creation time. This 1:1 DMA MR will be implicitly destroyed while the
first .set_map call is invoked, in which case callers like vhost-vdpa
will start to set up custom mappings. When the .reset callback is
invoked, the custom mappings will be cleared and the 1:1 DMA MR will be
re-created.
In order to reduce excessive memory mapping cost in live migration, it
is desirable to decouple the vhost-vdpa IOTLB abstraction from the
virtio device life cycle, i.e. mappings can be kept around intact across
virtio device reset. Leverage the .reset_map callback, which is meant to
destroy the regular MR (including cvq mapping) on the given ASID and
recreate the initial DMA mapping. That way, the device .reset op runs
free from having to maintain and clean up memory mappings by itself.
Additionally, implement .compat_reset to cater for older userspace,
which may wish to see mapping to be cleared during reset.
Co-developed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Message-Id: <1697880319-4937-7-git-send-email-si-wei.liu@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Offer this backend feature as mlx5 is compatible with it. It allows it
to do live migration with CVQ, dynamically switching between passthrough
and shadow virtqueue.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230703142514.363256-1-eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
For the following sequence:
- cvq group is in ASID 0
- .set_map(1, cvq_iotlb)
- .set_group_asid(cvq_group, 1)
... the cvq mapping from ASID 0 will be used. This is not always correct
behaviour.
This patch adds support for the above mentioned flow by saving the iotlb
on each .set_map and updating the cvq iotlb with it on a cvq group change.
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20231018171456.1624030-18-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Tested-by: Si-Wei Liu <si-wei.liu@oracle.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Vq descriptor mappings are supported in hardware by filling in an
additional mkey which contains the descriptor mappings to the hw vq.
A previous patch in this series added support for hw mkey (mr) creation
for ASID 1.
This patch fills in both the vq data and vq descriptor mkeys based on
group ASID mapping.
The feature is signaled to the vdpa core through the presence of the
.get_vq_desc_group op.
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20231018171456.1624030-16-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Tested-by: Si-Wei Liu <si-wei.liu@oracle.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Introduce the vq descriptor group and mr per ASID. Until now
.set_map on ASID 1 was only updating the cvq iotlb. From now on it also
creates a mkey for it. The current patch doesn't use it but follow-up
patches will add hardware support for mapping the vq descriptors.
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20231018171456.1624030-15-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Tested-by: Si-Wei Liu <si-wei.liu@oracle.com>
Tested-by: Lei Yang <leiyang@redhat.com>
The current flow for updating an mr works directly on mvdev->mr which
makes it cumbersome to handle multiple new mr structs.
This patch makes the flow more straightforward by having
mlx5_vdpa_create_mr return a new mr which will update the old mr (if
any). The old mr will be deleted and unlinked from mvdev. For the case
when the iotlb is empty (not NULL), the old mr will be cleared.
This change paves the way for adding mrs for different ASIDs.
The initialized bool is no longer needed as mr is now a pointer in the
mlx5_vdpa_dev struct which will be NULL when not initialized.
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20231018171456.1624030-14-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Tested-by: Si-Wei Liu <si-wei.liu@oracle.com>
Tested-by: Lei Yang <leiyang@redhat.com>
This patch adapts the mr creation/deletion code to be able to work with
any given mr struct pointer. All the APIs are adapted to take an extra
parameter for the mr.
mlx5_vdpa_create/delete_mr doesn't need a ASID parameter anymore. The
check is done in the caller instead (mlx5_set_map).
This change is needed for a followup patch which will introduce an
additional mr for the vq descriptor data.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231018171456.1624030-12-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Tested-by: Si-Wei Liu <si-wei.liu@oracle.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Make mlx5_destroy_mr symmetric to mlx5_create_mr.
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20231018171456.1624030-11-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Tested-by: Si-Wei Liu <si-wei.liu@oracle.com>
Tested-by: Lei Yang <leiyang@redhat.com>
The handling of the cvq iotlb is currently coupled with the creation
and destruction of the hardware mkeys (mr).
This patch moves cvq iotlb handling into its own function and shifts it
to a scope that is not related to mr handling. As cvq handling is just a
prune_iotlb + dup_iotlb cycle, put it all in the same "update" function.
Finally, the destruction path is handled by directly pruning the iotlb.
After this move is done the ASID mr code can be collapsed into a single
function.
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20231018171456.1624030-8-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Tested-by: Si-Wei Liu <si-wei.liu@oracle.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Necessary for upcoming cvq separation from mr allocation.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20231018171456.1624030-7-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Tested-by: Si-Wei Liu <si-wei.liu@oracle.com>
Tested-by: Lei Yang <leiyang@redhat.com>
A firmware error is triggered when configuring a 9k MTU on the PF after
switching to switchdev mode and then using a vdpa device with larger
(1k) rings:
mlx5_cmd_out_err: CREATE_GENERAL_OBJECT(0xa00) op_mod(0xd) failed, status bad resource(0x5), syndrome (0xf6db90), err(-22)
This is due to the fact that the hw VQ size parameters are computed
based on the umem_1/2/3_buffer_param_a/b capabilities and all
device capabilities are read only when the driver is moved to switchdev mode.
The problematic configuration flow looks like this:
1) Create VF
2) Unbind VF
3) Switch PF to switchdev mode.
4) Bind VF
5) Set PF MTU to 9k
6) create vDPA device
7) Start VM with vDPA device and 1K queue size
Note that setting the MTU before step 3) doesn't trigger this issue.
This patch reads the forementioned umem parameters at the latest point
possible before the VQs of the device are created.
v2:
- Allocate output with kmalloc to reduce stack frame size.
- Removed stable from cc.
Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20230831155702.1080754-1-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
The error path in setup_driver deletes the debugfs entry but doesn't
clear the pointer. During .dev_del the invalid pointer will be released
again causing a crash.
This patch fixes the issue by always clearing the debugfs entry in
mlx5_vdpa_remove_debugfs. Also, stop removing the debugfs entry in
.dev_del op: the debugfs entry is already handled within the
setup_driver/teardown_driver scope.
Cc: stable@vger.kernel.org
Fixes: f0417e72add5 ("vdpa/mlx5: Add and remove debugfs in setup/teardown driver")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Message-Id: <20230829174014.928189-2-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/sfc/tc.c
fa165e194997 ("sfc: don't unregister flow_indr if it was never registered")
3bf969e88ada ("sfc: add MAE table machinery for conntrack table")
https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mlx5_vdpa_destroy_mr can be called from .set_map with data ASID after
the control virtqueue ASID iotlb has been populated. The control vq
iotlb must not be cleared, since it will not be populated again.
So call the ASID aware destroy function which makes sure that the
right vq resource is destroyed.
Fixes: 8fcd20c30704 ("vdpa/mlx5: Support different address spaces for control and data")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Message-Id: <20230802171231.11001-5-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
The standard specifies that the initial number of queues is the
default, which is 1 (1 tx, 1 rx).
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230727172354.68243-2-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
This commit enables the dynamic allocation of EQs at runtime, allowing
for more flexibility in managing completion EQs and reducing the memory
overhead of driver load. Whenever a CQ is created for a given vector
index, the driver will lookup to see if there is an already mapped
completion EQ for that vector, if so, utilize it. Otherwise, allocate a
new EQ on demand and then utilize it for the CQ completion events.
Add a protection lock to the EQ table to protect from concurrent EQ
creation attempts.
While at it, replace mlx5_vector2irqn()/mlx5_vector2eqn() with
mlx5_comp_eqn_get() and mlx5_comp_irqn_get() which will allocate an
EQ on demand if no EQ is found for the given vector.
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add support for generation of interrupts from the device directly to the
VM to the VCPU thus avoiding the overhead on the host CPU.
When supported, the driver will attempt to allocate vectors for each
data virtqueue. If a vector for a virtqueue cannot be provided it will
use the QP mode where notifications go through the driver.
In addition, we add a shutdown callback to make sure allocated
interrupts are released in case of shutdown to allow clean shutdown.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Message-Id: <20230607190007.290505-1-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Extend the possible list for features that can be supported by firmware.
Note that different versions of firmware may or may not support these
features. The driver is made aware of them by querying the firmware.
While doing this, improve the code so we use enum names instead of hard
coded numerical values.
The new features supported by the driver are the following:
VIRTIO_NET_F_MRG_RXBUF
VIRTIO_NET_F_HOST_ECN
VIRTIO_NET_F_GUEST_ECN
VIRTIO_NET_F_GUEST_TSO6
VIRTIO_NET_F_GUEST_TSO4
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20230321112809.221432-3-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eugenio Pérez Martin <eperezma@redhat.com>
Following patch adds driver support for VIRTIO_NET_F_MRG_RXBUF.
Current firmware versions show degradation in packet rate when using
MRG_RXBUF. Users who favor memory saving over packet rate could enable
this feature but we want to keep it off by default.
One can still enable it when creating the vdpa device using vdpa tool by
providing features that include it.
For example:
$ vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 device_features 0x300cb982b
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20230321112809.221432-2-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Current code ignores link state updates if VIRTIO_NET_F_STATUS was not
negotiated. However, link state updates could be received before feature
negotiation was completed , therefore causing link state events to be
lost, possibly leaving the link state down.
Modify the code so link state notifier is registered after DRIVER_OK was
negotiated and carry the registration only if
VIRTIO_NET_F_STATUS was negotiated. Unregister the notifier when the
device is reset.
Fixes: 033779a708f0 ("vdpa/mlx5: make MTU/STATUS presence conditional on feature bits")
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20230417110343.138319-1-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The right place to add the debugfs create is in
setup_driver() and remove it in teardown_driver().
Current code adds the debugfs when creating the device but resetting a
device will remove the debugfs subtree and subsequent set_driver will
not be able to create the files since the debugfs pointer is NULL.
Fixes: 294221004322 ("vdpa/mlx5: Add debugfs subtree")
Signed-off-by: Eli Cohen <elic@nvidia.com>
v3 -> v4:
Fix error flow in setup_driver()
Message-Id: <20230403114039.11102-1-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Otherwise the virtqueue object to instate could point to invalid address
that was unmapped from the MTT:
mlx5_core 0000:41:04.2: mlx5_cmd_out_err:782:(pid 8321):
CREATE_GENERAL_OBJECT(0xa00) op_mod(0xd) failed, status
bad parameter(0x3), syndrome (0x5fa1c), err(-22)
Fixes: cae15c2ed8e6 ("vdpa/mlx5: Implement susupend virtqueue callback")
Cc: Eli Cohen <elic@nvidia.com>
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Message-Id: <1676424640-11673-1-git-send-email-si-wei.liu@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
This patch implements features provisioning for mlx5_vdpa.
1) Validate the provisioned features are a subset of the parent
features.
2) Clearing features that are not wanted by userspace.
For example:
# vdpa mgmtdev show
pci/0000:41:04.2:
supported_classes net
max_supported_vqs 65
dev_features CSUM GUEST_CSUM MTU MAC HOST_TSO4 HOST_TSO6 STATUS CTRL_VQ CTRL_VLAN MQ CTRL_MAC_ADDR VERSION_1 ACCESS_PLATFORM
1) Provision vDPA device with all features derived from the parent
# vdpa dev add name vdpa1 mgmtdev pci/0000:41:04.2
# vdpa dev config show
vdpa1: mac e4:11:c6:d3:45:f0 link up link_announce false max_vq_pairs 1 mtu 1500
negotiated_features CSUM GUEST_CSUM MTU HOST_TSO4 HOST_TSO6 STATUS CTRL_VQ CTRL_VLAN MQ CTRL_MAC_ADDR VERSION_1 ACCESS_PLATFORM
2) Provision vDPA device with a subset of parent features
# vdpa dev add name vdpa1 mgmtdev pci/0000:41:04.2 device_features 0x300020000
# vdpa dev config show
vdpa1:
negotiated_features CTRL_VQ VERSION_1 ACCESS_PLATFORM
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Message-Id: <1675725124-7375-7-git-send-email-si-wei.liu@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The spec says:
mtu only exists if VIRTIO_NET_F_MTU is set
status only exists if VIRTIO_NET_F_STATUS is set
We should only present MTU and STATUS conditionally depending on
the feature bits.
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Message-Id: <1675725124-7375-6-git-send-email-si-wei.liu@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch implements per virtqueue dma device for mlx5_vdpa. This is
needed for virtio_vdpa to work for CVQ which is backed by vringh but
not DMA. We simply advertise the vDPA device itself as the DMA device
for CVQ then DMA API can simply use PA so the identical mapping for
CVQ can still be used. Otherwise the identical (1:1) mapping won't
work when platform IOMMU is enabled since the IOVA is allocated on
demand which is not necessarily the PA.
This fixes the following crash when mlx5 vDPA device is bound to
virtio-vdpa with platform IOMMU enabled but not in passthrough mode:
BUG: unable to handle page fault for address: ff2fb3063deb1002
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 1393001067 P4D 1393002067 PUD 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 55 PID: 8923 Comm: kworker/u112:3 Kdump: loaded Not tainted 6.1.0+ #7
Hardware name: Dell Inc. PowerEdge R750/0PJ80M, BIOS 1.5.4 12/17/2021
Workqueue: mlx5_vdpa_wq mlx5_cvq_kick_handler [mlx5_vdpa]
RIP: 0010:vringh_getdesc_iotlb+0x93/0x1d0 [vringh]
Code: 14 25 40 ef 01 00 83 82 c0 0a 00 00 01 48 2b 05 93 5a 1b ea 8b 4c 24 14 48 c1 f8 06 48 c1 e0 0c 48 03 05 90 5a 1b ea 48 01 c8 <0f> b7 00 83 aa c0 0a 00 00 01 65 ff 0d bc e4 41 3f 0f 84 05 01 00
RSP: 0018:ff46821ba664fdf8 EFLAGS: 00010282
RAX: ff2fb3063deb1002 RBX: 0000000000000a20 RCX: 0000000000000002
RDX: ff2fb318d2f94380 RSI: 0000000000000002 RDI: 0000000000000001
RBP: ff2fb3065e832410 R08: ff46821ba664fe00 R09: 0000000000000001
R10: 0000000000000000 R11: 000000000000000d R12: ff2fb3065e832488
R13: ff2fb3065e8324a8 R14: ff2fb3065e8324c8 R15: ff2fb3065e8324a8
FS: 0000000000000000(0000) GS:ff2fb3257fac0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ff2fb3063deb1002 CR3: 0000001392010006 CR4: 0000000000771ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
mlx5_cvq_kick_handler+0x89/0x2b0 [mlx5_vdpa]
process_one_work+0x1e2/0x3b0
? rescuer_thread+0x390/0x390
worker_thread+0x50/0x3a0
? rescuer_thread+0x390/0x390
kthread+0xd6/0x100
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Tested-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230119061525.75068-6-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
For each interface, either VLAN tagged or untagged, add two hardware
counters: one for unicast and another for multicast. The counters count
RX packets and bytes and can be read through debugfs:
$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/untagged/mcast/packets
$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/untagged/ucast/bytes
This feature is controlled via the config option
MLX5_VDPA_STEERING_DEBUG. It is off by default as it may have some
impact on performance.
includes a fixup By Yang Yingliang <yangyingliang@huawei.com>:
vdpa/mlx5: fix check wrong pointer in mlx5_vdpa_add_mac_vlan_rules()
The local variable 'rule' is not used anymore, fix return value
check after calling mlx5_add_flow_rules().
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-9-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Message-Id: <20230104074418.1737510-1-yangyingliang@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eli Cohen <elic@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Add debugfs subtree and expose flow table ID and TIR number. This
information can be used by external tools to do extended
troubleshooting.
The information can be retrieved like so:
$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/table_id
$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/tirn
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-8-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Move some definitions from mlx5_vnet.c to newly added header file
mlx5_vnet.h. We need these definitions for the following patches that
add debugfs tree to expose information vital for debug.
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-7-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When qemu uses different address spaces for data and control virtqueues,
the current code would overwrite the control virtqueue iotlb through the
dup_iotlb call. Fix this by referring to the address space identifier
and the group to asid mapping to determine which mapping needs to be
updated. We also move the address space logic from mlx5 net to core
directory.
Reported-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-6-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
event_handler runs under atomic context and may not acquire reslock. We
can still guarantee that the handler won't be called after suspend by
clearing nb_registered, unregistering the handler and flushing the
workqueue.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-5-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Delete the old MAC from the table and not the new one which is not there
yet.
Fixes: baf2ad3f6a98 ("vdpa/mlx5: Add RX MAC VLAN filter support")
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-4-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Check if VIRTIO_NET_F_CTRL_VLAN is negotiated and return error if
control VQ command is received.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-3-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Set the VLAN id to the header values field instead of overwriting the
headers criteria field.
Before this fix, VLAN filtering would not really work and tagged packets
would be forwarded unfiltered to the TIR.
Fixes: baf2ad3f6a98 ("vdpa/mlx5: Add RX MAC VLAN filter support")
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-2-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
RQT objects require that a power of two value be configured for both
rqt_max_size and rqt_actual size.
For create_rqt, make sure to round up to the power of two the value of
given by the user who created the vdpa device and given by
ndev->rqt_size. The actual size is also rounded up to the power of two
using the current number of VQs given by ndev->cur_num_vqs.
Same goes with modify_rqt where we need to make sure act size is power
of two based on the new number of QPs.
Without this patch, attempt to create a device with non power of two QPs
would result in error from firmware.
Fixes: 52893733f2c5 ("vdpa/mlx5: Add multiqueue support")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20220912125019.833708-1-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Initialize err local variable to return -EAGAIN if the asid cannot be
found thus avoiding returning uninitialized value.
Fixes: 8fcd20c30704 ("vdpa/mlx5: Support different address spaces for control and data")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20220811134010.952291-1-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Partition virtqueues to two different address spaces: one for control
virtqueue which is implemented in software, and one for data virtqueues.
Based-on: <20220526124338.36247-1-eperezma@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20220714113927.85729-3-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Implement the suspend callback allowing to suspend the virtqueues so
they stop processing descriptors. This is required to allow to query a
consistent state of the virtqueue while live migration is taking place.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20220714113927.85729-2-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Using eth_broadcast_addr() to assign broadcast address instead
of memset().
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Xu Qiang <xuqiang36@huawei.com>
Message-Id: <20220704021405.64545-1-xuqiang36@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Currently, CVQ vringh is initialized inside setup_virtqueues() which is
called every time a memory update is done. This is undesirable since it
resets all the context of the vring, including the available and used
indices.
Move the initialization to mlx5_vdpa_set_status() when
VIRTIO_CONFIG_S_DRIVER_OK is set.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20220613075958.511064-2-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>