IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Expand representor vport stat group to support all counters from the
vport stat group, to count all the traffic passing through the vport.
Fix current implementation where fill_stats and update_stats use
different structs.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Today multipath offload is only supported when the number of
nexthops is 2 which block the use of it in case of system with
2 NICs.
This patch solve it by enabling multipath offload per NIC if
2 nexthops of the route are its uplinks.
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Initialize the meter object with the TC police mtu parameter.
Use the hardware range destination to compare the pkt len to the mtu setting.
Assign the range destination hit/miss ft to the police conform/exceed
attributes.
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
TC police action may configure the maximum packet size to be handled by
the policer, in addition to byte/packet rate.
MTU check is realized in hardware using the range destination, specifying
a hit ft, if packet len is in the range, or miss ft otherwise.
Instantiate mtu green/red flow tables with a single match-all rule.
Add the green/red actions to the hit/miss table accordingly.
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
TC police action may configure the maximum packet size to be handled by
the policer, in addition to byte/packet rate.
Currently the post meter table steers the packet according to the meter
aso output.
Refactor the code to allow both metering and range post actions as a
pre-step for adding police mtu offload support.
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add support for matching on range.
The supported type of range is L2 frame size.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Up until now miss address in all the STEs was used to connect miss lists
and to link the last STE in the list to end anchor.
Match range STE will require special handling because its miss address is
part of the 'action'. That is, range action has hit and miss addresses.
Since the range action is always the last action, need to make sure that
its miss address isn't overwritten by the end anchor.
Adding new function mlx5dr_ste_is_miss_addr_set() to answer the question
whether the STE's miss address has already been set as part of STE
initialization. Use a callback that always returns false right now. Once
match range is added, a different callback will be used for that STE type.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In preparation for MATCH RANGE STE support, create a function
to set the miss address of an STE.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In many cases different actions will ask for the same definer format.
Instead of allocating new definer general object and running out of
definers, have an xarray of allocated definers and keep track of their
usage with refcounts: allocate a new definer only when there isn't
one with the same format already created, and destroy definer only
when its refcount runs down to zero.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
As preparation for range action support, moving the handling
of final ICM address for flow table action to a separate function.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This patch handles the following two changes w.r.t. is_fw_table function:
1. When SW steering is asked to create/destroy FW table, we allow for
creation/destruction of only termination tables. Rename mlx5_dr_is_fw_table
both to comply with the static function naming and to reflect that we're
actually checking for FW termination table.
2. When the action 'go to flow table' is created, the destination flow
table can be any FW table, not only termination table. Adding function
to check if the dest table is FW table. This function will also be used
by the later creation of range match action, so putting it the header file.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
SW steering is able to match only on the exact values of the packet fields,
as requested by the user: the user provides mask for the fields that are of
interest, and the exact values to be matched on when the traffic is handled.
Match Definer is a general FW object that defines which fields in the
packet will be referenced by the mask and tag of each STE. Match definer ID
is part of STE fields, and it defines how the HW needs to interpret the STE's
mask/tag values.
Till now SW steering used the definers that were managed by FW and implemented
the STE layout as described by the HW spec. Now that we're adding a new type
of STE, SW steering needs to define for the HW how it should interpret this
new STE's layout.
This is done with a programmable match definer.
The programmable definer allows to selects which fields will be included in
the definer, and their layout: it has up to 9 DW selectors 8 Byte selectors.
Each selector indicates a DW/Byte worth of fields out of the table that
is defined by HW spec by referencing the offset of the required DW/Byte.
This patch adds dr_cmd function to create and destroy MATCH_DEFINER
general object.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Range is a new flow destination type which allows matching on
a range of values instead of matching on a specific value.
Range flow destination has the following fields:
- hit_ft: flow table to forward the traffic in case of hit
- miss_ft: flow table to forward the traffic in case of miss
- field: which packet characteristic to match on
- min: minimal value for the selected field
- max: maximal value for the selected field
Note:
- In order to match, the value in the packet should meet
the following criteria: min <= value < max
- Currently, the only supported field type is L2 packet length
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Update full structure of match definer and add an ID of
the SELECT match definer type.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Eric Dumazet says:
====================
mlx4: better BIG-TCP support
mlx4 uses a bounce buffer in TX whenever the tx descriptors
wrap around the right edge of the ring.
Size of this bounce buffer was hard coded and can be
increased if/when needed.
====================
Link: https://lore.kernel.org/r/20221207141237.2575012-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test against MLX4_MAX_DESC_TXBBS only matters if the TX
bounce buffer is going to be used.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Google production kernel has increased MAX_SKB_FRAGS to 45
for BIG-TCP rollout.
Unfortunately mlx4 TX bounce buffer is not big enough whenever
an skb has up to 45 page fragments.
This can happen often with TCP TX zero copy, as one frag usually
holds 4096 bytes of payload (order-0 page).
Tested:
Kernel built with MAX_SKB_FRAGS=45
ip link set dev eth0 gso_max_size 185000
netperf -t TCP_SENDFILE
I made sure that "ethtool -G eth0 tx 64" was properly working,
ring->full_size being set to 15.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Wei Wang <weiwan@google.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MAX_DESC_SIZE is really the size of the bounce buffer used
when reaching the right side of TX ring buffer.
MAX_DESC_TXBBS get a MLX4_ prefix.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Saeed Mahameed says:
====================
Support tc police jump conform-exceed attribute
The tc police action conform-exceed option defines how to handle
packets which exceed or conform to the configured bandwidth limit.
One of the possible conform-exceed values is jump, which skips over
a specified number of actions.
This series adds support for conform-exceed jump action.
The series adds platform support for branching actions by providing
true/false flow attributes to the branching action.
This is necessary for supporting police jump, as each branch may
execute a different action list.
The first five patches are preparation patches:
- Patches 1 and 2 add support for actions with no destinations (e.g. drop)
- Patch 3 refactor the code for subsequent function reuse
- Patch 4 defines an abstract way for identifying terminating actions
- Patch 5 updates action list validations logic considering branching actions
The following three patches introduce an interface for abstracting branching
actions:
- Patch 6 introduces an abstract api for defining branching actions
- Patch 7 generically instantiates the branching flow attributes using
the abstract API
Patch 8 adds the platform support for jump actions, by executing the following
sequence:
a. Store the jumping flow attr
b. Identify the jump target action while iterating the actions list.
c. Instantiate a new flow attribute after the jump target action.
This is the flow attribute that the branching action should jump to.
d. Set the target post action id on:
d.1. The jumping attribute, thus realizing the jump functionality.
d.2. The attribute preceding the target jump attr, if not terminating.
The next patches apply the platform's branching attributes to the police
action:
- Patch 9 is a refactor patch
- Patch 10 initializes the post meter table with the red/green flow attributes,
as were initialized by the platform
- Patch 11 enables the offload of meter actions using jump conform-exceed
value.
====================
Link: https://lore.kernel.org/all/20221203221337.29267-1-saeed@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Separate the matchall police action validation from flower validation.
Isolate the action validation logic in the police action parser.
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-12-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instantiate the post meter actions with the platform initialized branching
action attributes.
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-11-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently post meter supports only the pipe/drop conform-exceed policy.
This assumption is reflected in several variable names.
Rename the following variables as a pre-step for using the generalized
branching action platform.
Rename fwd_green_rule/drop_red_rule to green_rule/red_rule respectively.
Repurpose red_counter/green_counter to act_counter/drop_counter to allow
police conform-exceed configurations that do not drop.
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-10-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Identify the jump target action when iterating the action list.
Initialize the jump target attr with the jumping attribute during the
parsing phase. Initialize the jumping attr post action with the target
during the offload phase.
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-9-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Initialize flow attribute for drop, accept, pipe and jump branching actions.
Instantiate a flow attribute instance according to the specified branch
control action. Store the branching attributes on the branching action
flow attribute during the parsing phase. Then, during the offload phase,
allocate the relevant mod header objects to the branching actions.
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-8-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Extend the act tc api to set the branch control params aligning with
the police conform/exceed use case.
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-7-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently the entire flow action list is validate for offload limitations.
For example, flow with both forward and drop actions are declared invalid
due to hardware restrictions.
However, a multi-table hardware model changes the limitations from a flow
scope to a single flow attribute scope.
Apply offload limitations to flow attributes instead of the entire flow.
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-6-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Extend act api to identify actions that terminate action list.
Pre-step for terminating branching actions.
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-5-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After the tc action parsing phase the flow attribute is initialized with
relevant eswitch offload objects such as tunnel, vlan, header modify and
counter attributes. The post processing is done both for fdb and post-action
attributes.
Reuse the flow attribute post parsing logic by both fdb and post-action
offloads.
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-4-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently create_flow_handle() assumes a null dest pointer when there
are no destinations.
This might not be the case as the caller may pass an allocated dest
array while setting the dest_num parameter to 0.
Assert null dest array for flow rules that have no destinations (e.g. drop
rule).
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-3-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rules with drop action are not required to have a destination.
Currently the destination list is allocated with the maximum number of
destinations and passed to the fs_core layer along with the actual number
of destinations.
Remove redundant passing of dest pointer when count of dest is 0.
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-2-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Roger Quadros says:
====================
net: ethernet: ti: am65-cpsw: Fix set channel operation
This contains a critical bug fix for the recently merged suspend/resume
support [1] that broke set channel operation. (ethtool -L eth0 tx <n>)
As there were 2 dependent patches on top of the offending commit [1]
first revert them and then apply them back after the correct fix.
[1] fd23df72f2be ("net: ethernet: ti: am65-cpsw: Add suspend/resume support")
====================
Link: https://lore.kernel.org/r/20221206094419.19478-1-rogerq@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On low power during system suspend the ALE table context is lost.
Save the ALE context before suspend and restore it after resume.
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During suspend resume the context of PORT_VLAN_REG is lost so
save it during suspend and restore it during resume for
host port and slave ports.
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add PM handlers for System suspend/resume.
As DMA driver doesn't yet support suspend/resume we free up
the DMA channels at suspend and acquire and initialize them
at resume.
In this revised approach we do not free the TX/RX IRQs at
am65_cpsw_nuss_common_stop() as it causes problems.
We will now free them only on .suspend() as we need to release
the DMA channels (as DMA looses context) and re-acquiring
them on .resume() may not necessarily give us the same
IRQs.
To make this easier:
- introduce am65_cpsw_nuss_remove_rx_chns() which is
similar to am65_cpsw_nuss_remove_tx_chns(). These will
be invoked in pm.suspend() to release the DMA channels
and free up the IRQs.
- move napi_add() and request_irq() calls to
am65_cpsw_nuss_init_rx/tx_chns() so we can invoke them
in pm.resume() to acquire the DMA channels and IRQs.
As CPTS looses contect during suspend/resume, invoke the
necessary CPTS suspend/resume helpers.
ALE_CLEAR command is issued in cpsw_ale_start() so no need
to issue it before the call to cpsw_ale_start().
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit fd23df72f2be317d38d9fde0a8996b8e7454fd2a.
This commit broke set channel operation. Revert this and
implement it with a different approach in a separate patch.
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 643cf0e3ab5ccee37b3c53c018bd476c45c4b70e.
This is to make it easier to revert the offending commit
fd23df72f2be ("net: ethernet: ti: am65-cpsw: Add suspend/resume support")
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 1af3cb3702d02167926a2bd18580cecb2d64fd94.
This is to make it easier to revert the offending commit
fd23df72f2be ("net: ethernet: ti: am65-cpsw: Add suspend/resume support")
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shay Drory says:
====================
devlink: Add port function attribute to enable/disable Roce and migratable
This series is a complete rewrite of the series "devlink: Add port
function attribute to enable/disable roce"
link:
https://lore.kernel.org/netdev/20221102163954.279266-1-danielj@nvidia.com/
Currently mlx5 PCI VF and SF are enabled by default for RoCE
functionality. And mlx5 PCI VF is disable by dafault for migratable
functionality.
Currently a user does not have the ability to disable RoCE for a PCI
VF/SF device before such device is enumerated by the driver.
User is also incapable to do such setting from smartnic scenario for a
VF from the smartnic.
Current 'enable_roce' device knob is limited to do setting only at
driverinit time. By this time device is already created and firmware has
already allocated necessary system memory for supporting RoCE.
Also, Currently a user does not have the ability to enable migratable
for a PCI VF.
The above are a hyper visor level control, to set the functionality of
devices passed through to guests.
This is achieved by extending existing 'port function' object to control
capabilities of a function. This enables users to control capability of
the device before enumeration.
Examples when user prefers to disable RoCE for a VF when using switchdev
mode:
$ devlink port show pci/0000:06:00.0/1
pci/0000:06:00.0/1: type eth netdev pf0vf0 flavour pcivf controller 0
pfnum 0 vfnum 0 external false splittable false
function:
hw_addr 00:00:00:00:00:00 roce enable
$ devlink port function set pci/0000:06:00.0/1 roce disable
$ devlink port show pci/0000:06:00.0/1
pci/0000:06:00.0/1: type eth netdev pf0vf0 flavour pcivf controller 0
pfnum 0 vfnum 0 external false splittable false
function:
hw_addr 00:00:00:00:00:00 roce disable
FAQs:
-----
1. What does roce enable/disable do?
Ans: It disables RoCE capability of the function before its enumerated,
so when driver reads the capability from the device firmware, it is
disabled.
At this point RDMA stack will not be able to create UD, QP1, RC, XRC
type of QPs. When RoCE is disabled, the GID table of all ports of the
device is disabled in the device and software stack.
2. How is the roce 'port function' option different from existing
devlink param?
Ans: RoCE attribute at the port function level disables the RoCE
capability at the specific function level; while enable_roce only does
at the software level.
3. Why is this option for disabling only RoCE and not the whole RDMA
device?
Ans: Because user still wants to use the RDMA device for non RoCE
commands in more memory efficient way.
====================
Link: https://lore.kernel.org/r/20221206185119.380138-1-shayd@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Implement devlink port function commands to enable / disable migratable.
This is used to control the migratable capability of the device.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Expose port function commands to enable / disable migratable
capability, this is used to set the port function as migratable.
Live migration is the process of transferring a live virtual machine
from one physical host to another without disrupting its normal
operation.
In order for a VM to be able to perform LM, all the VM components must
be able to perform migration. e.g.: to be migratable.
In order for VF to be migratable, VF must be bound to VFIO driver with
migration support.
When migratable capability is enabled for a function of the port, the
device is making the necessary preparations for the function to be
migratable, which might include disabling features which cannot be
migrated.
Example of LM with migratable function configuration:
Set migratable of the VF's port function.
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
vfnum 1
function:
hw_addr 00:00:00:00:00:00 migratable disable
$ devlink port function set pci/0000:06:00.0/2 migratable enable
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
vfnum 1
function:
hw_addr 00:00:00:00:00:00 migratable enable
Bind VF to VFIO driver with migration support:
$ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/unbind
$ echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:08:00.0/driver_override
$ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/bind
Attach VF to the VM.
Start the VM.
Perform LM.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Implement devlink port function commands to enable / disable RoCE.
This is used to control the RoCE device capabilities.
This patch implement infrastructure which will be used by downstream
patches that will add additional capabilities.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Downstream patch requires to get other function GENERAL2 caps while
mlx5_vport_get_other_func_cap() gets only one type of caps (general).
Rename it to represent this and introduce a generic implementation
of mlx5_vport_get_other_func_cap().
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Expose port function commands to enable / disable RoCE, this is used to
control the port RoCE device capabilities.
When RoCE is disabled for a function of the port, function cannot create
any RoCE specific resources (e.g GID table).
It also saves system memory utilization. For example disabling RoCE enable a
VF/SF saves 1 Mbytes of system memory per function.
Example of a PCI VF port which supports function configuration:
Set RoCE of the VF's port function.
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
vfnum 1
function:
hw_addr 00:00:00:00:00:00 roce enable
$ devlink port function set pci/0000:06:00.0/2 roce disable
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
vfnum 1
function:
hw_addr 00:00:00:00:00:00 roce disable
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
devlink port function hw_addr attr documentation is in mlx5 specific
file while there is nothing mlx5 specific about it.
Move it to devlink-port.rst.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In order to avoid partial request processing, validate the request
before processing it.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce IFC related capabilities to enable setting VF to be able to
perform live migration. e.g.: to be migratable.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel says:
====================
bridge: mcast: Preparations for EVPN extensions
This patchset was split from [1] and includes non-functional changes
aimed at making it easier to add additional netlink attributes later on.
Future extensions are available here [2].
The idea behind these patches is to create an MDB configuration
structure into which netlink messages are parsed into. The structure is
then passed in the entry creation / deletion call chain instead of
passing the netlink attributes themselves. The same pattern is used by
other rtnetlink objects such as routes and nexthops.
I initially tried to extend the current code, but it proved to be too
difficult, which is why I decided to refactor it to the extensible and
familiar pattern used by other rtnetlink objects.
Tested using existing selftests and using a new selftest that will be
submitted together with the planned extensions.
[1] https://lore.kernel.org/netdev/20221018120420.561846-1-idosch@nvidia.com/
[2] https://github.com/idosch/linux/commits/submit/mdb_v1
====================
Link: https://lore.kernel.org/r/20221206105809.363767-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The 'group' argument is not modified, so mark it as 'const'. It will
allow us to constify arguments of the callers of this function in future
patches.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Drop the first three arguments and instead extract them from the MDB
configuration structure.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>