IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Enhance the parsing of offloaded TC rules matches to handle vlans.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Parse TC vlan actions and set the required elements to allow offloading.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many virtualization systems use a policy under which a vlan tag is
pushed to packets sent by guests, and popped before the packet is
forwarded to the VM.
The current generation of the mlx5 HW doesn't fully support that on
a per flow level. As such, we are addressing the above common use
case with the SRIOV e-Switch abilities to push vlan into packets
sent by VFs and pop vlan from packets forwarded to VFs.
The HW can match on the correct vlan being present in packets
forwarded to VFs (eSwitch steering is done before stripping
the tag), so this part is offloaded as is.
A common practice for vlans is to avoid both push vlan and pop vlan
for inter-host VM/VM (east-west) communication because in this case,
push on egress cancels out with pop on ingress.
For supporting that, we use a global eswitch vlan pop policy, hence
allowing guest A to communicate with both remote VM B and local VM C.
This works since the HW pops the vlan only if it exists (e.g for
C --> A packets but not for B --> A packets).
On the slow path, when a VF vport has an offloaded flow which involves
pushing vlans, wheres another flow is not currently offloaded, the
packets from the 2nd flow seen by the VF representor on the host have
vlan. The VF rep driver removes such vlan before calling into the host
networking stack.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Factor the relevant code into a static inline helper (skb_from_cqe)
doing that.
Move the call to napi_gro_receive to be carried out just
after mlx5e_complete_rx_cqe returns.
Both changes are to be used for the VF representor as well
in the next commit.
This patch doesn't change any functionality.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Put the representors related to the source and dest vports and the
action in struct mlx5_esw_flow_attr which is used while setting the FDB rule.
This patch doesn't change any functionality.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The HW can be programmed to push vlan, pop vlan or both.
A factorization step towards using the push/pop capabilties in the
eswitch offloads mode. This patch doesn't add new functionality.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The structure we use for the eswitch vport representor (mlx5_eswitch_rep)
has some fields which are set from upper layers in the driver when they
register the rep. Use explicit setting on registration time for them and
avoid global memcpy. This patch doesn't add new functionality.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the vport value in the PF entry to be that of the uplink so
we can use it blindly over the tc / eswitch offload code without
translating it each time we deal with the uplink representor.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a NULL check before calling asynchronous MCDI completion functions
during device removal.
Fixes: 7014d7f6 ("sfc: allow asynchronous MCDI without completion function")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enabling SRIOV while the ixgbevf driver is loaded will result in all
mailbox requests from ixgbevf_open() being rejected by ixgbe because
adapter->clear_to_send is set to false on reset.
Call ixgbe_sriov_reinit() before pci_enable_sriov() to make sure that
mailbox requests are handled from the time ixgbevf is loaded.
Reported-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Instead of limiting the VFs if we don't use 4 queues for RSS in the PF we
can instead just limit the RSS queues used to a power of 2. By doing this
we can support use cases where VFs are using more queues than the PF is
currently using and can support RSS if so desired.
The only limitation on this is that we cannot support 3 queues of RSS in
the PF or VF. In either of these cases we should fall back to 2 queues in
order to be able to use the power of 2 masking provided by the psrtype
register.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The hardware redirection table can support more queues then the PF
currently has when SR-IOV is enabled. In order to account for this use the
RSS mask to trim of the bits that are not used.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The maximum queue count reported was 1, however support for multiple queues
with SR-IOV was added some time ago so we should report support for it to
the user so that they can select multiple queues if they so desire.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The value MDIO_PRTAD_NONE should be used to indicate no PHY address.
Not 0, not 0xFFFF. Use the MDIO_PRTAD_NONE value consistently.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Protect set_rlpml with mailbox lock to make sure the MTU configuration
is handled properly.
This change resolves an issue where set_rlpml can fail when the VF
interface is brought up:
ixgbevf 0000:03:1d.6: Failed to set MTU at 1500
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
All the MACs supported by ixgbe support pause frames, so indicate
that support in ethtool. Also set advertising according to requested
mode.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Set the read_reg_mdi and write_reg_mdi method pointers for
X550EM_A_10G_T devices to resolve jumping to NULL.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
These functions are only used in ixgbe_x550.c.
Fixes a warning when compiling with -Wmissing-prototypes
Reported-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Simplify the logic for setting VLNCTRL.VFE by checking the VMDQ flag
and 82598 MAC instead of having to maintain a list of MAC types.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The i40e_shutdown_adminq function never returns failure. There is no need to
check the non-0 return value. Clean up the unnecessary error checking and
warning against it.
Change-ID: Ibb616f09cfb93bd1a872ebf3241a15fb8354b31b
Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The i40e driver was incorrectly assuming that we would always be pulling
no more than 1 descriptor from each fragment. It is in fact possible for
us to end up with the case where 2 descriptors worth of data may be pulled
when a frame is larger than one of the pieces generated when aligning the
payload to either 4K or pieces smaller than 16K.
To adjust for this we just need to make certain to test all the way to the
end of the fragments as it is possible for us to span 2 descriptors in the
block before us so we need to guarantee that even the last 6 descriptors
have enough data to fill a full frame.
Change-ID: Ic2ecb4d6b745f447d334e66c14002152f50e2f99
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Function i40evf_up_complete() always returns success. Changed this to a
void type and removed the code that checks the return status and prints
an error message.
Change-ID: I8c400f174786b9c855f679e470f35af292fb50ad
Signed-off-by: Bimmy Pujari <bimmy.pujari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently disabling the link state from PF via
ip link set enp5s0f0 vf 0 state disable
doesn't disable the CARRIER on the VF.
This patch updates the carrier and starts/stops the tx queues based on the
link state notification from PF.
PF: enp5s0f0, VF: enp5s2
#modprobe i40e
#echo 2 > /sys/class/net/enp5s0f0/device/sriov_numvfs
#ip link set enp5s2 up
#ip -d link show enp5s2
175: enp5s2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether ea:4d:60:bc:6f:85 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64
#ip link set enp5s0f0 vf 0 state disable
#ip -d link show enp5s0f0
171: enp5s0f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 68:05:ca:2e:72:68 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64 numtxqueues 72 numrxqueues 72 portid 6805ca2e7268
vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state disable, trust off
vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
#ip -d link show enp5s2
175: enp5s2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
link/ether ea:4d:60:bc:6f:85 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64 numtxqueues 16 numrxqueues 16
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There is a sanitcy check for desc being null in the first line of
function i40evf_debug_aq. However, before that, aq_desc is cast from
desc, and aq_desc is being dereferenced on the assignment of len, so
this could be a potential null pointer deference. Fix this by moving
the initialization of len to the code block where len is being used
and hence at this point we know it is OK to dereference aq_desc.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes an issue where we were byte swapping the port
parameter, then byte swapping it again in function execution.
Obviously, that's unnecessary, so take it out of the function calls.
Without this patch, the udp based tunnel configuration would
not be correct.
Change-ID: I788d83c5bd5732170f1a81dbfa0b1ac3ca8ea5b7
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes an issue in the virt channel code, where a return
from i40e_find_vsi_from_id was not checked for NULL when applicable.
Without this patch, there is a risk for panic and static analysis
tools complain. This patch fixes the problem by adding the check
and adding an additional input check for similar reasons.
Change-ID: I7e9be88eb7a3addb50eadc451c8336d9e06f5394
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This conditional is backward, so the driver responds back to the VF with
the wrong opcode. Do the old switcheroo to fix this.
Change-ID: I384035b0fef8a3881c176de4b4672009b3400b25
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When using the debugfs to issue the "dump port" command
with NPAR enabled, the firmware reports back with invalid argument.
The issue occurs because the pf->mac_seid was used to perform the query.
This is fine when NPAR is disabled because the switch ID == pf->mac_seid,
however this is not the case when NPAR is enabled. This fix instead
goes through the VSI to determine the correct ID to use in either case.
Change-ID: I0cd67913a7f2c4a2962e06d39e32e7447cc55b6a
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Previously, when using ethtool to change the RSS hash key, ethtool would
report back saying the old key was still being used and no error was
reported. It was unclear whether it was being reported incorrectly or
being set incorrectly. Debugging revealed 'i40e_set_rxfh()' returned
zero immediately instead of setting the key because a user defined
indirection table is not supplied when changing the hash key.
This fix instead changes it such that if an indirection table is not
supplied, then a default one is created and the hash key is now
correctly set.
Change-ID: Iddb621897ecf208650272b7ee46702cad7b69a71
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Potential dangerous invalid pointer might be accessed if
the error happens when couple phy_device to net_device so
cleanup the error path.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) use new api [get|set]_link_ksettings instead
of [get|set]_settings old ones.
2) dev->phydev is sure being ready before calling
these callbacks, so removing all the sanity check
if it is existing.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
remove the unused variable for parsing PHY address
and the related logic for sanity test which would
be all already handled done when of_mdiobus_register
was called
Reported-by: Nelson Chang <nelson.chang@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
reuse phydev already in struct net_device instead of creating
another new one in private structure.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing dynamically source clock, TX/RX delay and interface mode
used by TRGMII hardware module inside PHY capability polling routine
for adapting to the various speed of RGMII used by external PHY for
GMAC0.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
adds PHY-mode "trgmii" as an extension for the operation
mode of the PHY interface for PHY_INTERFACE_MODE_TRGMII.
and adds a variable trgmii inside mtk_mac as the indication
to make the difference between the MAC connected to internal
switch or connected to external PHY by the given configuration
on the board and then to perform the corresponding setup on
TRGMII hardware module.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PHYSTS_CHG (the ftgmac100's PHY IRQ) is telling the system to go
look at the PHY registers for a link status change.
The interrupt was causing issues on Aspeed SoC where some board designs
had an active high configuration, some active low, and in some cases
repurposed for other functions. When misconfigured Linux would chew 100%
of CPU cycles servicing interrupts:
[ 20.280000] ftgmac100 1e660000.ethernet eth0: [ISR] = 0x200: PHYSTS_CHG
[ 20.280000] ftgmac100 1e660000.ethernet eth0: [ISR] = 0x200: PHYSTS_CHG
[ 20.280000] ftgmac100 1e660000.ethernet eth0: [ISR] = 0x200: PHYSTS_CHG
[ 20.300000] ftgmac100 1e660000.ethernet eth0: [ISR] = 0x200: PHYSTS_CHG
While in the ftgmac100 IP can be configured for high, low and edge
sensitivity the current driver always polls the PHY, so we chose to mask
out the interrupt.
See https://patchwork.ozlabs.org/patch/672099/ for more discussion.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Aspeed SoCs have a new MDIO interface as an option in the G4 and G5
SoCs. The old one is still available, so select it in order to remain
compatible with the ftgmac100 driver.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is stale interrupt (PHYSTS_CHG in ISR, bit#6 in 0x0) from
the bootloader (uboot) when enabling the MAC. The stale interrupts
aren't part of kernel and should be cleared.
This clears the stale interrupts in ISR (0x0) when enabling the MAC.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RXDES and TXDES registers bits in the ftgmac100 indicates EDO{R,T}R
at bit position 15 for the Faraday Tech IP. However, the version of this
IP present in the Aspeed SoCs has these bits at position 30 in the
registers.
It appers that ast2400 SoCs support both positions, with the 15th bit
marked as reserved but still functional. In the ast2500 this bit is
reused for another function, so we need a work around.
This was confirmed with engineers from Aspeed that using bit 30 is
correct for both the ast2400 and ast2500 SoCs.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
These bits are #defined at a fixed location. In order to support future
hardware that has chosen to move these bits around move the bits into a
member of the struct ftgmac100.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ftgmac100 hardware revision in e.g. the Aspeed AST2500 no longer
reserves all bits in RXDES#2 but instead uses the bottom 16 bits to
store MAC frame metadata. Avoid corruption by shifting struct page
pointers out to their own member in struct ftgmac100.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove an open coded simple_open() function and replace file
operations references to the function with simple_open()
instead.
Generated by: scripts/coccinelle/api/simple_open.cocci
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously we rang XDP SQ doorbell on every forwarded XDP packet.
Here we introduce a xmit more like mechanism that will queue up more
than one packet into SQ (up to RX napi budget) w/o notifying the hardware.
Once RX napi budget is consumed and we exit napi RX loop, we will
flush (doorbell) all XDP looped packets in case there are such.
XDP forward packet rate:
Comparing XDP with and w/o xmit more (bulk transmit):
RX Cores XDP TX XDP TX (xmit more)
---------------------------------------------------
1 6.5Mpps 12.4Mpps
2 13.2Mpps 24.2Mpps
4 25.2Mpps 36.3Mpps*
8 36.3Mpps* 36.3Mpps*
*My xmitter was limited to 36.3Mpps, so it is the bottleneck.
It seems that receive side can handle more.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding support for XDP_TX forwarding from xdp program.
Using XDP, now user can loop packets out of the same port.
We create a dedicated TX SQ for each channel that will serve
XDP programs that return XDP_TX action to loop packets back to
the wire directly from the channel RQ RX path.
For that RX pages will now need to be mapped bi-directionally,
and on XDP_TX action we will sync the page back to device then
queue it into SQ for transmission. The XDP xmit frame function will
report back to the RX path if the page was consumed (transmitted), if so,
RX path will forget about that page as if it were released to the stack.
Later on, on XDP TX completion, the page will be released back to the
page cache.
For simplicity this patch will hit a doorbell on every XDP TX packet.
Next patch will introduce a xmit more like mechanism that will
queue up more than one packet into SQ w/o notifying the hardware,
once RX napi loop is done we will hit doorbell once for all XDP TX
packets form the previous loop. This should drastically improve
XDP TX performance.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make a clear separate between Regular SQ (TXQ) and ICO SQ creation,
destruction and union their mutual information structures.
Don't allocate redundant TXQ skb/wqe_info/dma_fifo arrays for ICO SQ.
And have a different SQ edge for ICO SQ than TXQ SQ, to be more
accurate.
In preparation for XDP TX support.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the BPF_PROG_TYPE_PHYS_DEV hook in mlx5e driver.
When XDP is on we make sure to change channels RQs type to
MLX5_WQ_TYPE_LINKED_LIST rather than "striding RQ" type to
ensure "page per packet".
On XDP set, we fail if HW LRO is set and request from user to turn it
off. Since on ConnectX4-LX HW LRO is always on by default, this will be
annoying, but we prefer not to enforce LRO off from XDP set function.
Full channels reset (close/open) is required only when setting XDP
on/off.
When XDP set is called just to exchange programs, we will update
each RQ xdp program on the fly and for synchronization with current
data path RX activity of that RQ, we temporally disable that RQ and
ensure RX path is not running, quickly update and re-enable that RQ,
for that we do:
- rq.state = disabled
- napi_synnchronize
- xchg(rq->xdp_prg)
- rq.state = enabled
- napi_schedule // Just in case we've missed an IRQ
Packet rate performance testing was done with pktgen 64B packets and on
TX side and, TC drop action on RX side compared to XDP fast drop.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Comparison is done between:
1. Baseline, Before this patch with TC drop action
2. This patch with TC drop action
3. This patch with XDP RX fast drop
RX Cores Baseline(TC drop) TC drop XDP fast Drop
--------------------------------------------------------------
1 5.3Mpps 5.3Mpps 16.5Mpps
2 10.2Mpps 10.2Mpps 31.3Mpps
4 20.5Mpps 19.9Mpps 36.3Mpps*
*My xmitter was limited to 36.3Mpps, so it is the bottleneck.
It seems that receive side can handle more.
Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add two helper functions to allow dynamic changes of RQ type.
mlx5e_set_rq_priv_params and mlx5e_set_rq_type_params will be
used on netdev creation to determine the default RQ type.
This will be needed later for downstream patches of XDP support.
When enabling XDP we will dynamically move from striding RQ to
linked list RQ type.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before this patch LRO size was 64K, now with build_skb requires
extra room, headroom + sizeof(skb_shared_info) added to the data
buffer will make wqe size or page_frag_size slightly larger than
64K which will demand order 5 page instead of order 4 in 4K page systems.
We take those extra bytes from hardware LRO data size in order to not
increase the required page order for when hardware LRO is enabled.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have two types of RX RQs, and they use two separate sets of
info arrays and structures in RX data path function. Today those
structures are mutually exclusive per RQ type, hence one kind is
allocated on RQ creation according to the RQ type.
For better cache locality and to minimalize the
sizeof(struct mlx5e_rq), in this patch we define them as a union.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>