IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
We tell driver developers to always pass NAPI_POLL_WEIGHT
as the weight to netif_napi_add(). This may be confusing
to newcomers, drop the weight argument, those who really
need to tweak the weight can use netif_napi_add_weight().
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN
Link: https://lore.kernel.org/r/20220927132753.750069-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is the absolute minimum viable TC implementation to get traffic to
VFs and allow them to be tested; it supports no match fields besides
ingress port, no actions besides mirred and drop, and no stats.
Example usage:
tc filter add dev $PF parent ffff: flower skip_sw \
action mirred egress mirror dev $VFREP
tc filter add dev $VFREP parent ffff: flower skip_sw \
action mirred egress redirect dev $PF
gives a VF unfiltered access to the network out the physical port ($PF
acts here as a physical port representor).
More matches, actions, and counters will be added in subsequent patches.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Different versions of EF100 firmware and FPGA bitstreams support different
matching capabilities in the Match-Action Engine. Probe for these at
start of day; subsequent patches will validate TC offload requests
against the reported capabilities.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nothing inserts into this table yet, but we have code to remove rules
on FLOW_CLS_DESTROY or at driver teardown time, in both cases also
attempting to remove the corresponding hardware rules.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TC offload support will involve complex limitations on what matches and
actions a rule can do, in some cases potentially depending on rules
already offloaded. So add an ethtool private flag "log-tc-errors" which
controls reporting the reasons for un-offloadable TC rules at NETIF_INFO.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bind indirect blocks for recognised tunnel netdevices.
Currently these connect to a stub efx_tc_flower() that only returns
-EOPNOTSUPP; subsequent patches will implement flower offloads to the
Match-Action Engine.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bind direct blocks for the MAE-admin PF and each VF representor.
Currently these connect to a stub efx_tc_flower() that only returns
-EOPNOTSUPP; subsequent patches will implement flower offloads to the
Match-Action Engine.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A previous patch added a wrapper function to take a lock around
efx_mcdi_filter_table_remove(), but only changed EF10 VFs' method table
to call it. Change it in the PF method table too.
Fixes: 77eb40749d73 ("sfc: move table locking into filter_table_{probe,remove} methods")
Signed-off-by: Andy Moreton <andy.moreton@amd.com>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20220922211218.814-1-ecree@xilinx.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As in previous commit for sfc, fix TX channels offset when
efx_siena_separate_tx_channels is false (the default)
Fixes: 25bde571b4a8 ("sfc/siena: fix wrong tx channel offset with efx_separate_tx_channels")
Reported-by: Tianhao Zhao <tizhao@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Link: https://lore.kernel.org/r/20220915141653.15504-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Trying to get the channel from the tx_queue variable here is wrong
because we can only be here if tx_queue is NULL, so we shouldn't
dereference it. As the above comment in the code says, this is very
unlikely to happen, but it's wrong anyway so let's fix it.
I hit this issue because of a different bug that caused tx_queue to be
NULL. If that happens, this is the error message that we get here:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[...]
RIP: 0010:efx_hard_start_xmit+0x153/0x170 [sfc]
Fixes: 12804793b17c ("sfc: decouple TXQ type from label")
Reported-by: Tianhao Zhao <tizhao@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20220914111135.21038-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In legacy interrupt mode the tx_channel_offset was hardcoded to 1, but
that's not correct if efx_sepparate_tx_channels is false. In that case,
the offset is 0 because the tx queues are in the single existing channel
at index 0, together with the rx queue.
Without this fix, as soon as you try to send any traffic, it tries to
get the tx queues from an uninitialized channel getting these errors:
WARNING: CPU: 1 PID: 0 at drivers/net/ethernet/sfc/tx.c:540 efx_hard_start_xmit+0x12e/0x170 [sfc]
[...]
RIP: 0010:efx_hard_start_xmit+0x12e/0x170 [sfc]
[...]
Call Trace:
<IRQ>
dev_hard_start_xmit+0xd7/0x230
sch_direct_xmit+0x9f/0x360
__dev_queue_xmit+0x890/0xa40
[...]
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[...]
RIP: 0010:efx_hard_start_xmit+0x153/0x170 [sfc]
[...]
Call Trace:
<IRQ>
dev_hard_start_xmit+0xd7/0x230
sch_direct_xmit+0x9f/0x360
__dev_queue_xmit+0x890/0xa40
[...]
Fixes: c308dfd1b43e ("sfc: fix wrong tx channel offset with efx_separate_tx_channels")
Reported-by: Tianhao Zhao <tizhao@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20220914103648.16902-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make the device inactive when the system shutdown callback has been
invoked. This is achieved by freezing the driver and disabling the
PCI bus mastering.
Co-developed-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20220906105620.26179-1-pieter.jansen-van-vuuren@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The previous patch add support for PTP over IPv6/UDP (only for 8000
series and newer) and this one add support for PTP over 802.3.
Tested: sync as master and as slave is correct with ptp4l. PTP over IPv4
and IPv6 still works fine.
Suggested-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit bd4a2697e5e2 ("sfc: use hardware tx timestamps for more than
PTP") added support for hardware timestamping on TX for cards of the
8000 series and newer, in an effort to provide support for other
transports other than IPv4/UDP.
However, timestamping was still not working on RX for these other
transports. This patch add support for PTP over IPv6/UDP.
Tested: sync as master and as slave is correct using ptp4l from linuxptp
package, both with IPv4 and IPv6.
Suggested-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for the support of PTP over IPv6/UDP and Ethernet in next
patches, allow a more flexible way of adding and removing RX filters for
PTP. Right now, only 2 filters are allowed, which are the ones needed
for PTP over IPv4/UDP.
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Petr Machata <petrm@nvidia.com> # For drivers/net/ethernet/mellanox/mlxsw
Acked-by: Geoff Levand <geoff@infradead.org> # For ps3_gelic_net and spider_net_ethtool
Acked-by: Tom Lendacky <thomas.lendacky@amd.com> # For drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
Acked-by: Marcin Wojtas <mw@semihalf.com> # For drivers/net/ethernet/marvell/mvpp2
Reviewed-by: Leon Romanovsky <leonro@nvidia.com> # For drivers/net/ethernet/mellanox/mlx{4|5}
Reviewed-by: Shay Agroskin <shayagr@amazon.com> # For drivers/net/ethernet/amazon/ena
Acked-by: Krzysztof Hałasa <khalasa@piap.pl> # For IXP4xx Ethernet
Link: https://lore.kernel.org/r/20220830201457.7984-3-wsa+renesas@sang-engineering.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It's not truly a ring, but the maximum length of the list of queued RX
SKBs is analogous to an RX ring size, so use that API to configure it.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Representors do not want to be subject to the PF's Ethernet address
filters, since traffic from VFs will typically have a destination
either elsewhere on the link segment or on an overlay network.
So, create a dynamic m-port with promiscuous and all-multicast
filters, and set it as the egress port of representor default rules.
Since the m-port is an alias of the calling PF's own m-port, traffic
will still be delivered to the PF's RXQs, but it will be subject to
the VNRX filter rules installed on the dynamic m-port (specified by
the v-port ID field of the filter spec).
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We need to be able to drop the efx->filter_sem in ef100_filter_table_up()
so that we can call functions that insert filters (and thus take that
rwsem for read), which means the efx->type->filter_table_probe method
needs to be responsible for taking the lock in the first place.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Default rules are low-priority switching rules which the hardware uses
in the absence of higher-priority rules. Each representor requires a
corresponding rule matching traffic from its representee VF and
delivering to the PF (where a check on INGRESS_MPORT in
__ef100_rx_packet() will direct it to the representor). No rule is
required in the reverse direction, because representor TX uses a TX
override descriptor to bypass the MAE and deliver directly to the VF.
Since inserting any rule into the MAE disables the firmware's own
default rules, also insert a pair of rules to connect the PF to the
physical network port and vice-versa.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the source m-port of a packet in __ef100_rx_packet() is a VF,
hand off the packet to the corresponding representor with
efx_ef100_rep_rx_packet().
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If not, for now drop them and warn. A subsequent patch will look up
the source m-port to try and find a representor to deliver them to.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Traffic delivered to the (MAE admin) PF could be from either the wire
or a VF. The INGRESS_MPORT field of the RX prefix distinguishes these;
base_mport is the value this field will have for traffic from the wire
(which should be delivered to the PF's netdevice, not a representor).
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Representor RX uses a NAPI context driven by a 'fake interrupt': when
the parent PF receives a packet destined for the representor, it adds
it to an SKB list (efv->rx_list), and schedules NAPI if the 'fake
interrupt' is primed. The NAPI poll then pulls packets off this list
and feeds them to the stack with netif_receive_skb_list().
This scheme allows us to decouple representor RX from the parent PF's
RX fast-path.
This patch implements the 'top half', which builds an SKB, copies data
into it from the RX buffer (which can then be released), adds it to
the queue and fires the 'fake interrupt' if necessary.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds the 'bottom half' napi->poll routine for representor RX.
See the next patch (with the top half) for an explanation of the 'fake
interrupt' scheme used to drive this NAPI context.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Implement .ndo_get_stats64() method to read values out of struct
efx_rep_sw_stats.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sending a PTP packet can imply to use the normal TX driver datapath but
invoked from the driver's ptp worker. The kernel generic TX code
disables softirqs and preemption before calling specific driver TX code,
but the ptp worker does not. Although current ptp driver functionality
does not require it, there are several reasons for doing so:
1) The invoked code is always executed with softirqs disabled for non
PTP packets.
2) Better if a ptp packet transmission is not interrupted by softirq
handling which could lead to high latencies.
3) netdev_xmit_more used by the TX code requires preemption to be
disabled.
Indeed a solution for dealing with kernel preemption state based on static
kernel configuration is not possible since the introduction of dynamic
preemption level configuration at boot time using the static calls
functionality.
Fixes: f79c957a0b537 ("drivers: net: sfc: use netdev_xmit_more helper")
Signed-off-by: Alejandro Lucero <alejandro.lucero-palau@amd.com>
Link: https://lore.kernel.org/r/20220726064504.49613-1-alejandro.lucero-palau@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since representors piggy-back on the PF's queues for TX, they can
only accept new TXes while the PF is up. Thus, any operation which
detaches the PF must first detach all its VFreps.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement .ndo_start_xmit() by calling into the parent PF's TX path.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A non-null efv in __ef100_enqueue_skb() indicates that the packet is
from that representor, should be transmitted with a suitable option
descriptor (to instruct the switch to deliver it to the representee),
and should not be accounted to the parent PF's stats or BQL.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An MAE port, or m-port, is a port (source/destination for traffic) on
the Match-Action Engine (the internal switch on EF100).
Representors will use their representee's m-port for two purposes: as
a destination override on TX from the representor, and as a source
match in 'default rules' to steer representee traffic (when not
matched by e.g. a TC flower rule) to representor RX via the parent
PF's receive queue.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No net_device_ops yet, just a placeholder netdev created per VF.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One PCIe function per network port (more precisely, per m-port group) is
responsible for configuring the Match-Action Engine which performs
switching and packet modification in the slice to support flower/OVS
offload. The GRP_MAE bit in the privilege mask indicates whether a
given function has this capability.
At probe time, call MCDIs to read the calling function's privilege mask,
and store the GRP_MAE bit in a new ef100_nic_data member.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When creating VFs a kernel panic can happen when calling to
efx_ef10_try_update_nic_stats_vf.
When releasing a DMA coherent buffer, sometimes, I don't know in what
specific circumstances, it has to unmap memory with vunmap. It is
disallowed to do that in IRQ context or with BH disabled. Otherwise, we
hit this line in vunmap, causing the crash:
BUG_ON(in_interrupt());
This patch reenables BH to release the buffer.
Log messages when the bug is hit:
kernel BUG at mm/vmalloc.c:2727!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 6 PID: 1462 Comm: NetworkManager Kdump: loaded Tainted: G I --------- --- 5.14.0-119.el9.x86_64 #1
Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS 2.8.2 08/27/2020
RIP: 0010:vunmap+0x2e/0x30
...skip...
Call Trace:
__iommu_dma_free+0x96/0x100
efx_nic_free_buffer+0x2b/0x40 [sfc]
efx_ef10_try_update_nic_stats_vf+0x14a/0x1c0 [sfc]
efx_ef10_update_stats_vf+0x18/0x40 [sfc]
efx_start_all+0x15e/0x1d0 [sfc]
efx_net_open+0x5a/0xe0 [sfc]
__dev_open+0xe7/0x1a0
__dev_change_flags+0x1d7/0x240
dev_change_flags+0x21/0x60
...skip...
Fixes: d778819609a2 ("sfc: DMA the VF stats only when requested")
Reported-by: Ma Yuying <yuma@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20220713092116.21238-1-ihuguet@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Use after free is detected by kfence when disabling sriov. What was read
after being freed was vf->pci_dev: it was freed from pci_disable_sriov
and later read in efx_ef10_sriov_free_vf_vports, called from
efx_ef10_sriov_free_vf_vswitching.
Set the pointer to NULL at release time to not trying to read it later.
Reproducer and dmesg log (note that kfence doesn't detect it every time):
$ echo 1 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs
$ echo 0 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs
BUG: KFENCE: use-after-free read in efx_ef10_sriov_free_vf_vswitching+0x82/0x170 [sfc]
Use-after-free read at 0x00000000ff3c1ba5 (in kfence-#224):
efx_ef10_sriov_free_vf_vswitching+0x82/0x170 [sfc]
efx_ef10_pci_sriov_disable+0x38/0x70 [sfc]
efx_pci_sriov_configure+0x24/0x40 [sfc]
sriov_numvfs_store+0xfe/0x140
kernfs_fop_write_iter+0x11c/0x1b0
new_sync_write+0x11f/0x1b0
vfs_write+0x1eb/0x280
ksys_write+0x5f/0xe0
do_syscall_64+0x5c/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
kfence-#224: 0x00000000edb8ef95-0x00000000671f5ce1, size=2792, cache=kmalloc-4k
allocated by task 6771 on cpu 10 at 3137.860196s:
pci_alloc_dev+0x21/0x60
pci_iov_add_virtfn+0x2a2/0x320
sriov_enable+0x212/0x3e0
efx_ef10_sriov_configure+0x67/0x80 [sfc]
efx_pci_sriov_configure+0x24/0x40 [sfc]
sriov_numvfs_store+0xba/0x140
kernfs_fop_write_iter+0x11c/0x1b0
new_sync_write+0x11f/0x1b0
vfs_write+0x1eb/0x280
ksys_write+0x5f/0xe0
do_syscall_64+0x5c/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
freed by task 6771 on cpu 12 at 3170.991309s:
device_release+0x34/0x90
kobject_cleanup+0x3a/0x130
pci_iov_remove_virtfn+0xd9/0x120
sriov_disable+0x30/0xe0
efx_ef10_pci_sriov_disable+0x57/0x70 [sfc]
efx_pci_sriov_configure+0x24/0x40 [sfc]
sriov_numvfs_store+0xfe/0x140
kernfs_fop_write_iter+0x11c/0x1b0
new_sync_write+0x11f/0x1b0
vfs_write+0x1eb/0x280
ksys_write+0x5f/0xe0
do_syscall_64+0x5c/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: 3c5eb87605e85 ("sfc: create vports for VFs and assign random MAC addresses")
Reported-by: Yanghang Liu <yanghliu@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20220712062642.6915-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The netdev probe will be used when moving from vDPA to EF100 BAR config.
The netdev remove will be used when moving from EF100 to vDPA BAR config.
In the process, change several log messages to pci_ instead of netif_
to remove the "(unregistered net_device)" text.
Signed-off-by: Jonathan Cooper <jonathan.s.cooper@amd.com>
Co-developed-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minor fix to existing code to make later patch checkpatch clean.
Signed-off-by: Jonathan Cooper <jonathan.s.cooper@amd.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>