63 Commits

Author SHA1 Message Date
Jakub Kicinski
de42873367 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY+bZrwAKCRDbK58LschI
 gzi4AP4+TYo0jnSwwkrOoN9l4f5VO9X8osmj3CXfHBv7BGWVxAD/WnvA3TDZyaUd
 agIZTkRs6BHF9He8oROypARZxTeMLwM=
 =nO1C
 -----END PGP SIGNATURE-----

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-02-11

We've added 96 non-merge commits during the last 14 day(s) which contain
a total of 152 files changed, 4884 insertions(+), 962 deletions(-).

There is a minor conflict in drivers/net/ethernet/intel/ice/ice_main.c
between commit 5b246e533d01 ("ice: split probe into smaller functions")
from the net-next tree and commit 66c0e13ad236 ("drivers: net: turn on
XDP features") from the bpf-next tree. Remove the hunk given ice_cfg_netdev()
is otherwise there a 2nd time, and add XDP features to the existing
ice_cfg_netdev() one:

        [...]
        ice_set_netdev_features(netdev);
        netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
                               NETDEV_XDP_ACT_XSK_ZEROCOPY;
        ice_set_ops(netdev);
        [...]

Stephen's merge conflict mail:
https://lore.kernel.org/bpf/20230207101951.21a114fa@canb.auug.org.au/

The main changes are:

1) Add support for BPF trampoline on s390x which finally allows to remove many
   test cases from the BPF CI's DENYLIST.s390x, from Ilya Leoshkevich.

2) Add multi-buffer XDP support to ice driver, from Maciej Fijalkowski.

3) Add capability to export the XDP features supported by the NIC.
   Along with that, add a XDP compliance test tool,
   from Lorenzo Bianconi & Marek Majtyka.

4) Add __bpf_kfunc tag for marking kernel functions as kfuncs,
   from David Vernet.

5) Add a deep dive documentation about the verifier's register
   liveness tracking algorithm, from Eduard Zingerman.

6) Fix and follow-up cleanups for resolve_btfids to be compiled
   as a host program to avoid cross compile issues,
   from Jiri Olsa & Ian Rogers.

7) Batch of fixes to the BPF selftest for xdp_hw_metadata which resulted
   when testing on different NICs, from Jesper Dangaard Brouer.

8) Fix libbpf to better detect kernel version code on Debian, from Hao Xiang.

9) Extend libbpf to add an option for when the perf buffer should
   wake up, from Jon Doron.

10) Follow-up fix on xdp_metadata selftest to just consume on TX
    completion, from Stanislav Fomichev.

11) Extend the kfuncs.rst document with description on kfunc
    lifecycle & stability expectations, from David Vernet.

12) Fix bpftool prog profile to skip attaching to offline CPUs,
    from Tonghao Zhang.

====================

Link: https://lore.kernel.org/r/20230211002037.8489-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 17:51:27 -08:00
Haiyang Zhang
18a048370b net: mana: Fix accessing freed irq affinity_hint
After calling irq_set_affinity_and_hint(), the cpumask pointer is
saved in desc->affinity_hint, and will be used later when reading
/proc/irq/<num>/affinity_hint. So the cpumask variable needs to be
persistent. Otherwise, we are accessing freed memory when reading
the affinity_hint file.

Also, need to clear affinity_hint before free_irq(), otherwise there
is a one-time warning and stack trace during module unloading:

 [  243.948687] WARNING: CPU: 10 PID: 1589 at kernel/irq/manage.c:1913 free_irq+0x318/0x360
 ...
 [  243.948753] Call Trace:
 [  243.948754]  <TASK>
 [  243.948760]  mana_gd_remove_irqs+0x78/0xc0 [mana]
 [  243.948767]  mana_gd_remove+0x3e/0x80 [mana]
 [  243.948773]  pci_device_remove+0x3d/0xb0
 [  243.948778]  device_remove+0x46/0x70
 [  243.948782]  device_release_driver_internal+0x1fe/0x280
 [  243.948785]  driver_detach+0x4e/0xa0
 [  243.948787]  bus_remove_driver+0x70/0xf0
 [  243.948789]  driver_unregister+0x35/0x60
 [  243.948792]  pci_unregister_driver+0x44/0x90
 [  243.948794]  mana_driver_exit+0x14/0x3fe [mana]
 [  243.948800]  __do_sys_delete_module.constprop.0+0x185/0x2f0

To fix the bug, use the persistent mask, cpumask_of(cpu#), and set
affinity_hint to NULL before freeing the IRQ, as required by free_irq().

Cc: stable@vger.kernel.org
Fixes: 71fa6887eeca ("net: mana: Assign interrupts to CPUs based on NUMA nodes")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/1675718929-19565-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07 21:30:22 -08:00
Marek Majtyka
66c0e13ad2 drivers: net: turn on XDP features
A summary of the flags being set for various drivers is given below.
Note that XDP_F_REDIRECT_TARGET and XDP_F_FRAG_TARGET are features
that can be turned off and on at runtime. This means that these flags
may be set and unset under RTNL lock protection by the driver. Hence,
READ_ONCE must be used by code loading the flag value.

Also, these flags are not used for synchronization against the availability
of XDP resources on a device. It is merely a hint, and hence the read
may race with the actual teardown of XDP resources on the device. This
may change in the future, e.g. operations taking a reference on the XDP
resources of the driver, and in turn inhibiting turning off this flag.
However, for now, it can only be used as a hint to check whether device
supports becoming a redirection target.

Turn 'hw-offload' feature flag on for:
 - netronome (nfp)
 - netdevsim.

Turn 'native' and 'zerocopy' features flags on for:
 - intel (i40e, ice, ixgbe, igc)
 - mellanox (mlx5).
 - stmmac
 - netronome (nfp)

Turn 'native' features flags on for:
 - amazon (ena)
 - broadcom (bnxt)
 - freescale (dpaa, dpaa2, enetc)
 - funeth
 - intel (igb)
 - marvell (mvneta, mvpp2, octeontx2)
 - mellanox (mlx4)
 - mtk_eth_soc
 - qlogic (qede)
 - sfc
 - socionext (netsec)
 - ti (cpsw)
 - tap
 - tsnep
 - veth
 - xen
 - virtio_net.

Turn 'basic' (tx, pass, aborted and drop) features flags on for:
 - netronome (nfp)
 - cavium (thunder)
 - hyperv.

Turn 'redirect_target' feature flag on for:
 - amanzon (ena)
 - broadcom (bnxt)
 - freescale (dpaa, dpaa2)
 - intel (i40e, ice, igb, ixgbe)
 - ti (cpsw)
 - marvell (mvneta, mvpp2)
 - sfc
 - socionext (netsec)
 - qlogic (qede)
 - mellanox (mlx5)
 - tap
 - veth
 - virtio_net
 - xen

Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Marek Majtyka <alardam@gmail.com>
Link: https://lore.kernel.org/r/3eca9fafb308462f7edb1f58e451d59209aa07eb.1675245258.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-02 20:48:23 -08:00
Haiyang Zhang
20e3028c39 net: mana: Fix IRQ name - add PCI and queue number
The PCI and queue number info is missing in IRQ names.

Add PCI and queue number to IRQ names, to allow CPU affinity
tuning scripts to work.

Cc: stable@vger.kernel.org
Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Link: https://lore.kernel.org/r/1674161950-19708-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-20 18:17:17 -08:00
Linus Torvalds
ab425febda v6.2 merge window pull request
Usual size of updates, a new driver a most of the bulk focusing on rxe:
 
 - Usual typos, style, and language updates
 
 - Driver updates for mlx5, irdma, siw, rts, srp, hfi1, hns, erdma, mlx4, srp
 
 - Lots of RXE updates
   * Improve reply error handling for bad MR operations
   * Code tidying
   * Debug printing uses common loggers
   * Remove half implemented RD related stuff
   * Support IBA's recently defined Atomic Write and Flush operations
 
 - erdma support for atomic operations
 
 - New driver "mana" for Ethernet HW available in Azure VMs. This driver
   only supports DPDK
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY5eIggAKCRCFwuHvBreF
 YeX7AP9+l5Y9J48OmK7y/YgADNo9g05agXp3E8EuUDmBU+PREgEAigdWaJVf2oea
 IctVja0ApLW5W+wsFt8Qh+V4PMiYTAM=
 =Q5V+
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Usual size of updates, a new driver, and most of the bulk focusing on
  rxe:

   - Usual typos, style, and language updates

   - Driver updates for mlx5, irdma, siw, rts, srp, hfi1, hns, erdma,
     mlx4, srp

   - Lots of RXE updates:
      * Improve reply error handling for bad MR operations
      * Code tidying
      * Debug printing uses common loggers
      * Remove half implemented RD related stuff
      * Support IBA's recently defined Atomic Write and Flush operations

   - erdma support for atomic operations

   - New driver 'mana' for Ethernet HW available in Azure VMs. This
     driver only supports DPDK"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (122 commits)
  IB/IPoIB: Fix queue count inconsistency for PKEY child interfaces
  RDMA: Add missed netdev_put() for the netdevice_tracker
  RDMA/rxe: Enable RDMA FLUSH capability for rxe device
  RDMA/cm: Make QP FLUSHABLE for supported device
  RDMA/rxe: Implement flush completion
  RDMA/rxe: Implement flush execution in responder side
  RDMA/rxe: Implement RC RDMA FLUSH service in requester side
  RDMA/rxe: Extend rxe packet format to support flush
  RDMA/rxe: Allow registering persistent flag for pmem MR only
  RDMA/rxe: Extend rxe user ABI to support flush
  RDMA: Extend RDMA kernel verbs ABI to support flush
  RDMA: Extend RDMA user ABI to support flush
  RDMA/rxe: Fix incorrect responder length checking
  RDMA/rxe: Fix oops with zero length reads
  RDMA/mlx5: Remove not-used IB_FLOW_SPEC_IB define
  RDMA/hns: Fix XRC caps on HIP08
  RDMA/hns: Fix error code of CMD
  RDMA/hns: Fix page size cap from firmware
  RDMA/hns: Fix PBL page MTR find
  RDMA/hns: Fix AH attr queried by query_qp
  ...
2022-12-14 09:27:13 -08:00
Jakub Kicinski
837e8ac871 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 18:19:59 -08:00
Haiyang Zhang
18010ff776 net: mana: Fix race on per-CQ variable napi work_done
After calling napi_complete_done(), the NAPIF_STATE_SCHED bit may be
cleared, and another CPU can start napi thread and access per-CQ variable,
cq->work_done. If the other thread (for example, from busy_poll) sets
it to a value >= budget, this thread will continue to run when it should
stop, and cause memory corruption and panic.

To fix this issue, save the per-CQ work_done variable in a local variable
before napi_complete_done(), so it won't be corrupted by a possible
concurrent thread after napi_complete_done().

Also, add a flag bit to advertise to the NIC firmware: the NAPI work_done
variable race is fixed, so the driver is able to reliably support features
like busy_poll.

Cc: stable@vger.kernel.org
Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/1670010190-28595-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-06 11:21:34 +01:00
Leon Romanovsky
3574cfdca2 RDMA/mana: Remove redefinition of basic u64 type
gdma_obj_handle_t is no more than redefinition of basic
u64 type. Remove such obfuscation.

Link: https://lore.kernel.org/r/3c1e821279e6a165d058655d2343722d6650e776.1668160486.git.leonro@nvidia.com
Acked-by: Long Li <longli@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-14 10:16:46 +02:00
Jakub Kicinski
79b0872b10 Merge branch 'mana-shared-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Long Li says:

====================
Introduce Microsoft Azure Network Adapter (MANA) RDMA driver [netdev prep]

The first 11 patches which modify the MANA Ethernet driver to support
RDMA driver.

* 'mana-shared-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  net: mana: Define data structures for protection domain and memory registration
  net: mana: Define data structures for allocating doorbell page from GDMA
  net: mana: Define and process GDMA response code GDMA_STATUS_MORE_ENTRIES
  net: mana: Define max values for SGL entries
  net: mana: Move header files to a common location
  net: mana: Record port number in netdev
  net: mana: Export Work Queue functions for use by RDMA driver
  net: mana: Set the DMA device max segment size
  net: mana: Handle vport sharing between devices
  net: mana: Record the physical address for doorbell page region
  net: mana: Add support for auxiliary device
====================

Link: https://lore.kernel.org/all/1667502990-2559-1-git-send-email-longli@linuxonhyperv.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10 12:07:19 -08:00
Nathan Huckleberry
0c9ef08a4d net: mana: Fix return type of mana_start_xmit()
The ndo_start_xmit field in net_device_ops is expected to be of type
netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev).

The mismatched return type breaks forward edge kCFI since the underlying
function definition does not match the function hook definition. A new
warning in clang will catch this at compile time:

  drivers/net/ethernet/microsoft/mana/mana_en.c:382:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit         = mana_start_xmit,
                                    ^~~~~~~~~~~~~~~
  1 error generated.

The return type of mana_start_xmit should be changed from int to
netdev_tx_t.

Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1703
Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
[nathan: Rebase on net-next and resolve conflicts
         Add note about new clang warning]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20221109002629.1446680-1-nathan@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-10 12:28:30 +01:00
Ajay Sharma
28c66cfa45 net: mana: Define data structures for protection domain and memory registration
The MANA hardware support protection domain and memory registration for use
in RDMA environment. Add those definitions and expose them for use by the
RDMA driver.

Signed-off-by: Ajay Sharma <sharmaajay@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-12-git-send-email-longli@linuxonhyperv.com
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10 07:57:27 +02:00
Ajay Sharma
de372f2a9c net: mana: Define and process GDMA response code GDMA_STATUS_MORE_ENTRIES
When doing memory registration, the PF may respond with
GDMA_STATUS_MORE_ENTRIES to indicate a follow request is needed. This is
not an error and should be processed as expected.

Signed-off-by: Ajay Sharma <sharmaajay@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-10-git-send-email-longli@linuxonhyperv.com
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10 07:57:27 +02:00
Long Li
aa56549792 net: mana: Define max values for SGL entries
The number of maximum SGl entries should be computed from the maximum
WQE size for the intended queue type and the corresponding OOB data
size. This guarantees the hardware queue can successfully queue requests
up to the queue depth exposed to the upper layer.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-9-git-send-email-longli@linuxonhyperv.com
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10 07:57:27 +02:00
Long Li
fd325cd648 net: mana: Move header files to a common location
In preparation to add MANA RDMA driver, move all the required header files
to a common location for use by both Ethernet and RDMA drivers.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-8-git-send-email-longli@linuxonhyperv.com
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10 07:57:26 +02:00
Long Li
d44089e555 net: mana: Record port number in netdev
The port number is useful for user-mode application to identify this
net device based on port index. Set to the correct value in ndev.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-7-git-send-email-longli@linuxonhyperv.com
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10 07:57:26 +02:00
Long Li
4c0ff7a106 net: mana: Export Work Queue functions for use by RDMA driver
RDMA device may need to create Ethernet device queues for use by Queue
Pair type RAW. This allows a user-mode context accesses Ethernet hardware
queues. Export the supporting functions for use by the RDMA driver.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-6-git-send-email-longli@linuxonhyperv.com
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10 07:57:26 +02:00
Ajay Sharma
6fe254160b net: mana: Set the DMA device max segment size
MANA hardware doesn't have any restrictions on the DMA segment size, set it
to the max allowed value.

Signed-off-by: Ajay Sharma <sharmaajay@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-5-git-send-email-longli@linuxonhyperv.com
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10 07:57:26 +02:00
Long Li
b5c1c9855b net: mana: Handle vport sharing between devices
For outgoing packets, the PF requires the VF to configure the vport with
corresponding protection domain and doorbell ID for the kernel or user
context. The vport can't be shared between different contexts.

Implement the logic to exclusively take over the vport by either the
Ethernet device or RDMA device.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-4-git-send-email-longli@linuxonhyperv.com
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10 07:57:26 +02:00
Long Li
f3dc096246 net: mana: Record the physical address for doorbell page region
For supporting RDMA device with multiple user contexts with their
individual doorbell pages, record the start address of doorbell page
region for use by the RDMA driver to allocate user context doorbell IDs.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-3-git-send-email-longli@linuxonhyperv.com
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10 07:57:26 +02:00
Long Li
a69839d432 net: mana: Add support for auxiliary device
In preparation for supporting MANA RDMA driver, add support for auxiliary
device in the Ethernet driver. The RDMA device is modeled as an auxiliary
device to the Ethernet device.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-2-git-send-email-longli@linuxonhyperv.com
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10 07:57:26 +02:00
Saurabh Sengar
71fa6887ee net: mana: Assign interrupts to CPUs based on NUMA nodes
In large VMs with multiple NUMA nodes, network performance is usually
best if network interrupts are all assigned to the same virtual NUMA
node. This patch assigns online CPU according to a numa aware policy,
local cpus are returned first, followed by non-local ones, then it wraps
around.

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/1667282761-11547-1-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-03 12:21:41 +01:00
Thomas Gleixner
068c38ad88 net: Remove the obsolte u64_stats_fetch_*_irq() users (drivers).
Now that the 32bit UP oddity is gone and 32bit uses always a sequence
count, there is no need for the fetch_irq() variants anymore.

Convert to the regular interface.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28 20:13:54 -07:00
Linus Torvalds
504c25cb76 Including fixes from wifi, netfilter and can.
A handful of awaited fixes here - revert of the FEC changes,
 bluetooth fix, fixes for iwlwifi spew.
 
 We added a warning in PHY/MDIO code which is triggering on
 a couple of platforms in a false-positive-ish way. If we can't
 iron that out over the week we'll drop it and re-add for 6.1.
 
 I've added a new "follow up fixes" section for fixes to fixes
 in 6.0-rcs but it may actually give the false impression that
 those are problematic or that more testing time would have
 caught them. So likely a one time thing.
 
 Follow up fixes:
 
  - nf_tables_addchain: fix nft_counters_enabled underflow
 
  - ebtables: fix memory leak when blob is malformed
 
  - nf_ct_ftp: fix deadlock when nat rewrite is needed
 
 Current release - regressions:
 
  - Revert "fec: Restart PPS after link state change"
  - Revert "net: fec: Use a spinlock to guard `fep->ptp_clk_on`"
 
  - Bluetooth: fix HCIGETDEVINFO regression
 
  - wifi: mt76: fix 5 GHz connection regression on mt76x0/mt76x2
 
  - mptcp: fix fwd memory accounting on coalesce
 
  - rwlock removal fall out:
    - ipmr: always call ip{,6}_mr_forward() from RCU read-side
      critical section
    - ipv6: fix crash when IPv6 is administratively disabled
 
  - tcp: read multiple skbs in tcp_read_skb()
 
  - mdio_bus_phy_resume state warning fallout:
    - eth: ravb: fix PHY state warning splat during system resume
    - eth: sh_eth: fix PHY state warning splat during system resume
 
 Current release - new code bugs:
 
  - wifi: iwlwifi: don't spam logs with NSS>2 messages
 
  - eth: mtk_eth_soc: enable XDP support just for MT7986 SoC
 
 Previous releases - regressions:
 
  - bonding: fix NULL deref in bond_rr_gen_slave_id
 
  - wifi: iwlwifi: mark IWLMEI as broken
 
 Previous releases - always broken:
 
  - nf_conntrack helpers:
    - irc: tighten matching on DCC message
    - sip: fix ct_sip_walk_headers
    - osf: fix possible bogus match in nf_osf_find()
 
  - ipvlan: fix out-of-bound bugs caused by unset skb->mac_header
 
  - core: fix flow symmetric hash
 
  - bonding, team: unsync device addresses on ndo_stop
 
  - phy: micrel: fix shared interrupt on LAN8814
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmMsj3EACgkQMUZtbf5S
 IrsUgQ//eXxuUZeGTg7cgJKPFJelrZ3iL16B1+s2qX94GPIqXRAShgC78iM7IbSe
 y3vR/7YVE7sKXm88wnLefMQVXPp0cE2p0+8++E/j4zcRZsM5sHb2+d3gW6nos2ed
 U8Ldm7LzWUNt/o1ZHDqZWBSoreFkmbFyHO6FVPCuH11tFUJqxJ/SP860mwo6tbuT
 HOoVphKis41IMEXCgybs2V0DAQewba0gejzAmySDy8epNhOj2F4Vo6aadnUCI68U
 HrIFYe2wiEi6MZDsB9zpRXc9seb6ZBKbBjgQnTK7MwfBEQCzxtR2lkNobJM1WbdL
 nYwHBOJ16yX0BnlSpUEepv6iJYY5Q7FS35Wk3Rq5Mik6DaEir6vVSBdRxHpYOkO2
 KPIyyMMAA5E8mAtqH3PcpnwDK+9c3KlZYYKXxIp2IjQm87DpOZJFynwsC3Crmbzo
 C7UTMav2nkHljoapMLUwzqyw2ip+Qo14XA043FDPUru1sXY9CY6q50XZa5GmrNKh
 xyaBdp4Ckj1kOuXUR9jz3Rq8skOZ8lNGHtiCdgPZitWhNKW1YORJihC7/9zdieCR
 1gOE7Dpz/MhVmFn2e8S5O3TkU5lXfALfPDJi4QiML5VLHXd/nCE5sHPiOBWcoo4w
 2djKbIGpLRnO6qMs4NkWNmPbG+/ouvpM+lewqn+xU4TGyn/NTbI=
 =wrep
 -----END PGP SIGNATURE-----

Merge tag 'net-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from wifi, netfilter and can.

  A handful of awaited fixes here - revert of the FEC changes, bluetooth
  fix, fixes for iwlwifi spew.

  We added a warning in PHY/MDIO code which is triggering on a couple of
  platforms in a false-positive-ish way. If we can't iron that out over
  the week we'll drop it and re-add for 6.1.

  I've added a new "follow up fixes" section for fixes to fixes in
  6.0-rcs but it may actually give the false impression that those are
  problematic or that more testing time would have caught them. So
  likely a one time thing.

  Follow up fixes:

   - nf_tables_addchain: fix nft_counters_enabled underflow

   - ebtables: fix memory leak when blob is malformed

   - nf_ct_ftp: fix deadlock when nat rewrite is needed

  Current release - regressions:

   - Revert "fec: Restart PPS after link state change" and the related
     "net: fec: Use a spinlock to guard `fep->ptp_clk_on`"

   - Bluetooth: fix HCIGETDEVINFO regression

   - wifi: mt76: fix 5 GHz connection regression on mt76x0/mt76x2

   - mptcp: fix fwd memory accounting on coalesce

   - rwlock removal fall out:
      - ipmr: always call ip{,6}_mr_forward() from RCU read-side
        critical section
      - ipv6: fix crash when IPv6 is administratively disabled

   - tcp: read multiple skbs in tcp_read_skb()

   - mdio_bus_phy_resume state warning fallout:
      - eth: ravb: fix PHY state warning splat during system resume
      - eth: sh_eth: fix PHY state warning splat during system resume

  Current release - new code bugs:

   - wifi: iwlwifi: don't spam logs with NSS>2 messages

   - eth: mtk_eth_soc: enable XDP support just for MT7986 SoC

  Previous releases - regressions:

   - bonding: fix NULL deref in bond_rr_gen_slave_id

   - wifi: iwlwifi: mark IWLMEI as broken

  Previous releases - always broken:

   - nf_conntrack helpers:
      - irc: tighten matching on DCC message
      - sip: fix ct_sip_walk_headers
      - osf: fix possible bogus match in nf_osf_find()

   - ipvlan: fix out-of-bound bugs caused by unset skb->mac_header

   - core: fix flow symmetric hash

   - bonding, team: unsync device addresses on ndo_stop

   - phy: micrel: fix shared interrupt on LAN8814"

* tag 'net-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
  selftests: forwarding: add shebang for sch_red.sh
  bnxt: prevent skb UAF after handing over to PTP worker
  net: marvell: Fix refcounting bugs in prestera_port_sfp_bind()
  net: sched: fix possible refcount leak in tc_new_tfilter()
  net: sunhme: Fix packet reception for len < RX_COPY_THRESHOLD
  udp: Use WARN_ON_ONCE() in udp_read_skb()
  selftests: bonding: cause oops in bond_rr_gen_slave_id
  bonding: fix NULL deref in bond_rr_gen_slave_id
  net: phy: micrel: fix shared interrupt on LAN8814
  net/smc: Stop the CLC flow if no link to map buffers on
  ice: Fix ice_xdp_xmit() when XDP TX queue number is not sufficient
  net: atlantic: fix potential memory leak in aq_ndev_close()
  can: gs_usb: gs_usb_set_phys_id(): return with error if identify is not supported
  can: gs_usb: gs_can_open(): fix race dev->can.state condition
  can: flexcan: flexcan_mailbox_read() fix return value for drop = true
  net: sh_eth: Fix PHY state warning splat during system resume
  net: ravb: Fix PHY state warning splat during system resume
  netfilter: nf_ct_ftp: fix deadlock when nat rewrite is needed
  netfilter: ebtables: fix memory leak when blob is malformed
  netfilter: nf_tables: fix percpu memory leak at nf_tables_addchain()
  ...
2022-09-22 10:58:13 -07:00
Haiyang Zhang
6fd2c68da5 net: mana: Add rmb after checking owner bits
Per GDMA spec, rmb is necessary after checking owner_bits, before
reading EQ or CQ entries.

Add rmb in these two places to comply with the specs.

Cc: stable@vger.kernel.org
Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Reported-by: Sinan Kaya <Sinan.Kaya@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/r/1662928805-15861-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19 16:23:19 -07:00
Vitaly Kuznetsov
8409fe92d8 PCI: Move PCI_VENDOR_ID_MICROSOFT/PCI_DEVICE_ID_HYPERV_VIDEO definitions to pci_ids.h
There are already three places in kernel which define
PCI_VENDOR_ID_MICROSOFT and two for PCI_DEVICE_ID_HYPERV_VIDEO and
there's a need to use these from core VMBus code. Move the defines where
they belong.

No functional change.

Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_ids.h
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20220827130345.1320254-2-vkuznets@redhat.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-09-05 17:00:54 +00:00
Haiyang Zhang
7a8938cd02 net: mana: Add support of XDP_REDIRECT action
Add a handler of the XDP_REDIRECT return code from a XDP program. The
packets will be flushed at the end of each RX/CQ NAPI poll cycle.
ndo_xdp_xmit() is implemented by sharing the code in mana_xdp_tx().
Ethtool per queue counters are added for XDP redirect and xmit operations.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-16 10:40:25 +02:00
Dexuan Cui
1566e7d620 net: mana: Add the Linux MANA PF driver
This minimal PF driver runs on bare metal.
Currently Ethernet TX/RX works. SR-IOV management is not supported yet.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Co-developed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-16 10:40:25 +02:00
Jakub Kicinski
b707b89f7b eth: switch to netif_napi_add_weight()
Switch all Ethernet drivers which use custom napi weights
to the new API.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08 11:33:57 +01:00
Jakub Kicinski
16d083e28f net: switch to netif_napi_add_tx()
Switch net callers to the new API not requiring
the NAPI_POLL_WEIGHT argument.

Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://lore.kernel.org/r/20220504163725.550782-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-05 15:54:12 -07:00
Haiyang Zhang
68f8313550 net: mana: Remove unnecessary check of cqe_type in mana_process_rx_cqe()
The switch statement already ensures cqe_type == CQE_RX_OKAY at that
point.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05 15:26:00 +00:00
Haiyang Zhang
e4b7621982 net: mana: Add handling of CQE_RX_TRUNCATED
The proper way to drop this kind of CQE is advancing rxq tail
without indicating the packet to the upper network layer.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05 15:26:00 +00:00
Haiyang Zhang
a6bf5703f1 net: mana: Reuse XDP dropped page
Reuse the dropped page in RX path to save page allocation
overhead.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-31 15:39:58 +00:00
Haiyang Zhang
d356abb95b net: mana: Add counter for XDP_TX
This counter will show up in ethtool stat. It is the
number of packets received and forwarded by XDP program.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-31 15:39:58 +00:00
Haiyang Zhang
f90f84201e net: mana: Add counter for packet dropped by XDP
This counter will show up in ethtool stat data.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-31 15:39:58 +00:00
Gustavo A. R. Silva
10cdc794da net: mana: Use struct_size() helper in mana_gd_create_dma_region()
Make use of the struct_size() helper instead of an open-coded version,
in order to avoid any potential type mistakes or integer overflows that,
in the worst scenario, could lead to heap overflows.

Also, address the following sparse warnings:
drivers/net/ethernet/microsoft/mana/gdma_main.c:677:24: warning: using sizeof on a flexible structure

Link: https://github.com/KSPP/linux/issues/174
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-25 12:59:18 +00:00
David S. Miller
e63a023489 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-12-30

The following pull-request contains BPF updates for your *net-next* tree.

We've added 72 non-merge commits during the last 20 day(s) which contain
a total of 223 files changed, 3510 insertions(+), 1591 deletions(-).

The main changes are:

1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii.

2) Beautify and de-verbose verifier logs, from Christy.

3) Composable verifier types, from Hao.

4) bpf_strncmp helper, from Hou.

5) bpf.h header dependency cleanup, from Jakub.

6) get_func_[arg|ret|arg_cnt] helpers, from Jiri.

7) Sleepable local storage, from KP.

8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-31 14:35:40 +00:00
Jakub Kicinski
3b80b73a4b net: Add includes masked by netdevice.h including uapi/bpf.h
Add missing includes unmasked by the subsequent change.

Mostly network drivers missing an include for XDP_PACKET_HEADROOM.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211230012742.770642-2-kuba@kernel.org
2021-12-29 20:03:05 -08:00
Dexuan Cui
6cc74443a7 net: mana: Add RX fencing
RX fencing allows the driver to know that any prior change to the RQs has
finished, e.g. when the RQs are disabled/enabled or the hashkey/indirection
table are changed, RX fencing is required.

Remove the previous workaround "ssleep(1)" and add the real support for
RX fencing as the PF driver supports the MANA_FENCE_RQ request now (any
old PF driver not supporting the request won't be used in production).

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/20211216001748.8751-1-decui@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16 20:27:32 -08:00
Paolo Abeni
c8064e5b4a bpf: Let bpf_warn_invalid_xdp_action() report more info
In non trivial scenarios, the action id alone is not sufficient to
identify the program causing the warning. Before the previous patch,
the generated stack-trace pointed out at least the involved device
driver.

Let's additionally include the program name and id, and the relevant
device name.

If the user needs additional infos, he can fetch them via a kernel
probe, leveraging the arguments added here.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/ddb96bb975cbfddb1546cf5da60e77d5100b533c.1638189075.git.pabeni@redhat.com
2021-12-13 22:28:27 +01:00
Jakub Kicinski
3150a73366 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-09 13:23:02 -08:00
José Expósito
9acfc57fa2 net: mana: Fix memory leak in mana_hwc_create_wq
If allocating the DMA buffer fails, mana_hwc_destroy_wq was called
without previously storing the pointer to the queue.

In order to avoid leaking the pointer to the queue, store it as soon as
it is allocated.

Addresses-Coverity-ID: 1484720 ("Resource leak")
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/r/20211208223723.18520-1-jose.exposito89@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-09 07:58:41 -08:00
Haiyang Zhang
ed5356b53f net: mana: Add XDP support
Add support of XDP for the MANA driver.

Supported XDP actions:
	XDP_PASS, XDP_TX, XDP_DROP, XDP_ABORTED

XDP actions not yet supported:
	XDP_REDIRECT

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22 13:20:19 +00:00
Colin Ian King
8f1bc38bbb net: mana: Fix spelling mistake "calledd" -> "called"
There is a spelling mistake in a dev_info message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/r/20211108201817.43121-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-09 19:16:55 -08:00
Dexuan Cui
635096a86e net: mana: Support hibernation and kexec
Implement the suspend/resume/shutdown callbacks for hibernation/kexec.

Add mana_gd_setup() and mana_gd_cleanup() for some common code, and
use them in the mand_gd_* callbacks.

Reuse mana_probe/remove() for the hibernation path.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01 13:21:49 +00:00
Dexuan Cui
62ea8b77ed net: mana: Improve the HWC error handling
Currently when the HWC creation fails, the error handling is flawed,
e.g. if mana_hwc_create_channel() -> mana_hwc_establish_channel() fails,
the resources acquired in mana_hwc_init_queues() is not released.

Enhance mana_hwc_destroy_channel() to do the proper cleanup work and
call it accordingly.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01 13:21:49 +00:00
Dexuan Cui
3c37f35735 net: mana: Report OS info to the PF driver
The PF driver might use the OS info for statistical purposes.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01 13:21:49 +00:00
Dexuan Cui
6c7ea69653 net: mana: Fix the netdev_err()'s vPort argument in mana_init_port()
Use the correct port index rather than 0.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01 13:21:49 +00:00
Haiyang Zhang
a137c069fb net: mana: Allow setting the number of queues while the NIC is down
The existing code doesn't allow setting the number of queues while the
NIC is down.

Update the ethtool handler functions to support setting the number of
queues while the NIC is at down state.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 14:56:19 +01:00
Jakub Kicinski
e15f5972b8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
tools/testing/selftests/net/ioam6.sh
  7b1700e009cc ("selftests: net: modify IOAM tests for undef bits")
  bf77b1400a56 ("selftests: net: Test for the IOAM encapsulation with IPv6")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-14 16:50:14 -07:00
Haiyang Zhang
be0499369d net: mana: Fix error handling in mana_create_rxq()
Fix error handling in mana_create_rxq() when
cq->gdma_id >= gc->max_num_cqs.

Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/1633698691-31721-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-08 17:00:04 -07:00