Commit Graph

14113 Commits

Author SHA1 Message Date
Junxian Huang
efb9cbf664 RDMA/hns: Fix unnecessary err return when using invalid congest control algorithm
Add a default congest control algorithm so that driver won't return
an error when the configured algorithm is invalid.

Fixes: f91696f2f0 ("RDMA/hns: Support congestion control type selection according to the FW")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20231028093242.670325-1-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-13 10:25:29 +02:00
Philipp Stanner
c170d4ff21 RDMA/hfi1: Copy userspace arrays safely
Currently, memdup_user() is utilized at two positions to copy userspace
arrays. This is done without overflow checks.

Use the new wrapper memdup_array_user() to copy the arrays more safely.

Suggested-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Link: https://lore.kernel.org/r/20231102191308.52046-2-pstanner@redhat.com
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-13 10:21:33 +02:00
Shigeru Yoshida
0550d4604e RDMA/core: Fix uninit-value access in ib_get_eth_speed()
KMSAN reported the following uninit-value access issue:

lo speed is unknown, defaulting to 1000
=====================================================
BUG: KMSAN: uninit-value in ib_get_width_and_speed drivers/infiniband/core/verbs.c:1889 [inline]
BUG: KMSAN: uninit-value in ib_get_eth_speed+0x546/0xaf0 drivers/infiniband/core/verbs.c:1998
 ib_get_width_and_speed drivers/infiniband/core/verbs.c:1889 [inline]
 ib_get_eth_speed+0x546/0xaf0 drivers/infiniband/core/verbs.c:1998
 siw_query_port drivers/infiniband/sw/siw/siw_verbs.c:173 [inline]
 siw_get_port_immutable+0x6f/0x120 drivers/infiniband/sw/siw/siw_verbs.c:203
 setup_port_data drivers/infiniband/core/device.c:848 [inline]
 setup_device drivers/infiniband/core/device.c:1244 [inline]
 ib_register_device+0x1589/0x1df0 drivers/infiniband/core/device.c:1383
 siw_device_register drivers/infiniband/sw/siw/siw_main.c:72 [inline]
 siw_newlink+0x129e/0x13d0 drivers/infiniband/sw/siw/siw_main.c:490
 nldev_newlink+0x8fd/0xa60 drivers/infiniband/core/nldev.c:1763
 rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
 rdma_nl_rcv+0xe8a/0x1120 drivers/infiniband/core/netlink.c:259
 netlink_unicast_kernel net/netlink/af_netlink.c:1342 [inline]
 netlink_unicast+0xf4b/0x1230 net/netlink/af_netlink.c:1368
 netlink_sendmsg+0x1242/0x1420 net/netlink/af_netlink.c:1910
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg net/socket.c:745 [inline]
 ____sys_sendmsg+0x997/0xd60 net/socket.c:2588
 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2642
 __sys_sendmsg net/socket.c:2671 [inline]
 __do_sys_sendmsg net/socket.c:2680 [inline]
 __se_sys_sendmsg net/socket.c:2678 [inline]
 __x64_sys_sendmsg+0x2fa/0x4a0 net/socket.c:2678
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

Local variable lksettings created at:
 ib_get_eth_speed+0x4b/0xaf0 drivers/infiniband/core/verbs.c:1974
 siw_query_port drivers/infiniband/sw/siw/siw_verbs.c:173 [inline]
 siw_get_port_immutable+0x6f/0x120 drivers/infiniband/sw/siw/siw_verbs.c:203

CPU: 0 PID: 11257 Comm: syz-executor.1 Not tainted 6.6.0-14500-g1c41041124bd #10
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
=====================================================

If __ethtool_get_link_ksettings() fails, `netdev_speed` is set to the
default value, SPEED_1000. In this case, if `lanes` field of struct
ethtool_link_ksettings is not initialized, an uninitialized value is passed
to ib_get_width_and_speed(). This causes the above issue. This patch
resolves the issue by initializing `lanes` to 0.

Fixes: cb06b6b3f6 ("RDMA/core: Get IB width and speed from netdev")
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Link: https://lore.kernel.org/r/20231108143113.1360567-1-syoshida@redhat.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-13 10:19:07 +02:00
Bernard Metzler
476b7c7e00 RDMA/siw: Use ib_umem_get() to pin user pages
Abandon siw private code to pin user pages during user
memory registration, but use ib_umem_get() instead.
This will help maintaining the driver in case of changes
to the memory subsystem.

Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20231104075643.195186-1-bmt@zurich.ibm.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-11-13 10:14:00 +02:00
Linus Torvalds
43468456c9 RDMA for v6.7
The usual collection of patches:
 
 - Bug fixes for hns, mlx5, and hfi1
 
 - Hardening patches for size_*, counted_by, strscpy
 
 - rts fixes from static analysis
 
 - Dump SRQ objects in rdma netlink, with hns support
 
 - Fix a performance regression in mlx5 MR deregistration
 
 - New XDR (200Gb/lane) link speed
 
 - SRQ record doorbell latency optimization for hns
 
 - IPSEC support for mlx5 multi-port mode
 
 - ibv_rereg_mr() support for irdma
 
 - Affiliated event support for bnxt_re
 
 - Opt out for the spec compliant qkey security enforcement as we
   discovered SW that breaks under enforcement
 
 - Comment and trivial updates
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZUEvXwAKCRCFwuHvBreF
 YQ/iAPkBl7vCH+oFGJKnh9/pUjKrWwRYBkGNgNesH7lX4Q/cIAEA+R3bAI2d4O4i
 tMrRRowTnxohbVRebuy4Pfsoo6m9dA4=
 =TxMJ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Nothing exciting this cycle, most of the diffstat is changing SPDX
  'or' to 'OR'.

  Summary:

   - Bugfixes for hns, mlx5, and hfi1

   - Hardening patches for size_*, counted_by, strscpy

   - rts fixes from static analysis

   - Dump SRQ objects in rdma netlink, with hns support

   - Fix a performance regression in mlx5 MR deregistration

   - New XDR (200Gb/lane) link speed

   - SRQ record doorbell latency optimization for hns

   - IPSEC support for mlx5 multi-port mode

   - ibv_rereg_mr() support for irdma

   - Affiliated event support for bnxt_re

   - Opt out for the spec compliant qkey security enforcement as we
     discovered SW that breaks under enforcement

   - Comment and trivial updates"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (50 commits)
  IB/mlx5: Fix init stage error handling to avoid double free of same QP and UAF
  RDMA/mlx5: Fix mkey cache WQ flush
  RDMA/hfi1: Workaround truncation compilation error
  IB/hfi1: Fix potential deadlock on &irq_src_lock and &dd->uctxt_lock
  RDMA/core: Remove NULL check before dev_{put, hold}
  RDMA/hfi1: Remove redundant assignment to pointer ppd
  RDMA/mlx5: Change the key being sent for MPV device affiliation
  RDMA/bnxt_re: Fix clang -Wimplicit-fallthrough in bnxt_re_handle_cq_async_error()
  RDMA/hns: Fix init failure of RoCE VF and HIP08
  RDMA/hns: Fix unnecessary port_num transition in HW stats allocation
  RDMA/hns: The UD mode can only be configured with DCQCN
  RDMA/hns: Add check for SL
  RDMA/hns: Fix signed-unsigned mixed comparisons
  RDMA/hns: Fix uninitialized ucmd in hns_roce_create_qp_common()
  RDMA/hns: Fix printing level of asynchronous events
  RDMA/core: Add support to set privileged QKEY parameter
  RDMA/bnxt_re: Do not report SRQ error in srq notification
  RDMA/bnxt_re: Report async events and errors
  RDMA/bnxt_re: Update HW interface headers
  IB/mlx5: Fix rdma counter binding for RAW QP
  ...
2023-11-02 15:20:30 -10:00
Linus Torvalds
6ed92e559a SCSI misc on 20231102
Updates to the usual drivers (ufs, megaraid_sas, lpfc, target, ibmvfc,
 scsi_debug) plus the usual assorted minor fixes and updates.  The
 major change this time around is a prep patch for rethreading of the
 driver reset handler API not to take a scsi_cmd structure which starts
 to reduce various drivers' dependence on scsi_cmd in error handling.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZUORLiYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishQ4WAQDDIhzp
 /PiJBBtt0U9ii/lYqRLrOVnN0extKEgEGO+FbwEAssKgs+5Jn/7XCgdpSrx8Co3/
 0cPXrZGxs7tFpFWLZjM=
 =AlRU
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "Updates to the usual drivers (ufs, megaraid_sas, lpfc, target, ibmvfc,
  scsi_debug) plus the usual assorted minor fixes and updates.

  The major change this time around is a prep patch for rethreading of
  the driver reset handler API not to take a scsi_cmd structure which
  starts to reduce various drivers' dependence on scsi_cmd in error
  handling"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (132 commits)
  scsi: ufs: core: Leave space for '\0' in utf8 desc string
  scsi: ufs: core: Conversion to bool not necessary
  scsi: ufs: core: Fix race between force complete and ISR
  scsi: megaraid: Fix up debug message in megaraid_abort_and_reset()
  scsi: aic79xx: Fix up NULL command in ahd_done()
  scsi: message: fusion: Initialize return value in mptfc_bus_reset()
  scsi: mpt3sas: Fix loop logic
  scsi: snic: Remove useless code in snic_dr_clean_pending_req()
  scsi: core: Add comment to target_destroy in scsi_host_template
  scsi: core: Clean up scsi_dev_queue_ready()
  scsi: pmcraid: Add missing scsi_device_put() in pmcraid_eh_target_reset_handler()
  scsi: target: core: Fix kernel-doc comment
  scsi: pmcraid: Fix kernel-doc comment
  scsi: core: Handle depopulation and restoration in progress
  scsi: ufs: core: Add support for parsing OPP
  scsi: ufs: core: Add OPP support for scaling clocks and regulators
  scsi: ufs: dt-bindings: common: Add OPP table
  scsi: scsi_debug: Add param to control sdev's allow_restart
  scsi: scsi_debug: Add debugfs interface to fail target reset
  scsi: scsi_debug: Add new error injection type: Reset LUN failed
  ...
2023-11-02 15:13:50 -10:00
Linus Torvalds
426ee5196d sysctl-6.7-rc1
To help make the move of sysctls out of kernel/sysctl.c not incur a size
 penalty sysctl has been changed to allow us to not require the sentinel, the
 final empty element on the sysctl array. Joel Granados has been doing all this
 work. On the v6.6 kernel we got the major infrastructure changes required to
 support this. For v6.7-rc1 we have all arch/ and drivers/ modified to remove
 the sentinel. Both arch and driver changes have been on linux-next for a bit
 less than a month. It is worth re-iterating the value:
 
   - this helps reduce the overall build time size of the kernel and run time
      memory consumed by the kernel by about ~64 bytes per array
   - the extra 64-byte penalty is no longer inncurred now when we move sysctls
     out from kernel/sysctl.c to their own files
 
 For v6.8-rc1 expect removal of all the sentinels and also then the unneeded
 check for procname == NULL.
 
 The last 2 patches are fixes recently merged by Krister Johansen which allow
 us again to use softlockup_panic early on boot. This used to work but the
 alias work broke it. This is useful for folks who want to detect softlockups
 super early rather than wait and spend money on cloud solutions with nothing
 but an eventual hung kernel. Although this hadn't gone through linux-next it's
 also a stable fix, so we might as well roll through the fixes now.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmVCqKsSHG1jZ3JvZkBr
 ZXJuZWwub3JnAAoJEM4jHQowkoinEgYQAIpkqRL85DBwems19Uk9A27lkctwZ6Fc
 HdslQCObQTsbuKVimZFP4IL2beUfUE0cfLZCXlzp+4nRDOf6vyhyf3w19jPQtI0Q
 YdqwTk9y6G5VjDsb35QK0+UBloY/kZ1H3/LW4uCwjXTuksUGmWW2Qvey35696Scv
 hDMLADqKQmdpYxLUaNi9QyYbEAjYtOai2ezg3+i7hTG168t1k/Ab2BxIFrPVsCR2
 FAiq05L4ugWjNskdsWBjck05JZsx9SK/qcAxpIPoUm4nGiFNHApXE0E0hs3vsnmn
 WIHIbxCQw8ZlUDlmw4S+0YH3NFFzFbWfmW8k2b0f2qZTJm/rU4KiJfcJVknkAUVF
 raFox6XDW0AUQ9L/NOUJ9ip5rup57GcFrMYocdJ3PPAvvmHKOb1D1O741p75RRcc
 9j7zwfIRrzjPUqzhsQS/GFjdJu3lJNmEBK1AcgrVry6WoItrAzJHKPPDC7TwaNmD
 eXpjxMl1sYzzHqtVh4hn+xkUYphj/6gTGMV8zdo+/FopFswgeJW9G8kHtlEWKDPk
 MRIKwACmfetP6f3ngHunBg+BOipbjCANL7JI0nOhVOQoaULxCCPx+IPJ6GfSyiuH
 AbcjH8DGI7fJbUkBFoF0dsRFZ2gH8ds1PYMbWUJ6x3FtuCuv5iIuvQYoaWU6itm7
 6f0KvCogg0fU
 =Qf50
 -----END PGP SIGNATURE-----

Merge tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux

Pull sysctl updates from Luis Chamberlain:
 "To help make the move of sysctls out of kernel/sysctl.c not incur a
  size penalty sysctl has been changed to allow us to not require the
  sentinel, the final empty element on the sysctl array. Joel Granados
  has been doing all this work. On the v6.6 kernel we got the major
  infrastructure changes required to support this. For v6.7-rc1 we have
  all arch/ and drivers/ modified to remove the sentinel. Both arch and
  driver changes have been on linux-next for a bit less than a month. It
  is worth re-iterating the value:

   - this helps reduce the overall build time size of the kernel and run
     time memory consumed by the kernel by about ~64 bytes per array

   - the extra 64-byte penalty is no longer inncurred now when we move
     sysctls out from kernel/sysctl.c to their own files

  For v6.8-rc1 expect removal of all the sentinels and also then the
  unneeded check for procname == NULL.

  The last two patches are fixes recently merged by Krister Johansen
  which allow us again to use softlockup_panic early on boot. This used
  to work but the alias work broke it. This is useful for folks who want
  to detect softlockups super early rather than wait and spend money on
  cloud solutions with nothing but an eventual hung kernel. Although
  this hadn't gone through linux-next it's also a stable fix, so we
  might as well roll through the fixes now"

* tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (23 commits)
  watchdog: move softlockup_panic back to early_param
  proc: sysctl: prevent aliased sysctls from getting passed to init
  intel drm: Remove now superfluous sentinel element from ctl_table array
  Drivers: hv: Remove now superfluous sentinel element from ctl_table array
  raid: Remove now superfluous sentinel element from ctl_table array
  fw loader: Remove the now superfluous sentinel element from ctl_table array
  sgi-xp: Remove the now superfluous sentinel element from ctl_table array
  vrf: Remove the now superfluous sentinel element from ctl_table array
  char-misc: Remove the now superfluous sentinel element from ctl_table array
  infiniband: Remove the now superfluous sentinel element from ctl_table array
  macintosh: Remove the now superfluous sentinel element from ctl_table array
  parport: Remove the now superfluous sentinel element from ctl_table array
  scsi: Remove now superfluous sentinel element from ctl_table array
  tty: Remove now superfluous sentinel element from ctl_table array
  xen: Remove now superfluous sentinel element from ctl_table array
  hpet: Remove now superfluous sentinel element from ctl_table array
  c-sky: Remove now superfluous sentinel element from ctl_talbe array
  powerpc: Remove now superfluous sentinel element from ctl_table arrays
  riscv: Remove now superfluous sentinel element from ctl_table array
  x86/vdso: Remove now superfluous sentinel element from ctl_table array
  ...
2023-11-01 20:51:41 -10:00
Linus Torvalds
89ed67ef12 Networking changes for 6.7.
Core & protocols
 ----------------
 
  - Support usec resolution of TCP timestamps, enabled selectively by
    a route attribute.
 
  - Defer regular TCP ACK while processing socket backlog, try to send
    a cumulative ACK at the end. Increase single TCP flow performance
    on a 200Gbit NIC by 20% (100Gbit -> 120Gbit).
 
  - The Fair Queuing (FQ) packet scheduler:
    - add built-in 3 band prio / WRR scheduling
    - support bypass if the qdisc is mostly idle (5% speed up for TCP RR)
    - improve inactive flow reporting
    - optimize the layout of structures for better cache locality
 
  - Support TCP Authentication Option (RFC 5925, TCP-AO), a more modern
    replacement for the old MD5 option.
 
  - Add more retransmission timeout (RTO) related statistics to TCP_INFO.
 
  - Support sending fragmented skbs over vsock sockets.
 
  - Make sure we send SIGPIPE for vsock sockets if socket was shutdown().
 
  - Add sysctl for ignoring lower limit on lifetime in Router
    Advertisement PIO, based on an in-progress IETF draft.
 
  - Add sysctl to control activation of TCP ping-pong mode.
 
  - Add sysctl to make connection timeout in MPTCP configurable.
 
  - Support rcvlowat and notsent_lowat on MPTCP sockets, to help apps
    limit the number of wakeups.
 
  - Support netlink GET for MDB (multicast forwarding), allowing user
    space to request a single MDB entry instead of dumping the entire
    table.
 
  - Support selective FDB flushing in the VXLAN tunnel driver.
 
  - Allow limiting learned FDB entries in bridges, prevent OOM attacks.
 
  - Allow controlling via configfs netconsole targets which were created
    via the kernel cmdline at boot, rather than via configfs at runtime.
 
  - Support multiple PTP timestamp event queue readers with different
    filters.
 
  - MCTP over I3C.
 
 BPF
 ---
 
  - Add new veth-like netdevice where BPF program defines the logic
    of the xmit routine. It can operate in L3 and L2 mode.
 
  - Support exceptions - allow asserting conditions which should
    never be true but are hard for the verifier to infer.
    With some extra flexibility around handling of the exit / failure.
    https://lwn.net/Articles/938435/
 
  - Add support for local per-cpu kptr, allow allocating and storing
    per-cpu objects in maps. Access to those objects operates on
    the value for the current CPU. This allows to deprecate local
    one-off implementations of per-CPU storage like
    BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE maps.
 
  - Extend cgroup BPF sockaddr hooks for UNIX sockets. The use case is
    for systemd to re-implement the LogNamespace feature which allows
    running multiple instances of systemd-journald to process the logs
    of different services.
 
  - Enable open-coded task_vma iteration, after maple tree conversion
    made it hard to directly walk VMAs in tracing programs.
 
  - Add open-coded task, css_task and css iterator support.
    One of the use cases is customizable OOM victim selection via BPF.
 
  - Allow source address selection with bpf_*_fib_lookup().
 
  - Add ability to pin BPF timer to the current CPU.
 
  - Prevent creation of infinite loops by combining tail calls and
    fentry/fexit programs.
 
  - Add missed stats for kprobes to retrieve the number of missed kprobe
    executions and subsequent executions of BPF programs.
 
  - Inherit system settings for CPU security mitigations.
 
  - Add BPF v4 CPU instruction support for arm32 and s390x.
 
 Changes to common code
 ----------------------
 
  - overflow: add DEFINE_FLEX() for on-stack definition of structs
    with flexible array members.
 
  - Process doc update with more guidance for reviewers.
 
 Driver API
 ----------
 
  - Simplify locking in WiFi (cfg80211 and mac80211 layers), use wiphy
    mutex in most places and remove a lot of smaller locks.
 
  - Create a common DPLL configuration API. Allow configuring
    and querying state of PLL circuits used for clock syntonization,
    in network time distribution.
 
  - Unify fragmented and full page allocation APIs in page pool code.
    Let drivers be ignorant of PAGE_SIZE.
 
  - Rework PHY state machine to avoid races with calls to phy_stop().
 
  - Notify DSA drivers of MAC address changes on user ports, improve
    correctness of offloads which depend on matching port MAC addresses.
 
  - Allow antenna control on injected WiFi frames.
 
  - Reduce the number of variants of napi_schedule().
 
  - Simplify error handling when composing devlink health messages.
 
 Misc
 ----
 
  - A lot of KCSAN data race "fixes", from Eric.
 
  - A lot of __counted_by() annotations, from Kees.
 
  - A lot of strncpy -> strscpy and printf format fixes.
 
  - Replace master/slave terminology with conduit/user in DSA drivers.
 
  - Handful of KUnit tests for netdev and WiFi core.
 
 Removed
 -------
 
  - AppleTalk COPS.
 
  - AppleTalk ipddp.
 
  - TI AR7 CPMAC Ethernet driver.
 
 Drivers
 -------
 
  - Ethernet high-speed NICs:
    - Intel (100G, ice, idpf):
      - add a driver for the Intel E2000 IPUs
      - make CRC/FCS stripping configurable
      - cross-timestamping for E823 devices
      - basic support for E830 devices
      - use aux-bus for managing client drivers
      - i40e: report firmware versions via devlink
    - nVidia/Mellanox:
      - support 4-port NICs
      - increase max number of channels to 256
      - optimize / parallelize SF creation flow
    - Broadcom (bnxt):
      - enhance NIC temperature reporting
      - support PAM4 speeds and lane configuration
    - Marvell OcteonTX2:
      - PTP pulse-per-second output support
      - enable hardware timestamping for VFs
    - Solarflare/AMD:
      - conntrack NAT offload and offload for tunnels
    - Wangxun (ngbe/txgbe):
      - expose HW statistics
    - Pensando/AMD:
      - support PCI level reset
      - narrow down the condition under which skbs are linearized
    - Netronome/Corigine (nfp):
      - support CHACHA20-POLY1305 crypto in IPsec offload
 
  - Ethernet NICs embedded, slower, virtual:
    - Synopsys (stmmac):
      - add Loongson-1 SoC support
      - enable use of HW queues with no offload capabilities
      - enable PPS input support on all 5 channels
      - increase TX coalesce timer to 5ms
    - RealTek USB (r8152): improve efficiency of Rx by using GRO frags
    - xen: support SW packet timestamping
    - add drivers for implementations based on TI's PRUSS (AM64x EVM)
 
  - nVidia/Mellanox Ethernet datacenter switches:
    - avoid poor HW resource use on Spectrum-4 by better block selection
      for IPv6 multicast forwarding and ordering of blocks in ACL region
 
  - Ethernet embedded switches:
    - Microchip:
      - support configuring the drive strength for EMI compliance
      - ksz9477: partial ACL support
      - ksz9477: HSR offload
      - ksz9477: Wake on LAN
    - Realtek:
      - rtl8366rb: respect device tree config of the CPU port
 
  - Ethernet PHYs:
    - support Broadcom BCM5221 PHYs
    - TI dp83867: support hardware LED blinking
 
  - CAN:
    - add support for Linux-PHY based CAN transceivers
    - at91_can: clean up and use rx-offload helpers
 
  - WiFi:
    - MediaTek (mt76):
      - new sub-driver for mt7925 USB/PCIe devices
      - HW wireless <> Ethernet bridging in MT7988 chips
      - mt7603/mt7628 stability improvements
    - Qualcomm (ath12k):
      - WCN7850:
        - enable 320 MHz channels in 6 GHz band
        - hardware rfkill support
        - enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS
          to make scan faster
        - read board data variant name from SMBIOS
      - QCN9274: mesh support
    - RealTek (rtw89):
      - TDMA-based multi-channel concurrency (MCC)
    - Silicon Labs (wfx):
      - Remain-On-Channel (ROC) support
 
  - Bluetooth:
    - ISO: many improvements for broadcast support
    - mark BCM4378/BCM4387 as BROKEN_LE_CODED
    - add support for QCA2066
    - btmtksdio: enable Bluetooth wakeup from suspend
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmU8XsYACgkQMUZtbf5S
 Irv19RAAnud/24OOF5XMEJkIcYlnfqximh4XO6PujRSYkSkOUJdZTF6iJPgf3pSP
 YpwoHYbYKHYfeOf8+3bTNESiQNSnoVmvmvwiS6/7lZ3behHUrGLQzW9Htc3EZyWH
 2h6QkDZ5OOjfg0bwYSfp3vXkmMH2k8WE9Y0NvCkhcohqZi13Rmp14RnyPmNb2d1V
 yZRYDMSM133KqE6gnBr1Ct65IEvnKeGlCUN2mTGqOJgdn6DZMsyxvtt0y4rmN7Ab
 41+CgPU5SfxfbYpW+Dl2HJpgfte3WrC57KC6AM0PAPJzPmQWgeB/m9mjz/apj6Bg
 bhsEIo7FdvbCnQm3yWPhK2OgCAcSwLr8jfGMU+Q+W4VnL5SRRR3Rm0zjsze+kHNP
 OfqJgxzl3DpvoJqVBy1h5FGcZt0XHwhksm4cTxWqIahsF+veY0ECBXbuBBQx9XTF
 Y7INfI8ulg7wISJs+CJfIClYkgOibTw2u8taBS5ikbtgxNqp5D4QqODn7UefQap1
 PR/IDYODF+zRgmMJLeBqSa6fij6BkfOEDiOWak5kggBoZdtbtmeKI6tzze06CNdW
 lWv1WEhRufxnwK+IuWsEkjhiMbs2WGLvkJ5JbgQV9BfqHfIfiqBCrcWtT/WbQnGt
 lmU46CXh1t/FZEqbmK9h+8vsIIfrcDl6jb5npEiKPRG00vDKRTM=
 =46nS
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Support usec resolution of TCP timestamps, enabled selectively by a
     route attribute.

   - Defer regular TCP ACK while processing socket backlog, try to send
     a cumulative ACK at the end. Increase single TCP flow performance
     on a 200Gbit NIC by 20% (100Gbit -> 120Gbit).

   - The Fair Queuing (FQ) packet scheduler:
       - add built-in 3 band prio / WRR scheduling
       - support bypass if the qdisc is mostly idle (5% speed up for TCP RR)
       - improve inactive flow reporting
       - optimize the layout of structures for better cache locality

   - Support TCP Authentication Option (RFC 5925, TCP-AO), a more modern
     replacement for the old MD5 option.

   - Add more retransmission timeout (RTO) related statistics to
     TCP_INFO.

   - Support sending fragmented skbs over vsock sockets.

   - Make sure we send SIGPIPE for vsock sockets if socket was
     shutdown().

   - Add sysctl for ignoring lower limit on lifetime in Router
     Advertisement PIO, based on an in-progress IETF draft.

   - Add sysctl to control activation of TCP ping-pong mode.

   - Add sysctl to make connection timeout in MPTCP configurable.

   - Support rcvlowat and notsent_lowat on MPTCP sockets, to help apps
     limit the number of wakeups.

   - Support netlink GET for MDB (multicast forwarding), allowing user
     space to request a single MDB entry instead of dumping the entire
     table.

   - Support selective FDB flushing in the VXLAN tunnel driver.

   - Allow limiting learned FDB entries in bridges, prevent OOM attacks.

   - Allow controlling via configfs netconsole targets which were
     created via the kernel cmdline at boot, rather than via configfs at
     runtime.

   - Support multiple PTP timestamp event queue readers with different
     filters.

   - MCTP over I3C.

  BPF:

   - Add new veth-like netdevice where BPF program defines the logic of
     the xmit routine. It can operate in L3 and L2 mode.

   - Support exceptions - allow asserting conditions which should never
     be true but are hard for the verifier to infer. With some extra
     flexibility around handling of the exit / failure:

          https://lwn.net/Articles/938435/

   - Add support for local per-cpu kptr, allow allocating and storing
     per-cpu objects in maps. Access to those objects operates on the
     value for the current CPU.

     This allows to deprecate local one-off implementations of per-CPU
     storage like BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE maps.

   - Extend cgroup BPF sockaddr hooks for UNIX sockets. The use case is
     for systemd to re-implement the LogNamespace feature which allows
     running multiple instances of systemd-journald to process the logs
     of different services.

   - Enable open-coded task_vma iteration, after maple tree conversion
     made it hard to directly walk VMAs in tracing programs.

   - Add open-coded task, css_task and css iterator support. One of the
     use cases is customizable OOM victim selection via BPF.

   - Allow source address selection with bpf_*_fib_lookup().

   - Add ability to pin BPF timer to the current CPU.

   - Prevent creation of infinite loops by combining tail calls and
     fentry/fexit programs.

   - Add missed stats for kprobes to retrieve the number of missed
     kprobe executions and subsequent executions of BPF programs.

   - Inherit system settings for CPU security mitigations.

   - Add BPF v4 CPU instruction support for arm32 and s390x.

  Changes to common code:

   - overflow: add DEFINE_FLEX() for on-stack definition of structs with
     flexible array members.

   - Process doc update with more guidance for reviewers.

  Driver API:

   - Simplify locking in WiFi (cfg80211 and mac80211 layers), use wiphy
     mutex in most places and remove a lot of smaller locks.

   - Create a common DPLL configuration API. Allow configuring and
     querying state of PLL circuits used for clock syntonization, in
     network time distribution.

   - Unify fragmented and full page allocation APIs in page pool code.
     Let drivers be ignorant of PAGE_SIZE.

   - Rework PHY state machine to avoid races with calls to phy_stop().

   - Notify DSA drivers of MAC address changes on user ports, improve
     correctness of offloads which depend on matching port MAC
     addresses.

   - Allow antenna control on injected WiFi frames.

   - Reduce the number of variants of napi_schedule().

   - Simplify error handling when composing devlink health messages.

  Misc:

   - A lot of KCSAN data race "fixes", from Eric.

   - A lot of __counted_by() annotations, from Kees.

   - A lot of strncpy -> strscpy and printf format fixes.

   - Replace master/slave terminology with conduit/user in DSA drivers.

   - Handful of KUnit tests for netdev and WiFi core.

  Removed:

   - AppleTalk COPS.

   - AppleTalk ipddp.

   - TI AR7 CPMAC Ethernet driver.

  Drivers:

   - Ethernet high-speed NICs:
      - Intel (100G, ice, idpf):
         - add a driver for the Intel E2000 IPUs
         - make CRC/FCS stripping configurable
         - cross-timestamping for E823 devices
         - basic support for E830 devices
         - use aux-bus for managing client drivers
         - i40e: report firmware versions via devlink
      - nVidia/Mellanox:
         - support 4-port NICs
         - increase max number of channels to 256
         - optimize / parallelize SF creation flow
      - Broadcom (bnxt):
         - enhance NIC temperature reporting
         - support PAM4 speeds and lane configuration
      - Marvell OcteonTX2:
         - PTP pulse-per-second output support
         - enable hardware timestamping for VFs
      - Solarflare/AMD:
         - conntrack NAT offload and offload for tunnels
      - Wangxun (ngbe/txgbe):
         - expose HW statistics
      - Pensando/AMD:
         - support PCI level reset
         - narrow down the condition under which skbs are linearized
      - Netronome/Corigine (nfp):
         - support CHACHA20-POLY1305 crypto in IPsec offload

   - Ethernet NICs embedded, slower, virtual:
      - Synopsys (stmmac):
         - add Loongson-1 SoC support
         - enable use of HW queues with no offload capabilities
         - enable PPS input support on all 5 channels
         - increase TX coalesce timer to 5ms
      - RealTek USB (r8152): improve efficiency of Rx by using GRO frags
      - xen: support SW packet timestamping
      - add drivers for implementations based on TI's PRUSS (AM64x EVM)

   - nVidia/Mellanox Ethernet datacenter switches:
      - avoid poor HW resource use on Spectrum-4 by better block
        selection for IPv6 multicast forwarding and ordering of blocks
        in ACL region

   - Ethernet embedded switches:
      - Microchip:
         - support configuring the drive strength for EMI compliance
         - ksz9477: partial ACL support
         - ksz9477: HSR offload
         - ksz9477: Wake on LAN
      - Realtek:
         - rtl8366rb: respect device tree config of the CPU port

   - Ethernet PHYs:
      - support Broadcom BCM5221 PHYs
      - TI dp83867: support hardware LED blinking

   - CAN:
      - add support for Linux-PHY based CAN transceivers
      - at91_can: clean up and use rx-offload helpers

   - WiFi:
      - MediaTek (mt76):
         - new sub-driver for mt7925 USB/PCIe devices
         - HW wireless <> Ethernet bridging in MT7988 chips
         - mt7603/mt7628 stability improvements
      - Qualcomm (ath12k):
         - WCN7850:
            - enable 320 MHz channels in 6 GHz band
            - hardware rfkill support
            - enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS to
              make scan faster
            - read board data variant name from SMBIOS
        - QCN9274: mesh support
      - RealTek (rtw89):
         - TDMA-based multi-channel concurrency (MCC)
      - Silicon Labs (wfx):
         - Remain-On-Channel (ROC) support

   - Bluetooth:
      - ISO: many improvements for broadcast support
      - mark BCM4378/BCM4387 as BROKEN_LE_CODED
      - add support for QCA2066
      - btmtksdio: enable Bluetooth wakeup from suspend"

* tag 'net-next-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1816 commits)
  net: pcs: xpcs: Add 2500BASE-X case in get state for XPCS drivers
  net: bpf: Use sockopt_lock_sock() in ip_sock_set_tos()
  net: mana: Use xdp_set_features_flag instead of direct assignment
  vxlan: Cleanup IFLA_VXLAN_PORT_RANGE entry in vxlan_get_size()
  iavf: delete the iavf client interface
  iavf: add a common function for undoing the interrupt scheme
  iavf: use unregister_netdev
  iavf: rely on netdev's own registered state
  iavf: fix the waiting time for initial reset
  iavf: in iavf_down, don't queue watchdog_task if comms failed
  iavf: simplify mutex_trylock+sleep loops
  iavf: fix comments about old bit locks
  doc/netlink: Update schema to support cmd-cnt-name and cmd-max-name
  tools: ynl: introduce option to process unknown attributes or types
  ipvlan: properly track tx_errors
  netdevsim: Block until all devices are released
  nfp: using napi_build_skb() to replace build_skb()
  net: dsa: microchip: ksz9477: Fix spelling mistake "Enery" -> "Energy"
  net: dsa: microchip: Ensure Stable PME Pin State for Wake-on-LAN
  net: dsa: microchip: Refactor switch shutdown routine for WoL preparation
  ...
2023-10-31 05:10:11 -10:00
George Kennedy
2ef422f063 IB/mlx5: Fix init stage error handling to avoid double free of same QP and UAF
In the unlikely event that workqueue allocation fails and returns NULL in
mlx5_mkey_cache_init(), delete the call to
mlx5r_umr_resource_cleanup() (which frees the QP) in
mlx5_ib_stage_post_ib_reg_umr_init().  This will avoid attempted double
free of the same QP when __mlx5_ib_add() does its cleanup.

Resolves a splat:

   Syzkaller reported a UAF in ib_destroy_qp_user

   workqueue: Failed to create a rescuer kthread for wq "mkey_cache": -EINTR
   infiniband mlx5_0: mlx5_mkey_cache_init:981:(pid 1642):
   failed to create work queue
   infiniband mlx5_0: mlx5_ib_stage_post_ib_reg_umr_init:4075:(pid 1642):
   mr cache init failed -12
   ==================================================================
   BUG: KASAN: slab-use-after-free in ib_destroy_qp_user (drivers/infiniband/core/verbs.c:2073)
   Read of size 8 at addr ffff88810da310a8 by task repro_upstream/1642

   Call Trace:
   <TASK>
   kasan_report (mm/kasan/report.c:590)
   ib_destroy_qp_user (drivers/infiniband/core/verbs.c:2073)
   mlx5r_umr_resource_cleanup (drivers/infiniband/hw/mlx5/umr.c:198)
   __mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4178)
   mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402)
   ...
   </TASK>

   Allocated by task 1642:
   __kmalloc (./include/linux/kasan.h:198 mm/slab_common.c:1026
   mm/slab_common.c:1039)
   create_qp (./include/linux/slab.h:603 ./include/linux/slab.h:720
   ./include/rdma/ib_verbs.h:2795 drivers/infiniband/core/verbs.c:1209)
   ib_create_qp_kernel (drivers/infiniband/core/verbs.c:1347)
   mlx5r_umr_resource_init (drivers/infiniband/hw/mlx5/umr.c:164)
   mlx5_ib_stage_post_ib_reg_umr_init (drivers/infiniband/hw/mlx5/main.c:4070)
   __mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4168)
   mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402)
   ...

   Freed by task 1642:
   __kmem_cache_free (mm/slub.c:1826 mm/slub.c:3809 mm/slub.c:3822)
   ib_destroy_qp_user (drivers/infiniband/core/verbs.c:2112)
   mlx5r_umr_resource_cleanup (drivers/infiniband/hw/mlx5/umr.c:198)
   mlx5_ib_stage_post_ib_reg_umr_init (drivers/infiniband/hw/mlx5/main.c:4076
   drivers/infiniband/hw/mlx5/main.c:4065)
   __mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4168)
   mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402)
   ...

Fixes: 04876c12c1 ("RDMA/mlx5: Move init and cleanup of UMR to umr.c")
Link: https://lore.kernel.org/r/1698170518-4006-1-git-send-email-george.kennedy@oracle.com
Suggested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: George Kennedy <george.kennedy@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-31 11:16:05 -03:00
Moshe Shemesh
a53e215f90 RDMA/mlx5: Fix mkey cache WQ flush
The cited patch tries to ensure no pending works on the mkey cache
workqueue by disabling adding new works and call flush_workqueue().
But this workqueue also has delayed works which might still be pending
the delay time to be queued.

Add cancel_delayed_work() for the delayed works which waits to be queued
and then the flush_workqueue() will flush all works which are already
queued and running.

Fixes: 374012b004 ("RDMA/mlx5: Fix mkey cache possible deadlock on cleanup")
Link: https://lore.kernel.org/r/b8722f14e7ed81452f791764a26d2ed4cfa11478.1698256179.git.leon@kernel.org
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-31 10:57:49 -03:00
Jason Gunthorpe
162e348024 Merge tag 'v6.6' into rdma.git for-next
Resolve conflict by taking the spin_lock hunk from for-next:

 https://lore.kernel.org/r/20230928113851.5197a1ec@canb.auug.org.au

Required for the next patch.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-31 10:54:48 -03:00
Linus Torvalds
14ab6d425e vfs-6.7.ctime
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTppYgAKCRCRxhvAZXjc
 okIHAP9anLz1QDyMLH12ASuHjgBc0Of3jcB6NB97IWGpL4O21gEA46ohaD+vcJuC
 YkBLU3lXqQ87nfu28ExFAzh10hG2jwM=
 =m4pB
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs

Pull vfs inode time accessor updates from Christian Brauner:
 "This finishes the conversion of all inode time fields to accessor
  functions as discussed on list. Changing timestamps manually as we
  used to do before is error prone. Using accessors function makes this
  robust.

  It does not contain the switch of the time fields to discrete 64 bit
  integers to replace struct timespec and free up space in struct inode.
  But after this, the switch can be trivially made and the patch should
  only affect the vfs if we decide to do it"

* tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (86 commits)
  fs: rename inode i_atime and i_mtime fields
  security: convert to new timestamp accessors
  selinux: convert to new timestamp accessors
  apparmor: convert to new timestamp accessors
  sunrpc: convert to new timestamp accessors
  mm: convert to new timestamp accessors
  bpf: convert to new timestamp accessors
  ipc: convert to new timestamp accessors
  linux: convert to new timestamp accessors
  zonefs: convert to new timestamp accessors
  xfs: convert to new timestamp accessors
  vboxsf: convert to new timestamp accessors
  ufs: convert to new timestamp accessors
  udf: convert to new timestamp accessors
  ubifs: convert to new timestamp accessors
  tracefs: convert to new timestamp accessors
  sysv: convert to new timestamp accessors
  squashfs: convert to new timestamp accessors
  server: convert to new timestamp accessors
  client: convert to new timestamp accessors
  ...
2023-10-30 09:47:13 -10:00
Linus Torvalds
df9c65b5fc vfs-6.7.iov_iter
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTppQwAKCRCRxhvAZXjc
 om2kAP4u+eLsrhJHfUPUttGUEkSkZE+5/s/f1A/1GcV51usLSgEAu8urxAnP49GW
 INaDABXaFfKx8/KI/H2YFZPKGwlNEwY=
 =PDDE
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.7.iov_iter' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs

Pull iov_iter updates from Christian Brauner:
 "This contain's David's iov_iter cleanup work to convert the iov_iter
  iteration macros to inline functions:

   - Remove last_offset from iov_iter as it was only used by ITER_PIPE

   - Add a __user tag on copy_mc_to_user()'s dst argument on x86 to
     match that on powerpc and get rid of a sparse warning

   - Convert iter->user_backed to user_backed_iter() in the sound PCM
     driver

   - Convert iter->user_backed to user_backed_iter() in a couple of
     infiniband drivers

   - Renumber the type enum so that the ITER_* constants match the order
     in iterate_and_advance*()

   - Since the preceding patch puts UBUF and IOVEC at 0 and 1, change
     user_backed_iter() to just use the type value and get rid of the
     extra flag

   - Convert the iov_iter iteration macros to always-inline functions to
     make the code easier to follow. It uses function pointers, but they
     get optimised away

   - Move the check for ->copy_mc to _copy_from_iter() and
     copy_page_from_iter_atomic() rather than in memcpy_from_iter_mc()
     where it gets repeated for every segment. Instead, we check once
     and invoke a side function that can use iterate_bvec() rather than
     iterate_and_advance() and supply a different step function

   - Move the copy-and-csum code to net/ where it can be in proximity
     with the code that uses it

   - Fold memcpy_and_csum() in to its two users

   - Move csum_and_copy_from_iter_full() out of line and merge in
     csum_and_copy_from_iter() since the former is the only caller of
     the latter

   - Move hash_and_copy_to_iter() to net/ where it can be with its only
     caller"

* tag 'vfs-6.7.iov_iter' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
  iov_iter, net: Move hash_and_copy_to_iter() to net/
  iov_iter, net: Merge csum_and_copy_from_iter{,_full}() together
  iov_iter, net: Fold in csum_and_memcpy()
  iov_iter, net: Move csum_and_copy_to/from_iter() to net/
  iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  iov_iter: Convert iterate*() to inline funcs
  iov_iter: Derive user-backedness from the iterator type
  iov_iter: Renumber ITER_* constants
  infiniband: Use user_backed_iter() to see if iterator is UBUF/IOVEC
  sound: Fix snd_pcm_readv()/writev() to use iov access functions
  iov_iter, x86: Be consistent about the __user tag on copy_mc_to_user()
  iov_iter: Remove last_offset from iov_iter as it was for ITER_PIPE
2023-10-30 09:24:21 -10:00
Leon Romanovsky
d4b2d16571 RDMA/hfi1: Workaround truncation compilation error
Increase name array to be large enough to overcome the following
compilation error.

drivers/infiniband/hw/hfi1/efivar.c: In function ‘read_hfi1_efi_var’:
drivers/infiniband/hw/hfi1/efivar.c:124:44: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=]
  124 |         snprintf(name, sizeof(name), "%s-%s", prefix_name, kind);
      |                                            ^
drivers/infiniband/hw/hfi1/efivar.c:124:9: note: ‘snprintf’ output 2 or more bytes (assuming 65) into a destination of size 64
  124 |         snprintf(name, sizeof(name), "%s-%s", prefix_name, kind);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/hw/hfi1/efivar.c:133:52: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=]
  133 |                 snprintf(name, sizeof(name), "%s-%s", prefix_name, kind);
      |                                                    ^
drivers/infiniband/hw/hfi1/efivar.c:133:17: note: ‘snprintf’ output 2 or more bytes (assuming 65) into a destination of size 64
  133 |                 snprintf(name, sizeof(name), "%s-%s", prefix_name, kind);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[6]: *** [scripts/Makefile.build:243: drivers/infiniband/hw/hfi1/efivar.o] Error 1

Fixes: c03c08d50b ("IB/hfi1: Check upper-case EFI variables")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/238fa39a8fd60e87a5ad7e1ca6584fcdf32e9519.1698159993.git.leonro@nvidia.com
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-25 11:13:57 +03:00
Chengfeng Ye
2f19c4b839 IB/hfi1: Fix potential deadlock on &irq_src_lock and &dd->uctxt_lock
handle_receive_interrupt_napi_sp() running inside interrupt handler
could introduce inverse lock ordering between &dd->irq_src_lock
and &dd->uctxt_lock, if read_mod_write() is preempted by the isr.

          [CPU0]                                        |          [CPU1]
hfi1_ipoib_dev_open()                                   |
--> hfi1_netdev_enable_queues()                         |
--> enable_queues(rx)                                   |
--> hfi1_rcvctrl()                                      |
--> set_intr_bits()                                     |
--> read_mod_write()                                    |
--> spin_lock(&dd->irq_src_lock)                        |
                                                        | hfi1_poll()
                                                        | --> poll_next()
                                                        | --> spin_lock_irq(&dd->uctxt_lock)
                                                        |
                                                        | --> hfi1_rcvctrl()
                                                        | --> set_intr_bits()
                                                        | --> read_mod_write()
                                                        | --> spin_lock(&dd->irq_src_lock)
<interrupt>                                             |
   --> handle_receive_interrupt_napi_sp()               |
   --> set_all_fastpath()                               |
   --> hfi1_rcd_get_by_index()                          |
   --> spin_lock_irqsave(&dd->uctxt_lock)               |

This flaw was found by an experimental static analysis tool I am
developing for irq-related deadlock.

To prevent the potential deadlock, the patch use spin_lock_irqsave()
on &dd->irq_src_lock inside read_mod_write() to prevent the possible
deadlock scenario.

Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Link: https://lore.kernel.org/r/20230926101116.2797-1-dg573847474@gmail.com
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-25 11:12:43 +03:00
Yang Li
7a1c2abf9a RDMA/core: Remove NULL check before dev_{put, hold}
The call netdev_{put, hold} of dev_{put, hold} will check NULL,
so there is no need to check before using dev_{put, hold},
remove it to silence the warning:

./drivers/infiniband/core/nldev.c:375:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7047
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231024003815.89742-1-yang.lee@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-24 18:16:04 +03:00
Colin Ian King
b55706366c RDMA/hfi1: Remove redundant assignment to pointer ppd
Pointer ppd is being assigned a value in a for-loop however it
is never read. The assignment is redundant and can be removed.
Cleans up clang scan build warning:

drivers/infiniband/hw/hfi1/init.c:1030:3: warning: Value stored
to 'ppd' is never read [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20231023141733.667807-1-colin.i.king@gmail.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-24 17:33:25 +03:00
Patrisious Haddad
02e7d139e5 RDMA/mlx5: Change the key being sent for MPV device affiliation
Change the key that we send from IB driver to EN driver regarding the
MPV device affiliation, since at that stage the IB device is not yet
initialized, so its index would be zero for different IB devices and
cause wrong associations between unrelated master and slave devices.

Instead use a unique value from inside the core device which is already
initialized at this stage.

Fixes: 0d293714ac ("RDMA/mlx5: Send events from IB driver about device affiliation state")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Link: https://lore.kernel.org/r/ac7e66357d963fc68d7a419515180212c96d137d.1697705185.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-24 17:10:58 +03:00
Nathan Chancellor
9040c0d96f RDMA/bnxt_re: Fix clang -Wimplicit-fallthrough in bnxt_re_handle_cq_async_error()
Clang warns (or errors with CONFIG_WERROR=y):

  drivers/infiniband/hw/bnxt_re/main.c:1114:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
   1114 |         default:
        |         ^
  drivers/infiniband/hw/bnxt_re/main.c:1114:2: note: insert 'break;' to avoid fall-through
   1114 |         default:
        |         ^
        |         break;
  1 error generated.

Clang is a little more pedantic than GCC, which does not warn when
falling through to a case that is just break or return. Clang's version
is more in line with the kernel's own stance in deprecated.rst, which
states that all switch/case blocks must end in either break,
fallthrough, continue, goto, or return. Add the missing break to silence
the warning.

Closes: https://github.com/ClangBuiltLinux/linux/issues/1950
Fixes: b02fd3f79e ("RDMA/bnxt_re: Report async events and errors")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20231020-bnxt_re-implicit-fallthrough-v1-1-b5c19534a6c9@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-22 10:31:01 +03:00
Junxian Huang
07f06e0e5c RDMA/hns: Fix init failure of RoCE VF and HIP08
During device init, a struct for HW stats will be allocated. As HW
stats are not supported for VF and HIP08, currently
hns_roce_alloc_hw_port_stats() returns NULL in this case. However,
ib-core considers the returned NULL pointer as memory allocation
failure and returns ENOMEM, eventually leading to the failure of VF
and HIP08 init.

In the case where the driver does not support the .alloc_hw_port_stats()
ops, ib-core will return EOPNOTSUPP and ignore this error code in the
upper layer function. So for VF and HIP08, just don't set the HW stats
ops to ib-core.

Fixes: 5a87279591 ("RDMA/hns: Support hns HW stats")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20231017125239.164455-8-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-19 09:49:40 +03:00
Junxian Huang
b4a797b894 RDMA/hns: Fix unnecessary port_num transition in HW stats allocation
The num_ports capability of devices should be compared with the
number of port(i.e. the input param "port_num") but not the port
index(i.e. port_num - 1).

Fixes: 5a87279591 ("RDMA/hns: Support hns HW stats")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20231017125239.164455-7-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-19 09:49:40 +03:00
Luoyouming
27c5fd271d RDMA/hns: The UD mode can only be configured with DCQCN
Due to hardware limitations, only DCQCN is supported for UD. Therefore, the
default algorithm for UD is set to DCQCN.

Fixes: f91696f2f0 ("RDMA/hns: Support congestion control type selection according to the FW")
Signed-off-by: Luoyouming <luoyouming@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20231017125239.164455-6-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-19 09:49:40 +03:00
Luoyouming
5e617c18b1 RDMA/hns: Add check for SL
SL set by users may exceed the capability of devices. So add check
for this situation.

Fixes: fba429fcf9 ("RDMA/hns: Fix missing fields in address vector")
Fixes: 70f9252158 ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT")
Fixes: f0cb411aad ("RDMA/hns: Use new interface to modify QP context")
Signed-off-by: Luoyouming <luoyouming@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20231017125239.164455-5-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-19 09:49:40 +03:00
Chengchang Tang
b5f9efff10 RDMA/hns: Fix signed-unsigned mixed comparisons
The ib_mtu_enum_to_int() and uverbs_attr_get_len() may returns a negative
value. In this case, mixed comparisons of signed and unsigned types will
throw wrong results.

This patch adds judgement for this situation.

Fixes: 30b707886a ("RDMA/hns: Support inline data in extented sge space for RC")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20231017125239.164455-4-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-19 09:49:40 +03:00
Chengchang Tang
c64e9710f9 RDMA/hns: Fix uninitialized ucmd in hns_roce_create_qp_common()
ucmd in hns_roce_create_qp_common() are not initialized. But it works fine
until new member sdb_addr is added to struct hns_roce_ib_create_qp.

If the user-mode driver uses an old version ABI, then the value of the new
member will be undefined after ib_copy_from_udata().

This patch fixes it by initialize this variable to 0. And the default value
of the new member sdb_addr will be 0 which is invalid.

Fixes: 0425e3e6e0 ("RDMA/hns: Support flush cqe for hip08 in kernel space")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20231017125239.164455-3-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-19 09:49:40 +03:00
Chengchang Tang
9faef73ef4 RDMA/hns: Fix printing level of asynchronous events
The current driver will print all asynchronous events. Some of the
print levels are set improperly, e.g. SRQ limit reach and SRQ last
wqe reach, which may also occur during normal operation of the software.
Currently, the information of these event is printed as a warning,
which causes a large amount of printing even during normal use of the
application. As a result, the service performance deteriorates.

This patch fixes the printing storms by modifying the print level.

Fixes: b00a92c8f2 ("RDMA/hns: Move all prints out of irq handle")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20231017125239.164455-2-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-19 09:49:40 +03:00
Patrisious Haddad
465d6b42f1 RDMA/core: Add support to set privileged QKEY parameter
Add netlink command that enables/disables privileged QKEY by default.

It is disabled by default, since according to IB spec only privileged
users are allowed to use privileged QKEY.
According to the IB specification rel-1.6, section 3.5.3:
"QKEYs with the most significant bit set are considered controlled
QKEYs, and a HCA does not allow a consumer to arbitrarily specify a
controlled QKEY."

Using rdma tool,
$rdma system set privileged-qkey on

When enabled non-privileged users would be able to use
controlled QKEYs which are considered privileged.

Using rdma tool,
$rdma system set privileged-qkey off

When disabled only privileged users would be able to use
controlled QKEYs.

You can also use the command below to check the parameter state:
$rdma system show
netns shared privileged-qkey off copy-on-fork on

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Link: https://lore.kernel.org/r/90398be70a9d23d2aa9d0f9fd11d2c264c1be534.1696848201.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-19 09:31:00 +03:00
Jeff Layton
7e6481cebd
qib: convert to new timestamp accessors
Convert to using the new inode timestamp accessor functions.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20231004185347.80880-5-jlayton@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-18 13:26:16 +02:00
Chandramohan Akula
45cfa8864c RDMA/bnxt_re: Do not report SRQ error in srq notification
In the SRQ notification handler, do not report the SRQ_ERROR
in the default event case, as there was no error.

Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1697049097-31992-4-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-15 11:48:32 +03:00
Chandramohan Akula
b02fd3f79e RDMA/bnxt_re: Report async events and errors
Report QP, SRQ and CQ async events and errors.

Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1697049097-31992-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-15 11:48:32 +03:00
Chandramohan Akula
d60a779673 RDMA/bnxt_re: Update HW interface headers
Updating the HW structures for the affiliated event and error
reporting. Newly added interface structures will be used in the
followup patch.

Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1697049097-31992-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-15 11:48:32 +03:00
Patrisious Haddad
c1336bb4aa IB/mlx5: Fix rdma counter binding for RAW QP
Previously when we had a RAW QP, we bound a counter to it when it moved
to INIT state, using the counter context inside RQC.

But when we try to modify that counter later in RTS state we used
modify QP which tries to change the counter inside QPC instead of RQC.

Now we correctly modify the counter set_id inside of RQC instead of QPC
for the RAW QP.

Fixes: d14133dd41 ("IB/mlx5: Support set qp counter")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/2e5ab6713784a8fe997d19c508187a0dfecf2dfc.1696847964.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-15 11:04:01 +03:00
Mike Christie
194605d45d scsi: target: Have drivers report if they support direct submissions
In some cases, like with multiple LUN targets or where the target has to
respond to transport level requests from the receiving context it can be
better to defer cmd submission to a helper thread. If the backend driver
blocks on something like request/tag allocation it can block the entire
target submission path and other LUs and transport IO on that session.

In other cases like single LUN targets with storage that can support all
the commands that the target can queue, then it's best to submit the cmd
to the backend from the target's cmd receiving context.

Subsequent commits will allow the user to config what they prefer, but
drivers like loop can't directly submit because they can be called from a
context that can't sleep. And, drivers like vhost-scsi can support direct
submission, but need to keep their default behavior of deferring execution
to avoid possible regressions where the backend can block.

Make the drivers tell LIO core if they support direct submissions and their
current default, so we can prevent users from misconfiguring the system and
initialize devices correctly.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20230928020907.5730-2-michael.christie@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-10-13 15:53:57 -04:00
Jakub Kicinski
1bc60524ca Merge branch 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Leon Romanovsky says:

====================
This PR is collected from
https://lore.kernel.org/all/cover.1695296682.git.leon@kernel.org

This series from Patrisious extends mlx5 to support IPsec packet offload
in multiport devices (MPV, see [1] for more details).

These devices have single flow steering logic and two netdev interfaces,
which require extra logic to manage IPsec configurations as they performed
on netdevs.

[1] https://lore.kernel.org/linux-rdma/20180104152544.28919-1-leon@kernel.org/

* 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Handle IPsec steering upon master unbind/bind
  net/mlx5: Configure IPsec steering for ingress RoCEv2 MPV traffic
  net/mlx5: Configure IPsec steering for egress RoCEv2 MPV traffic
  net/mlx5: Add create alias flow table function to ipsec roce
  net/mlx5: Implement alias object allow and create functions
  net/mlx5: Add alias flow table bits
  net/mlx5: Store devcom pointer inside IPsec RoCE
  net/mlx5: Register mlx5e priv to devcom in MPV mode
  RDMA/mlx5: Send events from IB driver about device affiliation state
  net/mlx5: Introduce ifc bits for migration in a chunk mode

====================

Link: https://lore.kernel.org/r/20231002083832.19746-1-leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13 09:35:34 -07:00
Jakub Kicinski
0e6bb5b7f4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

kernel/bpf/verifier.c
  829955981c ("bpf: Fix verifier log for async callback return values")
  a923819fb2 ("bpf: Treat first argument as return value for bpf_throw")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-12 17:07:34 -07:00
Christian Marangi
73382e919f netdev: replace napi_reschedule with napi_schedule
Now that napi_schedule return a bool, we can drop napi_reschedule that
does the same exact function. The function comes from a very old commit
bfe13f54f5 ("ibm_emac: Convert to use napi_struct independent of struct
net_device") and the purpose is actually deprecated in favour of
different logic.

Convert every user of napi_reschedule to napi_schedule.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> # ath10k
Acked-by: Nick Child <nnac123@linux.ibm.com> # ibm
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for can/dev/rx-offload.c
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20231009133754.9834-3-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-11 17:28:06 -07:00
Joel Granados
6d07cc269b infiniband: Remove the now superfluous sentinel element from ctl_table array
This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)

Remove sentinel from iwcm_ctl_table and ucma_ctl_table

Signed-off-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-10-11 12:16:13 -07:00
Sindhu Devale
5ac388db27 RDMA/irdma: Add support to re-register a memory region
Add support for reregister MR verb API by doing a de-register
followed by a register MR with the new attributes. Reuse resources
like iwmr handle and HW stag where possible.

Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20231004151306.228-1-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-09 08:39:49 +03:00
Leon Romanovsky
c38d23a544 RDMA/core: Require admin capabilities to set system parameters
Like any other set command, require admin permissions to do it.

Cc: stable@vger.kernel.org
Fixes: 2b34c55802 ("RDMA/core: Add command to set ib_core device net namspace sharing mode")
Link: https://lore.kernel.org/r/75d329fdd7381b52cbdf87910bef16c9965abb1f.1696443438.git.leon@kernel.org
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-10-05 20:01:47 +03:00
Chuck Lever
0aa44595d6 RDMA/core: Fix a couple of obvious typos in comments
Fix typos.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/169643338101.8035.6826446669479247727.stgit@manet.1015granger.net
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-04 21:55:44 +03:00
Leon Romanovsky
16419098e8 IPsec packet offload support in multiport RoCE devices
This series from Patrisious extends mlx5 to support IPsec packet offload
in multiport devices (MPV, see [1] for more details).

These devices have single flow steering logic and two netdev interfaces,
which require extra logic to manage IPsec configurations as they performed
on netdevs.

Thanks

[1] https://lore.kernel.org/linux-rdma/20180104152544.28919-1-leon@kernel.org/

Link: https://lore.kernel.org/all/20231002083832.19746-1-leon@kernel.org
Signed-of-by: Leon Romanovsky <leon@kernel.org>

* mlx5-next: (576 commits)
  net/mlx5: Handle IPsec steering upon master unbind/bind
  net/mlx5: Configure IPsec steering for ingress RoCEv2 MPV traffic
  net/mlx5: Configure IPsec steering for egress RoCEv2 MPV traffic
  net/mlx5: Add create alias flow table function to ipsec roce
  net/mlx5: Implement alias object allow and create functions
  net/mlx5: Add alias flow table bits
  net/mlx5: Store devcom pointer inside IPsec RoCE
  net/mlx5: Register mlx5e priv to devcom in MPV mode
  RDMA/mlx5: Send events from IB driver about device affiliation state
  net/mlx5: Introduce ifc bits for migration in a chunk mode
  Linux 6.6-rc3
  ...
2023-10-04 21:21:49 +03:00
Kees Cook
964168970c IB/hfi1: Annotate struct tid_rb_node with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct tid_rb_node.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230929180431.3005464-7-keescook@chromium.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02 14:44:54 +03:00
Kees Cook
2aba54a9e0 IB/mthca: Annotate struct mthca_icm_table with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct mthca_icm_table.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230929180431.3005464-6-keescook@chromium.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02 14:44:54 +03:00
Kees Cook
bd8eec5bfa IB/srp: Annotate struct srp_fr_pool with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct srp_fr_pool.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230929180431.3005464-5-keescook@chromium.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02 14:44:54 +03:00
Kees Cook
0bc018b7a7 RDMA/siw: Annotate struct siw_pbl with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct siw_pbl.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Bernard Metzler <bmt@zurich.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230929180431.3005464-4-keescook@chromium.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02 14:44:54 +03:00
Kees Cook
ed7c64de62 RDMA/usnic: Annotate struct usnic_uiom_chunk with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct usnic_uiom_chunk.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Christian Benvenuti <benve@cisco.com>
Cc: Nelson Escobar <neescoba@cisco.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230929180431.3005464-3-keescook@chromium.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02 14:44:54 +03:00
Kees Cook
fc424078f5 RDMA/core: Annotate struct ib_pkey_cache with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct ib_pkey_cache.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: "Håkon Bugge" <haakon.bugge@oracle.com>
Cc: Avihai Horon <avihaih@nvidia.com>
Cc: Anand Khoje <anand.a.khoje@oracle.com>
Cc: Mark Bloch <mbloch@nvidia.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230929180431.3005464-2-keescook@chromium.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02 14:44:54 +03:00
Leon Romanovsky
c99a7457e5 RDMA/mlx5: Remove not-used cache disable flag
During execution of mlx5_mkey_cache_cleanup(), there is a guarantee
that MR are not registered and/or destroyed. It means that we don't
need newly introduced cache disable flag.

Fixes: 374012b004 ("RDMA/mlx5: Fix mkey cache possible deadlock on cleanup")
Link: https://lore.kernel.org/r/c7e9c9f98c8ae4a7413d97d9349b29f5b0a23dbe.1695921626.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-10-02 14:32:44 +03:00
Mark Zhang
e0fe97efdb RDMA/cma: Initialize ib_sa_multicast structure to 0 when join
Initialize the structure to 0 so that it's fields won't have random
values. For example fields like rec.traffic_class (as well as
rec.flow_label and rec.sl) is used to generate the user AH through:
  cma_iboe_join_multicast
    cma_make_mc_event
      ib_init_ah_from_mcmember

And a random traffic_class causes a random IP DSCP in RoCEv2.

Fixes: b5de0c60cc ("RDMA/cma: Fix use after free race in roce multicast join")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/20230927090511.603595-1-markzhang@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02 13:10:40 +03:00
Yangyang Li
c9813b0b99 RDMA/hns: Support SRQ record doorbell
Compared with normal doorbell, using record doorbell can shorten the
process of ringing the doorbell and reduce the latency.

Add a flag HNS_ROCE_CAP_FLAG_SRQ_RECORD_DB to allow FW to
enable/disable SRQ record doorbell.

If the flag above is set, allocate the dma buffer for SRQ record
doorbell and write the buffer address into SRQC during SRQ creation.

For userspace SRQ, add a flag HNS_ROCE_RSP_SRQ_CAP_RECORD_DB to notify
userspace whether the SRQ record doorbell is enabled.

Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20230926130026.583088-1-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02 11:47:08 +03:00
Patrisious Haddad
0d293714ac RDMA/mlx5: Send events from IB driver about device affiliation state
Send blocking events from IB driver whenever the device is done being
affiliated or if it is removed from an affiliation.

This is useful since now the EN driver can register to those event and
know when a device is affiliated or not.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Link: https://lore.kernel.org/r/a7491c3e483cfd8d962f5f75b9a25f253043384a.1695296682.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02 11:20:59 +03:00
Patrisious Haddad
8dc0fd2f56 RDMA/ipoib: Add support for XDR speed in ethtool
The IBTA specification 1.7 has new speed - XDR, supporting signaling
rate of 200Gb.

Ethtool support of IPoIB driver translates IB speed to signaling rate.
Added translation of XDR IB type to rate of 200Gb Ethernet speed.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/ca252b79b7114af967de3d65f9a38992d4d87a14.1695204156.git.leon@kernel.org
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-26 12:38:57 +03:00
Patrisious Haddad
4f4db19089 IB/mlx5: Adjust mlx5 rate mapping to support 800Gb
Adjust mlx5 function which maps the speed rate from IB spec values
to internal driver values to be able to handle speeds up to 800Gb.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/301c803d8486b0df8aefad3fb3cc10dc58671985.1695204156.git.leon@kernel.org
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-26 12:38:54 +03:00
Patrisious Haddad
b28ad32442 IB/mlx5: Rename 400G_8X speed to comply to naming convention
Rename 400G_8X speed to comply to naming convention.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/ac98447cac8379a43fbdb36d56e5fb2b741a97ff.1695204156.git.leon@kernel.org
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-26 12:38:50 +03:00
Patrisious Haddad
948f0bf5ad IB/mlx5: Add support for 800G_8X lane speed
Add a check for 800G_8X speed when querying PTYS and report it back
correctly when needed.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/26fd0b6e1fac071c3eb779657bb3d8ba47f47c4f.1695204156.git.leon@kernel.org
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-26 12:38:46 +03:00
Or Har-Toov
561b4a3ac6 IB/mlx5: Expose XDR speed through MAD
Under MAD query port, Report NDR speed when NDR is supported in the port
capability mask.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/d30bdec2a66a8a2edd1d84ee61453c58cf346b43.1695204156.git.leon@kernel.org
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-26 12:38:43 +03:00
Or Har-Toov
703289ce43 IB/core: Add support for XDR link speed
Add new IBTA speed XDR, the new rate that was added to Infiniband spec
as part of XDR and supporting signaling rate of 200Gb.

In order to report that value to rdma-core, add new u32 field to
query_port response.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/9d235fc600a999e8274010f0e18b40fa60540e6c.1695204156.git.leon@kernel.org
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-26 12:38:39 +03:00
Shay Drory
57e7071683 RDMA/mlx5: Implement mkeys management via LIFO queue
Currently, mkeys are managed via xarray. This implementation leads to
a degradation in cases many MRs are unregistered in parallel, due to xarray
internal implementation, for example: deregistration 1M MRs via 64 threads
is taking ~15% more time[1].

Hence, implement mkeys management via LIFO queue, which solved the
degradation.

[1]
2.8us in kernel v5.19 compare to 3.2us in kernel v6.4

Signed-off-by: Shay Drory <shayd@nvidia.com>
Link: https://lore.kernel.org/r/fde3d4cfab0f32f0ccb231cd113298256e1502c5.1695283384.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-26 12:36:18 +03:00
Shay Drory
374012b004 RDMA/mlx5: Fix mkey cache possible deadlock on cleanup
Fix the deadlock by refactoring the MR cache cleanup flow to flush the
workqueue without holding the rb_lock.
This adds a race between cache cleanup and creation of new entries which
we solve by denied creation of new entries after cache cleanup started.

Lockdep:
WARNING: possible circular locking dependency detected
 [ 2785.326074 ] 6.2.0-rc6_for_upstream_debug_2023_01_31_14_02 #1 Not tainted
 [ 2785.339778 ] ------------------------------------------------------
 [ 2785.340848 ] devlink/53872 is trying to acquire lock:
 [ 2785.341701 ] ffff888124f8c0c8 ((work_completion)(&(&ent->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0xc8/0x900
 [ 2785.343403 ]
 [ 2785.343403 ] but task is already holding lock:
 [ 2785.344464 ] ffff88817e8f1260 (&dev->cache.rb_lock){+.+.}-{3:3}, at: mlx5_mkey_cache_cleanup+0x77/0x250 [mlx5_ib]
 [ 2785.346273 ]
 [ 2785.346273 ] which lock already depends on the new lock.
 [ 2785.346273 ]
 [ 2785.347720 ]
 [ 2785.347720 ] the existing dependency chain (in reverse order) is:
 [ 2785.349003 ]
 [ 2785.349003 ] -> #1 (&dev->cache.rb_lock){+.+.}-{3:3}:
 [ 2785.350160 ]        __mutex_lock+0x14c/0x15c0
 [ 2785.350962 ]        delayed_cache_work_func+0x2d1/0x610 [mlx5_ib]
 [ 2785.352044 ]        process_one_work+0x7c2/0x1310
 [ 2785.352879 ]        worker_thread+0x59d/0xec0
 [ 2785.353636 ]        kthread+0x28f/0x330
 [ 2785.354370 ]        ret_from_fork+0x1f/0x30
 [ 2785.355135 ]
 [ 2785.355135 ] -> #0 ((work_completion)(&(&ent->dwork)->work)){+.+.}-{0:0}:
 [ 2785.356515 ]        __lock_acquire+0x2d8a/0x5fe0
 [ 2785.357349 ]        lock_acquire+0x1c1/0x540
 [ 2785.358121 ]        __flush_work+0xe8/0x900
 [ 2785.358852 ]        __cancel_work_timer+0x2c7/0x3f0
 [ 2785.359711 ]        mlx5_mkey_cache_cleanup+0xfb/0x250 [mlx5_ib]
 [ 2785.360781 ]        mlx5_ib_stage_pre_ib_reg_umr_cleanup+0x16/0x30 [mlx5_ib]
 [ 2785.361969 ]        __mlx5_ib_remove+0x68/0x120 [mlx5_ib]
 [ 2785.362960 ]        mlx5r_remove+0x63/0x80 [mlx5_ib]
 [ 2785.363870 ]        auxiliary_bus_remove+0x52/0x70
 [ 2785.364715 ]        device_release_driver_internal+0x3c1/0x600
 [ 2785.365695 ]        bus_remove_device+0x2a5/0x560
 [ 2785.366525 ]        device_del+0x492/0xb80
 [ 2785.367276 ]        mlx5_detach_device+0x1a9/0x360 [mlx5_core]
 [ 2785.368615 ]        mlx5_unload_one_devl_locked+0x5a/0x110 [mlx5_core]
 [ 2785.369934 ]        mlx5_devlink_reload_down+0x292/0x580 [mlx5_core]
 [ 2785.371292 ]        devlink_reload+0x439/0x590
 [ 2785.372075 ]        devlink_nl_cmd_reload+0xaef/0xff0
 [ 2785.372973 ]        genl_family_rcv_msg_doit.isra.0+0x1bd/0x290
 [ 2785.374011 ]        genl_rcv_msg+0x3ca/0x6c0
 [ 2785.374798 ]        netlink_rcv_skb+0x12c/0x360
 [ 2785.375612 ]        genl_rcv+0x24/0x40
 [ 2785.376295 ]        netlink_unicast+0x438/0x710
 [ 2785.377121 ]        netlink_sendmsg+0x7a1/0xca0
 [ 2785.377926 ]        sock_sendmsg+0xc5/0x190
 [ 2785.378668 ]        __sys_sendto+0x1bc/0x290
 [ 2785.379440 ]        __x64_sys_sendto+0xdc/0x1b0
 [ 2785.380255 ]        do_syscall_64+0x3d/0x90
 [ 2785.381031 ]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
 [ 2785.381967 ]
 [ 2785.381967 ] other info that might help us debug this:
 [ 2785.381967 ]
 [ 2785.383448 ]  Possible unsafe locking scenario:
 [ 2785.383448 ]
 [ 2785.384544 ]        CPU0                    CPU1
 [ 2785.385383 ]        ----                    ----
 [ 2785.386193 ]   lock(&dev->cache.rb_lock);
 [ 2785.386940 ]				lock((work_completion)(&(&ent->dwork)->work));
 [ 2785.388327 ]				lock(&dev->cache.rb_lock);
 [ 2785.389425 ]   lock((work_completion)(&(&ent->dwork)->work));
 [ 2785.390414 ]
 [ 2785.390414 ]  *** DEADLOCK ***
 [ 2785.390414 ]
 [ 2785.391579 ] 6 locks held by devlink/53872:
 [ 2785.392341 ]  #0: ffffffff84c17a50 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40
 [ 2785.393630 ]  #1: ffff888142280218 (&devlink->lock_key){+.+.}-{3:3}, at: devlink_get_from_attrs_lock+0x12d/0x2d0
 [ 2785.395324 ]  #2: ffff8881422d3c38 (&dev->lock_key){+.+.}-{3:3}, at: mlx5_unload_one_devl_locked+0x4a/0x110 [mlx5_core]
 [ 2785.397322 ]  #3: ffffffffa0e59068 (mlx5_intf_mutex){+.+.}-{3:3}, at: mlx5_detach_device+0x60/0x360 [mlx5_core]
 [ 2785.399231 ]  #4: ffff88810e3cb0e8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x8d/0x600
 [ 2785.400864 ]  #5: ffff88817e8f1260 (&dev->cache.rb_lock){+.+.}-{3:3}, at: mlx5_mkey_cache_cleanup+0x77/0x250 [mlx5_ib]

Fixes: b958451783 ("RDMA/mlx5: Change the cache structure to an RB-tree")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-09-26 12:33:53 +03:00
Shay Drory
dab994bcc6 RDMA/mlx5: Fix NULL string error
checkpath is complaining about NULL string, change it to 'Unknown'.

Fixes: 37aa5c36aa ("IB/mlx5: Add UARs write-combining and non-cached mapping")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Link: https://lore.kernel.org/r/8638e5c14fadbde5fa9961874feae917073af920.1695203958.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-26 12:29:44 +03:00
Hamdan Igbaria
2fad8f06a5 RDMA/mlx5: Fix mutex unlocking on error flow for steering anchor creation
The mutex was not unlocked on some of the error flows.
Moved the unlock location to include all the error flow scenarios.

Fixes: e1f4a52ac1 ("RDMA/mlx5: Create an indirect flow table for steering anchor")
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com>
Link: https://lore.kernel.org/r/1244a69d783da997c0af0b827c622eb00495492e.1695203958.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-26 12:29:40 +03:00
Michael Guralnik
4f14c6c021 RDMA/mlx5: Fix assigning access flags to cache mkeys
After the change to use dynamic cache structure, new cache entries
can be added and the mkey allocation can no longer assume that all
mkeys created for the cache have access_flags equal to zero.

Example of a flow that exposes the issue:
A user registers MR with RO on a HCA that cannot UMR RO and the mkey is
created outside of the cache. When the user deregisters the MR, a new
cache entry is created to store mkeys with RO.

Later, the user registers 2 MRs with RO. The first MR is reused from the
new cache entry. When we try to get the second mkey from the cache we see
the entry is empty so we go to the MR cache mkey allocation flow which
would have allocated a mkey with no access flags, resulting the user getting
a MR without RO.

Fixes: dd1b913fb0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow")
Reviewed-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://lore.kernel.org/r/8a802700b82def3ace3f77cd7a9ad9d734af87e7.1695203958.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-26 12:29:33 +03:00
David Howells
7ebc540b35
infiniband: Use user_backed_iter() to see if iterator is UBUF/IOVEC
Use user_backed_iter() to see if iterator is UBUF/IOVEC rather than poking
inside the iterator.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20230925120309.1731676-5-dhowells@redhat.com
cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
cc: Jason Gunthorpe <jgg@ziepe.ca>
cc: Leon Romanovsky <leon@kernel.org>
cc: linux-rdma@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-25 14:30:28 +02:00
Christophe JAILLET
d7f393430a IB/mlx4: Fix the size of a buffer in add_port_entries()
In order to be sure that 'buff' is never truncated, its size should be
12, not 11.

When building with W=1, this fixes the following warnings:

  drivers/infiniband/hw/mlx4/sysfs.c: In function ‘add_port_entries’:
  drivers/infiniband/hw/mlx4/sysfs.c:268:34: error: ‘sprintf’ may write a terminating nul past the end of the destination [-Werror=format-overflow=]
    268 |                 sprintf(buff, "%d", i);
        |                                  ^
  drivers/infiniband/hw/mlx4/sysfs.c:268:17: note: ‘sprintf’ output between 2 and 12 bytes into a destination of size 11
    268 |                 sprintf(buff, "%d", i);
        |                 ^~~~~~~~~~~~~~~~~~~~~~
  drivers/infiniband/hw/mlx4/sysfs.c:286:34: error: ‘sprintf’ may write a terminating nul past the end of the destination [-Werror=format-overflow=]
    286 |                 sprintf(buff, "%d", i);
        |                                  ^
  drivers/infiniband/hw/mlx4/sysfs.c:286:17: note: ‘sprintf’ output between 2 and 12 bytes into a destination of size 11
    286 |                 sprintf(buff, "%d", i);
        |                 ^~~~~~~~~~~~~~~~~~~~~~

Fixes: c1e7e46612 ("IB/mlx4: Add iov directory in sysfs under the ib device")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/0bb1443eb47308bc9be30232cc23004c4d4cf43e.1695448530.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-23 21:53:24 +03:00
Justin Stitt
cb7ab7854b IB/qib: Replace deprecated strncpy
`strncpy` is deprecated for use on NUL-terminated destination strings [1]
and as such we should prefer more robust and less ambiguous string
interfaces.

We know `txselect_list` is expected to be NUL-terminated based on its
use in `param_get_string()`:
| int param_get_string(char *buffer, const struct kernel_param *kp)
| {
| 	const struct kparam_string *kps = kp->str;
| 	return scnprintf(buffer, PAGE_SIZE, "%s\n", kps->string);
| }

Note that `txselect_list` is assigned to `kp_txselect`'s string field:
| static struct kparam_string kp_txselect = {
| 	.string = txselect_list,
| 	.maxlen = MAX_ATTEN_LEN
| };

Wherein it is then assigned the set and get methods:
| module_param_call(txselect, setup_txselect, param_get_string,
| 		  &kp_txselect, S_IWUSR | S_IRUGO);

Considering the above, a suitable replacement is `strscpy` [2] due to
the fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2]
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20230921-strncpy-drivers-infiniband-hw-qib-qib_iba7322-c-v1-1-373727763f5b@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-22 13:29:19 +03:00
Justin Stitt
c2d0c5b28a IB/hfi1: Replace deprecated strncpy
`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

We see that `buf` is expected to be NUL-terminated based on it's use
within a trace event wherein `is_misc_err_name` and `is_various_name`
map to `is_name` through `is_table`:
| TRACE_EVENT(hfi1_interrupt,
| 	    TP_PROTO(struct hfi1_devdata *dd, const struct is_table *is_entry,
| 		     int src),
| 	    TP_ARGS(dd, is_entry, src),
| 	    TP_STRUCT__entry(DD_DEV_ENTRY(dd)
| 			     __array(char, buf, 64)
| 			     __field(int, src)
| 			     ),
| 	    TP_fast_assign(DD_DEV_ASSIGN(dd);
| 			   is_entry->is_name(__entry->buf, 64,
| 					     src - is_entry->start);
| 			   __entry->src = src;
| 			   ),
| 	    TP_printk("[%s] source: %s [%d]", __get_str(dev), __entry->buf,
| 		      __entry->src)
| );

Considering the above, a suitable replacement is `strscpy_pad` due to
the fact that it guarantees NUL-termination on the destination buffer
while maintaining the NUL-padding behavior that strncpy provides.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20230921-strncpy-drivers-infiniband-hw-hfi1-chip-c-v1-1-37afcf4964d9@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-22 13:28:54 +03:00
Justin Stitt
f0cc82ca11 RDMA/irdma: Replace deprecated strncpy
`strncpy` is deprecated for use on NUL-terminated destination strings [1]
and as such we should prefer more robust and less ambiguous string
interfaces.

A suitable replacement is `strscpy_pad` due to the fact that it
guarantees NUL-termination on the destination buffer.

It is unclear to me whether `i40iw_client.name` requires NUL-padding but
have opted to keep the NUL-padding behavior that strncpy provides to
ensure no functional change.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20230921-strncpy-drivers-infiniband-hw-irdma-i40iw_if-c-v1-1-22d87aef7186@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-22 13:27:27 +03:00
Selvin Xavier
a83c692789 RDMA/bnxt_re: Decrement resource stats correctly
rc_qp_count and ud_qp_count is not decremented during qp destroy.
Fix this.

Fixes: cb95709e0d ("bnxt_re: Update the hw counters for resource stats")
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1695199280-13520-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-21 11:56:23 +03:00
Selvin Xavier
9fc5f9a92f RDMA/bnxt_re: Fix the handling of control path response data
Flag that indicate control path command completion should be cleared
only after copying the command response data. As soon as the is_in_used
flag is clear, the waiting thread can proceed with wrong response
data.  This wrong data is causing multiple issues like wrong lkey
used in data traffic and wrong AH Id etc.

Use a memory barrier to ensure that the response data
is copied and visible to the process waiting on a different
cpu core before clearing the is_in_used flag.

Clear the is_in_used after copying the command response.

Fixes: bcfee4ce3e ("RDMA/bnxt_re: remove redundant cmdq_bitmap")
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1695199280-13520-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-21 11:56:23 +03:00
wenglianfa
58c49c097f RDMA/hns: Support SRQ restrack ops for hns driver
The SRQ restrack attributes come from the context maintained by ROCEE.

Example:
$ rdma res show srq -jp -dd
[ {
        "ifindex": 0,
        "ifname": "hns_0",
        "srqn": 0,
        "type": "BASIC",
        "lqpn": [ "14-15","22-23" ],
        "pdn": 2,
        "pid": 1224,
        "comm": "ib_send_bw",{
            "drv_srqn": 0,
            "drv_wqe_cnt": 512,
            "drv_max_gs": 2,
            "drv_xrcdn": 0
        }
    } ]

$ rdma res show srq link hns_0 -jpr
[ {
        "ifindex": 0,
        "ifname": "hns_0",
        "data": [ 149,0,0,0,0,0,0,0,0,0,0,0,119,101,120,99,0,
		  46,62,31,0,0,0,0,3,0,0,1,0,58,62,31,0,0,0,0,
		  30,159,15,0,0,0,64,5,0,0,0,0,0,0,0,0,0,0,0,
		  9,0,0,0,0,0,0,0,0 ]
    } ]

Signed-off-by: wenglianfa <wenglianfa@huawei.com>
Link: https://lore.kernel.org/r/20230918131110.3987498-4-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-20 10:51:45 +03:00
wenglianfa
aebf8145e1 RDMA/core: Add support to dump SRQ resource in RAW format
Add support to dump SRQ resource in raw format. It enable drivers to
return the entire device specific SRQ context without setting each
field separately.

Example:
$ rdma res show srq -r
dev hns3 149000...

$ rdma res show srq -j -r
[{"ifindex":0,"ifname":"hns3","data":[149,0,0,...]}]

Signed-off-by: wenglianfa <wenglianfa@huawei.com>
Link: https://lore.kernel.org/r/20230918131110.3987498-3-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-20 10:50:54 +03:00
wenglianfa
0e32d7d43b RDMA/core: Add dedicated SRQ resource tracker function
Add a dedicated callback function for SRQ resource tracker.

Signed-off-by: wenglianfa <wenglianfa@huawei.com>
Link: https://lore.kernel.org/r/20230918131110.3987498-2-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-20 10:50:54 +03:00
Ilpo Järvinen
8bf7187d97 RDMA/hfi1: Use FIELD_GET() to extract Link Width
Use FIELD_GET() to extract PCIe Negotiated Link Width field instead of
custom masking and shifting, and remove extract_width() which only
wraps that FIELD_GET().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20230919125648.1920-2-ilpo.jarvinen@linux.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dean Luick <dean.luick@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-20 10:30:38 +03:00
Zhu Yanjun
c5930a1aa0 RDMA/rtrs: Fix the problem of variable not initialized fully
No functionality change. The variable which is not initialized fully
will introduce potential risks.

Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://lore.kernel.org/r/20230919020806.534183-1-yanjun.zhu@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-19 12:30:33 +03:00
Zhu Yanjun
20a02837fb RDMA/rtrs: Require holding rcu_read_lock explicitly
No functional change. The function get_next_path_rr needs to hold
rcu_read_lock. As such, if no rcu read lock, warnings will pop out.

Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://lore.kernel.org/r/20230919073727.540207-1-yanjun.zhu@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-19 11:23:17 +03:00
Gustavo A. R. Silva
81760bedc6 RDMA/core: Use size_{add,sub,mul}() in calls to struct_size()
If, for any reason, the open-coded arithmetic causes a wraparound,
the protection that `struct_size()` provides against potential integer
overflows is defeated. Fix this by hardening calls to `struct_size()`
with `size_add()`, `size_sub()` and `size_mul()`.

Fixes: 467f432a52 ("RDMA/core: Split port and device counter sysfs attributes")
Fixes: a4676388e2 ("RDMA/core: Simplify how the gid_attrs sysfs is created")
Fixes: e9dd5daf88 ("IB/umad: Refactor code to use cdev_device_add()")
Fixes: 324e227ea7 ("RDMA/device: Add ib_device_get_by_netdev()")
Fixes: 5aad26a7ea ("IB/core: Use struct_size() in kzalloc()")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/ZQdt4NsJFwwOYxUR@work
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-19 10:33:45 +03:00
David Ahern
4ececeb839 IB/hfi1: Remove open coded reference to skb frag offset
frag offset should be obtained from skb_frag_off; update the code.

Signed-off-by: David Ahern <dsahern@kernel.org>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/20230911215753.12325-1-dsahern@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-18 14:24:15 +03:00
Leon Romanovsky
18126c7676 RDMA/cma: Fix truncation compilation warning in make_cma_ports
The following compilation error is false alarm as RDMA devices don't
have such large amount of ports to actually cause to format truncation.

drivers/infiniband/core/cma_configfs.c: In function ‘make_cma_ports’:
drivers/infiniband/core/cma_configfs.c:223:57: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=]
  223 |                 snprintf(port_str, sizeof(port_str), "%u", i + 1);
      |                                                         ^
drivers/infiniband/core/cma_configfs.c:223:17: note: ‘snprintf’ output between 2 and 11 bytes into a destination of size 10
  223 |                 snprintf(port_str, sizeof(port_str), "%u", i + 1);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[5]: *** [scripts/Makefile.build:243: drivers/infiniband/core/cma_configfs.o] Error 1

Fixes: 045959db65 ("IB/cma: Add configfs for rdma_cm")
Link: https://lore.kernel.org/r/a7e3b347ee134167fa6a3787c56ef231a04bc8c2.1694434639.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-09-18 11:52:10 +03:00
Cheng Xu
b2abdffb50 RDMA/erdma: Fix NULL pointer access in regmr_cmd
Fix the crash of regmr_cmd called by erdma_ib_alloc_mr. The reason is
that mr->mem.mtt is not initialized but it is accessed in regmr_cmd.

The call trace information:

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 <...>
 RIP: 0010:regmr_cmd+0x170/0x1c0 [erdma]
 <...>
Call Trace:
 ? __die+0x20/0x70
 ? page_fault_oops+0x66/0x150
 ? do_user_addr_fault+0x61/0x660
 ? exc_page_fault+0x65/0x140
 ? asm_exc_page_fault+0x22/0x30
 ? regmr_cmd+0x170/0x1c0 [erdma]
 ? preempt_count_add+0x70/0xa0
 ? _raw_spin_lock_irqsave+0x19/0x50
 ? _raw_spin_unlock_irqrestore+0x1b/0x40
 ? erdma_alloc_idx+0x51/0x90 [erdma]
 erdma_get_dma_mr+0xa3/0x120 [erdma]
 __ib_alloc_pd+0xeb/0x1c0 [ib_core]

Fixes: 7244b4aa42 ("RDMA/erdma: Refactor the storage structure of MTT entries")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/3d140c1d-524a-4dbe-a51c-aee4f7ecafdb@moroto.mountain/
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230908060559.80203-1-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-18 10:42:19 +03:00
Dan Carpenter
6b5f0749ce RDMA/erdma: Fix error code in erdma_create_scatter_mtt()
The erdma_create_scatter_mtt() function is supposed to return error
pointers.  Returning NULL will lead to an Oops.

Fixes: ed10435d35 ("RDMA/erdma: Implement hierarchical MTT")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/1eb400d5-d8a3-4a8e-b3da-c43c6c377f86@moroto.mountain
Acked-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-18 09:12:00 +03:00
Konstantin Meskhidze
c489800e0d RDMA/uverbs: Fix typo of sizeof argument
Since size of 'hdr' pointer and '*hdr' structure is equal on 64-bit
machines issue probably didn't cause any wrong behavior. But anyway,
fixing of typo is required.

Fixes: da0f60df7b ("RDMA/uverbs: Prohibit write() calls with too small buffers")
Co-developed-by: Ivanov Mikhail <ivanov.mikhail1@huawei-partners.com>
Signed-off-by: Ivanov Mikhail <ivanov.mikhail1@huawei-partners.com>
Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
Link: https://lore.kernel.org/r/20230905103258.1738246-1-konstantin.meskhidze@huawei.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-11 15:01:09 +03:00
Artem Chernyshev
8fb8a82086 RDMA/cxgb4: Check skb value for failure to allocate
get_skb() can fail to allocate skb, so check it.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 5be78ee924 ("RDMA/cxgb4: Fix LE hash collision bug for active open connection")
Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru>
Link: https://lore.kernel.org/r/20230905124048.284165-1-artem.chernyshev@red-soft.ru
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-11 14:55:53 +03:00
Bernard Metzler
53a3f77704 RDMA/siw: Fix connection failure handling
In case immediate MPA request processing fails, the newly
created endpoint unlinks the listening endpoint and is
ready to be dropped. This special case was not handled
correctly by the code handling the later TCP socket close,
causing a NULL dereference crash in siw_cm_work_handler()
when dereferencing a NULL listener. We now also cancel
the useless MPA timeout, if immediate MPA request
processing fails.

This patch furthermore simplifies MPA processing in general:
Scheduling a useless TCP socket read in sk_data_ready() upcall
is now surpressed, if the socket is already moved out of
TCP_ESTABLISHED state.

Fixes: 6c52fdc244 ("rdma/siw: connection management")
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20230905145822.446263-1-bmt@zurich.ibm.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-11 14:53:23 +03:00
Bart Van Assche
e193b7955d RDMA/srp: Do not call scsi_done() from srp_abort()
After scmd_eh_abort_handler() has called the SCSI LLD eh_abort_handler
callback, it performs one of the following actions:
* Call scsi_queue_insert().
* Call scsi_finish_command().
* Call scsi_eh_scmd_add().
Hence, SCSI abort handlers must not call scsi_done(). Otherwise all
the above actions would trigger a use-after-free. Hence remove the
scsi_done() call from srp_abort(). Keep the srp_free_req() call
before returning SUCCESS because we may not see the command again if
SUCCESS is returned.

Cc: Bob Pearson <rpearsonhpe@gmail.com>
Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: d853667091 ("IB/srp: Avoid having aborted requests hang")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230823205727.505681-1-bvanassche@acm.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-11 14:15:58 +03:00
Rohit Chavan
e57b0eef66 RDMA/core: Fix repeated words in comments
Corrected the repeated words in the documentation of
rdma_replace_ah_attr() and ib_resolve_unicast_gid_dmac()
functions.

Signed-off-by: Rohit Chavan <roheetchavan@gmail.com>
Link: https://lore.kernel.org/r/20230824081304.408-1-roheetchavan@gmail.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-11 14:14:59 +03:00
Krzysztof Kozlowski
3ec648c631 IB: Use capital "OR" for multiple licenses in SPDX
Documentation/process/license-rules.rst and checkpatch expect the SPDX
identifier syntax for multiple licenses to use capital "OR".  Correct it
to keep consistent format and avoid copy-paste issues.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230823092912.122674-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-11 14:14:00 +03:00
Linus Torvalds
b89b029377 SCSI misc on 20230902
Updates to the usual drivers (ufs, lpfc, qla2xxx, mpi3mr, libsas) and
 the usual minor updates and bug fixes but no significant core changes.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZPLlcCYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishXH4AP9Mm8Hk
 ICCySnXJ/VcxrAU2ekUlPxGT8CNT2WqRhIqKegEAu0jagPyNcTjX+oBCNVOM33zU
 p5uU6IEHDA3d36qu04w=
 =8gks
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "Updates to the usual drivers (ufs, lpfc, qla2xxx, mpi3mr, libsas) and
  the usual minor updates and bug fixes but no significant core changes"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (116 commits)
  scsi: storvsc: Handle additional SRB status values
  scsi: libsas: Delete sas_ata_task.retry_count
  scsi: libsas: Delete sas_ata_task.stp_affil_pol
  scsi: libsas: Delete sas_ata_task.set_affil_pol
  scsi: libsas: Delete sas_ssp_task.task_prio
  scsi: libsas: Delete sas_ssp_task.enable_first_burst
  scsi: libsas: Delete sas_ssp_task.retry_count
  scsi: libsas: Delete struct scsi_core
  scsi: libsas: Delete enum sas_phy_type
  scsi: libsas: Delete enum sas_class
  scsi: libsas: Delete sas_ha_struct.lldd_module
  scsi: target: Fix write perf due to unneeded throttling
  scsi: lpfc: Do not abuse UUID APIs and LPFC_COMPRESS_VMID_SIZE
  scsi: pm8001: Remove unused declarations
  scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock
  scsi: elx: sli4: Remove code duplication
  scsi: bfa: Replace one-element array with flexible-array member in struct fc_rscn_pl_s
  scsi: qla2xxx: Remove unused declarations
  scsi: pmcraid: Use pci_dev_id() to simplify the code
  scsi: pm80xx: Set RETFIS when requested by libsas
  ...
2023-09-02 12:02:41 -07:00
Linus Torvalds
f7e97ce269 v6.6 merge window RDMA pull request
Many small changes across the subystem, some highlights:
 
 - Usual driver cleanups in qedr, siw, erdma, hfi1, mlx4/5, irdma, mthca,
   hns, and bnxt_re
 
 - siw now works over tunnel and other netdevs with a MAC address by
   removing assumptions about a MAC/GID from the connection manager
 
 - "Doorbell Pacing" for bnxt_re - this is a best effort scheme to allow
   userspace to slow down the doorbell rings if the HW gets full
 
 - irdma egress VLAN priority, better QP/WQ sizing
 
 - rxe bug fixes in queue draining and srq resizing
 
 - Support more ethernet speed options in the core layer
 
 - DMABUF support for bnxt_re
 
 - Multi-stage MTT support for erdma to allow much bigger MR registrations
 
 - A irdma fix with a CVE that came in too late to go to -rc, missing
   bounds checking for 0 length MRs
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZPEqkAAKCRCFwuHvBreF
 YZrNAPoCBfU+VjCKNr2yqF7s52os5ZdBV7Uuh4txHcXWW9H7GAD/f19i2u62fzNu
 C27jj4cztemMBb8mgwyxPw/wLg7NLwY=
 =pC6k
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Many small changes across the subystem, some highlights:

   - Usual driver cleanups in qedr, siw, erdma, hfi1, mlx4/5, irdma,
     mthca, hns, and bnxt_re

   - siw now works over tunnel and other netdevs with a MAC address by
     removing assumptions about a MAC/GID from the connection manager

   - "Doorbell Pacing" for bnxt_re - this is a best effort scheme to
     allow userspace to slow down the doorbell rings if the HW gets full

   - irdma egress VLAN priority, better QP/WQ sizing

   - rxe bug fixes in queue draining and srq resizing

   - Support more ethernet speed options in the core layer

   - DMABUF support for bnxt_re

   - Multi-stage MTT support for erdma to allow much bigger MR
     registrations

   - A irdma fix with a CVE that came in too late to go to -rc, missing
     bounds checking for 0 length MRs"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (87 commits)
  IB/hfi1: Reduce printing of errors during driver shut down
  RDMA/hfi1: Move user SDMA system memory pinning code to its own file
  RDMA/hfi1: Use list_for_each_entry() helper
  RDMA/mlx5: Fix trailing */ formatting in block comment
  RDMA/rxe: Fix redundant break statement in switch-case.
  RDMA/efa: Fix wrong resources deallocation order
  RDMA/siw: Call llist_reverse_order in siw_run_sq
  RDMA/siw: Correct wrong debug message
  RDMA/siw: Balance the reference of cep->kref in the error path
  Revert "IB/isert: Fix incorrect release of isert connection"
  RDMA/bnxt_re: Fix kernel doc errors
  RDMA/irdma: Prevent zero-length STAG registration
  RDMA/erdma: Implement hierarchical MTT
  RDMA/erdma: Refactor the storage structure of MTT entries
  RDMA/erdma: Renaming variable names and field names of struct erdma_mem
  RDMA/hns: Support hns HW stats
  RDMA/hns: Dump whole QP/CQ/MR resource in raw
  RDMA/irdma: Add missing kernel-doc in irdma_setup_umode_qp()
  RDMA/mlx4: Copy union directly
  RDMA/irdma: Drop unused kernel push code
  ...
2023-09-01 16:49:33 -07:00
Linus Torvalds
bd6c11bc43 Networking changes for 6.6.
Core
 ----
 
  - Increase size limits for to-be-sent skb frag allocations. This
    allows tun, tap devices and packet sockets to better cope with large
    writes operations.
 
  - Store netdevs in an xarray, to simplify iterating over netdevs.
 
  - Refactor nexthop selection for multipath routes.
 
  - Improve sched class lifetime handling.
 
  - Add backup nexthop ID support for bridge.
 
  - Implement drop reasons support in openvswitch.
 
  - Several data races annotations and fixes.
 
  - Constify the sk parameter of routing functions.
 
  - Prepend kernel version to netconsole message.
 
 Protocols
 ---------
 
  - Implement support for TCP probing the peer being under memory
    pressure.
 
  - Remove hard coded limitation on IPv6 specific info placement
    inside the socket struct.
 
  - Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated
    per socket scaling factor.
 
  - Scaling-up the IPv6 expired route GC via a separated list of
    expiring routes.
 
  - In-kernel support for the TLS alert protocol.
 
  - Better support for UDP reuseport with connected sockets.
 
  - Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR
    header size.
 
  - Get rid of additional ancillary per MPTCP connection struct socket.
 
  - Implement support for BPF-based MPTCP packet schedulers.
 
  - Format MPTCP subtests selftests results in TAP.
 
  - Several new SMC 2.1 features including unique experimental options,
    max connections per lgr negotiation, max links per lgr negotiation.
 
 BPF
 ---
 
  - Multi-buffer support in AF_XDP.
 
  - Add multi uprobe BPF links for attaching multiple uprobes
    and usdt probes, which is significantly faster and saves extra fds.
 
  - Implement an fd-based tc BPF attach API (TCX) and BPF link support on
    top of it.
 
  - Add SO_REUSEPORT support for TC bpf_sk_assign.
 
  - Support new instructions from cpu v4 to simplify the generated code and
    feature completeness, for x86, arm64, riscv64.
 
  - Support defragmenting IPv(4|6) packets in BPF.
 
  - Teach verifier actual bounds of bpf_get_smp_processor_id()
    and fix perf+libbpf issue related to custom section handling.
 
  - Introduce bpf map element count and enable it for all program types.
 
  - Add a BPF hook in sys_socket() to change the protocol ID
    from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy.
 
  - Introduce bpf_me_mcache_free_rcu() and fix OOM under stress.
 
  - Add uprobe support for the bpf_get_func_ip helper.
 
  - Check skb ownership against full socket.
 
  - Support for up to 12 arguments in BPF trampoline.
 
  - Extend link_info for kprobe_multi and perf_event links.
 
 Netfilter
 ---------
 
  - Speed-up process exit by aborting ruleset validation if a
    fatal signal is pending.
 
  - Allow NLA_POLICY_MASK to be used with BE16/BE32 types.
 
 Driver API
 ----------
 
  - Page pool optimizations, to improve data locality and cache usage.
 
  - Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the need
    for raw ioctl() handling in drivers.
 
  - Simplify genetlink dump operations (doit/dumpit) providing them
    the common information already populated in struct genl_info.
 
  - Extend and use the yaml devlink specs to [re]generate the split ops.
 
  - Introduce devlink selective dumps, to allow SF filtering SF based on
    handle and other attributes.
 
  - Add yaml netlink spec for netlink-raw families, allow route, link and
    address related queries via the ynl tool.
 
  - Remove phylink legacy mode support.
 
  - Support offload LED blinking to phy.
 
  - Add devlink port function attributes for IPsec.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - Broadcom ASP 2.0 (72165) ethernet controller
    - MediaTek MT7988 SoC
    - Texas Instruments AM654 SoC
    - Texas Instruments IEP driver
    - Atheros qca8081 phy
    - Marvell 88Q2110 phy
    - NXP TJA1120 phy
 
  - WiFi:
    - MediaTek mt7981 support
 
  - Can:
    - Kvaser SmartFusion2 PCI Express devices
    - Allwinner T113 controllers
    - Texas Instruments tcan4552/4553 chips
 
  - Bluetooth:
    - Intel Gale Peak
    - Qualcomm WCN3988 and WCN7850
    - NXP AW693 and IW624
    - Mediatek MT2925
 
 Drivers
 -------
 
  - Ethernet NICs:
    - nVidia/Mellanox:
      - mlx5:
        - support UDP encapsulation in packet offload mode
        - IPsec packet offload support in eswitch mode
        - improve aRFS observability by adding new set of counters
        - extends MACsec offload support to cover RoCE traffic
        - dynamic completion EQs
      - mlx4:
        - convert to use auxiliary bus instead of custom interface logic
    - Intel
      - ice:
        - implement switchdev bridge offload, even for LAG interfaces
        - implement SRIOV support for LAG interfaces
      - igc:
        - add support for multiple in-flight TX timestamps
    - Broadcom:
      - bnxt:
        - use the unified RX page pool buffers for XDP and non-XDP
        - use the NAPI skb allocation cache
    - OcteonTX2:
      - support Round Robin scheduling HTB offload
      - TC flower offload support for SPI field
    - Freescale:
      -  add XDP_TX feature support
    - AMD:
      - ionic: add support for PCI FLR event
      - sfc:
        - basic conntrack offload
        - introduce eth, ipv4 and ipv6 pedit offloads
    - ST Microelectronics:
      - stmmac: maximze PTP timestamping resolution
 
  - Virtual NICs:
    - Microsoft vNIC:
      - batch ringing RX queue doorbell on receiving packets
      - add page pool for RX buffers
    - Virtio vNIC:
      - add per queue interrupt coalescing support
    - Google vNIC:
      - add queue-page-list mode support
 
  - Ethernet high-speed switches:
    - nVidia/Mellanox (mlxsw):
      - add port range matching tc-flower offload
      - permit enslavement to netdevices with uppers
 
  - Ethernet embedded switches:
    - Marvell (mv88e6xxx):
      - convert to phylink_pcs
    - Renesas:
      - r8A779fx: add speed change support
      - rzn1: enables vlan support
 
  - Ethernet PHYs:
    - convert mv88e6xxx to phylink_pcs
 
  - WiFi:
    - Qualcomm Wi-Fi 7 (ath12k):
      - extremely High Throughput (EHT) PHY support
    - RealTek (rtl8xxxu):
      - enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
        RTL8192EU and RTL8723BU
    - RealTek (rtw89):
      - Introduce Time Averaged SAR (TAS) support
 
  - Connector:
    - support for event filtering
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmTt1ZoSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkgFUP/REFaYWdWUvAzmWeezyx9dqgZMfSOjWq
 9QvySiA94OAOcjIYkb7wfzQ5BBAZqaBQ/f8XqWwS1EDDDEBs8sP1cxmABKwW7Hsr
 qFRu2sOqLzKBk223d0jIgEocfQaFpGbF71gXoTlDivBjBi5UxWm9bF0XnbYWcKgO
 /QEvzNosi9uNdi85Fzmv62J6YzAdidEpwGsM7X2CfejwNRmStxAEg/NwvRR0Hyiq
 OJCo97omEgTRaUle8nc64PDx33u4h5kQ1BkaeHEv0rbE3hftFC2YPKn/InmqSFGz
 6ew2xnrGPR37LCuAiCcIIv6yR7K0eu0iYJ7jXwZxBDqxGavEPuwWGBoCP6qFiitH
 ZLWhIrAUrdmSbySkTOCONhJ475qFAuQoYHYpZnX/bJZUHlSsb/9lwDJYJQGpVfd1
 /daqJVSb7lhaifmNO1iNd/ibCIXq9zapwtkRwA897M8GkZBTsnVvazFld1Em+Se3
 Bx6DSDUVBqVQ9fpZG2IAGD6odDwOzC1lF2IoceFvK9Ff6oE0psI+A0qNLMkHxZbW
 Qlo7LsNe53hpoCC+yHTfXX7e/X8eNt0EnCGOQJDusZ0Nr3K7H4LKFA0i8UBUK05n
 4lKnnaSQW7GQgdofLWt103OMDR9GoDxpFsm7b1X9+AEk6Fz6tq50wWYeMZETUKYP
 DCW8VGFOZjZM
 =9CsR
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
 "Core:

   - Increase size limits for to-be-sent skb frag allocations. This
     allows tun, tap devices and packet sockets to better cope with
     large writes operations

   - Store netdevs in an xarray, to simplify iterating over netdevs

   - Refactor nexthop selection for multipath routes

   - Improve sched class lifetime handling

   - Add backup nexthop ID support for bridge

   - Implement drop reasons support in openvswitch

   - Several data races annotations and fixes

   - Constify the sk parameter of routing functions

   - Prepend kernel version to netconsole message

  Protocols:

   - Implement support for TCP probing the peer being under memory
     pressure

   - Remove hard coded limitation on IPv6 specific info placement inside
     the socket struct

   - Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per
     socket scaling factor

   - Scaling-up the IPv6 expired route GC via a separated list of
     expiring routes

   - In-kernel support for the TLS alert protocol

   - Better support for UDP reuseport with connected sockets

   - Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR
     header size

   - Get rid of additional ancillary per MPTCP connection struct socket

   - Implement support for BPF-based MPTCP packet schedulers

   - Format MPTCP subtests selftests results in TAP

   - Several new SMC 2.1 features including unique experimental options,
     max connections per lgr negotiation, max links per lgr negotiation

  BPF:

   - Multi-buffer support in AF_XDP

   - Add multi uprobe BPF links for attaching multiple uprobes and usdt
     probes, which is significantly faster and saves extra fds

   - Implement an fd-based tc BPF attach API (TCX) and BPF link support
     on top of it

   - Add SO_REUSEPORT support for TC bpf_sk_assign

   - Support new instructions from cpu v4 to simplify the generated code
     and feature completeness, for x86, arm64, riscv64

   - Support defragmenting IPv(4|6) packets in BPF

   - Teach verifier actual bounds of bpf_get_smp_processor_id() and fix
     perf+libbpf issue related to custom section handling

   - Introduce bpf map element count and enable it for all program types

   - Add a BPF hook in sys_socket() to change the protocol ID from
     IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy

   - Introduce bpf_me_mcache_free_rcu() and fix OOM under stress

   - Add uprobe support for the bpf_get_func_ip helper

   - Check skb ownership against full socket

   - Support for up to 12 arguments in BPF trampoline

   - Extend link_info for kprobe_multi and perf_event links

  Netfilter:

   - Speed-up process exit by aborting ruleset validation if a fatal
     signal is pending

   - Allow NLA_POLICY_MASK to be used with BE16/BE32 types

  Driver API:

   - Page pool optimizations, to improve data locality and cache usage

   - Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the
     need for raw ioctl() handling in drivers

   - Simplify genetlink dump operations (doit/dumpit) providing them the
     common information already populated in struct genl_info

   - Extend and use the yaml devlink specs to [re]generate the split ops

   - Introduce devlink selective dumps, to allow SF filtering SF based
     on handle and other attributes

   - Add yaml netlink spec for netlink-raw families, allow route, link
     and address related queries via the ynl tool

   - Remove phylink legacy mode support

   - Support offload LED blinking to phy

   - Add devlink port function attributes for IPsec

  New hardware / drivers:

   - Ethernet:
      - Broadcom ASP 2.0 (72165) ethernet controller
      - MediaTek MT7988 SoC
      - Texas Instruments AM654 SoC
      - Texas Instruments IEP driver
      - Atheros qca8081 phy
      - Marvell 88Q2110 phy
      - NXP TJA1120 phy

   - WiFi:
      - MediaTek mt7981 support

   - Can:
      - Kvaser SmartFusion2 PCI Express devices
      - Allwinner T113 controllers
      - Texas Instruments tcan4552/4553 chips

   - Bluetooth:
      - Intel Gale Peak
      - Qualcomm WCN3988 and WCN7850
      - NXP AW693 and IW624
      - Mediatek MT2925

  Drivers:

   - Ethernet NICs:
      - nVidia/Mellanox:
         - mlx5:
            - support UDP encapsulation in packet offload mode
            - IPsec packet offload support in eswitch mode
            - improve aRFS observability by adding new set of counters
            - extends MACsec offload support to cover RoCE traffic
            - dynamic completion EQs
         - mlx4:
            - convert to use auxiliary bus instead of custom interface
              logic
      - Intel
         - ice:
            - implement switchdev bridge offload, even for LAG
              interfaces
            - implement SRIOV support for LAG interfaces
         - igc:
            - add support for multiple in-flight TX timestamps
      - Broadcom:
         - bnxt:
            - use the unified RX page pool buffers for XDP and non-XDP
            - use the NAPI skb allocation cache
      - OcteonTX2:
         - support Round Robin scheduling HTB offload
         - TC flower offload support for SPI field
      - Freescale:
         - add XDP_TX feature support
      - AMD:
         - ionic: add support for PCI FLR event
         - sfc:
            - basic conntrack offload
            - introduce eth, ipv4 and ipv6 pedit offloads
      - ST Microelectronics:
         - stmmac: maximze PTP timestamping resolution

   - Virtual NICs:
      - Microsoft vNIC:
         - batch ringing RX queue doorbell on receiving packets
         - add page pool for RX buffers
      - Virtio vNIC:
         - add per queue interrupt coalescing support
      - Google vNIC:
         - add queue-page-list mode support

   - Ethernet high-speed switches:
      - nVidia/Mellanox (mlxsw):
         - add port range matching tc-flower offload
         - permit enslavement to netdevices with uppers

   - Ethernet embedded switches:
      - Marvell (mv88e6xxx):
         - convert to phylink_pcs
      - Renesas:
         - r8A779fx: add speed change support
         - rzn1: enables vlan support

   - Ethernet PHYs:
      - convert mv88e6xxx to phylink_pcs

   - WiFi:
      - Qualcomm Wi-Fi 7 (ath12k):
         - extremely High Throughput (EHT) PHY support
      - RealTek (rtl8xxxu):
         - enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
           RTL8192EU and RTL8723BU
      - RealTek (rtw89):
         - Introduce Time Averaged SAR (TAS) support

   - Connector:
      - support for event filtering"

* tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits)
  net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show
  net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler
  net: stmmac: clarify difference between "interface" and "phy_interface"
  r8152: add vendor/device ID pair for D-Link DUB-E250
  devlink: move devlink_notify_register/unregister() to dev.c
  devlink: move small_ops definition into netlink.c
  devlink: move tracepoint definitions into core.c
  devlink: push linecard related code into separate file
  devlink: push rate related code into separate file
  devlink: push trap related code into separate file
  devlink: use tracepoint_enabled() helper
  devlink: push region related code into separate file
  devlink: push param related code into separate file
  devlink: push resource related code into separate file
  devlink: push dpipe related code into separate file
  devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper
  devlink: push shared buffer related code into separate file
  devlink: push port related code into separate file
  devlink: push object register/unregister notifications into separate helpers
  inet: fix IP_TRANSPARENT error handling
  ...
2023-08-29 11:33:01 -07:00
Linus Torvalds
615e95831e v6.6-vfs.ctime
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZOXTKAAKCRCRxhvAZXjc
 oifJAQCzi/p+AdQu8LA/0XvR7fTwaq64ZDCibU4BISuLGT2kEgEAuGbuoFZa0rs2
 XYD/s4+gi64p9Z01MmXm2XO1pu3GPg0=
 =eJz5
 -----END PGP SIGNATURE-----

Merge tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs timestamp updates from Christian Brauner:
 "This adds VFS support for multi-grain timestamps and converts tmpfs,
  xfs, ext4, and btrfs to use them. This carries acks from all relevant
  filesystems.

  The VFS always uses coarse-grained timestamps when updating the ctime
  and mtime after a change. This has the benefit of allowing filesystems
  to optimize away a lot of metadata updates, down to around 1 per
  jiffy, even when a file is under heavy writes.

  Unfortunately, this has always been an issue when we're exporting via
  NFSv3, which relies on timestamps to validate caches. A lot of changes
  can happen in a jiffy, so timestamps aren't sufficient to help the
  client decide to invalidate the cache.

  Even with NFSv4, a lot of exported filesystems don't properly support
  a change attribute and are subject to the same problems with timestamp
  granularity. Other applications have similar issues with timestamps
  (e.g., backup applications).

  If we were to always use fine-grained timestamps, that would improve
  the situation, but that becomes rather expensive, as the underlying
  filesystem would have to log a lot more metadata updates.

  This introduces fine-grained timestamps that are used when they are
  actively queried.

  This uses the 31st bit of the ctime tv_nsec field to indicate that
  something has queried the inode for the mtime or ctime. When this flag
  is set, on the next mtime or ctime update, the kernel will fetch a
  fine-grained timestamp instead of the usual coarse-grained one.

  As POSIX generally mandates that when the mtime changes, the ctime
  must also change the kernel always stores normalized ctime values, so
  only the first 30 bits of the tv_nsec field are ever used.

  Filesytems can opt into this behavior by setting the FS_MGTIME flag in
  the fstype. Filesystems that don't set this flag will continue to use
  coarse-grained timestamps.

  Various preparatory changes, fixes and cleanups are included:

   - Fixup all relevant places where POSIX requires updating ctime
     together with mtime. This is a wide-range of places and all
     maintainers provided necessary Acks.

   - Add new accessors for inode->i_ctime directly and change all
     callers to rely on them. Plain accesses to inode->i_ctime are now
     gone and it is accordingly rename to inode->__i_ctime and commented
     as requiring accessors.

   - Extend generic_fillattr() to pass in a request mask mirroring in a
     sense the statx() uapi. This allows callers to pass in a request
     mask to only get a subset of attributes filled in.

   - Rework timestamp updates so it's possible to drop the @now
     parameter the update_time() inode operation and associated helpers.

   - Add inode_update_timestamps() and convert all filesystems to it
     removing a bunch of open-coding"

* tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits)
  btrfs: convert to multigrain timestamps
  ext4: switch to multigrain timestamps
  xfs: switch to multigrain timestamps
  tmpfs: add support for multigrain timestamps
  fs: add infrastructure for multigrain timestamps
  fs: drop the timespec64 argument from update_time
  xfs: have xfs_vn_update_time gets its own timestamp
  fat: make fat_update_time get its own timestamp
  fat: remove i_version handling from fat_update_time
  ubifs: have ubifs_update_time use inode_update_timestamps
  btrfs: have it use inode_update_timestamps
  fs: drop the timespec64 arg from generic_update_time
  fs: pass the request_mask to generic_fillattr
  fs: remove silly warning from current_time
  gfs2: fix timestamp handling on quota inodes
  fs: rename i_ctime field to __i_ctime
  selinux: convert to ctime accessor functions
  security: convert to ctime accessor functions
  apparmor: convert to ctime accessor functions
  sunrpc: convert to ctime accessor functions
  ...
2023-08-28 09:31:32 -07:00
Jakub Kicinski
3c5066c6b0 Merge branch 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Leon Romanovsky says:

====================
mlx5 MACsec RoCEv2 support

From Patrisious:

This series extends previously added MACsec offload support
to cover RoCE traffic either.

In order to achieve that, we need configure MACsec with offload between
the two endpoints, like below:

REMOTE_MAC=10:70:fd:43:71:c0

* ip addr add 1.1.1.1/16 dev eth2
* ip link set dev eth2 up
* ip link add link eth2 macsec0 type macsec encrypt on
* ip macsec offload macsec0 mac
* ip macsec add macsec0 tx sa 0 pn 1 on key 00 dffafc8d7b9a43d5b9a3dfbbf6a30c16
* ip macsec add macsec0 rx port 1 address $REMOTE_MAC
* ip macsec add macsec0 rx port 1 address $REMOTE_MAC sa 0 pn 1 on key 01 ead3664f508eb06c40ac7104cdae4ce5
* ip addr add 10.1.0.1/16 dev macsec0
* ip link set dev macsec0 up

And in a similar manner on the other machine, while noting the keys order
would be reversed and the MAC address of the other machine.

RDMA traffic is separated through relevant GID entries and in case
of IP ambiguity issue - meaning we have a physical GIDs and a MACsec
GIDs with the same IP/GID, we disable our physical GID in order
to force the user to only use the MACsec GID.

v0: https://lore.kernel.org/netdev/20230813064703.574082-1-leon@kernel.org/

* 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  RDMA/mlx5: Handles RoCE MACsec steering rules addition and deletion
  net/mlx5: Add RoCE MACsec steering infrastructure in core
  net/mlx5: Configure MACsec steering for ingress RoCEv2 traffic
  net/mlx5: Configure MACsec steering for egress RoCEv2 traffic
  IB/core: Reorder GID delete code for RoCE
  net/mlx5: Add MACsec priorities in RDMA namespaces
  RDMA/mlx5: Implement MACsec gid addition and deletion
  net/mlx5: Maintain fs_id xarray per MACsec device inside macsec steering
  net/mlx5: Remove netdevice from MACsec steering
  net/mlx5e: Move MACsec flow steering and statistics database from ethernet to core
  net/mlx5e: Rename MACsec flow steering functions/parameters to suit core naming style
  net/mlx5: Remove dependency of macsec flow steering on ethernet
  net/mlx5e: Move MACsec flow steering operations to be used as core library
  macsec: add functions to get macsec real netdevice and check offload
====================

Link: https://lore.kernel.org/r/20230821073833.59042-1-leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-24 11:32:18 -07:00
Petr Pavlu
7d22b1cb9d mlx4: Connect the infiniband part to the auxiliary bus
Use the auxiliary bus to perform device management of the infiniband
part of the mlx4 driver.

Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 08:25:28 +01:00
Petr Pavlu
73d68002a0 mlx4: Replace the mlx4_interface.event callback with a notifier
Use a notifier to implement mlx4_dispatch_event() in preparation to
switch mlx4_en and mlx4_ib to be an auxiliary device.

A problem is that if the mlx4_interface.event callback was replaced with
something as mlx4_adrv.event then the implementation of
mlx4_dispatch_event() would need to acquire a lock on a given device
before executing this callback. That is necessary because otherwise
there is no guarantee that the associated driver cannot get unbound when
the callback is running. However, taking this lock is not possible
because mlx4_dispatch_event() can be invoked from the hardirq context.
Using an atomic notifier allows the driver to accurately record when it
wants to receive these events and solves this problem.

A handler registration is done by both mlx4_en and mlx4_ib at the end of
their mlx4_interface.add callback. This matches the current situation
when mlx4_add_device() would enable events for a given device
immediately after this callback, by adding the device on the
mlx4_priv.list.

Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 08:25:27 +01:00
Petr Pavlu
7ba189ac52 mlx4: Use 'void *' as the event param of mlx4_dispatch_event()
Function mlx4_dispatch_event() takes an 'unsigned long' as its event
parameter. The actual value is none (MLX4_DEV_EVENT_CATASTROPHIC_ERROR),
a pointer to mlx4_eqe (MLX4_DEV_EVENT_PORT_MGMT_CHANGE), or a 32-bit
integer (remaining events).

In preparation to switch mlx4_en and mlx4_ib to be an auxiliary device,
the mlx4_interface.event callback is replaced with a notifier and
function mlx4_dispatch_event() gets updated to invoke
atomic_notifier_call_chain(). This requires forwarding the input 'param'
value from the former function to the latter. A problem is that the
notifier call takes 'void *' as its 'param' value, compared to
'unsigned long' used by mlx4_dispatch_event(). Re-passing the value
would need either punning it to 'void *' or passing down the address of
the input 'param'. Both approaches create a number of unnecessary casts.

Change instead the input 'param' of mlx4_dispatch_event() from
'unsigned long' to 'void *'. A mlx4_eqe pointer can be passed directly,
callers using an int value are adjusted to pass its address.

Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 08:25:27 +01:00
Petr Pavlu
71ab55a9af mlx4: Get rid of the mlx4_interface.get_dev callback
Simplify the mlx4 driver interface by removing mlx4_get_protocol_dev()
and the associated mlx4_interface.get_dev callbacks. This is done in
preparation to use an auxiliary bus to model the mlx4 driver structure.

The change is motivated by the following situation:
* The mlx4_en interface is being initialized by mlx4_en_add() and
  mlx4_en_activate().
* The latter activate function calls mlx4_en_init_netdev() ->
  register_netdev() to register a new net_device.
* A netdev event NETDEV_REGISTER is raised for the device.
* The netdev notififier mlx4_ib_netdev_event() is called and it invokes
  mlx4_ib_scan_netdevs() -> mlx4_get_protocol_dev() ->
  mlx4_en_get_netdev() [via mlx4_interface.get_dev].

This chain creates a problem when mlx4_en gets switched to be an
auxiliary driver. It contains two device calls which would both need to
take a respective device lock.

Avoid this situation by updating mlx4_ib_scan_netdevs() to no longer
call mlx4_get_protocol_dev() but instead to utilize the information
passed in net_device.parent and net_device.dev_port. This data is
sufficient to determine that an updated port is one that the mlx4_ib
driver should take care of and to keep mlx4_ib_dev.iboe.netdevs up to
date.

Following that, update mlx4_ib_get_netdev() to also not call
mlx4_get_protocol_dev() and instead scan all current netdevs to find
find a matching one. Note that mlx4_ib_get_netdev() is called early from
ib_register_device() and cannot use data tracked in
mlx4_ib_dev.iboe.netdevs which is not at that point yet set.

Finally, remove function mlx4_get_protocol_dev() and the
mlx4_interface.get_dev callbacks (only mlx4_en_get_netdev()) as they
became unused.

Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 08:25:27 +01:00
Douglas Miller
f5acc36b07 IB/hfi1: Reduce printing of errors during driver shut down
The driver prints unnecessary prints for error conditions on shutdown
remove them to quiet it down.

Signed-off-by: Douglas Miller <doug.miller@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/169271327832.1855761.3756156924805531643.stgit@awfm-02.cornelisnetworks.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-22 17:31:45 +03:00
Brendan Cunningham
d2c0234634 RDMA/hfi1: Move user SDMA system memory pinning code to its own file
Move user SDMA system memory page-pinning code from user_sdma.c to
pin_system.c. Put declarations for non-static functions in pinning.h.

System memory pinning is necessary for processing user SDMA requests but
actual steps are invisible to user SDMA request-processing code. Moving
system memory pinning code for user SDMA to its own file makes this
distinction apparent.

These changes have no effect on userspace.

Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com>
Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/169271327311.1855761.4736714053318724062.stgit@awfm-02.cornelisnetworks.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-22 17:31:45 +03:00
Jinjie Ruan
3d91dfe72a RDMA/hfi1: Use list_for_each_entry() helper
Convert list_for_each() to list_for_each_entry() so that the pos list_head
pointer and list_entry() call are no longer needed, which can
reduce a few lines of code. No functional changed.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/20230822033539.3692453-1-ruanjinjie@huawei.com
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-22 17:28:27 +03:00
Rohit Chavan
d3c2245754 RDMA/mlx5: Fix trailing */ formatting in block comment
Resolved a formatting issue where the trailing */ in a block comment
was placed on a same line instead of separate line.

Signed-off-by: Rohit Chavan <roheetchavan@gmail.com>
Link: https://lore.kernel.org/r/20230822120451.8215-1-roheetchavan@gmail.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-22 17:27:16 +03:00
Rohit Chavan
6812e06999 RDMA/rxe: Fix redundant break statement in switch-case.
Removed unreachable break statement after return.

Signed-off-by: Rohit Chavan <roheetchavan@gmail.com>
Link: https://lore.kernel.org/r/20230822091304.7312-1-roheetchavan@gmail.com
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-22 17:23:00 +03:00
Yonatan Nachum
dc202c57e9 RDMA/efa: Fix wrong resources deallocation order
When trying to destroy QP or CQ, we first decrease the refcount and
potentially free memory regions allocated for the object and then
request the device to destroy the object. If the device fails, the
object isn't fully destroyed so the user/IB core can try to destroy the
object again which will lead to underflow when trying to decrease an
already zeroed refcount.

Deallocate resources in reverse order of allocating them to safely free
them.

Fixes: ff6629f88c ("RDMA/efa: Do not delay freeing of DMA pages")
Reviewed-by: Michael Margolin <mrgolin@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Yonatan Nachum <ynachum@amazon.com>
Link: https://lore.kernel.org/r/20230822082725.31719-1-ynachum@amazon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-22 17:21:53 +03:00
Guoqing Jiang
9dfccb6d0d RDMA/siw: Call llist_reverse_order in siw_run_sq
We can call the function to get fifo list.

Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20230821133255.31111-4-guoqing.jiang@linux.dev
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-22 17:05:12 +03:00
Guoqing Jiang
bee024d204 RDMA/siw: Correct wrong debug message
We need to print num_sle first then pbl->max_buf per the condition.
Also replace mem->pbl with pbl while at it.

Fixes: 303ae1cdfd ("rdma/siw: application interface")
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20230821133255.31111-3-guoqing.jiang@linux.dev
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-22 17:05:12 +03:00
Guoqing Jiang
b056327bee RDMA/siw: Balance the reference of cep->kref in the error path
The siw_connect can go to err in below after cep is allocated successfully:

1. If siw_cm_alloc_work returns failure. In this case socket is not
assoicated with cep so siw_cep_put can't be called by siw_socket_disassoc.
We need to call siw_cep_put twice since cep->kref is increased once after
it was initialized.

2. If siw_cm_queue_work can't find a work, which means siw_cep_get is not
called in siw_cm_queue_work, so cep->kref is increased twice by siw_cep_get
and when associate socket with cep after it was initialized. So we need to
call siw_cep_put three times (one in siw_socket_disassoc).

3. siw_send_mpareqrep returns error, this scenario is similar as 2.

So we need to remove one siw_cep_put in the error path.

Fixes: 6c52fdc244 ("rdma/siw: connection management")
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20230821133255.31111-2-guoqing.jiang@linux.dev
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-22 17:05:12 +03:00
Leon Romanovsky
dfe261107c Revert "IB/isert: Fix incorrect release of isert connection"
Commit: 699826f4e3 ("IB/isert: Fix incorrect release of isert connection") is
causing problems on OPA when DEVICE_REMOVAL is happening.

 ------------[ cut here ]------------
 WARNING: CPU: 52 PID: 2117247 at drivers/infiniband/core/cq.c:359
ib_cq_pool_cleanup+0xac/0xb0 [ib_core]
 Modules linked in: nfsd nfs_acl target_core_user uio tcm_fc libfc
scsi_transport_fc tcm_loop target_core_pscsi target_core_iblock target_core_file
rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs
rfkill rpcrdma rdma_ucm ib_srpt sunrpc ib_isert iscsi_target_mod target_core_mod
opa_vnic ib_iser libiscsi ib_umad scsi_transport_iscsi rdma_cm ib_ipoib iw_cm
ib_cm hfi1(-) rdmavt ib_uverbs intel_rapl_msr intel_rapl_common sb_edac ib_core
x86_pkg_temp_thermal intel_powerclamp coretemp i2c_i801 mxm_wmi rapl iTCO_wdt
ipmi_si iTCO_vendor_support mei_me ipmi_devintf mei intel_cstate ioatdma
intel_uncore i2c_smbus joydev pcspkr lpc_ich ipmi_msghandler acpi_power_meter
acpi_pad xfs libcrc32c sr_mod sd_mod cdrom t10_pi sg crct10dif_pclmul
crc32_pclmul crc32c_intel drm_kms_helper drm_shmem_helper ahci libahci
ghash_clmulni_intel igb drm libata dca i2c_algo_bit wmi fuse
 CPU: 52 PID: 2117247 Comm: modprobe Not tainted 6.5.0-rc1+ #1
 Hardware name: Intel Corporation S2600CWR/S2600CW, BIOS
SE5C610.86B.01.01.0014.121820151719 12/18/2015
 RIP: 0010:ib_cq_pool_cleanup+0xac/0xb0 [ib_core]
 Code: ff 48 8b 43 40 48 8d 7b 40 48 83 e8 40 4c 39 e7 75 b3 49 83
c4 10 4d 39 fc 75 94 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc <0f> 0b eb a1
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f
 RSP: 0018:ffffc10bea13fc80 EFLAGS: 00010206
 RAX: 000000000000010c RBX: ffff9bf5c7e66c00 RCX: 000000008020001d
 RDX: 000000008020001e RSI: fffff175221f9900 RDI: ffff9bf5c7e67640
 RBP: ffff9bf5c7e67600 R08: ffff9bf5c7e64400 R09: 000000008020001d
 R10: 0000000040000000 R11: 0000000000000000 R12: ffff9bee4b1e8a18
 R13: dead000000000122 R14: dead000000000100 R15: ffff9bee4b1e8a38
 FS:  00007ff1e6d38740(0000) GS:ffff9bfd9fb00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00005652044ecc68 CR3: 0000000889b5c005 CR4: 00000000001706e0
 Call Trace:
  <TASK>
  ? __warn+0x80/0x130
  ? ib_cq_pool_cleanup+0xac/0xb0 [ib_core]
  ? report_bug+0x195/0x1a0
  ? handle_bug+0x3c/0x70
  ? exc_invalid_op+0x14/0x70
  ? asm_exc_invalid_op+0x16/0x20
  ? ib_cq_pool_cleanup+0xac/0xb0 [ib_core]
  disable_device+0x9d/0x160 [ib_core]
  __ib_unregister_device+0x42/0xb0 [ib_core]
  ib_unregister_device+0x22/0x30 [ib_core]
  rvt_unregister_device+0x20/0x90 [rdmavt]
  hfi1_unregister_ib_device+0x16/0xf0 [hfi1]
  remove_one+0x55/0x1a0 [hfi1]
  pci_device_remove+0x36/0xa0
  device_release_driver_internal+0x193/0x200
  driver_detach+0x44/0x90
  bus_remove_driver+0x69/0xf0
  pci_unregister_driver+0x2a/0xb0
  hfi1_mod_cleanup+0xc/0x3c [hfi1]
  __do_sys_delete_module.constprop.0+0x17a/0x2f0
  ? exit_to_user_mode_prepare+0xc4/0xd0
  ? syscall_trace_enter.constprop.0+0x126/0x1a0
  do_syscall_64+0x5c/0x90
  ? syscall_exit_to_user_mode+0x12/0x30
  ? do_syscall_64+0x69/0x90
  ? syscall_exit_work+0x103/0x130
  ? syscall_exit_to_user_mode+0x12/0x30
  ? do_syscall_64+0x69/0x90
  ? exc_page_fault+0x65/0x150
  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
 RIP: 0033:0x7ff1e643f5ab
 Code: 73 01 c3 48 8b 0d 75 a8 1b 00 f7 d8 64 89 01 48 83 c8 ff c3
66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 45 a8 1b 00 f7 d8 64 89 01 48
 RSP: 002b:00007ffec9103cc8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
 RAX: ffffffffffffffda RBX: 00005615267fdc50 RCX: 00007ff1e643f5ab
 RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005615267fdcb8
 RBP: 00005615267fdc50 R08: 0000000000000000 R09: 0000000000000000
 R10: 00007ff1e659eac0 R11: 0000000000000206 R12: 00005615267fdcb8
 R13: 0000000000000000 R14: 00005615267fdcb8 R15: 00007ffec9105ff8
  </TASK>
 ---[ end trace 0000000000000000 ]---

And...

 restrack: ------------[ cut here ]------------
 infiniband hfi1_0: BUG: RESTRACK detected leak of resources
 restrack: Kernel PD object allocated by ib_isert is not freed
 restrack: Kernel CQ object allocated by ib_core is not freed
 restrack: Kernel QP object allocated by rdma_cm is not freed
 restrack: ------------[ cut here ]------------

Fixes: 699826f4e3 ("IB/isert: Fix incorrect release of isert connection")
Reported-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Closes: https://lore.kernel.org/all/921cd1d9-2879-f455-1f50-0053fe6a6655@cornelisnetworks.com
Link: https://lore.kernel.org/r/a27982d3235005c58f6d321f3fad5eb6e1beaf9e.1692604607.git.leonro@nvidia.com
Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-08-22 17:01:04 +03:00
Leon Romanovsky
c6c0052df2 RDMA/bnxt_re: Fix kernel doc errors
Fix set of the following errors due to use of wrong kernel doc format
to describe function parameters:

drivers/infiniband/hw/bnxt_re/qplib_rcfw.c:68: warning: Function parameter or member 'rcfw'
	not described in '__wait_for_resp'

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308180600.oOnkIAQV-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202308180401.iaj2ktTc-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202308180214.Lt9NAhbM-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202308180055.6zM4AK6V-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202308172136.ipx1wvs6-lkp@intel.com/
Link: https://lore.kernel.org/r/4b22c385f1b68590ace8f82f2985d14b20999432.1692539554.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-08-21 10:40:38 +03:00
Patrisious Haddad
58dbd6428a RDMA/mlx5: Handles RoCE MACsec steering rules addition and deletion
Add RoCE MACsec rules when a gid is added for the MACsec netdevice and
handle their cleanup when the gid is removed or the MACsec SA is deleted.
Also support alias IP for the MACsec device, as long as we don't have
more ips than what the gid table can hold.
In addition handle the case where a gid is added but there are still no
SAs added for the MACsec device, so the rules are added later on when
the SAs are added.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-08-20 12:35:24 +03:00
Patrisious Haddad
a019b1258d IB/core: Reorder GID delete code for RoCE
Reorder GID delete code so that the driver del_gid operation is executed
before nullifying the gid attribute ndev parameter, this allows drivers
to access the ndev during their gid delete operation, which makes more
sense since they had access to it during the gid addition operation.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-08-20 12:35:24 +03:00
Patrisious Haddad
758ce14aee RDMA/mlx5: Implement MACsec gid addition and deletion
Handle MACsec IP ambiguity issue, since mlx5 hw can't support
programming both the MACsec and the physical gid when they have the same
IP address, because it wouldn't know to whom to steer the traffic.
Hence in such case we delete the physical gid from the hw gid table,
which would then cause all traffic sent over it to fail, and we'll only
be able to send traffic over the MACsec gid.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-08-20 12:35:24 +03:00
Christopher Bednarz
bb6d73d9ad RDMA/irdma: Prevent zero-length STAG registration
Currently irdma allows zero-length STAGs to be programmed in HW during
the kernel mode fast register flow. Zero-length MR or STAG registration
disable HW memory length checks.

Improve gaps in bounds checking in irdma by preventing zero-length STAG or
MR registrations except if the IB_PD_UNSAFE_GLOBAL_RKEY is set.

This addresses the disclosure CVE-2023-25775.

Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Christopher Bednarz <christopher.n.bednarz@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230818144838.1758-1-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-19 19:28:17 +03:00
Cheng Xu
ed10435d35 RDMA/erdma: Implement hierarchical MTT
Hierarchical MTT allows large MR registration without the need of
continuous physical address. This commit adds the support of hierarchical
MTT support for erdma.

Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230817102151.75964-4-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-19 14:41:01 +03:00
Cheng Xu
7244b4aa42 RDMA/erdma: Refactor the storage structure of MTT entries
Currently our MTT only support inline mtt entries (0 level MTT) and
indirect MTT entries (1 level mtt), which will limit the maximum length
of MRs. In order to implement a multi-level MTT, we refactor the
structure of MTT first.

Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230817102151.75964-3-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-19 14:40:30 +03:00
Cheng Xu
d7cfbba90b RDMA/erdma: Renaming variable names and field names of struct erdma_mem
Currently, variable names and field names of struct erdma_mem contain
'mtt', which is not accurate. Renaming them with 'xxx_mem' or 'mem'.

Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230817102151.75964-2-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-19 14:36:29 +03:00
Chengchang Tang
5a87279591 RDMA/hns: Support hns HW stats
Support query hns HW stats for rdma-tool to help debugging.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20230816091812.2899366-3-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-19 14:26:34 +03:00
Chengchang Tang
c4bb187379 RDMA/hns: Dump whole QP/CQ/MR resource in raw
Currently, some fields in the QP/CQ/MR resource can be dumped by
rdma-tool, but this information is not enough. It is very
inconvenient to continue to expand on the current field, and it
will also introduce some trouble to parse these raw data.

This patch dump whole resource in raw to avoid the above problems.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20230816091812.2899366-2-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-19 14:26:08 +03:00
Jakub Kicinski
7ff57803d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/sfc/tc.c
  fa165e1949 ("sfc: don't unregister flow_indr if it was never registered")
  3bf969e88a ("sfc: add MAE table machinery for conntrack table")
https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 12:44:56 -07:00
Leon Romanovsky
5f513c8b97 RDMA/irdma: Add missing kernel-doc in irdma_setup_umode_qp()
Fix the following warning reported by kbuild:

drivers/infiniband/hw/irdma/verbs.c:584: warning: Function parameter
            or member 'udata' not described in 'irdma_setup_umode_qp'

Fixes: 3a84987204 ("RDMA/irdma: Allow accurate reporting on QP max send/recv WR")
Link: https://lore.kernel.org/r/2c9bcd2b773c400a1699bd7973e22bfba1e4b379.1692260011.git.leonro@nvidia.com
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308171620.m4MNACWz-lkp@intel.com/
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-08-18 13:48:16 -03:00
Gustavo A. R. Silva
18ddaeb03b RDMA/mlx4: Copy union directly
Copy union directly instead of using memcpy().

Note that in this case, a direct assignment is more readable and
consistent with the subsequent assignments.

This addresses the following -Wstringop-overflow warning seen in s390
with defconfig:
drivers/infiniband/hw/mlx4/main.c:296:33: warning: writing 16 bytes into a region of size 0 [-Wstringop-overflow=]
  296 |                                 memcpy(&port_gid_table->gids[free].gid,
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  297 |                                        &attr->gid, sizeof(attr->gid));
      |                                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This helps with the ongoing efforts to globally enable
-Wstringop-overflow.

Link: https://github.com/KSPP/linux/issues/308
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/ZNvimeRAPkJ24zRG@work
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-16 09:28:33 +03:00
Shiraz Saleem
295c95aa7e RDMA/irdma: Drop unused kernel push code
The driver has code blocks for kernel push WQEs but does not
map the doorbell page rendering this mode non functional [1]

Remove code associated with this feature from the kernel fast
path as there is currently no plan of record to support this.

This also address a sparse issue reported by lkp.

drivers/infiniband/hw/irdma/uk.c:285:24: sparse: sparse: incorrect type in assignment (different base types) @@     expected bool [usertype] push_wqe:1 @@     got restricted __le32 [usertype] *push_db @@
drivers/infiniband/hw/irdma/uk.c:285:24: sparse:     expected bool [usertype] push_wqe:1
drivers/infiniband/hw/irdma/uk.c:285:24: sparse:     got restricted __le32 [usertype] *push_db
drivers/infiniband/hw/irdma/uk.c:386:24: sparse: sparse: incorrect type in assignment (different base types) @@     expected bool [usertype] push_wqe:1 @@     got restricted __le32 [usertype] *push_db @@

[1] https://lore.kernel.org/linux-rdma/20230815051809.GB22185@unreal/T/#t

Fixes: 272bba19d6 ("RDMA: Remove unnecessary ternary operators")
Fixes: 551c46edc7 ("RDMA/irdma: Add user/kernel shared libraries")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308110251.BV6BcwUR-lkp@intel.com/
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230816001209.1721-1-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-16 09:24:52 +03:00
Saravanan Vajravel
0a30e59f22 RDMA/bnxt_re: Add support for dmabuf pinned memory regions
Support the new verb which indicates dmabuf support.  bnxt doesn't support
ODP. So use the pinned version of the dmabuf APIs to enable bnxt_re
devices to work as dmabuf importer.

Link: https://lore.kernel.org/r/1690790473-25850-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-08-15 15:54:08 -03:00
Selvin Xavier
213d2b9bb2 RDMA/bnxt_re: Protect the PD table bitmap
Syncrhonization is required to avoid simultaneous allocation
of the PD. Add a new mutex lock to handle allocation from
the PD table.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1692032419-21680-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-15 09:06:06 +03:00
Kashyap Desai
811e0ce9e6 RDMA/bnxt_re: Initialize mutex dbq_lock
Fix the missing dbq_lock mutex initialization

Fixes: 2ad4e6303a ("RDMA/bnxt_re: Implement doorbell pacing algorithm")
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1692032419-21680-1-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-15 09:06:06 +03:00
Kalesh AP
ca60fd116c IB/core: Add more speed parsing in ib_get_width_and_speed()
When the Ethernet driver does not provide the number of lanes
in the __ethtool_get_link_ksettings() response, the function
ib_get_width_and_speed() does not take consideration of 50G,
100G and 200G speeds while calculating the IB width and speed.
Update the width and speed for the above netdev speeds.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1690966823-8159-1-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-13 10:41:57 +03:00
Guoqing Jiang
25944c0681 RDMA/cxgb4: Set sq_sig_type correctly
Replace '0' with IB_SIGNAL_REQ_WR given the sq_sig_type is either
IB_SIGNAL_ALL_WR or IB_SIGNAL_REQ_WR per the below.

enum ib_sig_type {
	IB_SIGNAL_ALL_WR,
	IB_SIGNAL_REQ_WR
};

Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20230731092106.10396-1-guoqing.jiang@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-13 10:26:11 +03:00
Kashyap Desai
64b632654b RDMA/bnxt_re: Initialize dpi_tbl_lock mutex
Fix the missing dpi_tbl_lock mutex initialization.

Fixes: 0ac20faf5d ("RDMA/bnxt_re: Reorg the bar mapping")
Link: https://lore.kernel.org/r/1691642677-21369-4-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-08-10 16:35:54 -03:00
Kalesh AP
5ac8480ae4 RDMA/bnxt_re: Fix error handling in probe failure path
During bnxt_re_dev_init(), when bnxt_re_setup_chip_ctx() fails unregister
with L2 first before bailing out probe.

Fixes: ae8637e131 ("RDMA/bnxt_re: Add chip context to identify 57500 series")
Link: https://lore.kernel.org/r/1691642677-21369-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-08-10 16:35:54 -03:00
Selvin Xavier
5363fc488d RDMA/bnxt_re: Properly order ib_device_unalloc() to avoid UAF
ib_dealloc_device() should be called only after device cleanup.  Fix the
dealloc sequence.

Fixes: 6d758147c7 ("RDMA/bnxt_re: Use auxiliary driver interface")
Link: https://lore.kernel.org/r/1691642677-21369-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-08-10 16:35:54 -03:00
Yue Haibing
d952f54d01 RDMA/hns: Remove unused declaration hns_roce_modify_srq()
Commit c7bcb13442 ("RDMA/hns: Add SRQ support for hip08 kernel mode")
declared but never implemented this.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20230804130418.41728-1-yuehaibing@huawei.com
Reviewed-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-08 21:35:05 +03:00
Ivan Orlov
64917f4c35 RDMA: Make all 'class' structures const
Now that the driver core allows for struct class to be in read-only
memory, making all 'class' structures to be declared at build time
placing them into read-only memory, instead of having to be dynamically
allocated at load time.

Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: "Md. Haris Iqbal" <haris.iqbal@ionos.com>
Cc: Jack Wang <jinpu.wang@ionos.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Yishai Hadas <yishaih@nvidia.com>
Cc: Ivan Orlov <ivan.orlov0322@gmail.com>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/2023080427-commuting-crewless-cbee@gregkh
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-08 13:25:15 +03:00
Maher Sanalla
f14c1a14e6 net/mlx5: Allocate completion EQs dynamically
This commit enables the dynamic allocation of EQs at runtime, allowing
for more flexibility in managing completion EQs and reducing the memory
overhead of driver load. Whenever a CQ is created for a given vector
index, the driver will lookup to see if there is an already mapped
completion EQ for that vector, if so, utilize it. Otherwise, allocate a
new EQ on demand and then utilize it for the CQ completion events.

Add a protection lock to the EQ table to protect from concurrent EQ
creation attempts.

While at it, replace mlx5_vector2irqn()/mlx5_vector2eqn() with
mlx5_comp_eqn_get() and mlx5_comp_irqn_get() which will allocate an
EQ on demand if no EQ is found for the given vector.

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07 10:53:52 -07:00
Maher Sanalla
674dd4e2e0 net/mlx5: Rename mlx5_comp_vectors_count() to mlx5_comp_vectors_max()
To accurately represent its purpose, rename the function that retrieves
the value of maximum vectors from mlx5_comp_vectors_count() to
mlx5_comp_vectors_max().

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07 10:53:51 -07:00
Ruan Jinjie
849b1955ad RDMA: Remove unnecessary NULL values
The NULL initialization of the pointers assigned by kzalloc() first is
not necessary, because if the kzalloc() failed, the pointers will be
assigned NULL, otherwise it works as usual. so remove it.

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/20230804082102.3361961-1-ruanjinjie@huawei.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-07 16:56:57 +03:00
Xiang Yang
26b7d1a271 IB/uverbs: Fix an potential error pointer dereference
smatch reports the warning below:
drivers/infiniband/core/uverbs_std_types_counters.c:110
ib_uverbs_handler_UVERBS_METHOD_COUNTERS_READ() error: 'uattr'
dereferencing possible ERR_PTR()

The return value of uattr maybe ERR_PTR(-ENOENT), fix this by checking
the value of uattr before using it.

Fixes: ebb6796bd3 ("IB/uverbs: Add read counters support")
Signed-off-by: Xiang Yang <xiangyang3@huawei.com>
Link: https://lore.kernel.org/r/20230804022525.1916766-1-xiangyang3@huawei.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-07 16:49:59 +03:00
Chengchang Tang
9e03dbea2b RDMA/hns: Fix CQ and QP cache affinity
Currently, the affinity between QP cache and CQ cache is not
considered when assigning QPN, it will affect the message rate of HW.

Allocate QPN from QP cache with better CQ affinity to get better
performance.

Fixes: 71586dd200 ("RDMA/hns: Create QP with selected QPN for bank load balance")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20230804012711.808069-5-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-07 16:46:58 +03:00
Junxian Huang
c9c0bd3c17 RDMA/hns: Fix inaccurate error label name in init instance
This patch fixes inaccurate error label name in init instance.

Fixes: 70f9252158 ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20230804012711.808069-4-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-07 16:46:58 +03:00
Junxian Huang
706efac447 RDMA/hns: Fix incorrect post-send with direct wqe of wr-list
Currently, direct wqe is not supported for wr-list. RoCE driver excludes
direct wqe for wr-list by judging whether the number of wr is 1.

For a wr-list where the second wr is a length-error atomic wr, the
post-send driver handles the first wr and adds 1 to the wr number counter
firstly. While handling the second wr, the driver finds out a length error
and terminates the wr handle process, remaining the counter at 1. This
causes the driver mistakenly judges there is only 1 wr and thus enters
the direct wqe process, carrying the current length-error atomic wqe.

This patch fixes the error by adding a judgement whether the current wr
is a bad wr. If so, use the normal doorbell process but not direct wqe
despite the wr number is 1.

Fixes: 01584a5edc ("RDMA/hns: Add support of direct wqe")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20230804012711.808069-3-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-07 16:46:58 +03:00
Chengchang Tang
df1bcf90a6 RDMA/hns: Fix port active speed
HW supports a variety of different speed, but the current speed
is fixed.

The real speed should be querried from ethernet.

Fixes: 9a4435375c ("IB/hns: Add driver files for hns RoCE driver")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20230804012711.808069-2-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-07 16:46:58 +03:00
Kalesh AP
14611b9b98 RDMA/bnxt_re: Remove unnecessary variable initializations
Remove unnecessary variable initializations.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1691052326-32143-7-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-07 16:39:42 +03:00
Kalesh AP
00d0427fd8 RDMA/bnxt_re: Avoid unnecessary memset
Avoid memset by initializing the variables during
declaration itself.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1691052326-32143-6-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-07 16:39:42 +03:00
Kalesh AP
e59a5cec3f RDMA/bnxt_re: Cleanup bnxt_re_process_raw_qp_pkt_rx() function
- Remove unnecessary memset by initializing the variables
   during declaration itself.
 - Arranged variable declarartion in RCT order.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1691052326-32143-5-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-07 16:39:42 +03:00
Selvin Xavier
c9f3e4e1d8 RDMA/bnxt_re: Fix the sideband buffer size handling for FW commands
bnxt_qplib_rcfw_alloc_sbuf allocates 24 bytes and it is better to fit
on stack variables. This way we can avoid unwanted kmalloc call.
Call dma_alloc_coherent directly instead of wrapper
bnxt_qplib_rcfw_alloc_sbuf.

Also, FW expects the side buffer  needs to be aligned to
BNXT_QPLIB_CMDQE_UNITS(16B). So align the size to have the
extra padding bytes.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1691052326-32143-4-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-07 16:39:42 +03:00
Kalesh AP
fd28c8a8c7 RDMA/bnxt_re: Remove a redundant flag
After the cited commit, BNXT_RE_FLAG_GOT_MSIX is redundant.
Remove it.

Fixes: 3034322113 ("bnxt_en: Remove runtime interrupt vector allocation")
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1691052326-32143-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-07 16:39:42 +03:00
Kalesh AP
f19fba1f79 RDMA/bnxt_re: Fix max_qp count for virtual functions
Driver has not accounted QP1 for virtual functions
when fetching device attributes and hence max_qp
count is one less than active_qp count. Fixed driver
so that it counts QP1 for virtual functions as well
while fetching device attributes

Fixes: ccd9d0d3df ("RDMA/bnxt_re: Enable RoCE on virtual functions")
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1691052326-32143-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-07 16:39:42 +03:00
Douglas Miller
4fdfaef71f IB/hfi1: Fix possible panic during hotplug remove
During hotplug remove it is possible that the update counters work
might be pending, and may run after memory has been freed.
Cancel the update counters work before freeing memory.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Signed-off-by: Douglas Miller <doug.miller@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/169099756100.3927190.15284930454106475280.stgit@awfm-02.cornelisnetworks.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-03 21:13:57 +03:00
Gustavo A. R. Silva
38313c6d2a RDMA/irdma: Replace one-element array with flexible-array member
One-element and zero-length arrays are deprecated. So, replace
one-element array in struct irdma_qvlist_info with flexible-array
member.

A patch for this was sent a while ago[1]. However, it seems that, at
the time, the changes were partially folded[2][3], and the actual
flexible-array transformation was omitted. This patch fixes that.

The only binary difference seen before/after changes is shown below:

|  drivers/infiniband/hw/irdma/hw.o
| @@ -868,7 +868,7 @@
| drivers/infiniband/hw/irdma/hw.c:484 (discriminator 2)
|	size += struct_size(iw_qvlist, qv_info, rf->msix_count);
|      55b:      imul   $0x45c,%rdi,%rdi
|-     562:      add    $0x10,%rdi
|+     562:      add    $0x4,%rdi

which is, of course, expected as it reflects the mistake made
while folding the patch I've mentioned above.

Worth mentioning is the fact that with this change we save 12 bytes
of memory, as can be inferred from the diff snapshot above. Notice
that:

$ pahole -C rdma_qv_info idrivers/infiniband/hw/irdma/hw.o
struct irdma_qv_info {
	u32                        v_idx;                /*     0     4 */
	u16                        ceq_idx;              /*     4     2 */
	u16                        aeq_idx;              /*     6     2 */
	u8                         itr_idx;              /*     8     1 */

	/* size: 12, cachelines: 1, members: 4 */
	/* padding: 3 */
	/* last cacheline: 12 bytes */
};

Link: https://lore.kernel.org/linux-hardening/20210525230038.GA175516@embeddedor/ [1]
Link: https://lore.kernel.org/linux-hardening/bf46b428deef4e9e89b0ea1704b1f0e5@intel.com/ [2]
Link: https://lore.kernel.org/linux-rdma/20210520143809.819-1-shiraz.saleem@intel.com/T/#u [3]
Fixes: 44d9e52977 ("RDMA/irdma: Implement device initialization definitions")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/ZMpsQrZadBaJGkt4@work
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-08-03 21:11:18 +03:00
Yue Haibing
2897f1925b RDMA/hns: Remove unused function declarations
commit b16f818847 ("RDMA/hns: Refactor eq code for hip06") left behind
hns_roce_cleanup_eq_table().

commit 773f841ab1 ("RDMA/hns: Avoid filling sgid index when modifying QP
to RTR") leave hns_get_gid_index() unused.

Remove both.

Link: https://lore.kernel.org/r/20230731135916.32392-1-yuehaibing@huawei.com
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-07-31 15:27:37 -03:00
Bob Pearson
5d122db2ff RDMA/rxe: Fix incomplete state save in rxe_requester
If a send packet is dropped by the IP layer in rxe_requester()
the call to rxe_xmit_packet() can fail with err == -EAGAIN.
To recover, the state of the wqe is restored to the state before
the packet was sent so it can be resent. However, the routines
that save and restore the state miss a significnt part of the
variable state in the wqe, the dma struct which is used to process
through the sge table. And, the state is not saved before the packet
is built which modifies the dma struct.

Under heavy stress testing with many QPs on a fast node sending
large messages to a slow node dropped packets are observed and
the resent packets are corrupted because the dma struct was not
restored. This patch fixes this behavior and allows the test cases
to succeed.

Fixes: 3050b99850 ("IB/rxe: Fix race condition between requester and completer")
Link: https://lore.kernel.org/r/20230721200748.4604-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-07-31 15:24:12 -03:00
Bob Pearson
cc28f35115 RDMA/rxe: Fix rxe_modify_srq
This patch corrects an error in rxe_modify_srq where if the
caller changes the srq size the actual new value is not returned
to the caller since it may be larger than what is requested.
Additionally it open codes the subroutine rcv_wqe_size() which
adds very little value, and makes some whitespace changes.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20230620140142.9452-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-07-31 15:24:12 -03:00
Bob Pearson
5993b75d0b RDMA/rxe: Fix unsafe drain work queue code
If create_qp does not fully succeed it is possible for qp cleanup
code to attempt to drain the send or recv work queues before the
queues have been created causing a seg fault. This patch checks
to see if the queues exist before attempting to drain them.

Link: https://lore.kernel.org/r/20230620135519.9365-3-rpearsonhpe@gmail.com
Reported-by: syzbot+2da1965168e7dbcba136@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-rdma/00000000000012d89205fe7cfe00@google.com/raw
Fixes: 49dc9c1f0c ("RDMA/rxe: Cleanup reset state handling in rxe_resp.c")
Fixes: fbdeb828a2 ("RDMA/rxe: Cleanup error state handling in rxe_comp.c")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-07-31 15:24:12 -03:00
Bob Pearson
e0ba8ff467 RDMA/rxe: Move work queue code to subroutines
This patch:
	- Moves code to initialize a qp send work queue to a
	  subroutine named rxe_init_sq.
	- Moves code to initialize a qp recv work queue to a
	  subroutine named rxe_init_rq.
	- Moves initialization of qp request and response packet
	  queues ahead of work queue initialization so that cleanup
	  of a qp if it is not fully completed can successfully
	  attempt to drain the packet queues without a seg fault.
	- Makes minor whitespace cleanups.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20230620135519.9365-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-07-31 15:24:12 -03:00