6218 Commits

Author SHA1 Message Date
Mark Zhang
45842fc627 IB/mlx5: Support statistic q counter configuration
Add support for ib callbacks counter_bind_qp(), counter_unbind_qp() and
counter_dealloc().

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05 10:22:55 -03:00
Mark Zhang
318d535cef IB/mlx5: Add counter set id as a parameter for mlx5_ib_query_q_counters()
Add counter set id as a parameter so that this API can be used for
querying any q counter.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05 10:22:55 -03:00
Mark Zhang
d14133dd41 IB/mlx5: Support set qp counter
Support bind a qp with counter. If counter is null then bind the qp to the
default counter. Different QP state has different operation:

- RESET: Set the counter field so that it will take effective during
  RST2INIT change;
- RTS: Issue an RTS2RTS change to update the QP counter;
- Other: Set the counter field and mark the counter_pending flag, when QP
  is moved to RTS state and this flag is set, then issue an RTS2RTS
  modification to update the counter.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05 10:22:55 -03:00
Jason Gunthorpe
5600a410ea Merge mlx5-next into rdma for-next
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Required for dependencies in the next patches.

* mlx5-next:
  net/mlx5: Add rts2rts_qp_counters_set_id field in hca cap
  net/mlx5: Properly name the generic WQE control field
  net/mlx5: Introduce TLS TX offload hardware bits and structures
  net/mlx5: Refactor mlx5_esw_query_functions for modularity
  net/mlx5: E-Switch prepare functions change handler to be modular
  net/mlx5: Introduce and use mlx5_eswitch_get_total_vports()
2019-07-05 10:16:19 -03:00
Daniel Kranzdorf
bcde9a83b1 RDMA/efa: Entropy in admin commands id
Make admin commands id easier to distinguish by using relevant bits from
the producer counter.
This allows us to differentiate admin commands with the same producer
index (happens after admin queue overlap), which is helpful when
debugging.

Signed-off-by: Daniel Kranzdorf <dkkranzd@amazon.com>
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-04 14:31:09 -03:00
Leon Romanovsky
50ba3c18a4 RDMA/mlx5: Use proper allocation API to get zeroed memory
There is no need in custom memory zeroing, because it can be done
by using kzalloc from the beginning.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-04 14:08:18 -03:00
Lijun Ou
9a601fc43e RDMA/hns: Fix building modular hns
The patch below wasn't fully tested for all combinations of module and
configs, and causes a compile failure:

WARNING: modpost: missing MODULE_LICENSE() in drivers/infiniband/hw/hns/hns_roce_ah.o
see include/linux/module.h for more information
WARNING: modpost: missing MODULE_LICENSE() in drivers/infiniband/hw/hns/hns_roce_alloc.o
see include/linux/module.h for more information
WARNING: modpost: missing MODULE_LICENSE() in drivers/infiniband/hw/hns/hns_roce_cmd.o
see include/linux/module.h for more information
WARNING: modpost: missing MODULE_LICENSE() in drivers/infiniband/hw/hns/hns_roce_cq.o
see include/linux/module.h for more information
WARNING: modpost: missing MODULE_LICENSE() in drivers/infiniband/hw/hns/hns_roce_db.o
see include/linux/module.h for more information
WARNING: modpost: missing MODULE_LICENSE() in drivers/infiniband/hw/hns/hns_roce_hem.o
see include/linux/module.h for more information
WARNING: modpost: missing MODULE_LICENSE() in drivers/infiniband/hw/hns/hns_roce_mr.o
see include/linux/module.h for more information
WARNING: modpost: missing MODULE_LICENSE() in drivers/infiniband/hw/hns/hns_roce_pd.o
see include/linux/module.h for more information
WARNING: modpost: missing MODULE_LICENSE() in drivers/infiniband/hw/hns/hns_roce_qp.o
see include/linux/module.h for more information
WARNING: modpost: missing MODULE_LICENSE() in drivers/infiniband/hw/hns/hns_roce_restrack.o
see include/linux/module.h for more information
WARNING: modpost: missing MODULE_LICENSE() in drivers/infiniband/hw/hns/hns_roce_srq.o
see include/linux/module.h for more information
ERROR: "hns_roce_bitmap_cleanup" [drivers/infiniband/hw/hns/hns_roce_srq.ko] undefined!
ERROR: "hns_roce_bitmap_init" [drivers/infiniband/hw/hns/hns_roce_srq.ko] undefined!
ERROR: "hns_roce_free_cmd_mailbox" [drivers/infiniband/hw/hns/hns_roce_srq.ko] undefined!
ERROR: "hns_roce_alloc_cmd_mailbox" [drivers/infiniband/hw/hns/hns_roce_srq.ko] undefined!
ERROR: "hns_roce_table_get" [drivers/infiniband/hw/hns/hns_roce_srq.ko] undefined!
ERROR: "hns_roce_bitmap_alloc" [drivers/infiniband/hw/hns/hns_roce_srq.ko] undefined!
ERROR: "hns_roce_table_find" [drivers/infiniband/hw/hns/hns_roce_srq.ko] undefined!

The fix is to put the module sub components in the right line.

Fixes: e9816ddf2a33 ("RDMA/hns: Cleanup unnecessary exported symbols")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-04 13:41:50 -03:00
Yishai Hadas
5832fdd35e IB/mlx5: DEVX cleanup mdev
No need any more to hold mlx5_core_dev on the devx_object, it can be
accessed from ib_dev.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03 17:13:44 -03:00
Yishai Hadas
ef1659ade3 IB/mlx5: Add DEVX support for CQ events
Add DEVX support for CQ events by creating and destroying the CQ via
mlx5_core and set an handler to manage its completions.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03 17:12:38 -03:00
Yishai Hadas
5ec9d8ee87 IB/mlx5: Implement DEVX dispatching event
Implement DEVX dispatching event by looking up for the applicable
subscriptions for the reported event and using their target fd to
signal/set the event.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03 17:12:38 -03:00
Yishai Hadas
7597385371 IB/mlx5: Enable subscription for device events over DEVX
Enable subscription for device events over DEVX.

Each subscription is added to the two level xarray data structure
according to its event number and the DEVX object information in case was
given with the given target fd.

Those events will be reported over the given fd once will occur.
Downstream patches will mange the dispatching to any subscription.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03 17:12:38 -03:00
Yishai Hadas
e337dd53ce IB/mlx5: Register DEVX with mlx5_core to get async events
Register DEVX with with mlx5_core to get async events.  This will enable
to dispatch the applicable events to its consumers in down stream patches.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03 17:12:37 -03:00
Yishai Hadas
2afc5e1b9c IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD
Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD and its initial
implementation.

This object is from type class FD and will be used to read DEVX
async events.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03 17:11:10 -03:00
Parav Pandit
2752b82316 net/mlx5: Introduce and use mlx5_eswitch_get_total_vports()
Instead MLX5_TOTAL_VPORTS, use mlx5_eswitch_get_total_vports().
mlx5_eswitch_get_total_vports() in subsequent patch accounts for SF
vports as well.
Expanding MLX5_TOTAL_VPORTS macro would require exposing SF internals to
more generic vport.h header file. Such exposure is not desired.
Hence a mlx5_eswitch_get_total_vports() is introduced.

Given that mlx5_eswitch_get_total_vports() API wants to work on const
mlx5_core_dev*, change its helper functions also to accept const *dev.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-03 12:50:42 -07:00
Jason Gunthorpe
69ea0582f3 Merge mlx5-next into rdma for-next
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Required for dependencies in the next patches.

Resolved the conflicts:
 - esw_destroy_offloads_acl_tables() use the newer mlx5_esw_for_all_vports()
   version
 - esw_offloads_steering_init() drop the cap test
 - esw_offloads_init() drop the extra function arguments

* branch 'mlx5-next': (39 commits)
  net/mlx5: Expose device definitions for object events
  net/mlx5: Report EQE data upon CQ completion
  net/mlx5: Report a CQ error event only when a handler was set
  net/mlx5: mlx5_core_create_cq() enhancements
  net/mlx5: Expose the API to register for ANY event
  net/mlx5: Use event mask based on device capabilities
  net/mlx5: Fix mlx5_core_destroy_cq() error flow
  net/mlx5: E-Switch, Handle UC address change in switchdev mode
  net/mlx5: E-Switch, Consider host PF for inline mode and vlan pop
  net/mlx5: E-Switch, Use iterator for vlan and min-inline setups
  net/mlx5: E-Switch, Reg/unreg function changed event at correct stage
  net/mlx5: E-Switch, Consolidate eswitch function number of VFs
  net/mlx5: E-Switch, Refactor eswitch SR-IOV interface
  net/mlx5: Handle host PF vport mac/guid for ECPF
  net/mlx5: E-Switch, Use correct flags when configuring vlan
  net/mlx5: Reduce dependency on enabled_vfs counter and num_vfs
  net/mlx5: Don't handle VF func change if host PF is disabled
  net/mlx5: Limit scope of mlx5_get_next_phys_dev() to PCI PF devices
  net/mlx5: Move pci status reg access mutex to mlx5_pci_init
  net/mlx5: Rename mlx5_pci_dev_type to mlx5_coredev_type
  ...

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03 16:50:26 -03:00
Parav Pandit
2f40cf30c8 IB/mlx5: Fixed reporting counters on 2nd port for Dual port RoCE
Currently during dual port IB device registration in below code flow,

ib_register_device()
  ib_device_register_sysfs()
    ib_setup_port_attrs()
      add_port()
        get_counter_table()
          get_perf_mad()
            process_mad()
              mlx5_ib_process_mad()

mlx5_ib_process_mad() fails on 2nd port when both the ports are not fully
setup at the device level (because 2nd port is unaffiliated).

As a result, get_perf_mad() registers different PMA counter group for 1st
and 2nd port, namely pma_counter_ext and pma_counter. However both ports
have the same capability and counter offsets.

Due to this when counters are read by the user via sysfs in below code
flow, counters are queried from wrong location from the device mainly from
PPCNT instead of VPORT counters.

show_pma_counter()
  get_perf_mad()
    process_mad()
      mlx5_ib_process_mad()
        process_pma_cmd()

This shows all zero counters for 2nd port.

To overcome this, process_pma_cmd() is invoked, and when unaffiliated port
is not yet setup during device registration phase, make the query on the
first port.  while at it, only process_pma_cmd() needs to work on the
native port number and underlying mdev, so shift the get, put calls to
where its needed inside process_pma_cmd().

Fixes: 212f2a87b74f ("IB/mlx5: Route MADs for dual port RoCE")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03 15:08:54 -03:00
Yishai Hadas
4e0e2ea188 net/mlx5: Report EQE data upon CQ completion
Report EQE data upon CQ completion to let upper layers use this data.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 21:00:20 +03:00
Yishai Hadas
38164b7719 net/mlx5: mlx5_core_create_cq() enhancements
Enhance mlx5_core_create_cq() to get the command out buffer from the
callers to let them use the output.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 20:59:32 +03:00
Yishai Hadas
b9a7ba5562 net/mlx5: Use event mask based on device capabilities
Use the reported device capabilities for the supported user events (i.e.
affiliated and un-affiliated) to set the EQ mask.

As the event mask can be up to 256 defined by 4 entries of u64 change
the applicable code to work accordingly.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 20:55:45 +03:00
YueHaibing
6044414fa8 RDMA/hns: Remove set but not used variable 'fclr_write_fail_flag'
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/infiniband/hw/hns/hns_roce_hw_v2.c: In function 'hns_roce_function_clear':
drivers/infiniband/hw/hns/hns_roce_hw_v2.c:1135:7: warning:
 variable 'fclr_write_fail_flag' set but not used [-Wunused-but-set-variable]

It is never used, so can be removed.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03 14:52:15 -03:00
Liu, Changcheng
2e67e77584 RDMA/i40iw: Set queue pair state when being queried
The API for ib_query_qp requires the driver to set qp_state and
cur_qp_state on return, add the missing sets.

Fixes: d37498417947 ("i40iw: add files for iwarp interface")
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03 14:45:56 -03:00
Fuqian Huang
cda8cf56d8 IB/i40iw: Use kmemdup rather than open coding
Use kmemdump instead of kzmalloc + memcpy.

Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03 14:34:42 -03:00
Fuqian Huang
4c44d4634b IB: Remove unneeded memset
In commit af7ddd8a627c ("Merge tag 'dma-mapping-4.21' of
git://git.infradead.org/users/hch/dma-mapping"),
dma_alloc_coherent/dmam_alloc_coherent always zeroed the returned memory.
So the memset after a coherent allocation function is not needed.

Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03 14:26:49 -03:00
Bodong Wang
f6455de0b0 net/mlx5: E-Switch, Refactor eswitch SR-IOV interface
Devlink eswitch mode is not necessarily related to SR-IOV, e.g, ECPF
can be at offload mode when SR-IOV is not enabled.

Rename the interface and eswitch mode names to decouple from SR-IOV,
and cleanup eswitch messages accordingly.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Bodong Wang
b8ca123860 RDMA/mlx5: Cleanup rep when doing unload
When an IB rep is loaded, netdev for the same vport is saved for later
reference. However, it's not cleaned up when doing unload. For ECPF,
kernel crashes when driver is referring to the already removed netdev.

Following steps lead to a shown call trace:
1. Create n VFs from host PF
2. Distroy the VFs
3. Run "rdma link" from ARM

Call trace:
  mlx5_ib_get_netdev+0x9c/0xe8 [mlx5_ib]
  mlx5_query_port_roce+0x268/0x558 [mlx5_ib]
  mlx5_ib_rep_query_port+0x14/0x34 [mlx5_ib]
  ib_query_port+0x9c/0xfc [ib_core]
  fill_port_info+0x74/0x28c [ib_core]
  nldev_port_get_doit+0x1a8/0x1e8 [ib_core]
  rdma_nl_rcv_msg+0x16c/0x1c0 [ib_core]
  rdma_nl_rcv+0xe8/0x144 [ib_core]
  netlink_unicast+0x184/0x214
  netlink_sendmsg+0x288/0x354
  sock_sendmsg+0x18/0x2c
  __sys_sendto+0xbc/0x138
  __arm64_sys_sendto+0x28/0x34
  el0_svc_common+0xb0/0x100
  el0_svc_handler+0x6c/0x84
  el0_svc+0x8/0xc

Cleanup the rep and netdev reference when unloading IB rep.

Fixes: 26628e2d58c9 ("RDMA/mlx5: Move to single device multiport ports in switchdev mode")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Bodong Wang
2f69e591e4 {IB, net}/mlx5: E-Switch, Use index of rep for vport to IB port mapping
In the single IB device mode, the mapping between vport number and
rep relies on a counter. However for dynamic vport allocation, it is
desired to keep consistent map of eswitch vport and IB port.

Hence, simplify code to remove the free running counter and instead
use the available vport index during load/unload sequence from the
eswitch.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Suggested-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Dennis Dalessandro
09fbca8e62 IB/hfi1: No need to use try_module_get for debugfs
The call in debugfs.c for try_module_get() is not needed. A reference to
the module will be taken by the VFS layer as long as the owner field is
set in the file ops struct. So set this as well as remove the call.

Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:34:26 -03:00
Mike Marciniszyn
aa9b79ec37 IB/hfi1: Add missing INVALIDATE opcodes for trace
This was missed in the original implementation of the memory management
extensions.

Fixes: 0db3dfa03c08 ("IB/hfi1: Work request processing for fast register mr and invalidate")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:34:26 -03:00
Michael J. Ruhl
bf3b1e0ce0 IB/hfi1: Reduce excessive aspm inlines
Uninline the aspm API since it increases code space for no reason.

Move the aspm module param to the new aspm C file.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:34:26 -03:00
Michael J. Ruhl
2b0ad2da8f IB/{rdmavt, hfi1, qib}: Add helpers to hide SWQE WR details
Add some helper functions to hide struct rvt_swqe details.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:34:26 -03:00
Michael J. Ruhl
d310c4bf8a IB/{rdmavt, hfi1, qib}: Remove AH refcount for UD QPs
Historically rdmavt destroy_ah() has returned an -EBUSY when the AH has a
non-zero reference count.  IBTA 11.2.2 notes no such return value or error
case:

	Output Modifiers:
	- Verb results:
	- Operation completed successfully.
	- Invalid HCA handle.
	- Invalid address handle.

ULPs never test for this error and this will leak memory.

The reference count exists to allow for driver independent progress
mechanisms to process UD SWQEs in parallel with post sends.  The SWQE will
hold a reference count until the UD SWQE completes and then drops the
reference.

Fix by removing need to reference count the AH.  Add a UD specific
allocation to each SWQE entry to cache the necessary information for
independent progress.  Copy the information during the post send
processing.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:34:26 -03:00
Kamenee Arumugam
5136bfea7e IB/{hfi1, qib, rdmavt}: Put qp in error state when cq is full
When a completion queue is full, the associated queue pairs are not put
into the error state. According to the IBTA specification, this is a
violation.

Quote from IBTA spec:
C9-218: A Requester Class F error occurs when the CQ is inaccessible or
full and an attempt is made to complete a WQE.  The Affected QP shall be
moved to the error state and affiliated asynchronous errors generated as
described in 11.6.3.1 Affiliated Asynchronous Events on page 678. The
current WQE and any subsequent WQEs are left in an unknown state.

C11-37: The CI shall generate a CQ Error when a CQ overrun is
detected. This condition will result in an Affiliated Asynchronous Error
for any associated Work Queues when they attempt to use that
CQ. Completions can no longer be added to the CQ. It is not guaranteed
that completions present in the CQ at the time the error occurred can be
retrieved. Possible causes include a CQ overrun or a CQ protection error.

Put the qp in error state when cq is full. Implement a state called full
to continue to put other associated QPs in error state.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:34:26 -03:00
Kamenee Arumugam
239b0e52d8 IB/hfi1: Move rvt_cq_wc struct into uapi directory
The rvt_cq_wc struct elements are shared between rdmavt and the providers
but not in uapi directory.  As per the comment in
https://marc.info/?l=linux-rdma&m=152296522708522&w=2 The hfi1 driver and
the rdma core driver are not using shared structures in the uapi
directory.

In that case, move rvt_cq_wc struct into the rvt-abi.h header file and
create a rvt_k_cq_w for the kernel completion queue.

Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:32:16 -03:00
Jason Gunthorpe
371bb62158 Linux 5.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0Os1seHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGtx4H/j6i482XzcGFKTBm
 A7mBoQpy+kLtoUov4EtBAR62OuwI8rsahW9di37QKndPoQrczWaKBmr3De6LCdPe
 v3pl3O6wBbvH5ru+qBPFX9PdNbDvimEChh7LHxmMxNQq3M+AjZAZVJyfpoiFnx35
 Fbge+LZaH/k8HMwZmkMr5t9Mpkip715qKg2o9Bua6dkH0AqlcpLlC8d9a+HIVw/z
 aAsyGSU8jRwhoAOJsE9bJf0acQ/pZSqmFp0rDKqeFTSDMsbDRKLGq/dgv4nW0RiW
 s7xqsjb/rdcvirRj3rv9+lcTVkOtEqwk0PVdL9WOf7g4iYrb3SOIZh8ZyViaDSeH
 VTS5zps=
 =huBY
 -----END PGP SIGNATURE-----

Merge tag 'v5.2-rc6' into rdma.git for-next

For dependencies in next patches.

Resolve conflicts:
- Use uverbs_get_cleared_udata() with new cq allocation flow
- Continue to delete nes despite SPDX conflict
- Resolve list appends in mlx5_command_str()
- Use u16 for vport_rule stuff
- Resolve list appends in struct ib_client

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 21:18:23 -03:00
Jianbo Liu
669ff1e32f RDMA/mlx5: Add vport metadata matching for IB representors
If vport metadata matching is enabled in eswitch, the rule created
must be changed to match on the metadata, instead of source port.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:29 -07:00
Jianbo Liu
bb0ee7dcc4 net/mlx5: Add flow context for flow tag
Refactor the flow data structures, add new flow_context and move
flow_tag into it, as flow_tag doesn't belong to the rule action.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:28 -07:00
Colin Ian King
10dcc7448e RDMA/hns: fix spelling mistake "attatch" -> "attach"
There is a spelling mistake in an dev_err message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25 16:33:00 -03:00
Lijun Ou
e9816ddf2a RDMA/hns: Cleanup unnecessary exported symbols
This patch removes the hns-roce.ko for cleanup all the exported symbols in
common part.

Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25 14:48:44 -03:00
Dan Carpenter
b417c0879d RDMA/hns: Fix an error code in hns_roce_set_user_sq_size()
This function is supposed to return negative kernel error codes but here
it returns CMD_RST_PRC_EBUSY (2).  The error code eventually gets passed
to IS_ERR() and since it's not an error pointer it leads to an Oops in
hns_roce_v1_rsv_lp_qp()

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25 10:21:13 -03:00
Colin Ian King
7ef7587541 RDMA/hns: fix potential integer overflow on left shift
There is a potential integer overflow when int i is left shifted as this
is evaluated using 32 bit arithmetic but is being used in a context that
expects an expression of type dma_addr_t.  Fix this by casting integer i
to dma_addr_t before shifting to avoid the overflow.

Addresses-Coverity: ("Unintentional integer overflow")
Fixes: 2ac0bc5e725e ("RDMA/hns: Add a group interfaces for optimizing buffers getting flow")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25 10:18:19 -03:00
Matthew Wilcox
792c4e9d0b net/mlx5: Convert mkey_table to XArray
The lock protecting the data structure does not need to be an rwlock.  The
only read access to the lock is in an error path, and if that's limiting
your scalability, you have bigger performance problems.

Eliminate mlx5_mkey_table in favour of using the xarray directly.
reg_mr_callback must use GFP_ATOMIC for allocating XArray nodes as it may
be called in interrupt context.

This also fixes a minor bug where SRCU locking was being used on the radix
tree read side, when RCU was needed too.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-24 16:44:40 -07:00
Max Gurtovoy
7796d2a3bb RDMA/mlx5: Refactor MR descriptors allocation
Improve code readability using static helpers for each memory region
type. Re-use the common logic to get smaller functions that are easy
to maintain and reduce code duplication.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:28 -03:00
Max Gurtovoy
2563e2f30a RDMA/mlx5: Use PA mapping for PI handover
If possibe, avoid doing a UMR operation to register data and protection
buffers (via MTT/KLM mkeys). Instead, use the local DMA key and map the
SG lists using PA access. This is safe, since the internal key for data
and protection never exposed to the remote server (only signature key
might be exposed). If PA mappings are not possible, perform mapping
using MTT/KLM descriptors.

The setup of the tested benchmark (using iSER ULP):
 - 2 servers with 24 cores (1 initiator and 1 target)
 - ConnectX-4/ConnectX-5 adapters
 - 24 target sessions with 1 LUN each
 - ramdisk backstore
 - PI active

Performance results running fio (24 jobs, 128 iodepth) using
write_generate=1 and read_verify=1 (w/w.o patch):

bs      IOPS(read)        IOPS(write)
----    ----------        ----------
512   1266.4K/1262.4K    1720.1K/1732.1K
4k    793139/570902      1129.6K/773982
32k   72660/72086        97229/96164

Using write_generate=0 and read_verify=0 (w/w.o patch):
bs      IOPS(read)        IOPS(write)
----    ----------        ----------
512   1590.2K/1600.1K    1828.2K/1830.3K
4k    1078.1K/937272     1142.1K/815304
32k   77012/77369        98125/97435

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:28 -03:00
Israel Rukshin
de0ae958de RDMA/mlx5: Improve PI handover performance
In some loads, there is performance degradation when using KLM mkey
instead of MTT mkey. This is because KLM descriptor access is via
indirection that might require more HW resources and cycles.
Using KLM descriptor is not necessary when there are no gaps at the
data/metadata sg lists. As an optimization, use MTT mkey whenever it
is possible. For that matter, allocate internal MTT mkey and choose the
effective pi_mr for in transaction according to the required mapping
scheme.

The setup of the tested benchmark (using iSER ULP):
 - 2 servers with 24 cores (1 initiator and 1 target)
 - ConnectX-4/ConnectX-5 adapters
 - 24 target sessions with 1 LUN each
 - ramdisk backstore
 - PI active

Performance results running fio (24 jobs, 128 iodepth) using
write_generate=1 and read_verify=1 (w/w.o/baseline):

bs      IOPS(read)                IOPS(write)
----    ----------                ----------
512   1262.4K/1243.3K/1147.1K    1732.1K/1725.1K/1423.8K
4k    570902/571233/457874       773982/743293/642080
32k   72086/72388/71933          96164/71789/93249

Using write_generate=0 and read_verify=0 (w/w.o patch):
bs      IOPS(read)                IOPS(write)
----    ----------                ----------
512   1600.1K/1572.1K/1393.3K    1830.3K/1823.5K/1557.2K
4k    937272/921992/762934       815304/753772/646071
32k   77369/75052/72058          97435/73180/94612

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Suggested-by: Max Gurtovoy <maxg@mellanox.com>
Suggested-by: Idan Burstein <idanb@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:28 -03:00
Israel Rukshin
5c171cbe3a RDMA/mlx5: Remove unused IB_WR_REG_SIG_MR code
IB_WR_REG_SIG_MR is not needed after IB_WR_REG_MR_INTEGRITY
was used.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:28 -03:00
Max Gurtovoy
185eddc457 RDMA/core: Validate integrity handover device cap
Protect the case that a ULP tries to allocate a QP with signature
enabled flag while the LLD doesn't support this feature.
While we're here, also move integrity_en attribute from mlx5_qp to
ib_qp as a preparation for adding new integrity API to the rw-API
(that is part of ib_core module).

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Israel Rukshin
c0a6cbb9cb RDMA/core: Rename signature qp create flag and signature device capability
Rename IB_QP_CREATE_SIGNATURE_EN to IB_QP_CREATE_INTEGRITY_EN
and IB_DEVICE_SIGNATURE_HANDOVER to IB_DEVICE_INTEGRITY_HANDOVER.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Max Gurtovoy
38ca87c6f1 RDMA/mlx5: Introduce and implement new IB_WR_REG_MR_INTEGRITY work request
This new WR will be used to perform PI (protection information) handover
using the new API. Using the new API, the user will post a single WR that
will internally perform all the needed actions to complete PI operation.
This new WR will use a memory region that was allocated as
IB_MR_TYPE_INTEGRITY and was mapped using ib_map_mr_sg_pi to perform the
registration. In the old API, in order to perform a signature handover
operation, each ULP should perform the following:
1. Map and register the data buffers.
2. Map and register the protection buffers.
3. Post a special reg WR to configure the signature handover operation
   layout.
4. Invalidate the signature memory key.
5. Invalidate protection buffers memory key.
6. Invalidate data buffers memory key.

In the new API, the mapping of both data and protection buffers is
performed using a single call to ib_map_mr_sg_pi function. Also the
registration of the buffers and the configuration of the signature
operation layout is done by a single new work request called
IB_WR_REG_MR_INTEGRITY.
This patch implements this operation for mlx5 devices that are capable to
offload data integrity generation/validation while performing the actual
buffer transfer.
This patch will not remove the old signature API that is used by the iSER
initiator and target drivers. This will be done in the future.

In the internal implementation, for each IB_WR_REG_MR_INTEGRITY work
request, we are using a single UMR operation to register both data and
protection buffers using KLM's.
Afterwards, another UMR operation will describe the strided block format.
These will be followed by 2 SET_PSV operations to set the memory/wire
domains initial signature parameters passed by the user.
In the end of the whole transaction, only the signature memory key
(the one that exposed for the RDMA operation) will be invalidated.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Max Gurtovoy
22465bba39 RDMA/mlx5: Update set_sig_data_segment attribute for new signature API
Explicitly pass the sig_mr and the access flags for the mkey segment
configuration. This function will be used also in the new signature
API, so modify it in order to use it in both APIs. This is a preparation
commit before adding new signature API.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Max Gurtovoy
9ac7c4bcd3 RDMA/mlx5: Pass UMR segment flags instead of boolean
UMR ctrl segment flags can vary between UMR operations. for example,
using inline UMR or adding free/not-free checks for a memory key.
This is a preparation commit before adding new signature API that
will not need not-free checks for the internal memory key during the
UMR operation.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:26 -03:00