IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This is significantly bug fixes and general cleanups. The noteworthy new
features are fairly small:
- XRC support for HNS and improves RQ operations
- Bug fixes and updates for hns, mlx5, bnxt_re, hfi1, i40iw, rxe, siw and
qib
- Quite a few general cleanups on spelling, error handling, static checker
detections, etc
- Increase the number of device ports supported beyond 255. High port
count software switches now exist
- Several bug fixes for rtrs
- mlx5 Device Memory support for host controlled atomics
- Report SRQ tables through to rdma-tool
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmCMMHEACgkQOG33FX4g
mxri3Q//RAgIExCGHebQ9xkptZHVyTLLJMpiMl2cqk3ZVRdDZ7QdiQjIqY2KqlUK
nxBj7EXJeX6rV5a1xqCcOO1gBetB28TSwnCNE2ZqrXP5B59ISW8D052IWza3UkUz
WmHLARxHQlyKBWA4+ZAgfoUGL0NmWA8QPf56t/RK/3/OsuYnGzcnWmmFbt8XKFcH
NtO3KC45mKWDqqG0A0XRrLbEQz/ElO3OuPBqlBKgB3ZgGPzgsOUTOGkm1tCcZ89L
/pvZGB7SklKZdCX8TxdpVGd9h0zHl8pqh1yEzvTA1ypNAYSUId2mvZXluU8J5yJl
FLk7E1IxE5050FNEc7T5uZdUVntulYiqL2558coRI34l5w26pKGjIMxw/nTB8hg8
4ZfBtKVemIG6yzW5Up6iBpK7qWYpvLWVShwYAWhbNsjN7JGzJuh1gJnjbmYgyz2P
RTMU9wjFPLL2wZxg4LDHACVJNBb82j6KKuE+kZWpk11ro7INw9+7YwRuTo7/ezxC
BwXKu8wF4igwSigV55jM+WnGXLhxdC3qmx/2cbtWyLM/PzdRL96tM0RWW5v8/Nv7
teFhkt+f3RVqcfYH5K1qCXy3UFrxG6bxFSvcHHSBx2bdIrqhuTY5FqszAYImeW2j
iHoyIsuSuGu79HQgOzAQZsEyksWi6OYDvA9Q9VBoPP4bJ3DOAa4=
=vsXA
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"This is significantly bug fixes and general cleanups. The noteworthy
new features are fairly small:
- XRC support for HNS and improves RQ operations
- Bug fixes and updates for hns, mlx5, bnxt_re, hfi1, i40iw, rxe, siw
and qib
- Quite a few general cleanups on spelling, error handling, static
checker detections, etc
- Increase the number of device ports supported beyond 255. High port
count software switches now exist
- Several bug fixes for rtrs
- mlx5 Device Memory support for host controlled atomics
- Report SRQ tables through to rdma-tool"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (145 commits)
IB/qib: Remove redundant assignment to ret
RDMA/nldev: Add copy-on-fork attribute to get sys command
RDMA/bnxt_re: Fix a double free in bnxt_qplib_alloc_res
RDMA/siw: Fix a use after free in siw_alloc_mr
IB/hfi1: Remove redundant variable rcd
RDMA/nldev: Add QP numbers to SRQ information
RDMA/nldev: Return SRQ information
RDMA/restrack: Add support to get resource tracking for SRQ
RDMA/nldev: Return context information
RDMA/core: Add CM to restrack after successful attachment to a device
RDMA/cma: Skip device which doesn't support CM
RDMA/rxe: Fix a bug in rxe_fill_ip_info()
RDMA/mlx5: Expose private query port
RDMA/mlx4: Remove an unused variable
RDMA/mlx5: Fix type assignment for ICM DM
IB/mlx5: Set right RoCE l3 type and roce version while deleting GID
RDMA/i40iw: Fix error unwinding when i40iw_hmc_sd_one fails
RDMA/cxgb4: add missing qpid increment
IB/ipoib: Remove unnecessary struct declaration
RDMA/bnxt_re: Get rid of custom module reference counting
...
All other users of the dummy netdevice embed the netdev in other
structures:
init_dummy_netdev(&mal->dummy_dev);
init_dummy_netdev(ð->dummy_dev);
init_dummy_netdev(&ar->napi_dev);
init_dummy_netdev(&irq_grp->napi_ndev);
init_dummy_netdev(&wil->napi_ndev);
init_dummy_netdev(&trans_pcie->napi_dev);
init_dummy_netdev(&dev->napi_dev);
init_dummy_netdev(&bus->mux_dev);
The AIP and VNIC implementation turns that model inside out and used a
kfree() to free what appears to be a netdev struct when in reality, it is
a struct that enbodies the rx state as well as the dummy netdev used to
support napi_poll across disparate receive contexts. The relationship is
infered by the odd allocation:
const int netdev_size = sizeof(*dd->dummy_netdev) +
sizeof(struct hfi1_netdev_priv);
<snip>
dd->dummy_netdev = kcalloc_node(1, netdev_size, GFP_KERNEL, dd->node);
Correct the issue by:
- Correctly naming the alloc and free functions
- Renaming hfi1_netdev_priv to hfi1_netdev_rx
- Replacing dd dummy_netdev with a netdev_rx pointer
- Embedding the net_device in hfi1_netdev_rx
- Moving the init_dummy_netdev to the alloc routine
- Adjusting wrappers to fit the new model
Fixes: 6991abcb99 ("IB/hfi1: Add functions to receive accelerated ipoib packets")
Link: https://lore.kernel.org/r/1617026056-50483-11-git-send-email-dennis.dalessandro@cornelisnetworks.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Current code uses many different types when dealing with a port of a RDMA
device: u8, unsigned int and u32. Switch to u32 to clean up the logic.
This allows us to make (at least) the core view consistent and use the
same type. Unfortunately not all places can be converted. Many uverbs
functions expect port to be u8 so keep those places in order not to break
UAPIs. HW/Spec defined values must also not be changed.
With the switch to u32 we now can support devices with more than 255
ports. U32_MAX is reserved to make control logic a bit easier to deal
with. As a device with U32_MAX ports probably isn't going to happen any
time soon this seems like a non issue.
When a device with more than 255 ports is created uverbs will report the
RDMA device as having 255 ports as this is the max currently supported.
The verbs interface is not changed yet because the IBTA spec limits the
port size in too many places to be u8 and all applications that relies in
verbs won't be able to cope with this change. At this stage, we are
extending the interfaces that are using vendor channel solely
Once the limitation is lifted mlx5 in switchdev mode will be able to have
thousands of SFs created by the device. As the only instance of an RDMA
device that reports more than 255 ports will be a representor device and
it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other
ULPs aren't effected by this change and their sysfs/interfaces that are
exposes to userspace can remain unchanged.
While here cleanup some alignment issues and remove unneeded sanity
checks (mainly in rdmavt),
Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.org
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Two earlier bug fixes have created a security problem in the hfi1
driver. One fix aimed to solve an issue where current->mm was not valid
when closing the hfi1 cdev. It attempted to do this by saving a cached
value of the current->mm pointer at file open time. This is a problem if
another process with access to the FD calls in via write() or ioctl() to
pin pages via the hfi driver. The other fix tried to solve a use after
free by taking a reference on the mm.
To fix this correctly we use the existing cached value of the mm in the
mmu notifier. Now we can check in the insert, evict, etc. routines that
current->mm matched what the notifier was registered for. If not, then
don't allow access. The register of the mmu notifier will save the mm
pointer.
Since in do_exit() the exit_mm() is called before exit_files(), which
would call our close routine a reference is needed on the mm. We rely on
the mmgrab done by the registration of the notifier, whereas before it was
explicit. The mmu notifier deregistration happens when the user context is
torn down, the creation of which triggered the registration.
Also of note is we do not do any explicit work to protect the interval
tree notifier. It doesn't seem that this is going to be needed since we
aren't actually doing anything with current->mm. The interval tree
notifier stuff still has a FIXME noted from a previous commit that will be
addressed in a follow on patch.
Cc: <stable@vger.kernel.org>
Fixes: e0cf75deab ("IB/hfi1: Fix mm_struct use after free")
Fixes: 3faa3d9a30 ("IB/hfi1: Make use of mm consistent")
Link: https://lore.kernel.org/r/20201125210112.104301.51331.stgit@awfm-01.aw.intel.com
Suggested-by: Jann Horn <jannh@google.com>
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
As described in earlier patches, ipoib netdev will share receive
contexts with existing VNIC netdev through a dummy netdev. The
following changes are made to achieve that:
- Set up netdev receive contexts after user contexts. A function is
added to count the available netdev receive contexts.
- Add functions to set/get receive map table free index.
- Rename NUM_VNIC_MAP_ENTRIES as NUM_NETDEV_MAP_ENTRIES.
- Let the dummy netdev own the receive contexts instead of VNIC.
- Allocate the dummy netdev when the hfi1 device is added and free it
when the device is removed.
- Initialize AIP RSM rules when the IpoIb rxq is initialized and
remove the rules when it is de-initialized.
- Convert VNIC to use the dummy netdev.
Link: https://lore.kernel.org/r/20200511160649.173205.4626.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch adds the interrupt handler function, the NAPI poll
function, and its associated helper functions for receiving
accelerated ipoib packets. While we are here, fix the formats
of two error printouts.
Link: https://lore.kernel.org/r/20200511160637.173205.64890.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Ipoib netdev will share receive contexts with existing VNIC netdev.
To achieve that, a dummy netdev is allocated with hfi1_devdata to
own the receive contexts, and ipoib and VNIC netdevs will be put
on top of it. Each receive context is associated with a single
NAPI object.
This patch adds the functions to receive incoming packets for
accelerated ipoib.
Link: https://lore.kernel.org/r/20200511160631.173205.54184.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Rename num_vnic_contexts as num_ndetdev_contexts since VNIC and ipoib
will share the same set of receive contexts.
Link: https://lore.kernel.org/r/20200511160625.173205.53306.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This is implementation of RSM rule for AIP packets.
AIP rule will use rule RSM2 and will match standard
Infiniband packet containg BTH (LNH==BTH) and
having Dest QPN prefixed with value 0x81. Spread between
receive contexts will be done using source QPN bits.
VNIC and AIP will share receive contexts, so their rules
will point to the same RMT entries and their shared
code is moved to separate functions.
If any of the rules is active RMT mapping will be skipped
for latter.
Changed function hfi1_vnic_is_rsm_full to be more general
and moved it from main header to chip.c.
Changed the order of RSM rules because AIP rule as
more specific one is needed to be placed before more
general QOS rule. Rules are occupying two last RSM
registers.
Link: https://lore.kernel.org/r/20200511160612.173205.73002.stgit@awfm-01.aw.intel.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The module parameter for KDETH qpns is being removed in favor
of always using the default value of 0x80 as the qpn prefix.
Defines have been added for various KDETH values including
the prefix of 0x80.
The reserved range now starts at the base value for KDETH
qpns (0x80) and extends up to and including the last qpn for
other reserved QP prefixed types.
Adjust other QP prefixed define names to match KDETH defined
names.
Link: https://lore.kernel.org/r/20200511160600.173205.27508.stgit@awfm-01.aw.intel.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The field kobj was added to hfi1_devdata structure to manage the life time
of the hfi1_devdata structure for PSM accesses:
commit e11ffbd575 ("IB/hfi1: Do not free hfi1 cdev parent structure early")
Later another mechanism user_refcount/user_comp was introduced to provide
the same functionality:
commit acd7c8fe14 ("IB/hfi1: Fix an Oops on pci device force remove")
This patch will remove this kobj field, as it is no longer needed.
Link: https://lore.kernel.org/r/20200316210500.7753.4145.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
All other code paths increment some form of drop counter.
This was missed in the original implementation.
Fixes: 82c2611daa ("staging/rdma/hfi1: Handle packets with invalid RHF on context 0")
Link: https://lore.kernel.org/r/20200106134228.119356.96828.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Packet receiving functions returns int value, and yet the return values
are not used at all.
This patch converts the functions to return void.
Link: https://lore.kernel.org/r/20200106134222.119356.84098.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch pushes special case drop logic into an API to be shared by all
interrupt handlers.
Additionally, convert do_drop to a bool.
Link: https://lore.kernel.org/r/20200106134203.119356.36962.stgit@awfm-01.aw.intel.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch eliminate special cases by adding a fast_handler member to the
receive context and changes to the fast handler as specified in the new
variable. Initialize the variable as soon as the setting for dma tail is
known when the context is created.
Setting fast path is called every time when any context has entered slow
path. Add function to check if contexts is using fast path and do not set
fast path when it is already done to improve RCD fastpath setting.
Link: https://lore.kernel.org/r/20200106134150.119356.87558.stgit@awfm-01.aw.intel.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch adds a set of accessor routines to access context members.
Link: https://lore.kernel.org/r/20191219211922.58387.26548.stgit@awfm-01.aw.intel.com
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This converts one of the two users of mmu_notifiers to use the new API.
The conversion is fairly straightforward, however the existing use of
notifiers here seems to be racey.
Link: https://lore.kernel.org/r/20191112202231.3856-7-jgg@ziepe.ca
Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This paves the way for another patch that reacts to a
flush sdma completion for RC.
Fixes: 81cd3891f0 ("IB/hfi1: Add support for 16B Management Packets")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Also remove hfi1_devs_list.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When disabling and removing a receive context, it is possible for an
asynchronous event (i.e IRQ) to occur. Because of this, there is a race
between cleaning up the context, and the context being used by the
asynchronous event.
cpu 0 (context cleanup)
rc->ref_count-- (ref_count == 0)
hfi1_rcd_free()
cpu 1 (IRQ (with rcd index))
rcd_get_by_index()
lock
ref_count+++ <-- reference count race (WARNING)
return rcd
unlock
cpu 0
hfi1_free_ctxtdata() <-- incorrect free location
lock
remove rcd from array
unlock
free rcd
This race will cause the following WARNING trace:
WARNING: CPU: 0 PID: 175027 at include/linux/kref.h:52 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
CPU: 0 PID: 175027 Comm: IMB-MPI1 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1
Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
Call Trace:
dump_stack+0x19/0x1b
__warn+0xd8/0x100
warn_slowpath_null+0x1d/0x20
hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
is_rcv_urgent_int+0x24/0x90 [hfi1]
general_interrupt+0x1b6/0x210 [hfi1]
__handle_irq_event_percpu+0x44/0x1c0
handle_irq_event_percpu+0x32/0x80
handle_irq_event+0x3c/0x60
handle_edge_irq+0x7f/0x150
handle_irq+0xe4/0x1a0
do_IRQ+0x4d/0xf0
common_interrupt+0x162/0x162
The race can also lead to a use after free which could be similar to:
general protection fault: 0000 1 SMP
CPU: 71 PID: 177147 Comm: IMB-MPI1 Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1
Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
task: ffff9962a8098000 ti: ffff99717a508000 task.ti: ffff99717a508000 __kmalloc+0x94/0x230
Call Trace:
? hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
hfi1_aio_write+0xba/0x110 [hfi1]
do_sync_readv_writev+0x7b/0xd0
do_readv_writev+0xce/0x260
? handle_mm_fault+0x39d/0x9b0
? pick_next_task_fair+0x5f/0x1b0
? sched_clock_cpu+0x85/0xc0
? __schedule+0x13a/0x890
vfs_writev+0x35/0x60
SyS_writev+0x7f/0x110
system_call_fastpath+0x22/0x27
Use the appropriate kref API to verify access.
Reorder context cleanup to ensure context removal before cleanup occurs
correctly.
Cc: stable@vger.kernel.org # v4.14.0+
Fixes: f683c80ca6 ("IB/hfi1: Resolve kernel panics by reference counting receive contexts")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This commit adds the TID RDMA READ pointers to the receiving opcode
handlers. It also adds TID RDMA READ header sizes to header size table.
A function to print the RHF EFLAGS errors is created so that it can be
shared by both IB and TID RDMA receiving functions.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
TID entries are used by hfi1 hardware to receive data payload from
incoming packets directly into a user buffer and thus avoid data copying
by software. This patch implements the functions for TID allocation,
freeing, and programming TID RcvArray entries in hardware for kernel
clients. TID entries are managed via lists of TID groups similar to PSM.
Furthermore, to track TID resource allocation for each request, software
flows are also allocated and freed as needed. Since software flows
consume large amount of memory for tracking TID allocation and freeing,
it is generally desirable to allocate them dynamically in the send queue
and only for TID RDMA requests, but pre-allocate them for receive queue
because the send queue could have thousands of entries while the receive
queue has only a limited number of entries.
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The hfi1 hardware flow is a hardware flow-control mechanism for a KDETH
data packet that is received on a hfi1 port. It validates the packet by
checking both the generation and sequence. Each QP that uses the TID RDMA
mechanism will allocate a hardware flow from its receiving context for
any incoming KDETH data packets.
This patch implements:
(1) a function to allocate hardware flow
(2) a function to free hardware flow
(3) a function to initialize hardware flow generation for a receiving
context
(4) a wait mechanism if the hardware flow is not available
(4) a function to remove the qp from the wait queue for hardware flow
when the qp is reset or destroyed.
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
OPFN allows a pair of connected RC QPs to exchange a set of parameters
in succession. The parameter exchange itself is done using the IB compare
and swap request with a special virtual address. The request is triggered
using a reserved IB work request opcode. This patch implements the OPFN
interface to initialize, start, process, and reset the OPFN request.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
OPFN (Omni Path Feature Negotiation) support discovery allows a RC QP to
announce that it supports OPFN and also discover if OPFN is supported by
the peer QP. OPFN parameter negotiation is skipped unless OPFN support is
first discovered. OPFN support is announced by claiming what was
the reserved bit in dword 1 of OmniPath modified base transport header
in requests and responses.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This has been a fairly typical cycle, with the usual sorts of driver
updates. Several series continue to come through which improve and
modernize various parts of the core code, and we finally are starting to
get the uAPI command interface cleaned up.
- Various driver fixes for bnxt_re, cxgb3/4, hfi1, hns, i40iw, mlx4, mlx5,
qib, rxe, usnic
- Rework the entire syscall flow for uverbs to be able to run over
ioctl(). Finally getting past the historic bad choice to use write()
for command execution
- More functional coverage with the mlx5 'devx' user API
- Start of the HFI1 series for 'TID RDMA'
- SRQ support in the hns driver
- Support for new IBTA defined 2x lane widths
- A big series to consolidate all the driver function pointers into
a big struct and have drivers provide a 'static const' version of the
struct instead of open coding initialization
- New 'advise_mr' uAPI to control device caching/loading of page tables
- Support for inline data in SRPT
- Modernize how umad uses the driver core and creates cdev's and sysfs
files
- First steps toward removing 'uobject' from the view of the drivers
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlwhV2oACgkQOG33FX4g
mxpF8A/9EkRCg6wCDC59maA53b5PjuNmD//9hXbycQPQSlxntI2PyYtxrzBqc0+2
yIaFFMehL41XNN6y1zfkl7ndl62McCH2TpiidU8RyTxVw/e3KsDD5sU6++atfHRo
M82RNfedDtxPG8TcCPKVLof6JHADApGSR1r4dCYfAnu7KFMyvlLmeYyx4r/2E6yC
iQPmtKVOdbGkuWGeX+brGEA0vg7FUOAvaysnxddjyh9hyem4h0SUR3Af/Ik0N5ME
PYzC+hMKbkPVBLoCWyg7QwUaqK37uWwguMQLtI2byF7FgbiK/lBQt6TsidR4Fw3p
EalL7uqxgCTtLYh918vxLFjdYt6laka9j7xKCX8M8d06sy/Lo8iV4hWjiTESfMFG
usqs7D6p09gA/y1KISji81j6BI7C92CPVK2drKIEnfyLgY5dBNFcv9m2H12lUCH2
NGbfCNVaTQVX6bFWPpy2Bt2y/Litsfxw5RviehD7jlG0lQjsXGDkZzsDxrMSSlNU
S79iiTJyK4kUZkXzrSSlN58pLBlbupJwm5MDjKmM+irsrsCHjGIULvc902qtnC3/
8ImiTtW6XvqLbgWXyy2Th8/ZgRY234p1ybhog+DFaGKUch0XqB7VXTV2OZm0GjcN
Fp4PUeBt+/gBgYqjpuffqQc1rI4uwXYSoz7wq9RBiOpw5zBFT1E=
=T0p1
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"This has been a fairly typical cycle, with the usual sorts of driver
updates. Several series continue to come through which improve and
modernize various parts of the core code, and we finally are starting
to get the uAPI command interface cleaned up.
- Various driver fixes for bnxt_re, cxgb3/4, hfi1, hns, i40iw, mlx4,
mlx5, qib, rxe, usnic
- Rework the entire syscall flow for uverbs to be able to run over
ioctl(). Finally getting past the historic bad choice to use
write() for command execution
- More functional coverage with the mlx5 'devx' user API
- Start of the HFI1 series for 'TID RDMA'
- SRQ support in the hns driver
- Support for new IBTA defined 2x lane widths
- A big series to consolidate all the driver function pointers into a
big struct and have drivers provide a 'static const' version of the
struct instead of open coding initialization
- New 'advise_mr' uAPI to control device caching/loading of page
tables
- Support for inline data in SRPT
- Modernize how umad uses the driver core and creates cdev's and
sysfs files
- First steps toward removing 'uobject' from the view of the drivers"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (193 commits)
RDMA/srpt: Use kmem_cache_free() instead of kfree()
RDMA/mlx5: Signedness bug in UVERBS_HANDLER()
IB/uverbs: Signedness bug in UVERBS_HANDLER()
IB/mlx5: Allocate the per-port Q counter shared when DEVX is supported
IB/umad: Start using dev_groups of class
IB/umad: Use class_groups and let core create class file
IB/umad: Refactor code to use cdev_device_add()
IB/umad: Avoid destroying device while it is accessed
IB/umad: Simplify and avoid dynamic allocation of class
IB/mlx5: Fix wrong error unwind
IB/mlx4: Remove set but not used variable 'pd'
RDMA/iwcm: Don't copy past the end of dev_name() string
IB/mlx5: Fix long EEH recover time with NVMe offloads
IB/mlx5: Simplify netdev unbinding
IB/core: Move query port to ioctl
RDMA/nldev: Expose port_cap_flags2
IB/core: uverbs copy to struct or zero helper
IB/rxe: Reuse code which sets port state
IB/rxe: Make counters thread safe
IB/mlx5: Use the correct commands for UMEM and UCTX allocation
...
A CA is supposed to ignore FECN bits in multicast, ACK, and CNP
packets. This patch corrects the behavior of the HFI1 driver in this
regard by ignoring FECNs in those packet types.
While fixing the above behavior, fix the extraction of the FECN and BECN
bits from the packet headers for both 9B and 16B packets.
Furthermore, this patch corrects the driver's response to a FECN in RDMA
READ RESPONSE packets. Instead of sending an "empty" ACK, the driver now
sends a CNP packet. While editing that code path, add the missing trace
for CNP packets.
Fixes: 88733e3b84 ("IB/hfi1: Add 16B UD support")
Fixes: f59fb9e051 ("IB/hfi1: Fix handling of FECN marked multicast packet")
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Use rdma_set_device_sysfs_group() to register device attributes and
simplify the driver.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
User contexts use the receive URGENT interrupt. However, enabling
the IRQ SRC in the file_ops module is not as clean as it could be.
Augment the _rcvctl() function to be able to enable/disable the IRQ
source.
Use the new interface from file_ops to enable/disable the IRQ.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The current IRQ API is an all or nothing interface. This has two
problems:
1. All IRQs are enabled regardless of use
2. Moving from general interrupt to MSIx handling is difficult
Introduce a new API to enable/disable specific IRQs or a range of IRQs.
Do not enable and disable all IRQs in one step.
Rework various modules to enable/disable IRQs when needed.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The current method of allocating MSIx resources is a bit cumbersome,
and not very easily added to.
Refactor and re-order the code paths into a more consistent interface.
Update the interface so that allocations are not order dependent.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The current HFI1 MSIx API is difficult to follow, change, or add to.
In anticipation of moving to an more flexible API, move the current
MSIx functionality to the new msix.c module.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Currently several things occur before the hfi1_devdata structure is
allocated. This leads to an inconsistent logging ability and makes
it more difficult to restructure some code paths.
Allocate (and do a minimal init) the structure as soon as possible.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The tune_pcie_caps needs to occur sometime after PCI is enabled, but
before the HFI is enabled. Currently it is placed in the MSIx
allocation code which doesn't really fit. Moving it to just after
the gen3 bump.
Clean up the associated code (modules, etc.).
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The INTx IRQ support does not work for all HF1 IRQ handlers
(specifically the receive data IRQs).
Remove all supporting code for the INTx IRQ.
If the requested MSIx vector request is unsuccessful, do not allow the
driver to continue.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Many fields in ctxtdata are incorrectly sized and the organization of the
fields within the structure is a jumble.
Fix by:
- Correcting oversize fields.
- Putting fields common to all contexts at the top with hot fields
at the top.
- Moving PSM fields to the bottom of the structure.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Remove the sizeable cache of the chip sizing CSRs and replace with CSR
reads as needed.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Fields in this structure are sized excessively based on hardware
limitations and input values.
Fix by reducing fields as appropriate and repositioning to close holes in
the structure.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
It is only ever written.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The usage of this ctxt data field is not hot path and the value can be
computed on demand to cut down the ctxtdata bloat.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The field is based on a constant that can never change.
Use the define to assign the register instead.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This field should be in ctxtdata to allow for better locality of access by
eliminating a dd dereference.
The new field is now side-by-side with rcvhdrqentsize since the rhf_offset
is a function of the rcvhdrqentsize.
Both fields are now correctly sized as u8.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The current implementation precludes having receive context specific
packet type receive handlers.
Fix this by adding adding c99 const array for the existing handlers and
remove the current 72 bytes of pointers from devdata.
A new pointer in hfi1_ctxtdata will point to the const array.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The mutex exp_lock in struct hfi1_ctxtdata is used to protect all
Expected TID data of a user context. This patch renames it to exp_mutex
to better reflect its identity and prepare for upcoming patches.
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>