IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The current write wqe mechanism is to write to DDR first, and then notify
the hardware through doorbell to read the data. Direct wqe is a mechanism
to fill wqe directly into the hardware. In the case of light load, the wqe
will be filled into pcie bar space of the hardware, this will reduce one
memory access operation and therefore reduce the latency. SIMD
instructions allows cpu to write the 512 bits at one time to device
memory, thus it can be used for posting direct wqe.
Add direct wqe enable switch and address mapping.
Link: https://lore.kernel.org/r/20211207124901.42123-2-liangwenpeng@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Due to the discrete nature of the HIP08 timer unit, a requester might
finish the timeout period sooner, in elapsed real time, than its responder
does, even when both sides share the identical RNR timeout length included
in the RNR Nak packet and the responder indeed starts the timing prior to
the requester. Furthermore, if a 'providential' resend packet arrived
before the responder's timeout period expired, the responder is certainly
entitled to drop the packet silently in the light of IB protocol.
To address this problem, our team made good use of certain hardware facts:
1) The timing resolution regards the transmission arrangements is 1
microsecond, e.g. if cq_period field is set to 3, it would be
interpreted as 3 microsecond by hardware
2) A QPC field shall inform the hardware how many timing unit (ticks)
constitutes a full microsecond, which, by default, is 1000
3) It takes 14ns for the processor to handle a packet in the buffer, so
the RNR timeout length of 10ns would ensure our processing mechanism is
disabled during the entire timeout period and the packet won't be
dropped silently
To achieve (3), we permanently set the QPC field mentioned in (2) to zero
which nominally indicates every time tick is equivalent to a microsecond
in wall-clock time; now, a RNR timeout period at face value of 10 would
only last 10 ticks, which is 10ns in wall-clock time.
It's worth noting that we adapt the driver by magnifying certain
configuration parameters(cq_period, eq_period and ack_timeout)by 1000
given the user assumes the configuring timing unit to be microseconds.
Also, this particular improvisation is only deployed on HIP08 since other
hardware has already solved this issue.
Fixes: cfc85f3e4b7f ("RDMA/hns: Add profile support for hip08 driver")
Link: https://lore.kernel.org/r/20211209140655.49493-1-liangwenpeng@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently, the driver ignores the user's priority for flow steering
rules in FDB namespace. Change it and create the rule in the right
priority.
It will allow to create FDB steering rules in up to 16 different
priorities.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This patch doesn't add an additional namespaces, but just separates the
naming to be used by each FDB user, bypass and kernel.
Downstream patches will actually split this up and allow to have more
than single priority for the bypass users.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The driver uses irq_set_affinity_hint() to update the affinity_hint mask
that is consumed by the userspace to distribute the interrupts. However,
under the hood irq_set_affinity_hint() also applies the provided cpumask
(if not NULL) as the affinity for the given interrupt which is an
undocumented side effect.
To remove this side effect irq_set_affinity_hint() has been marked
as deprecated and new interfaces have been introduced. Hence, replace the
irq_set_affinity_hint() with the new interface irq_update_affinity_hint()
that only updates the affinity_hint pointer.
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://lore.kernel.org/r/20210903152430.244937-7-nitesh@redhat.com
Fix the wrongly reported max_send_wr and max_recv_wr attributes for user
QP by making sure to save their valuse on QP creation, so when query QP is
called the attributes will be reported correctly.
Fixes: cecbcddf6461 ("qedr: Add support for QP verbs")
Link: https://lore.kernel.org/r/20211206201314.124947-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The variable pkey is assigned from a macro. Then this variable is passed
to a function bth_init directly, and pkey is not used again. So remove it
and use the macro directly.
Fixes: 76251e15ea73 ("RDMA/rxe: Remove pkey table")
Link: https://lore.kernel.org/r/20211207194057.713289-1-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Completion events (CEs) are lost if the application is allowed to arm the
CQ more than two times when no new CE for this CQ has been generated by
the HW.
Check if arming has been done for the CQ and if not, arm the CQ for any
event otherwise promote to arm the CQ for any event only when the last arm
event was solicited.
Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Link: https://lore.kernel.org/r/20211201231509.1930-2-shiraz.saleem@intel.com
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Return IBV_WC_REM_OP_ERR for responder QP errors instead of
IBV_WC_REM_ACCESS_ERR.
Return IBV_WC_LOC_QP_OP_ERR for errors detected on the SQ with bad opcodes
Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions")
Link: https://lore.kernel.org/r/20211201231509.1930-1-shiraz.saleem@intel.com
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
'pchunk->bitmapbuf' is a bitmap. Its size (in number of bits) is stored in
'pchunk->sizeofbitmap'.
When it is allocated, the size (in bytes) is computed by:
size_in_bits >> 3
There are 2 issues (numbers bellow assume that longs are 64 bits):
- there is no guarantee here that 'pchunk->bitmapmem.size' is modulo
BITS_PER_LONG but bitmaps are stored as longs
(sizeofbitmap=8 bits will only allocate 1 byte, instead of 8 (1 long))
- the number of bytes is computed with a shift, not a round up, so we
may allocate less memory than needed
(sizeofbitmap=65 bits will only allocate 8 bytes (i.e. 1 long), when 2
longs are needed = 16 bytes)
Fix both issues by using 'bitmap_zalloc()' and remove the useless
'bitmapmem' from 'struct irdma_chunk'.
While at it, remove some useless NULL test before calling
kfree/bitmap_free.
Fixes: 915cc7ac0f8e ("RDMA/irdma: Add miscellaneous utility definitions")
Link: https://lore.kernel.org/r/5e670b640508e14b1869c3e8e4fb970d78cbe997.1638692171.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
When irdma_hmc_sd_one fails, 'chunk' is freed while its still on the PBLE
info list.
Add the chunk entry to the PBLE info list only after successful setting of
the SD in irdma_hmc_sd_one.
Fixes: e8c4dbc2fcac ("RDMA/irdma: Add PBLE resource manager")
Link: https://lore.kernel.org/r/20211207152135.2192-1-shiraz.saleem@intel.com
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This buffer is currently allocated in hfi1_init():
if (reinit)
ret = init_after_reset(dd);
else
ret = loadtime_init(dd);
if (ret)
goto done;
/* allocate dummy tail memory for all receive contexts */
dd->rcvhdrtail_dummy_kvaddr = dma_alloc_coherent(&dd->pcidev->dev,
sizeof(u64),
&dd->rcvhdrtail_dummy_dma,
GFP_KERNEL);
if (!dd->rcvhdrtail_dummy_kvaddr) {
dd_dev_err(dd, "cannot allocate dummy tail memory\n");
ret = -ENOMEM;
goto done;
}
The reinit triggered path will overwrite the old allocation and leak it.
Fix by moving the allocation to hfi1_alloc_devdata() and the deallocation
to hfi1_free_devdata().
Link: https://lore.kernel.org/r/20211129192008.101968.91302.stgit@awfm-01.cornelisnetworks.com
Cc: stable@vger.kernel.org
Fixes: 46b010d3eeb8 ("staging/rdma/hfi1: Workaround to prevent corruption during packet delivery")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The code tests the dma address which legitimately can be 0.
The code should test the kernel logical address to avoid leaking eager
buffer allocations that happen to map to a dma address of 0.
Fixes: 60368186fd85 ("IB/hfi1: Fix user-space buffers mapping with IOMMU enabled")
Link: https://lore.kernel.org/r/20211129191952.101968.17137.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Fix the following sparse warning:
drivers/infiniband/hw/bnxt_re/qplib_fp.c:1260:26: sparse: warning: incorrect type in assignment (different base types)
Fixes: 0e938533d96d ("RDMA/bnxt_re: Remove dynamic pkey table")
Link: https://lore.kernel.org/r/20211205204537.14184-1-kamalheib1@gmail.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The struct member variable create_flags is assigned twice. Remove the
unnecessary assignment.
Fixes: ece9ca97ccdc ("RDMA/uverbs: Do not check the input length on create_cq/qp paths")
Link: https://lore.kernel.org/r/20211207064607.541695-1-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
It is more general for ARM device drivers to use the device attribute to
map PCI BAR spaces.
Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
Link: https://lore.kernel.org/r/20211206133652.27476-1-liangwenpeng@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In 'pvrdma_uar_table_init()', the 'tbl->table' bitmap has just been
allocated, so no concurrent accesses can occur.
The other accesses to the 'tbl->table' bitmap are protected by the
'tbl->lock' spinlock, so no concurrent accesses can happen.
So prefer the non-atomic '__[set|clear]_bit()' functions to save a few
cycles.
Link: https://lore.kernel.org/r/271b0e2c316e2b4cf34ac6fbca0701edd2d882ec.1637870667.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
With preemption enabled (CONFIG_DEBUG_PREEMPT=y), the following appeared
when rnbd client tries to map remote block device.
BUG: using smp_processor_id() in preemptible [00000000] code: bash/1733
caller is debug_smp_processor_id+0x17/0x20
CPU: 0 PID: 1733 Comm: bash Not tainted 5.16.0-rc1 #5
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x5d/0x78
dump_stack+0x10/0x12
check_preemption_disabled+0xe4/0xf0
debug_smp_processor_id+0x17/0x20
rtrs_clt_update_all_stats+0x3b/0x70 [rtrs_client]
rtrs_clt_read_req+0xc3/0x380 [rtrs_client]
? rtrs_clt_init_req+0xe3/0x120 [rtrs_client]
rtrs_clt_request+0x1a7/0x320 [rtrs_client]
? 0xffffffffc0ab1000
send_usr_msg+0xbf/0x160 [rnbd_client]
? rnbd_clt_put_sess+0x60/0x60 [rnbd_client]
? send_usr_msg+0x160/0x160 [rnbd_client]
? sg_alloc_table+0x27/0xb0
? sg_zero_buffer+0xd0/0xd0
send_msg_sess_info+0xe9/0x180 [rnbd_client]
? rnbd_clt_put_sess+0x60/0x60 [rnbd_client]
? blk_mq_alloc_tag_set+0x2ef/0x370
rnbd_clt_map_device+0xba8/0xcd0 [rnbd_client]
? send_msg_open+0x200/0x200 [rnbd_client]
rnbd_clt_map_device_store+0x3e5/0x620 [rnbd_client
To supress the calltrace, let's call get_cpu_ptr/put_cpu_ptr pair in
rtrs_clt_update_rdma_stats to disable preemption when accessing per-cpu
variable.
While at it, let's make the similar change in rtrs_clt_update_wc_stats.
And for rtrs_clt_inc_failover_cnt, though it was only called inside rcu
section, but it still can be preempted in case CONFIG_PREEMPT_RCU is
enabled, so change it to {get,put}_cpu_ptr pair either.
Link: https://lore.kernel.org/r/20211128133501.38710-1-guoqing.jiang@linux.dev
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The RoCE spec requires RoCE devices to support only the default pkey.
However the bnxt_re driver maintains a 0xFFFF entry pkey table and uses
only the first entry. Remove the pkey table and hard code a table of
length one hard wired with the default pkey.
Link: https://lore.kernel.org/r/20211125033615.483750-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Use the addrconf_addr_eui48() helper function to set the sys_image_guid,
Also make sure the GUID is valid EUI-64 identifier.
Link: https://lore.kernel.org/r/20211124102336.427637-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The type of min_latency is ktime_t, so use KTIME_MAX to initialize the
initial value.
Fixes: dc3b66a0ce70 ("RDMA/rtrs-clt: Add a minimum latency multipath policy")
Link: https://lore.kernel.org/r/20211124081040.19533-1-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Guoqing Jiang <Guoqing.Jiang@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The existing tests are a little hard to comprehend. Use
check_add_overflow() instead.
Fixes: 04ded1672402 ("RDMA/cma: Verify private data length")
Link: https://lore.kernel.org/r/1637661978-18770-1-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Use 'bitmap_zalloc()' to simplify code, improve the semantic and avoid
some open-coded arithmetic in allocator arguments.
Using the 'zalloc' version of the allocator also saves a now useless
'bitmap_zero()' call.
Also change the corresponding 'kfree()' into 'bitmap_free()' to keep
consistency.
While at it, remove an extra space in a statement just a few lines above.
Link: https://lore.kernel.org/r/e396c4aa16cd8945d43877570a8f6d926cea555a.1637789139.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In 'mthca_buddy_init()', the 'buddy->bits[n]' bitmap has just been
allocated, so no concurrent accesses can occur.
The other accesses to the 'buddy->bits[n]' bitmap are protected by the
'buddy->lock' spinlock, so no concurrent accesses can occur.
So prefer the non-atomic '__[set|clear]_bit()' functions to save a few
cycles.
Link: https://lore.kernel.org/r/a19b88ccdbc03972fd97306b998731814283041f.1637785902.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Use 'bitmap_zalloc()' to simplify code, improve the semantic and avoid
some open-coded arithmetic in allocator arguments.
Using the 'zalloc' version of the allocator also saves a now useless
'bitmap_zero()' call.
Also change the corresponding 'kfree()' into 'bitmap_free()' to keep
consistency.
Link: https://lore.kernel.org/r/ea9031e28f453bc179033740f66f0c19293fcf0b.1637785902.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
When hns_roce_v2_destroy_qp() is called, the brief calling process of the
driver is as follows:
......
hns_roce_v2_destroy_qp
hns_roce_v2_qp_modify
hns_roce_cmd_mbox
hns_roce_qp_destroy
If hns_roce_cmd_mbox() detects that the hardware is being reset during the
execution of the hns_roce_cmd_mbox(), the driver will not be able to get
the return value from the hardware (the firmware cannot respond to the
driver's mailbox during the hardware reset phase).
The driver needs to wait for the hardware reset to complete before
continuing to execute hns_roce_qp_destroy(), otherwise it may happen that
the driver releases the resources but the hardware is still accessing. In
order to fix this problem, HNS RoCE needs to add a piece of code to wait
for the hardware reset to complete.
The original interface get_hw_reset_stat() is the instantaneous state of
the hardware reset, which cannot accurately reflect whether the hardware
reset is completed, so it needs to be replaced with the ae_dev_reset_cnt
interface.
The sign that the hardware reset is complete is that the return value of
the ae_dev_reset_cnt interface is greater than the original value
reset_cnt recorded by the driver.
Fixes: 6a04aed6afae ("RDMA/hns: Fix the chip hanging caused by sending mailbox&CMQ during reset")
Link: https://lore.kernel.org/r/20211123142402.26936-1-liangwenpeng@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
is_reset is used to indicate whether the hardware starts to reset. When
hns_roce_hw_v2_reset_notify_down() is called, the hardware has not yet
started to reset. If is_reset is set at this time, all mailbox operations
of resource destroy actions will be intercepted by driver. When the driver
cleans up resources, but the hardware is still accessed, the following
errors will appear:
arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000350100000010
arm-smmu-v3 arm-smmu-v3.2.auto: 0x000002088000003f
arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000a50e0800
arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000
arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000350100000010
arm-smmu-v3 arm-smmu-v3.2.auto: 0x000002088000043e
arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000a50a0800
arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000
arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000350100000010
arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000020880000436
arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000a50a0880
arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000
arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000350100000010
arm-smmu-v3 arm-smmu-v3.2.auto: 0x000002088000043a
arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000a50e0840
hns3 0000:35:00.0: INT status: CMDQ(0x0) HW errors(0x0) other(0x0)
arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000
hns3 0000:35:00.0: received unknown or unhandled event of vector0
arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000350100000010
{34}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 7
is_reset will be set correctly in check_aedev_reset_status(), so the
setting in hns_roce_hw_v2_reset_notify_down() should be deleted.
Fixes: 726be12f5ca0 ("RDMA/hns: Set reset flag when hw resetting")
Link: https://lore.kernel.org/r/20211123084809.37318-1-liangwenpeng@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
For the case of IB_MR_TYPE_DM the mr does doesn't have a umem, even though
it is a user MR. This causes function mlx5_free_priv_descs() to think that
it is a kernel MR, leading to wrongly accessing mr->descs that will get
wrong values in the union which leads to attempt to release resources that
were not allocated in the first place.
For example:
DMA-API: mlx5_core 0000:08:00.1: device driver tries to free DMA memory it has not allocated [device address=0x0000000000000000] [size=0 bytes]
WARNING: CPU: 8 PID: 1021 at kernel/dma/debug.c:961 check_unmap+0x54f/0x8b0
RIP: 0010:check_unmap+0x54f/0x8b0
Call Trace:
debug_dma_unmap_page+0x57/0x60
mlx5_free_priv_descs+0x57/0x70 [mlx5_ib]
mlx5_ib_dereg_mr+0x1fb/0x3d0 [mlx5_ib]
ib_dereg_mr_user+0x60/0x140 [ib_core]
uverbs_destroy_uobject+0x59/0x210 [ib_uverbs]
uobj_destroy+0x3f/0x80 [ib_uverbs]
ib_uverbs_cmd_verbs+0x435/0xd10 [ib_uverbs]
? uverbs_finalize_object+0x50/0x50 [ib_uverbs]
? lock_acquire+0xc4/0x2e0
? lock_acquired+0x12/0x380
? lock_acquire+0xc4/0x2e0
? lock_acquire+0xc4/0x2e0
? ib_uverbs_ioctl+0x7c/0x140 [ib_uverbs]
? lock_release+0x28a/0x400
ib_uverbs_ioctl+0xc0/0x140 [ib_uverbs]
? ib_uverbs_ioctl+0x7c/0x140 [ib_uverbs]
__x64_sys_ioctl+0x7f/0xb0
do_syscall_64+0x38/0x90
Fix it by reorganizing the dereg flow and mlx5_ib_mr structure:
- Move the ib_umem field into the user MRs structure in the union as it's
applicable only there.
- Function mlx5_ib_dereg_mr() will now call mlx5_free_priv_descs() only
in case there isn't udata, which indicates that this isn't a user MR.
Fixes: f18ec4223117 ("RDMA/mlx5: Use a union inside mlx5_ib_mr")
Link: https://lore.kernel.org/r/66bb1dd253c1fd7ceaa9fc411061eefa457b86fb.1637581144.git.leonro@nvidia.com
Signed-off-by: Alaa Hleihel <alaa@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
On error handling path in rxe_qp_from_init() qp->sq.queue is freed and
then rxe_create_qp() will drop last reference to this object. qp clean up
function will try to free this queue one time and it causes UAF bug.
Fix it by zeroing queue pointer after freeing queue in rxe_qp_from_init().
Fixes: 514aee660df4 ("RDMA: Globally allocate and release QP memory")
Link: https://lore.kernel.org/r/20211121202239.3129-1-paskripkin@gmail.com
Reported-by: syzbot+aab53008a5adf26abe91@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Set the RDMA protocol to use at driver bind time based on the ice PF's
rdma_mode flag.
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Tested-by: Leszek Kaliszczuk <leszek.kaliszczuk@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Each member of Array[][] should be initialized on a separate line.
Link: https://lore.kernel.org/r/20211119140208.40416-7-liangwenpeng@huawei.com
Signed-off-by: Xinhao Liu <liuxinhao@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>