137 Commits

Author SHA1 Message Date
Jason Gunthorpe
8d7c7c0eeb RDMA: Add ib_virt_dma_to_page()
Make it clearer what is going on by adding a function to go back from the
"virtual" dma_addr to a kva and another to a struct page. This is used in the
ib_uses_virt_dma() style drivers (siw, rxe, hfi, qib).

Call them instead of a naked casting and  virt_to_page() when working with dma_addr
values encoded by the various ib_map functions.

This also fixes the virt_to_page() casting problem Linus Walleij has been
chasing.

Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/0-v2-05ea785520ed+10-ib_virt_page_jgg@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-16 11:08:07 +03:00
Tetsuo Handa
266e9b3475 RDMA/siw: Remove namespace check from siw_netdev_event()
syzbot is reporting that siw_netdev_event(NETDEV_UNREGISTER) cannot destroy
siw_device created after unshare(CLONE_NEWNET) due to net namespace check.
It seems that this check was by error there and should be removed.

Reported-by: syzbot <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Suggested-by: Leon Romanovsky <leon@kernel.org>
Fixes: bdcf26bf9b3a ("rdma/siw: network and RDMA core interface")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/r/a44e9ac5-44e2-d575-9e30-02483cc7ffd1@I-love.SAKURA.ne.jp
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-03 21:24:21 +03:00
Daniil Dulov
271bfcfb83 RDMA/siw: Fix potential page_array out of range access
When seg is equal to MAX_ARRAY, the loop should break, otherwise
it will result in out of range access.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: b9be6f18cf9e ("rdma/siw: transmit path")
Signed-off-by: Daniil Dulov <d.dulov@aladdin.ru>
Link: https://lore.kernel.org/r/20230227091751.589612-1-d.dulov@aladdin.ru
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-13 13:54:49 +02:00
Linus Torvalds
8cbd92339d v6.3 RDMA pull request
Small cycle this time:
 
 - Minor driver updates for hfi1, cxgb4, erdma, hns, irdma, mlx5, siw, mana
 
 - inline CQE support for hns
 
 - Have mlx5 display device error codes
 
 - Pinned DMABUF support for irdma
 
 - Continued rxe cleanups, particularly converting the MRs to use xarray
 
 - Improvements to what can be cached in the mlx5 mkey cache
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY/gPmgAKCRCFwuHvBreF
 YW5IAP4xOAiTif4f87vD1twRU/ebq4VEX0r+C2NX5x5fwlCJrAEA7RLV8uG9Uii2
 ez0BuWNxfajuvFHntnZ1E+7UDP0S8gk=
 =CgUH
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Quite a small cycle this time, even with the rc8. I suppose everyone
  went to sleep over xmas.

   - Minor driver updates for hfi1, cxgb4, erdma, hns, irdma, mlx5, siw,
     mana

   - inline CQE support for hns

   - Have mlx5 display device error codes

   - Pinned DMABUF support for irdma

   - Continued rxe cleanups, particularly converting the MRs to use
     xarray

   - Improvements to what can be cached in the mlx5 mkey cache"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (61 commits)
  IB/mlx5: Extend debug control for CC parameters
  IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors
  IB/hfi1: Fix math bugs in hfi1_can_pin_pages()
  RDMA/irdma: Add support for dmabuf pin memory regions
  RDMA/mlx5: Use query_special_contexts for mkeys
  net/mlx5e: Use query_special_contexts for mkeys
  net/mlx5: Change define name for 0x100 lkey value
  net/mlx5: Expose bits for querying special mkeys
  RDMA/rxe: Fix missing memory barriers in rxe_queue.h
  RDMA/mana_ib: Fix a bug when the PF indicates more entries for registering memory on first packet
  RDMA/rxe: Remove rxe_alloc()
  RDMA/cma: Distinguish between sockaddr_in and sockaddr_in6 by size
  Subject: RDMA/rxe: Handle zero length rdma
  iw_cxgb4: Fix potential NULL dereference in c4iw_fill_res_cm_id_entry()
  RDMA/mlx5: Use rdma_umem_for_each_dma_block()
  RDMA/umem: Remove unused 'work' member from struct ib_umem
  RDMA/irdma: Cap MSIX used to online CPUs + 1
  RDMA/mlx5: Check reg_create() create for errors
  RDMA/restrack: Correct spelling
  RDMA/cxgb4: Fix potential null-ptr-deref in pass_establish()
  ...
2023-02-24 15:11:03 -08:00
Bernard Metzler
65a8fc30fb RDMA/siw: Fix user page pinning accounting
To avoid racing with other user memory reservations, immediately
account full amount of pages to be pinned.

Fixes: 2251334dcac9 ("rdma/siw: application buffer management")
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20230202101000.402990-1-bmt@zurich.ibm.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-06 14:46:50 +02:00
Peilin Ye
40e0b09081 net/sock: Introduce trace_sk_data_ready()
As suggested by Cong, introduce a tracepoint for all ->sk_data_ready()
callback implementations.  For example:

<...>
  iperf-609  [002] .....  70.660425: sk_data_ready: family=2 protocol=6 func=sock_def_readable
  iperf-609  [002] .....  70.660436: sk_data_ready: family=2 protocol=6 func=sock_def_readable
<...>

Suggested-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23 11:26:50 +00:00
Linus Torvalds
ed56954cf5 v6.2 merge window 2nd pull request
Fix two build warnings on 32 bit platforms
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY50UewAKCRCFwuHvBreF
 YazuAQCHY/yOiziPBqNvI9jSb/MVGh1oaZQZRTkt5fOvLSGt8gD/Rv30/h6EcKfG
 S5A6eQ3qrVzEhp7P8mST2Q/MkGmQ+AQ=
 =Qw7m
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Fix two build warnings on 32 bit platforms

  It seems the linux-next CI and 0-day bot are not testing enough 32 bit
  configurations, as soon as you merged the rdma pull request there were
  two instant reports of warnings on these sytems that I would have
  thought should have been covered by time in linux-next

  Anyhow, here are the fixes so people don't hit problems with -Werror"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/siw: Fix pointer cast warning
  RDMA/rxe: Fix compile warnings on 32-bit
2022-12-17 08:23:42 -06:00
Arnd Bergmann
5244ca8867 RDMA/siw: Fix pointer cast warning
The previous build fix left a remaining issue in configurations with
64-bit dma_addr_t on 32-bit architectures:

drivers/infiniband/sw/siw/siw_qp_tx.c: In function 'siw_get_pblpage':
drivers/infiniband/sw/siw/siw_qp_tx.c:32:37: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
   32 |                 return virt_to_page((void *)paddr);
      |                                     ^

Use the same double cast here that the driver uses elsewhere to convert
between dma_addr_t and void*.

Fixes: 0d1b756acf60 ("RDMA/siw: Pass a pointer to virt_to_page()")
Link: https://lore.kernel.org/r/20221215170347.2612403-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-16 16:07:38 -04:00
Linus Torvalds
ab425febda v6.2 merge window pull request
Usual size of updates, a new driver a most of the bulk focusing on rxe:
 
 - Usual typos, style, and language updates
 
 - Driver updates for mlx5, irdma, siw, rts, srp, hfi1, hns, erdma, mlx4, srp
 
 - Lots of RXE updates
   * Improve reply error handling for bad MR operations
   * Code tidying
   * Debug printing uses common loggers
   * Remove half implemented RD related stuff
   * Support IBA's recently defined Atomic Write and Flush operations
 
 - erdma support for atomic operations
 
 - New driver "mana" for Ethernet HW available in Azure VMs. This driver
   only supports DPDK
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY5eIggAKCRCFwuHvBreF
 YeX7AP9+l5Y9J48OmK7y/YgADNo9g05agXp3E8EuUDmBU+PREgEAigdWaJVf2oea
 IctVja0ApLW5W+wsFt8Qh+V4PMiYTAM=
 =Q5V+
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Usual size of updates, a new driver, and most of the bulk focusing on
  rxe:

   - Usual typos, style, and language updates

   - Driver updates for mlx5, irdma, siw, rts, srp, hfi1, hns, erdma,
     mlx4, srp

   - Lots of RXE updates:
      * Improve reply error handling for bad MR operations
      * Code tidying
      * Debug printing uses common loggers
      * Remove half implemented RD related stuff
      * Support IBA's recently defined Atomic Write and Flush operations

   - erdma support for atomic operations

   - New driver 'mana' for Ethernet HW available in Azure VMs. This
     driver only supports DPDK"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (122 commits)
  IB/IPoIB: Fix queue count inconsistency for PKEY child interfaces
  RDMA: Add missed netdev_put() for the netdevice_tracker
  RDMA/rxe: Enable RDMA FLUSH capability for rxe device
  RDMA/cm: Make QP FLUSHABLE for supported device
  RDMA/rxe: Implement flush completion
  RDMA/rxe: Implement flush execution in responder side
  RDMA/rxe: Implement RC RDMA FLUSH service in requester side
  RDMA/rxe: Extend rxe packet format to support flush
  RDMA/rxe: Allow registering persistent flag for pmem MR only
  RDMA/rxe: Extend rxe user ABI to support flush
  RDMA: Extend RDMA kernel verbs ABI to support flush
  RDMA: Extend RDMA user ABI to support flush
  RDMA/rxe: Fix incorrect responder length checking
  RDMA/rxe: Fix oops with zero length reads
  RDMA/mlx5: Remove not-used IB_FLOW_SPEC_IB define
  RDMA/hns: Fix XRC caps on HIP08
  RDMA/hns: Fix error code of CMD
  RDMA/hns: Fix page size cap from firmware
  RDMA/hns: Fix PBL page MTR find
  RDMA/hns: Fix AH attr queried by query_qp
  ...
2022-12-14 09:27:13 -08:00
David Hildenbrand
129e636fe9 RDMA/siw: remove FOLL_FORCE usage
GUP now supports reliable R/O long-term pinning in COW mappings, such
that we break COW early. MAP_SHARED VMAs only use the shared zeropage so
far in one corner case (DAXFS file with holes), which can be ignored
because GUP does not support long-term pinning in fsdax (see
check_vma_flags()).

Consequently, FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM is no longer required
for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop
using FOLL_FORCE, which is really only for ptrace access.

Link: https://lkml.kernel.org/r/20221116102659.70287-13-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Bernard Metzler <bmt@zurich.ibm.com>
Cc: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 15:58:59 -08:00
Bernard Metzler
60da2d11fc RDMA/siw: Set defined status for work completion with undefined status
A malicious user may write undefined values into memory mapped completion
queue elements status or opcode. Undefined status or opcode values will
result in out-of-bounds access to an array mapping siw internal
representation of opcode and status to RDMA core representation when
reaping CQ elements. While siw detects those undefined values, it did not
correctly set completion status to a defined value, thus defeating the
whole purpose of the check.

This bug leads to the following Smatch static checker warning:

	drivers/infiniband/sw/siw/siw_cq.c:96 siw_reap_cqe()
	error: buffer overflow 'map_cqe_status' 10 <= 21

Fixes: bdf1da5df9da ("RDMA/siw: Fix immediate work request flush to completion queue")
Link: https://lore.kernel.org/r/20221115170747.1263298-1-bmt@zurich.ibm.com
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-15 16:47:00 -04:00
Bernard Metzler
bdf1da5df9 RDMA/siw: Fix immediate work request flush to completion queue
Correctly set send queue element opcode during immediate work request
flushing in post sendqueue operation, if the QP is in ERROR state.
An undefined ocode value results in out-of-bounds access to an array
for mapping the opcode between siw internal and RDMA core representation
in work completion generation. It resulted in a KASAN BUG report
of type 'global-out-of-bounds' during NFSoRDMA testing.

This patch further fixes a potential case of a malicious user which may
write undefined values for completion queue elements status or opcode,
if the CQ is memory mapped to user land. It avoids the same out-of-bounds
access to arrays for status and opcode mapping as described above.

Fixes: 303ae1cdfdf7 ("rdma/siw: application interface")
Fixes: b0fff7317bb4 ("rdma/siw: completion queue methods")
Reported-by: Olga Kornievskaia <kolga@netapp.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20221107145057.895747-1-bmt@zurich.ibm.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-09 15:26:49 +02:00
Jason Gunthorpe
33331a728c Linux 6.0
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmM5/fMeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG+Y0H/jG0HQjUHfIOGjPQ
 T33eaF5POoZKzZmsTNITbtya7B3/OTZZRGdSq9B8t5tElyPnZrdIfaXds17mCI8y
 5DJUEQGdv6fqRttYLGKf6rk1kzABhhaTS8n9BgDFEsmvdqwG4AV6dLQr3HL09gTV
 Lu+Is86RPwmpgH0Z9u7DyFCF8yLjefyu0vl6eFm/SXjCE8gQM/LZQSi9mv5/YDxa
 MVKlsZKIkQm6P8a6r8wbGedKYBped4+4gYedsf/IcS0lvKdLIs/P7YgR5wKklSbM
 tqECvBOmKq1Fwj/oxH+fx0KLX/ZD1xxQwoQd+a9DOSo+BuPBt6KGojYT9gQRyFJp
 R7gyUCo=
 =2UOD
 -----END PGP SIGNATURE-----

Merge tag 'v6.0' into rdma.git for-next

Trvial merge conflicts against rdma.git for-rc resolved matching
linux-next:
            drivers/infiniband/hw/hns/hns_roce_hw_v2.c
            drivers/infiniband/hw/hns/hns_roce_main.c

https://lore.kernel.org/r/20220929124005.105149-1-broonie@kernel.org

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-06 19:48:45 -03:00
Bernard Metzler
a3c278807a RDMA/siw: Fix QP destroy to wait for all references dropped.
Delay QP destroy completion until all siw references to QP are
dropped. The calling RDMA core will free QP structure after
successful return from siw_qp_destroy() call, so siw must not
hold any remaining reference to the QP upon return.
A use-after-free was encountered in xfstest generic/460, while
testing NFSoRDMA. Here, after a TCP connection drop by peer,
the triggered siw_cm_work_handler got delayed until after
QP destroy call, referencing a QP which has already freed.

Fixes: 303ae1cdfdf7 ("rdma/siw: application interface")
Reported-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20220920082503.224189-1-bmt@zurich.ibm.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-20 21:23:52 +03:00
Bernard Metzler
754209850d RDMA/siw: Always consume all skbuf data in sk_data_ready() upcall.
For header and trailer/padding processing, siw did not consume new
skb data until minimum amount present to fill current header or trailer
structure, including potential payload padding. Not consuming any
data during upcall may cause a receive stall, since tcp_read_sock()
is not upcalling again if no new data arrive.
A NFSoRDMA client got stuck at RDMA Write reception of unaligned
payload, if the current skb did contain only the expected 3 padding
bytes, but not the 4 bytes CRC trailer. Expecting 4 more bytes already
arrived in another skb, and not consuming those 3 bytes in the current
upcall left the Write incomplete, waiting for the CRC forever.

Fixes: 8b6a361b8c48 ("rdma/siw: receive path")
Reported-by: Olga Kornievskaia <kolga@netapp.com>
Tested-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20220920081202.223629-1-bmt@zurich.ibm.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-20 21:21:18 +03:00
Linus Walleij
0d1b756acf RDMA/siw: Pass a pointer to virt_to_page()
Functions that work on a pointer to virtual memory such as
virt_to_pfn() and users of that function such as
virt_to_page() are supposed to pass a pointer to virtual
memory, ideally a (void *) or other pointer. However since
many architectures implement virt_to_pfn() as a macro,
this function becomes polymorphic and accepts both a
(unsigned long) and a (void *).

If we instead implement a proper virt_to_pfn(void *addr)
function the following happens (occurred on arch/arm):

drivers/infiniband/sw/siw/siw_qp_tx.c:32:23: warning: incompatible
  integer to pointer conversion passing 'dma_addr_t' (aka 'unsigned int')
  to parameter of type 'const void *' [-Wint-conversion]
drivers/infiniband/sw/siw/siw_qp_tx.c:32:37: warning: passing argument
  1 of 'virt_to_pfn' makes pointer from integer without a cast
  [-Wint-conversion]
drivers/infiniband/sw/siw/siw_qp_tx.c:538:36: warning: incompatible
  integer to pointer conversion passing 'unsigned long long'
  to parameter of type 'const void *' [-Wint-conversion]

Fix this with an explicit cast. In one case where the SIW
SGE uses an unaligned u64 we need a double cast modifying the
virtual address (va) to a platform-specific uintptr_t before
casting to a (void *).

Fixes: b9be6f18cf9e ("rdma/siw: transmit path")
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20220902215918.603761-1-linus.walleij@linaro.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-04 10:21:59 +03:00
Tom Talpey
fc5e1acf6a RDMA/siw: Add missing Kconfig selections
The SoftiWARP Kconfig is missing "select" for CRYPTO and CRYPTO_CRC32C.

In addition, it improperly "depends on" LIBCRC32C, this should be a
"select", similar to net/sctp and others. As a dependency, SIW fails
to appear in generic configurations.

Link: https://lore.kernel.org/r/d366bf02-3271-754f-fc68-1a84016d0e19@talpey.com
Signed-off-by: Tom Talpey <tom@talpey.com>
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-01 10:12:01 +03:00
Cheng Xu
3056fc6c32 RDMA/siw: Fix duplicated reported IW_CM_EVENT_CONNECT_REPLY event
If siw_recv_mpa_rr returns -EAGAIN, it means that the MPA reply hasn't
been received completely, and should not report IW_CM_EVENT_CONNECT_REPLY
in this case. This may trigger a call trace in iw_cm. A simple way to
trigger this:
 server: ib_send_lat
 client: ib_send_lat -R <server_ip>

The call trace looks like this:

 kernel BUG at drivers/infiniband/core/iwcm.c:894!
 invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
 <...>
 Workqueue: iw_cm_wq cm_work_handler [iw_cm]
 Call Trace:
  <TASK>
  cm_work_handler+0x1dd/0x370 [iw_cm]
  process_one_work+0x1e2/0x3b0
  worker_thread+0x49/0x2e0
  ? rescuer_thread+0x370/0x370
  kthread+0xe5/0x110
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30
  </TASK>

Fixes: 6c52fdc244b5 ("rdma/siw: connection management")
Link: https://lore.kernel.org/r/dae34b5fd5c2ea2bd9744812c1d2653a34a94c67.1657706960.git.chengyou@linux.alibaba.com
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18 14:20:52 +03:00
Andrey Strachuk
aeea6cc067 RDMA: remove useless condition in siw_create_cq()
Comparison of 'cq' with NULL is useless since
'cq' is a result of container_of and cannot be NULL
in any reasonable scenario.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 303ae1cdfdf7 ("rdma/siw: application interface")
Link: https://lore.kernel.org/r/20220711151251.17089-1-strochuk@ispras.ru
Signed-off-by: Andrey Strachuk <strochuk@ispras.ru>
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18 12:27:28 +03:00
Jason Gunthorpe
a6f844da39 Linux 5.18
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmKKlIAeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGC3oH/iPm/fLG2sJut8My
 sU0RC9K+6ESV5h2Qy6k00/lqKstlu4EvBjw4V8vYpx3Q2+hbSFMn2SeWqqqT3Lkk
 Zb8KINCFuuyMtdCBb42PV0zhUf5pCQF7ocm/Ae4jllDHtPmqk3WJ6IGtZBK5JBlw
 z6RR/wKt0y0MRj9eZyPyYjOee2L2vuVh4tgnexK/4L8g2ZtMMRThhvUzSMWG4zxR
 STYYNp0uFcfT1Vt85+ODevFH4TvdECAj+SqAegN+seHLM17YY7M0/WiIYpxGRv8P
 lIpDQl4PBU8EBkpI5hkpJ/3qPincbuVOMLsYfxFtpcjjG12vGjFp2krGpS3TedZQ
 3mvaJ7c=
 =vLke
 -----END PGP SIGNATURE-----

Merge tag 'v5.18' into rdma.git for-next

Following patches have dependencies.

Resolve the merge conflict in
drivers/net/ethernet/mellanox/mlx5/core/main.c by keeping the new names
for the fs functions following linux-next:

https://lore.kernel.org/r/20220519113529.226bc3e2@canb.auug.org.au/

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24 12:40:28 -03:00
Bernard Metzler
a2d36b02c1 RDMA/siw: Enable siw on tunnel devices
Enable siw to attach to tunnel devices, there is no reason not to, siw
properly generates all packets already.

Link: https://lore.kernel.org/r/20220510143917.23735-1-bmt@zurich.ibm.com
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-11 13:52:38 -03:00
Cheng Xu
ef91271c65 RDMA/siw: Fix a condition race issue in MPA request processing
The calling of siw_cm_upcall and detaching new_cep with its listen_cep
should be atomistic semantics. Otherwise siw_reject may be called in a
temporary state, e,g, siw_cm_upcall is called but the new_cep->listen_cep
has not being cleared.

This fixes a WARN:

  WARNING: CPU: 7 PID: 201 at drivers/infiniband/sw/siw/siw_cm.c:255 siw_cep_put+0x125/0x130 [siw]
  CPU: 2 PID: 201 Comm: kworker/u16:22 Kdump: loaded Tainted: G            E     5.17.0-rc7 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
  Workqueue: iw_cm_wq cm_work_handler [iw_cm]
  RIP: 0010:siw_cep_put+0x125/0x130 [siw]
  Call Trace:
   <TASK>
   siw_reject+0xac/0x180 [siw]
   iw_cm_reject+0x68/0xc0 [iw_cm]
   cm_work_handler+0x59d/0xe20 [iw_cm]
   process_one_work+0x1e2/0x3b0
   worker_thread+0x50/0x3a0
   ? rescuer_thread+0x390/0x390
   kthread+0xe5/0x110
   ? kthread_complete_and_exit+0x20/0x20
   ret_from_fork+0x1f/0x30
   </TASK>

Fixes: 6c52fdc244b5 ("rdma/siw: connection management")
Link: https://lore.kernel.org/r/d528d83466c44687f3872eadcb8c184528b2e2d4.1650526554.git.chengyou@linux.alibaba.com
Reported-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-04 20:59:54 -03:00
Jason Gunthorpe
e945c653c8 RDMA: Split kernel-only global device caps from uverbs device caps
Split out flags from ib_device::device_cap_flags that are only used
internally to the kernel into kernel_cap_flags that is not part of the
uapi. This limits the device_cap_flags to being the same bitmap that will
be copied to userspace.

This cleanly splits out the uverbs flags from the kernel flags to avoid
confusion in the flags bitmap.

Add some short comments describing which each of the kernel flags is
connected to. Remove unused kernel flags.

Link: https://lore.kernel.org/r/0-v2-22c19e565eef+139a-kern_caps_jgg@nvidia.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-06 15:02:13 -03:00
Bernard Metzler
b43a76f423 RDMA/siw: Fix broken RDMA Read Fence/Resume logic.
Code unconditionally resumed fenced SQ processing after next RDMA Read
completion, even if other RDMA Read responses are still outstanding, or
ORQ is full. Also adds comments for better readability of fence
processing, and removes orq_get_tail() helper, which is not needed
anymore.

Fixes: 8b6a361b8c48 ("rdma/siw: receive path")
Fixes: a531975279f3 ("rdma/siw: main include file")
Link: https://lore.kernel.org/r/20220130170815.1940-1-bmt@zurich.ibm.com
Reported-by: Jared Holzman <jared.holzman@excelero.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-02-01 09:54:28 -04:00
Dan Carpenter
a75badebfd RDMA/siw: Fix refcounting leak in siw_create_qp()
The atomic_inc() needs to be paired with an atomic_dec() on the error
path.

Fixes: 514aee660df4 ("RDMA: Globally allocate and release QP memory")
Link: https://lore.kernel.org/r/20220118091104.GA11671@kili
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-28 11:41:17 -04:00
Linus Torvalds
f56caedaf9 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "146 patches.

  Subsystems affected by this patch series: kthread, ia64, scripts,
  ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak,
  dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap,
  memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb,
  userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp,
  ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and
  damon)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits)
  mm/damon: hide kernel pointer from tracepoint event
  mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
  mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
  mm/damon/dbgfs: remove an unnecessary variable
  mm/damon: move the implementation of damon_insert_region to damon.h
  mm/damon: add access checking for hugetlb pages
  Docs/admin-guide/mm/damon/usage: update for schemes statistics
  mm/damon/dbgfs: support all DAMOS stats
  Docs/admin-guide/mm/damon/reclaim: document statistics parameters
  mm/damon/reclaim: provide reclamation statistics
  mm/damon/schemes: account how many times quota limit has exceeded
  mm/damon/schemes: account scheme actions that successfully applied
  mm/damon: remove a mistakenly added comment for a future feature
  Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
  Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
  Docs/admin-guide/mm/damon/usage: remove redundant information
  Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
  mm/damon: convert macro functions to static inline functions
  mm/damon: modify damon_rand() macro to static inline function
  mm/damon: move damon_rand() definition into damon.h
  ...
2022-01-15 20:37:06 +02:00
Cai Huoqing
e085011393 RDMA/siw: make use of the helper function kthread_run_on_cpu()
Replace kthread_create/kthread_bind/wake_up_process() with
kthread_run_on_cpu() to simplify the code.

Link: https://lkml.kernel.org/r/20211022025711.3673-3-caihuoqing@baidu.com
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Cc: Bernard Metzler <bmt@zurich.ibm.com>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:23 +02:00
Jiapeng Chong
76937fa552 RDMA/siw: Use max() instead of doing it manually
Fix following coccicheck warning:

./drivers/infiniband/sw/siw/siw_verbs.c:665:28-29: WARNING opportunity for max().

Link: https://lore.kernel.org/r/1638439679-114250-1-git-send-email-jiapeng.chong@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-06 19:49:19 -04:00
Kamal Heib
0abfc79d72 RDMA/siw: Use helper function to set sys_image_guid
Use the addrconf_addr_eui48() helper function to set the sys_image_guid,
Also make sure the GUID is valid EUI-64 identifier.

Link: https://lore.kernel.org/r/20211124102336.427637-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-25 13:35:08 -04:00
Christophe JAILLET
8869574a6c RDMA: Remove redundant 'flush_workqueue()' calls
'destroy_workqueue()' already drains the queue before destroying it, so
there is no need to flush it explicitly.

Remove the redundant 'flush_workqueue()' calls.

This was generated with coccinelle:

@@
expression E;
@@
- 	flush_workqueue(E);
	destroy_workqueue(E);

Link: https://lore.kernel.org/r/ca7bac6e6c9c5cc8d04eec3944edb13de0e381a3.1633874776.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12 13:21:23 -03:00
Leon Romanovsky
514aee660d RDMA: Globally allocate and release QP memory
Convert QP object to follow IB/core general allocation scheme.  That
change allows us to make sure that restrack properly kref the memory.

Link: https://lore.kernel.org/r/48e767124758aeecc433360ddd85eaa6325b34d9.1627040189.git.leonro@nvidia.com
Reviewed-by: Gal Pressman <galpress@amazon.com> #efa
Tested-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> #rdma and core
Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:27 -03:00
Ira Weiny
9d649d594f RDMA/siw: Convert siw_tx_hdt() to kmap_local_page()
kmap() is being deprecated and will break uses of device dax after PKS
protection is introduced.[1]

The use of kmap() in siw_tx_hdt() is all thread local therefore
kmap_local_page() is a sufficient replacement and will work with pgmap
protected pages when those are implemented.

siw_tx_hdt() tracks pages used in a page_array.  It uses that array to
unmap pages which were mapped on function exit.  Not all entries in the
array are mapped and this is tracked in kmap_mask.

kunmap_local() takes a mapped address rather than a page.  Alter
siw_unmap_pages() to take the iov array to reuse the iov_base address of
each mapping.  Use PAGE_MASK to get the proper address for kunmap_local().

kmap_local_page() mappings are tracked in a stack and must be unmapped in
the opposite order they were mapped in.  Because segments are mapped into
the page array in increasing index order, modify siw_unmap_pages() to
unmap pages in decreasing order.

Use kmap_local_page() instead of kmap() to map pages in the page_array.

[1] https://lore.kernel.org/lkml/20201009195033.3208459-59-ira.weiny@intel.com/

Link: https://lore.kernel.org/r/20210624174814.2822896-1-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-15 14:42:21 -03:00
Ira Weiny
1ec50dd12a RDMA/siw: Remove kmap()
kmap() is being deprecated and will break uses of device dax after PKS
protection is introduced.[1]

These uses of kmap() in the SIW driver are thread local.  Therefore
kmap_local_page() is sufficient to use and will work with pgmap protected
pages when those are implemnted.

There is one more use of kmap() in this driver which is split into its own
patch because kmap_local_page() has strict ordering rules and the use of
the kmap_mask over multiple segments must be handled carefully.
Therefore, that conversion is handled in a stand alone patch.

Use kmap_local_page() instead of kmap() in the 'easy' cases.

[1] https://lore.kernel.org/lkml/20201009195033.3208459-59-ira.weiny@intel.com/

Link: https://lore.kernel.org/r/20210622061422.2633501-4-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-15 14:42:21 -03:00
Leon Romanovsky
a3d83276d9 RDMA/siw: Release xarray entry
The xarray entry is allocated in siw_qp_add(), but release was
missed in case zero-sized SQ was discovered.

Fixes: 661f385961f0 ("RDMA/siw: Fix handling of zero-sized Read and Receive Queues.")
Link: https://lore.kernel.org/r/f070b59d5a1114d5a4e830346755c2b3f141cde5.1620560472.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-10 14:06:45 -03:00
Leon Romanovsky
a568814a55 RDMA/siw: Properly check send and receive CQ pointers
The check for the NULL of pointer received from container_of() is
incorrect by definition as it points to some offset from NULL.

Change such check with proper NULL check of SIW QP attributes.

Fixes: 303ae1cdfdf7 ("rdma/siw: application interface")
Link: https://lore.kernel.org/r/a7535a82925f6f4c1f062abaa294f3ae6e54bdd2.1620560310.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-10 14:06:45 -03:00
Lv Yunlong
3093ee182f RDMA/siw: Fix a use after free in siw_alloc_mr
Our code analyzer reported a UAF.

In siw_alloc_mr(), it calls siw_mr_add_mem(mr,..). In the implementation of
siw_mr_add_mem(), mem is assigned to mr->mem and then mem is freed via
kfree(mem) if xa_alloc_cyclic() failed. Here, mr->mem still point to a
freed object. After, the execution continue up to the err_out branch of
siw_alloc_mr, and the freed mr->mem is used in siw_mr_drop_mem(mr).

My patch moves "mr->mem = mem" behind the if (xa_alloc_cyclic(..)<0) {}
section, to avoid the uaf.

Fixes: 2251334dcac9 ("rdma/siw: application buffer management")
Link: https://lore.kernel.org/r/20210426011647.3561-1-lyl2019@mail.ustc.edu.cn
Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
Reviewed-by: Bernard Metzler <bmt@zurich.ihm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-27 15:19:21 -03:00
Mark Bloch
1fb7f8973f RDMA: Support more than 255 rdma ports
Current code uses many different types when dealing with a port of a RDMA
device: u8, unsigned int and u32. Switch to u32 to clean up the logic.

This allows us to make (at least) the core view consistent and use the
same type. Unfortunately not all places can be converted. Many uverbs
functions expect port to be u8 so keep those places in order not to break
UAPIs.  HW/Spec defined values must also not be changed.

With the switch to u32 we now can support devices with more than 255
ports. U32_MAX is reserved to make control logic a bit easier to deal
with. As a device with U32_MAX ports probably isn't going to happen any
time soon this seems like a non issue.

When a device with more than 255 ports is created uverbs will report the
RDMA device as having 255 ports as this is the max currently supported.

The verbs interface is not changed yet because the IBTA spec limits the
port size in too many places to be u8 and all applications that relies in
verbs won't be able to cope with this change. At this stage, we are
extending the interfaces that are using vendor channel solely

Once the limitation is lifted mlx5 in switchdev mode will be able to have
thousands of SFs created by the device. As the only instance of an RDMA
device that reports more than 255 ports will be a representor device and
it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other
ULPs aren't effected by this change and their sysfs/interfaces that are
exposes to userspace can remain unchanged.

While here cleanup some alignment issues and remove unneeded sanity
checks (mainly in rdmavt),

Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.org
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-26 09:31:21 -03:00
Leon Romanovsky
fdb68dd30e RDMA: Delete not-used static inline functions
Perform mass deletion of static inline functions that are not used.

Link: https://lore.kernel.org/r/20210314133908.291945-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-22 09:31:19 -03:00
Bernard Metzler
e35ecb466e RDMA/iwcm: Allow AFONLY binding for IPv6 addresses
Binding IPv6 address/port to AF_INET6 domain only is provided via
rdma_set_afonly(), but was not signalled to the provider.  Applications
like NFS/RDMA bind the same port to both IPv4 and IPv6 addresses
simultaneously and thus rely on it working correctly.

Link: https://lore.kernel.org/r/20210219143441.1068-1-bmt@zurich.ibm.com
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-10 15:30:45 -04:00
Kamal Heib
429fa96989 RDMA/siw: Fix calculation of tx_valid_cpus size
The size of tx_valid_cpus was calculated under the assumption that the
numa nodes identifiers are continuous, which is not the case in all archs
as this could lead to the following panic when trying to access an invalid
tx_valid_cpus index, avoid the following panic by using nr_node_ids
instead of num_online_nodes() to allocate the tx_valid_cpus size.

   Kernel attempted to read user page (8) - exploit attempt? (uid: 0)
   BUG: Kernel NULL pointer dereference on read at 0x00000008
   Faulting instruction address: 0xc0080000081b4a90
   Oops: Kernel access of bad area, sig: 11 [#1]
   LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
   Modules linked in: siw(+) rfkill rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm sunrpc ib_umad rdma_cm ib_cm iw_cm i40iw ib_uverbs ib_core i40e ses enclosure scsi_transport_sas ipmi_powernv ibmpowernv at24 ofpart ipmi_devintf regmap_i2c ipmi_msghandler powernv_flash uio_pdrv_genirq uio mtd opal_prd zram ip_tables xfs libcrc32c sd_mod t10_pi ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm drm vmx_crypto aacraid drm_panel_orientation_quirks dm_mod
   CPU: 40 PID: 3279 Comm: modprobe Tainted: G        W      X --------- ---  5.11.0-0.rc4.129.eln108.ppc64le #2
   NIP:  c0080000081b4a90 LR: c0080000081b4a2c CTR: c0000000007ce1c0
   REGS: c000000027fa77b0 TRAP: 0300   Tainted: G        W      X --------- ---   (5.11.0-0.rc4.129.eln108.ppc64le)
   MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 44224882  XER: 00000000
   CFAR: c0000000007ce200 DAR: 0000000000000008 DSISR: 40000000 IRQMASK: 0
   GPR00: c0080000081b4a2c c000000027fa7a50 c0080000081c3900 0000000000000040
   GPR04: c000000002023080 c000000012e1c300 000020072ad70000 0000000000000001
   GPR08: c000000001726068 0000000000000008 0000000000000008 c0080000081b5758
   GPR12: c0000000007ce1c0 c0000007fffc3000 00000001590b1e40 0000000000000000
   GPR16: 0000000000000000 0000000000000001 000000011ad68fc8 00007fffcc09c5c8
   GPR20: 0000000000000008 0000000000000000 00000001590b2850 00000001590b1d30
   GPR24: 0000000000043d68 000000011ad67a80 000000011ad67a80 0000000000100000
   GPR28: c000000012e1c300 c0000000020271c8 0000000000000001 c0080000081bf608
   NIP [c0080000081b4a90] siw_init_cpulist+0x194/0x214 [siw]
   LR [c0080000081b4a2c] siw_init_cpulist+0x130/0x214 [siw]
   Call Trace:
   [c000000027fa7a50] [c0080000081b4a2c] siw_init_cpulist+0x130/0x214 [siw] (unreliable)
   [c000000027fa7a90] [c0080000081b4e68] siw_init_module+0x40/0x2a0 [siw]
   [c000000027fa7b30] [c0000000000124f4] do_one_initcall+0x84/0x2e0
   [c000000027fa7c00] [c000000000267ffc] do_init_module+0x7c/0x350
   [c000000027fa7c90] [c00000000026a180] __do_sys_init_module+0x210/0x250
   [c000000027fa7db0] [c0000000000387e4] system_call_exception+0x134/0x230
   [c000000027fa7e10] [c00000000000d660] system_call_common+0xf0/0x27c
   Instruction dump:
   40810044 3d420000 e8bf0000 e88a82d0 3d420000 e90a82c8 792a1f24 7cc4302a
   7d2642aa 79291f24 7d25482a 7d295214 <7d4048a8> 7d4a3b78 7d4049ad 40c2fff4

Fixes: bdcf26bf9b3a ("rdma/siw: network and RDMA core interface")
Link: https://lore.kernel.org/r/20210201112922.141085-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-08 20:04:57 -04:00
Bernard Metzler
661f385961 RDMA/siw: Fix handling of zero-sized Read and Receive Queues.
During connection setup, the application may choose to zero-size inbound
and outbound READ queues, as well as the Receive queue.  This patch fixes
handling of zero-sized queues, but not prevents it.

Kamal Heib says in an initial error report:

 When running the blktests over siw the following shift-out-of-bounds is
 reported, this is happening because the passed IRD or ORD from the ulp
 could be zero which will lead to unexpected behavior when calling
 roundup_pow_of_two(), fix that by blocking zero values of ORD or IRD.

   UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
   shift exponent 64 is too large for 64-bit type 'long unsigned int'
   CPU: 20 PID: 3957 Comm: kworker/u64:13 Tainted: G S     5.10.0-rc6 #2
   Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.1.5 04/11/2016
   Workqueue: iw_cm_wq cm_work_handler [iw_cm]
   Call Trace:
    dump_stack+0x99/0xcb
    ubsan_epilogue+0x5/0x40
    __ubsan_handle_shift_out_of_bounds.cold.11+0xb4/0xf3
    ? down_write+0x183/0x3d0
    siw_qp_modify.cold.8+0x2d/0x32 [siw]
    ? __local_bh_enable_ip+0xa5/0xf0
    siw_accept+0x906/0x1b60 [siw]
    ? xa_load+0x147/0x1f0
    ? siw_connect+0x17a0/0x17a0 [siw]
    ? lock_downgrade+0x700/0x700
    ? siw_get_base_qp+0x1c2/0x340 [siw]
    ? _raw_spin_unlock_irqrestore+0x39/0x40
    iw_cm_accept+0x1f4/0x430 [iw_cm]
    rdma_accept+0x3fa/0xb10 [rdma_cm]
    ? check_flush_dependency+0x410/0x410
    ? cma_rep_recv+0x570/0x570 [rdma_cm]
    nvmet_rdma_queue_connect+0x1a62/0x2680 [nvmet_rdma]
    ? nvmet_rdma_alloc_cmds+0xce0/0xce0 [nvmet_rdma]
    ? lock_release+0x56e/0xcc0
    ? lock_downgrade+0x700/0x700
    ? lock_downgrade+0x700/0x700
    ? __xa_alloc_cyclic+0xef/0x350
    ? __xa_alloc+0x2d0/0x2d0
    ? rdma_restrack_add+0xbe/0x2c0 [ib_core]
    ? __ww_mutex_die+0x190/0x190
    cma_cm_event_handler+0xf2/0x500 [rdma_cm]
    iw_conn_req_handler+0x910/0xcb0 [rdma_cm]
    ? _raw_spin_unlock_irqrestore+0x39/0x40
    ? trace_hardirqs_on+0x1c/0x150
    ? cma_ib_handler+0x8a0/0x8a0 [rdma_cm]
    ? __kasan_kmalloc.constprop.7+0xc1/0xd0
    cm_work_handler+0x121c/0x17a0 [iw_cm]
    ? iw_cm_reject+0x190/0x190 [iw_cm]
    ? trace_hardirqs_on+0x1c/0x150
    process_one_work+0x8fb/0x16c0
    ? pwq_dec_nr_in_flight+0x320/0x320
    worker_thread+0x87/0xb40
    ? __kthread_parkme+0xd1/0x1a0
    ? process_one_work+0x16c0/0x16c0
    kthread+0x35f/0x430
    ? kthread_mod_delayed_work+0x180/0x180
    ret_from_fork+0x22/0x30

Fixes: a531975279f3 ("rdma/siw: main include file")
Fixes: f29dd55b0236 ("rdma/siw: queue pair methods")
Fixes: 8b6a361b8c48 ("rdma/siw: receive path")
Fixes: b9be6f18cf9e ("rdma/siw: transmit path")
Fixes: 303ae1cdfdf7 ("rdma/siw: application interface")
Link: https://lore.kernel.org/r/20210108125845.1803-1-bmt@zurich.ibm.com
Reported-by: Kamal Heib <kamalheib1@gmail.com>
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-08 16:29:18 -04:00
Zheng Yongjun
90eef9f712 RDMA: Convert comma to semicolon
Replace a comma between expression statements by a semicolon.

Link: https://lore.kernel.org/r/20201214134118.4349-1-zhengyongjun3@huawei.com
Link: https://lore.kernel.org/r/20201214134146.4456-1-zhengyongjun3@huawei.com
Link: https://lore.kernel.org/r/20201214134218.4510-1-zhengyongjun3@huawei.com
Link: https://lore.kernel.org/r/20201214134243.4563-1-zhengyongjun3@huawei.com
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-07 13:53:41 -04:00
Jason Gunthorpe
a9d2e9ae95 RDMA/siw,rxe: Make emulated devices virtual in the device tree
This moves siw and rxe to be virtual devices in the device tree:

lrwxrwxrwx 1 root root 0 Nov  6 13:55 /sys/class/infiniband/rxe0 -> ../../devices/virtual/infiniband/rxe0/

Previously they were trying to parent themselves to the physical device of
their attached netdev, which doesn't make alot of sense.

My hope is this will solve some weird syzkaller hits related to sysfs as
it could be possible that the parent of a netdev is another netdev, eg
under bonding or some other syzkaller found netdev configuration.

Nesting a ib_device under anything but a physical device is going to cause
inconsistencies in sysfs during destructions.

Link: https://lore.kernel.org/r/0-v1-dcbfc68c4b4a+d6-virtual_dev_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 16:14:31 -04:00
Christoph Hellwig
5a7a9e038b RDMA/core: remove use of dma_virt_ops
Use the ib_dma_* helpers to skip the DMA translation instead.  This
removes the last user if dma_virt_ops and keeps the weird layering
violation inside the RDMA core instead of burderning the DMA mapping
subsystems with it.  This also means the software RDMA drivers now don't
have to mess with DMA parameters that are not relevant to them at all, and
that in the future we can use PCI P2P transfers even for software RDMA, as
there is no first fake layer of DMA mapping that the P2P DMA support.

Link: https://lore.kernel.org/r/20201106181941.1878556-8-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-17 15:22:07 -04:00
Jason Gunthorpe
bf3b7b7ba9 Merge branch 'for-rc' into rdma.git
From https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git

The rc RDMA branch is needed due to dependencies on the next patches.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-17 15:20:26 -04:00
Christoph Hellwig
b1e678bf29 RMDA/sw: Don't allow drivers using dma_virt_ops on highmem configs
dma_virt_ops requires that all pages have a kernel virtual address.
Introduce a INFINIBAND_VIRT_DMA Kconfig symbol that depends on !HIGHMEM
and make all three drivers depend on the new symbol.

Also remove the ARCH_DMA_ADDR_T_64BIT dependency, which has been obsolete
since commit 4965a68780c5 ("arch: define the ARCH_DMA_ADDR_T_64BIT config
symbol in lib/Kconfig")

Fixes: 551199aca1c3 ("lib/dma-virt: Add dma_virt_ops")
Link: https://lore.kernel.org/r/20201106181941.1878556-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12 13:27:41 -04:00
Jason Gunthorpe
5c4193669b RDMA/rxe,siw: Restore uverbs_cmd_mask IB_USER_VERBS_CMD_POST_SEND
These two drivers open code the call to POST_SEND and do not use the
rdma-core wrapper to do it, thus their usages was missed during the audit.

Both drivers use this as a doorbell to signal the kernel to start DMA.

Fixes: 628c02bf38aa ("RDMA: Remove uverbs cmds from drivers that don't use them")
Link: https://lore.kernel.org/r/0-v1-4608c5610afa+fb-uverbs_cmd_post_send_fix_jgg@nvidia.com
Reported-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 15:32:06 -04:00
Zhang Qilong
856c299899 RDMA/siw: Fix typo of EAGAIN not -EAGAIN in siw_cm_work_handler()
The rv cannot be 'EAGAIN' in the previous path, we should use '-EAGAIN' to
check it. For example:

Call trace:
 ->siw_cm_work_handler
	->siw_proc_mpareq
		->siw_recv_mpa_rr

Link: https://lore.kernel.org/r/20201028122509.47074-1-zhangqilong3@huawei.com
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 15:26:53 -04:00
Parav Pandit
683a9c7ed8 RDMA: Fix software RDMA drivers for dma mapping error
The commit f959dcd6ddfd ("dma-direct: Fix potential NULL pointer
dereference") made dma_mask as mandetory field to be setup even for
dma_virt_ops based dma devices. The commit in the fixes tag omitted
setting up the dma_mask on virtual devices triggering the below trace when
they were combined during the merge window.

Fix it by setting empty DMA MASK for software based RDMA devices.

  WARNING: CPU: 1 PID: 8488 at kernel/dma/mapping.c:149 dma_map_page_attrs+0x493/0x700
  CPU: 1 PID: 8488 Comm: syz-executor144 Not tainted 5.9.0-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:dma_map_page_attrs+0x493/0x700 kernel/dma/mapping.c:149
  Trace:
   dma_map_single_attrs include/linux/dma-mapping.h:279 [inline]
   ib_dma_map_single include/rdma/ib_verbs.h:3967 [inline]
   ib_mad_post_receive_mads+0x23f/0xd60 drivers/infiniband/core/mad.c:2715
   ib_mad_port_start drivers/infiniband/core/mad.c:2862 [inline]
   ib_mad_port_open drivers/infiniband/core/mad.c:3016 [inline]
   ib_mad_init_device+0x72b/0x1400 drivers/infiniband/core/mad.c:3092
   add_client_context+0x405/0x5e0 drivers/infiniband/core/device.c:680
   enable_device_and_get+0x1d5/0x3c0 drivers/infiniband/core/device.c:1301
   ib_register_device drivers/infiniband/core/device.c:1376 [inline]
   ib_register_device+0x7a7/0xa40 drivers/infiniband/core/device.c:1335
   rxe_register_device+0x46d/0x570 drivers/infiniband/sw/rxe/rxe_verbs.c:1182
   rxe_add+0x12fe/0x16d0 drivers/infiniband/sw/rxe/rxe.c:247
   rxe_net_add+0x8c/0xe0 drivers/infiniband/sw/rxe/rxe_net.c:507
   rxe_newlink drivers/infiniband/sw/rxe/rxe.c:269 [inline]
   rxe_newlink+0xb7/0xe0 drivers/infiniband/sw/rxe/rxe.c:250
   nldev_newlink+0x30e/0x540 drivers/infiniband/core/nldev.c:1555
   rdma_nl_rcv_msg+0x367/0x690 drivers/infiniband/core/netlink.c:195
   rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
   rdma_nl_rcv+0x2f2/0x440 drivers/infiniband/core/netlink.c:259
   netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
   netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
   netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1919
   sock_sendmsg_nosec net/socket.c:651 [inline]
   sock_sendmsg+0xcf/0x120 net/socket.c:671
   ____sys_sendmsg+0x6e8/0x810 net/socket.c:2353
   ___sys_sendmsg+0xf3/0x170 net/socket.c:2407
   __sys_sendmsg+0xe5/0x1b0 net/socket.c:2440
   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x443699

Link: https://lore.kernel.org/r/20201030093803.278830-1-parav@nvidia.com
Reported-by: syzbot+34dc2fea3478e659af01@syzkaller.appspotmail.com
Fixes: e0477b34d9d1 ("RDMA: Explicitly pass in the dma_device to ib_register_device")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Tested-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Zhu Yanjun <yanjunz@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 15:14:56 -04:00
Jason Gunthorpe
628c02bf38 RDMA: Remove uverbs cmds from drivers that don't use them
Allowing userspace to invoke these commands is probably going to crash
these drivers as they are not tested and not expecting to use them on a
user object.

For example pvrdma touches cq->ring_state which is not initialized for
user QPs.

These commands are effected:

- IB_USER_VERBS_CMD_REQ_NOTIFY_CQ is ibv_cmd_req_notify_cq() in
  rdma-core, only hfi1, ipath and rxe calls it.

- IB_USER_VERBS_CMD_POLL_CQ is ibv_cmd_poll_cq() in rdma-core, only
  ipath and hfi1 calls it.

- IB_USER_VERBS_CMD_POST_SEND/RECV is ibv_cmd_post_send/recv() in
  rdma-core, only ipath and hfi1 call them.

- IB_USER_VERBS_CMD_POST_SRQ_RECV is ibv_cmd_post_srq_recv() in
  rdma-core, only ipath and hfi1 calls it.

- IB_USER_VERBS_CMD_PEEK_CQ isn't even implemented anywhere

- IB_USER_VERBS_CMD_CREATE/DESTROY_AH is ibv_cmd_create/destroy_ah() in
  rdma-core, only bnxt_re, efa, hfi1, ipath, mlx5, orcrdma, and rxe call
  it.

Link: https://lore.kernel.org/r/10-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:28:00 -03:00