IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
To be used by mlx5_ib in the following patches for implementing
RAW PACKET QP.
Add mlx5_core_ prefix to alloc and delloc transport_domain since
they are exposed now.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Per user context, work with CQE version that both the user-space
and the kernel support. Report this CQE version via the response of
the alloc_ucontext command.
Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Enforce working with CQE version 1 when the user supports CQE
version 1 and asked to work this way.
If the user still works with CQE version 0, then use the default
CQE version to tell the Firmware that the user still works in the
older mode.
After this patch, the kernel still reports CQE version 0.
Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The rdma netlink local service registers a handler to handle RESOLVE
response and another handler to handle SET_TIMEOUT request. The first
thing these handlers do is to call netlink_capable() to check the
access right of the received skb to make sure that the sender has root
access. Under normal conditions, such responses and requests will be
directly forwarded to the handlers without going through the netlink_dump
pathway (see ibnl_rcv_msg() in drivers/infiniband/core/netlink.c).
However, a user application could send a RESOLVE request (not response)
to the local service, which will fall into the netlink_dump pathway,
where a new skb will be created without initializing the control block.
This new skb will be eventually forwarded to the local service RESOLVE
response handler. Unfortunately, netlink_capable() will cause general
protection fault if the skb's control block is not initialized. This
patch will address the problem by checking the skb first.
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
No usage after the conversion to the new CQ API.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Based on profiling, UD performance drops in case of processes
in a single client due to excess context switches when
the progress workqueue is scheduled.
This is solved by modifying the heuristic to select the
direct progress instead of the scheduling progress via
the workqueue when UD-like situations are detected in
the heuristic.
Reviewed-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Advertise RoCE v2 support in port_immutable attributes according to
the hardware's capabilities. This enables the verbs stack to use
RoCE v2 mode.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The mlx4 driver uses a special QP to implement the GSI QP. This kind
of QP allows to build the InfiniBand headers in software.
When mlx4 hardware builds the packet, it calculates the ICRC and puts
it at the end of the payload. However, this ICRC calculation depends
on the QP configuration, which is determined when the QP is modified
(roce_mode during INIT->RTR).
When receiving a packet, the ICRC verification doesn't depend on this
configuration.
Therefore, using two GSI QPs for send (one for each RoCE version) and
one GSI QP for receive are required.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
RoCEv2 packets are sent over IP/UDP protocols.
The mlx4 driver uses a type of RAW QP to send packets for QP1 and
therefore needs to build the network headers below BTH in software.
This patch adds option to build QP1 packets with IP and UDP headers if
RoCEv2 is requested.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If the hardware supports RoCE v2, we configure the hardware UDP
port according to the RoCE v2 Annex when mlx4_ib device is added.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In order to support modify_qp for RoCE v2, we need to set
the gid_type in the QP context.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This will be used in hardware device driver when building QP or AH
contexts.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In RoCE v2 we need to choose a source UDP port, we do so by using
entropy over the source and dest QPNs.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In order to support RoCE v2, the hardware needs to be configured
to classify certain UDP packets as RoCE v2 packets and pass it
through its RoCE pipeline. This patch enables configuring this
UDP port.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
To tell hardware about a gid with type RoCEv2, software needs a new
modifier to the SET_PORT command: MLX4_SET_PORT_ROCE_ADDR. This can
replace the old method, MLX4_SET_PORT_GID_TABLE, for RoCEv1 gids.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If the hardware supports RoCE v2 (mixed with RoCE v1) mode, we enable
it. This is necessary in order to support RoCE v2.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
IB core driver adds a property of type to struct ib_gid_attr.
The mlx4 driver should take that in consideration when modifying or
querying the hardware gid table.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Query the RoCE support from firmware using the appropriate firmware
commands. Downstream patches will read these capabilities and act
accordingly.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
We now alwasy have a per-PD local_dma_lkey available. Make use of that
fact in svc_rdma and stop registering our own MR.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
To support the server-side of an NFSv4.1 backchannel on RDMA
connections, add a transport class that enables backward
direction messages on an existing forward channel connection.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Extra resources for handling backchannel requests have to be
pre-allocated when a transport instance is created. Set up
additional fields in svcxprt_rdma to track these resources.
The max_requests fields are elements of the RPC-over-RDMA
protocol, so they should be u32. To ensure that unsigned
arithmetic is used everywhere, some other fields in the
svcxprt_rdma struct are updated.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pre-requisite to use map_xdr in the backchannel code.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Clean up.
These functions can otherwise fail, so check for page allocation
failures too.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
svc_rdma_post_recv() allocates pages for receive buffers on-demand.
It uses GFP_KERNEL so the allocator tries hard, and may sleep. But
I'm about to add a call to svc_rdma_post_recv() from a function
that may not sleep.
Since all svc_rdma_post_recv() call sites can tolerate its failure,
allow it to fail if the page allocator returns nothing. Longer term,
receive buffers, being a finite resource per-connection, should be
pre-allocated and re-used.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
To ensure this allocation cannot fail and will not sleep,
pre-allocate the req_map structures per-connection.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When the maximum payload size of NFS READ and WRITE was increased
by commit cc9a903d91 ("svcrdma: Change maximum server payload back
to RPCSVC_MAXPAYLOAD"), the size of struct svc_rdma_op_ctxt
increased to over 6KB (on x86_64). That makes allocating one of
these from a kmem_cache more likely to fail in situations when
system memory is exhausted.
Since I'm about to add a caller where this allocation must always
work _and_ it cannot sleep, pre-allocate ctxts for each connection.
Another motivation for this change is that NFSv4.x servers are
required by specification not to drop NFS requests. Pre-allocating
memory resources reduces the likelihood of a drop.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Be sure the completed ctxt is put in every path.
The xprt enqueue can take a while, so put the completed ctxt back
in circulation _before_ enqueuing the xprt.
Remove/disable debugging.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
kzalloc is used here, so setting the atomic fields to zero is
unnecessary. sc_ord is set again in handle_connect_req. The other
fields are re-initialized in svc_rdma_accept().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Previously, IPV6_DEFAULT_HOPLIMIT was used as the hop limit value for
RoCE. Fixing that by taking ip4_dst_hoplimit and ip6_dst_hoplimit as
hop limit values.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
rdma_addr_find_dmac_by_grh resolves dmac, vlan_id and if_index and
downsteram patch will also add hop_limit as an output parameter,
thus we rename it to rdma_addr_find_l2_eth_by_grh.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The IRQ_POLL_F_SCHED bit is set as long as polling is ongoing.
This means that irq_poll_sched() must proceed if this bit has
not yet been set.
Fixes: commit ea51190c03 ("irq_poll: fold irq_poll_sched_prep into irq_poll_sched").
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sparse complains about dereference before check. Fixing this by
moving the check before the dereference.
Fixes: 200298326b ('IB/core: Validate route when we init ah')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When write_gid function needs to do a sleep-able operation, it unlocks
table->rwlock and then relocks it. Sparse complains about context
imbalance.
This is safe as write_gid is always called with table->rwlock.
write_gid protects from simultaneous writes to this GID entry
by setting the GID_TABLE_ENTRY_INVALID flag.
Fixes: 9c584f0495 ('IB/core: Change per-entry lock in RoCE GID table to
one lock')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Port number is not part of ClassPortInfo attribute but is
still needed as a parameter when invoking process_mad.
To properly handle this attribute, port_num is added as a
parameter to get_counter_table and get_perf_mad was changed
not to store port_num in the attribute itself when it's
querying the ClassPortInfo attribute.
This handles issue pointed out by Matan Barak <matanb@dev.mellanox.co.il>
Fixes: 145d9c5410 ('IB/core: Display extended counter set if available')
Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Acked-by: Matan Barak <matanb@mellanox.com>
Acked-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Detected this by building the IB core with W=1. See also patch
"IB core: Fix ib_sg_to_pages()" (commit 8f5ba10ed4).
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Leon Romanovsky <leon.romanovsky@mellanox.com>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Remove the local workqueue to process mad completions and use the CQ API
instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Stop abusing wr_id and just pass the parameter explicitly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use eth_zero_addr to assign the zero address to the given address
array instead of memset when second argument is address of zero.
Signed-off-by: Lucas Tanure <tanure@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fix the following sparse warning:
drivers/infiniband/hw/mlx5/main.c:1061:29: warning: symbol 'pfn' shadows
an earlier one
drivers/infiniband/hw/mlx5/main.c:1030:21: originally declared here
Fixes: d69e3bcf79 ('IB/mlx5: Mmap the HCA's core clock register to user-space')
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The macro mlx4_foreach_non_ib_transport_port() is not used anywhere. Remove it.
Fixes: aa9a2d51a3 ("mlx4: Activate RoCE/SRIOV")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In commit dbf727de74 ("IB/core: Use GID table in AH creation and dmac
resolution") we copy source mac to mlx4_ah from the attributes of
gid at ib_ah_attr.grh.sgid_index. Now we can use it.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Hop limit value wasn't copied from attributes when ah was created.
This may influence packets for unconnected services to get dropped in
routers when endpoints are not in the same subnet.
Fixes: fa417f7b52 ("IB/mlx4: Add support for IBoE")
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Maximum number of EQE capacity per CQ was mistakenly exposed
as CQE. Fix that.
Fixes: 938fe83c8d ("net/mlx5_core: New device capabilities handling")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The h/w is designed in such a way that, if you do anything IPv6
related, a valid clip entry must be there. So take clip reference
before creating IPv6 listening servers, and then if we fail to
create server, release the clip entry.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>