IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Since IBoE is using Ethernet as its link layer, there is no central
management entity so there is need for QP0. QP1 is still needed since
it handles communications between CM agents. This patch will skip QP0
and create only QP1 for IBoE ports.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
IPoIB is IP-over-Infiniband link layer. In the case of IBoE, the link
layer is Ethernet and IP can work directly over Ethernet, so disable
IPoIB for non-IB_LINK_LAYER_INFINIBAND ports.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We tried very hard to remove all possible dev_hold()/dev_put() pairs in
network stack, using RCU conversions.
There is still an unavoidable device refcount change for every dst we
create/destroy, and this can slow down some workloads (routers or some
app servers, mmap af_packet)
We can switch to a percpu refcount implementation, now dynamic per_cpu
infrastructure is mature. On a 64 cpus machine, this consumes 256 bytes
per device.
On x86, dev_hold(dev) code :
before
lock incl 0x280(%ebx)
after:
movl 0x260(%ebx),%eax
incl fs:(%eax)
Stress bench :
(Sending 160.000.000 UDP frames,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_TRIE)
Before:
real 1m1.662s
user 0m14.373s
sys 12m55.960s
After:
real 0m51.179s
user 0m15.329s
sys 10m15.942s
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A process can get stuck in an uninterruptible wait in the
kernel while destroying a cm_id when iw_cm_connect() fails:
For example, When creation of a PD fails but the user continues with
an attempt to connect to the server without checking the return value,
in iw_cm_connect() a NULL qp is found so the call fails. However the
IWCM_F_CONNECT_WAIT bit is not cleared. destroy_cm_id() then waits
forever for IWCM_F_CONNECT_WAIT to be cleared.
The same problem exists on the passive side with the accept call.
Fix this by clearing the bit and waking up any waiters in the
appropriate spots.
Signed-off-by: Animesh Trivedi <atr@zurich.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We can replace our equivalent open-coded version.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix the limit on the size of max fast registration WRs that can be
posted to match hardware capabilities.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This lets the bonding driver to detect when an interface goes down.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
With commit cd6860eb ("RDMA/nes: Fix hangs on ifdown") we no longer
remove nes interfaces on ifdown. On nes_query_port(), add an
additional check of the netdev queue and report IB_PORT_DOWN if the
queue is not running.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
the eHCA driver registers a MR for all of kernel memory, but makes the
assumption that valid memory exists at KERNELBASE. This assumption
may not be true in the case of a relocatable kernel, so use KERNELBASE
+ PHYSICAL_START to get the true beginning of usable kernel memory.
cc: Joachim Fenkes <fenkes@de.ibm.com>
cc: Christoph Raisch <raisch@de.ibm.com>
cc: Hoan-Ham Hguyen <hnguyen@de.ibm.com>
Signed-off-by: Sonny Rao <sonnyrao@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Get rid of init_MUTEX[_LOCKED]() and use sema_init() instead.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Just a small cleanup. The "passive_state" variable isn't used any
more after commit dae58728dc ("RDMA/nes: Fix double CLOSE event
indication crash")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
IGMP processing is broken because the IPOIB does not set the
skb->pkt_type the right way for multicast traffic. All incoming
packets are set to PACKET_HOST which means that igmp_recv() will
ignore the IGMP broadcasts/multicasts.
This in turn means that the IGMP timers are firing and are sending
information about multicast subscriptions unnecessarily. In a large
private network this can cause traffic spikes.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
- Remove dsgl support - doesn't work in T4.
- Wrap the immediate PBL as needed when building it in the wr.
- Adjust max pbl depth allowed based on ulptx alignment requirements.
- Bump the slots per SQ to 5 to allow up to 128MB fast registers.
- Advertise fastreg support by default.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Move the connection setup/teardown paths to the workq thread removing
spin lock/irq disable requirements for these paths. This allows calls
down to the LLD for EP and QP state transition actions to be atomic
with respect to processing CPL messages coming up from the HW.
Namely, calls to rdma_init() and rdma_fini() can now be called with
the mutex held avoiding many race conditions with the abort path.
The QP spinlock is still used but only to manipulate the qp state. This
allows the fastpaths, poll, post_send, and pos_recv, to run in the
irq context.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
T4 support on-chip SQs to reduce latency. This patch adds support for
this in iw_cxgb4:
- Manage ocqp memory like other adapter mem resources.
- Allocate user mode SQs from ocqp mem if available.
- Map ocqp mem to user process using write combining.
- Map PCIE_MA_SYNC reg to user process.
Bump uverbs ABI.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add "stags" debugfs file. This is useful for examining the TPTE and
PBL entries in adapter memory. It allows scripts to dump just the
active entries.
Also clean up the "qps" file handlers and shared common code.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This helps debug cases where HW resources are depleted.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
T4 FW sends up CPL_RDMA_TERMINATE to indicate a peer TERM. This
triggers the QP moving to TERMINATE state.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
T4 incorrectly inserts TERM CQEs into the CQ. Silently ignore them.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The cxgb4_*_send() functions return NET_XMIT_ values, which are
positive integers or negative errno values. So don't treat positive
return values as an error.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The HW design requires zeroing any pad in SGLs.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In c4iw_modify_qp() error path, only use qhp->ep if ep is not already set.
Otherwise qhp->ep can be NULL and we crash.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix:
drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_alloc_fast_reg_page_list':
drivers/infiniband/hw/nes/nes_verbs.c:477: warning: cast to pointer from integer of different size
drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_post_send':
drivers/infiniband/hw/nes/nes_verbs.c:3486: warning: cast to pointer from integer of different size
drivers/infiniband/hw/nes/nes_verbs.c:3486: warning: cast to pointer from integer of different size
by printing u64 quantities by casting to unsigned long and long and
using %llx, rather than casting to void* and using %p.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch allows ports to have different link layers:
IB_LINK_LAYER_INFINIBAND or IB_LINK_LAYER_ETHERNET. This is required
for adding IBoE (InfiniBand-over-Ethernet, aka RoCE) support. For
devices that do not provide an implementation for querying the link
layer property of a port, we return a default value based on the
transport: RMA_TRANSPORT_IB nodes will return IB_LINK_LAYER_INFINIBAND
and RDMA_TRANSPORT_IWARP nodes will return IB_LINK_LAYER_ETHERNET.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix:
drivers/infiniband/hw/cxgb4/qp.c: In function ‘create_qp’:
drivers/infiniband/hw/cxgb4/qp.c:147: warning: cast from pointer to integer of different size
drivers/infiniband/hw/cxgb4/qp.c: In function ‘rdma_fini’:
drivers/infiniband/hw/cxgb4/qp.c:988: warning: cast from pointer to integer of different size
drivers/infiniband/hw/cxgb4/qp.c: In function ‘rdma_init’:
drivers/infiniband/hw/cxgb4/qp.c:1063: warning: cast from pointer to integer of different size
drivers/infiniband/hw/cxgb4/mem.c: In function ‘write_adapter_mem’:
drivers/infiniband/hw/cxgb4/mem.c:74: warning: cast from pointer to integer of different size
drivers/infiniband/hw/cxgb4/cq.c: In function ‘destroy_cq’:
drivers/infiniband/hw/cxgb4/cq.c:58: warning: cast from pointer to integer of different size
drivers/infiniband/hw/cxgb4/cq.c: In function ‘create_cq’:
drivers/infiniband/hw/cxgb4/cq.c:135: warning: cast from pointer to integer of different size
drivers/infiniband/hw/cxgb4/cm.c: In function ‘fw6_msg’:
drivers/infiniband/hw/cxgb4/cm.c:2326: warning: cast to pointer from integer of different size
by casting pointers to unsigned long instead of u64.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The HW by default has RX coalescing on. For iWARP connections, this
causes a 100ms delay in connection establishement due to the ingress
MPA Start message being stalled in HW. So explicitly turn RX
coalescing off when setting up iWARP connections.
This was causing very bad performance for NP64 gather operations using
Open MPI, due to the way it sets up connections on larger jobs.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Changing state to CLOSING when FIN is received causes A0 cards to
hang. Fix this by checking for A0 cards in FIN handling.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When the driver receives an AE for FIN received, it closes the
connection without changing the state of the connection in the
hardware to closing. By changing the state to closing, hardware will
do a normal close sequence.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
During a stress testing in a large cluster, multiple close event are
detected and BUG() is hit in the iWARP core. The cause is that the
active node gave up while waiting for an MPA response from the peer
and tried to close the connection by sending RST. The passive node
driver receives the RST but is waiting for MPA response from the user.
When the MPA accept is received, the driver offloads the connection
and sends a CLOSE event. The driver gets an AE indicating RESET
received and also sends a CLOSE event, hitting a BUG().
Fix this by correcting RESET handling and sending CLOSE events.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Setting TX pause param writes to the wrong register location causing
the adapter to hang. Correct the define used to write the reigster.
Addresses: https://bugs.openfabrics.org/show_bug.cgi?id=2116
Reported-by: Shiri Franchi <shirif@voltaire.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The max depth supported by T3 is 64K entries. This fixes a bug
introduced in commit 9918b28d ("RDMA/cxgb3: Increase the max CQ
depth") that causes stalls and possibly crashes in large MPI clusters.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (42 commits)
IB/qib: Add missing <linux/slab.h> include
IB/ehca: Drop unnecessary NULL test
RDMA/nes: Fix confusing if statement indentation
IB/ehca: Init irq tasklet before irq can happen
RDMA/nes: Fix misindented code
RDMA/nes: Fix showing wqm_quanta
RDMA/nes: Get rid of "set but not used" variables
RDMA/nes: Read firmware version from correct place
IB/srp: Export req_lim via sysfs
IB/srp: Make receive buffer handling more robust
IB/srp: Use print_hex_dump()
IB: Rename RAW_ETY to RAW_ETHERTYPE
RDMA/nes: Fix two sparse warnings
RDMA/cxgb3: Make needlessly global iwch_l2t_send() static
IB/iser: Make needlessly global iser_alloc_rx_descriptors() static
RDMA/cxgb4: Add timeouts when waiting for FW responses
IB/qib: Fix race between qib_error_qp() and receive packet processing
IB/qib: Limit the number of packets processed per interrupt
IB/qib: Allow writes to the diag_counters to be able to clear them
IB/qib: Set cfgctxts to number of CPUs by default
...
of_device is just an alias for platform_device, so remove it entirely. Also
replace to_of_device() with to_platform_device() and update comment blocks.
This patch was initially generated from the following semantic patch, and then
edited by hand to pick up the bits that coccinelle didn't catch.
@@
@@
-struct of_device
+struct platform_device
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Reviewed-by: David S. Miller <davem@davemloft.net>
Fix build failure on sparc64 which is missing the include of
<linux/slab.h> via <asm/pci.h> that x86, powerpc, ia64, etc. have.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
list_for_each_entry binds its first argument to a non-null value, and thus
any null test on the value of that argument is superfluous.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
iterator I;
expression x;
statement S,S1,S2;
@@
I(x,...) { <...
- if (x == NULL && ...) S
...> }
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix confusing indentation that makes a statement look as if it's part of
an if statement when in fact it isn't.
Reported-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Initialize tasklet before interrupts are requested to prevent
scheduling of an uninitialized tasklet.
Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>