linux

iv/linux

Author	SHA1	Message	Date
Majd Dibbiny	0b24e5ac93	IB/core: Add extended device capability flags Since all the uverbs device_cap_flags are occupied, we need a place to expose more device capabilities. This patch adds a new 64 bit device_cap_flags_ex to expose new device capabilities. The lower 32 bits will be identical to the original device_cap_flags, The upper 32 bits will be new capabilities. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:27 -04:00
Colin Ian King	73fbb4db6c	i40iw: pass hw_stats by reference rather than by value passing hw_stats by value requires a 280 byte copy so instead pass it by reference is much more efficient. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Chien Tin Tung <chien.tin.tung@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:26 -04:00
Lars-Peter Clausen	d42ed55aaf	i40iw: Remove unnecessary synchronize_irq() before free_irq() Calling synchronize_irq() right before free_irq() is quite useless. On one hand the IRQ can easily fire again before free_irq() is entered, on the other hand free_irq() itself calls synchronize_irq() internally (in a race condition free way), before any state associated with the IRQ is freed. Patch was generated using the following semantic patch: // <smpl> @@ expression irq; @@ -synchronize_irq(irq); free_irq(irq, ...); // </smpl> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Faisal Latif <faisal.latif#intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:26 -04:00
Julia Lawall	fb997816fd	i40iw: constify i40iw_vf_cqp_ops structure The i40iw_vf_cqp_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:26 -04:00
Guy Levi	37aa5c36aa	IB/mlx5: Add UARs write-combining and non-cached mapping By this patch, the user space library will be able to improve performance using appropriate ringing DoorBell method according to the memory type it asked for. Currently only one mapping command is allowed for UARs: MLX5_IB_MMAP_REGULAR_PAGE. Using this mapping, the kernel maps the UARs to write-combining (WC) if the system supports it. If the system is not supporting WC the UARs are mapped to non-cached(NC). In this case the user space library can't tell which mapping is applied. This patch adds 2 new mapping commands: MLX5_IB_MMAP_WC_PAGE and MLX5_IB_MMAP_NC_PAGE. For these commands the kernel maps exactly as requested and fails if it can't. Since there is no generic way to check if the requested memory region can be mapped as WC, driver enables conclusive WC mapping only for x86, PowerPC and ARM which support WC for the device's memory region. Signed-off-by: Guy Levy <guyle@mellanox.com> Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:03 -04:00
Matan Barak	6cbac1e4cd	IB/mlx5: Allow mapping the free running counter on PROT_EXEC The current mlx5 code disallows mapping the free running counter of mlx5 based hardwares when PROT_EXEC is set. Although this behaviour is correct, Linux does add an implicit VM_EXEC to the vm_flags if the READ_IMPLIES_EXEC bit is set in the process personality. This happens for example if the process stack is executable. This causes libmlx5 to output a warning and prevents the user from reading the free running clock. Executing the init segment of the hardware isn't a security risk (at least no more than executing a process own stack), so we just prevent writes to there. Fixes: d69e3bcf7976 ('IB/mlx5: Mmap the HCA's core clock register to user-space') Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:03 -04:00
Geliang Tang	ee71b9681d	IB/mlx4: Use list_for_each_entry_safe Simplify the code in search_relocate_mgid0_group with by using list_for_each_entry_safe(). Signed-off-by: Geliang Tang <geliangtang@163.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:03 -04:00
Mark Bloch	0f377d8625	IB/SA: Use correct free function Fixes a direct call to kfree_skb when nlmsg_free should be used. Fixes: 2ca546b92a02 ('IB/sa: Route SA pathrecord query through netlink') Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:02 -04:00
Mark Bloch	2fa2d4fb11	IB/core: Fix a potential array overrun in CMA and SA agent Fix array overrun when going over callback table. In declaration of callback table, the max size isn't provided and in registration phase, it is provided. There is potential scenario where a new operation is added and it is not supported by current client. The acceptance of such operation by ib_netlink will cause to array overrun. Fixes: 809d5fc9bf65 ("infiniband: pass rdma_cm module to netlink_dump_start") Fixes: b493d91d333e ("iwcm: common code for port mapper") Fixes: 2ca546b92a02 ("IB/sa: Route SA pathrecord query through netlink") Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:02 -04:00
Mark Bloch	1ae5ccc781	IB/core: Remove unnecessary check in ibnl_rcv_msg RDMA_NL_GET_OP is defined like this: (type & ((1 << 10) - 1)) which means op (defined as an int) can never be a negative number. Fixes: b2cbae2c2487 ('RDMA: Add netlink infrastructure') Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:01 -04:00
Mark Bloch	5ed935e861	IB/IWPM: Fix a potential skb leak In case ibnl_put_msg fails in send_nlmsg_done, the function returns with -ENOMEM without freeing. This patch fixes this behavior. Fixes: 30dc5e63d6a5 ("RDMA/core: Add support for iWARP Port Mapper user space service") Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:01 -04:00
Andy Shevchenko	faca88273b	RDMA/nes: replace custom print_hex_dump() There is no need to duplicate a lot of code that is in the kernel library for ages. Replace duplicating code by calling to print_hex_dump() directly. Note that output is slightly changed: - hex and ascii parts have just two spaces delimeter - there is no delimeter for ascii portions - file and line removed from prefix (they were redundant anyway since previous output shows same closer enough) Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:01 -04:00
Colin Ian King	aa70345369	IB/mlx4: trivial fix of spelling mistake on "argument" fix spelling mistake, argumant -> argument Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:00 -04:00
Denys Vlasenko	da74bf4aea	IB/nes: Deinline nes_free_qp_mem, save 1072 bytes This function compiles to 550 bytes of machine code. Three callsites, all in nes_create_qp. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: Faisal Latif <faisal.latif@intel.com> CC: Doug Ledford <dledford@redhat.com> CC: linux-rdma@vger.kernel.org CC: linux-kernel@vger.kernel.org Reviewed-By: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:40:00 -04:00
Tatyana Nikolova	1997412db6	RDMA/nes: Adding queue drain functions Adding sq and rq drain functions, which block until all previously posted wr-s in the specified queue have completed. A completion object is signaled to unblock the thread, when the last cqe for the corresponding queue is processed. Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Faisal Latif <faisal.latif@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:59 -04:00
Bart Van Assche	825107a237	iwcm: Fix a sparse warning Avoid that sparse complains about the comparison of s_addr with INADDR_ANY. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Faisal Latif <faisal.latif@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:59 -04:00
Bart Van Assche	54f5c9c52d	IB/srp: Fix a debug kernel crash Avoid that the following BUG() is triggered against a debug kernel: kernel BUG at include/linux/scatterlist.h:92! RIP: 0010:[<ffffffffa0467199>] [<ffffffffa0467199>] srp_map_idb+0x199/0x1a0 [ib_srp] Call Trace: [<ffffffffa04685fa>] srp_map_data+0x84a/0x890 [ib_srp] [<ffffffffa0469674>] srp_queuecommand+0x1e4/0x610 [ib_srp] [<ffffffff813f5a5e>] scsi_dispatch_cmd+0x9e/0x180 [<ffffffff813f8b07>] scsi_request_fn+0x477/0x610 [<ffffffff81298ffe>] __blk_run_queue+0x2e/0x40 [<ffffffff81299070>] blk_delay_work+0x20/0x30 [<ffffffff81071f07>] process_one_work+0x197/0x480 [<ffffffff81072239>] worker_thread+0x49/0x490 [<ffffffff810787ea>] kthread+0xea/0x100 [<ffffffff8159b632>] ret_from_fork+0x22/0x40 Fixes: f7f7aab1a5c0 ("IB/srp: Convert to new registration API") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> # v4.4+ Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:59 -04:00
Hans Westgaard Ry	e3614bc9dc	IB/ipoib: Add readout of statistics using ethtool IPoIB collects statistics of traffic including number of packets sent/received, number of bytes transferred, and certain errors. This patch makes these statistics available to be queried by ethtool. Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Tested-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:43 -04:00
Sebastian Andrzej Siewior	01690e9c70	infiniband/ulp/ipoib: remove pkey_mutex The last user of pkey_mutex was removed in db84f8803759 ("IB/ipoib: Use P_Key change event instead of P_Key polling mechanism") but the lock remained. This patch removes it. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:43 -04:00
Colin Ian King	78c49f83ee	i40iw: pass hw_stats by reference rather than by value passing hw_stats by value requires a 280 byte copy so instead pass it by reference is much more efficient. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Chien Tin Tung <chien.tin.tung@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:34 -04:00
Lars-Peter Clausen	6d6c5c1eb3	i40iw: Remove unnecessary synchronize_irq() before free_irq() Calling synchronize_irq() right before free_irq() is quite useless. On one hand the IRQ can easily fire again before free_irq() is entered, on the other hand free_irq() itself calls synchronize_irq() internally (in a race condition free way), before any state associated with the IRQ is freed. Patch was generated using the following semantic patch: // <smpl> @@ expression irq; @@ -synchronize_irq(irq); free_irq(irq, ...); // </smpl> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Faisal Latif <faisal.latif#intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:33 -04:00
Julia Lawall	2016de7ba0	i40iw: constify i40iw_vf_cqp_ops structure The i40iw_vf_cqp_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:33 -04:00
Muhammad Falak R Wani	eea570788e	staging/rdma/hfi1: use RCU_INIT_POINTER() when NULLing. It is safe to use RCU_INIT_POINTER() to NULL a pointer, instead of rcu_assign_pointer(). This results in slightly smaller/faster code. Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:20 -04:00
Ashutosh Dixit	3923979e9b	IB/hfi1: Change hfi1_init loop to preserve error returns If one iteration of the loop causes an error return and a later iteration doesn't, the later iteration causes the earlier error condition to be lost. This could result in driver probe succeeding when it should have failed. Therefore save off the error return in the loop itself rather than outside the loop. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:19 -04:00
Jianxin Xiong	bf77cc34f3	ib_pack.h: Add opcode definition for send with invalidate The opcode for "SEND Last with Invalidate" and "SEND Only with Invalidate" have been defined for RC in IBA Specification Vol 1 since Release 1.2. Add the definition to the header file in preparation of supporting these opcodes in rdmavt based drivers. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:19 -04:00
Jianxin Xiong	859b527f00	IB/hfi1: Keep SC_USER as the last send context type SC_USER needs to be the last send context type to ensure other send context types get their allocation when num_user_contexts is set to a large number. This fixes a panic when the module parameter num_user_contexts is set to 141 and larger. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:19 -04:00
Dean Luick	f036780be8	IB/hfi1: Immediately apply congestion setting MAD The handling of the congestion setting MAD packet only saved off the values, waiting for a congestion control table packet before going active. Instead, immediately apply the values. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:18 -04:00
Jakub Pawlak	cde10afa6c	IB/hfi1: Correct log message strings Remove "IB" keyword from log messages. Correct comment for thermal sensor init function. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:18 -04:00
Mike Marciniszyn	cdbff5042d	IB/rdmavt: Increase CQ callback thread priority The priority of the send engines is higher than the CQ completion thread potentially causing completions to be starved for very fast interfaces. Change the CQ kthread to match the send engine threads to minimize this delay for ULP completion processing. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:17 -04:00
Jubin John	02ba00c0bb	IB/hfi1: Fix hfi_rcvhdr tracepoint The hfi_rcvhdr tracepoint has the ctxt and eflags switched in the prototype of the trace event, compared to the args and usage of the trace function. Fix this by swapping these 2 fields in the trace event prototype. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:17 -04:00
Jubin John	63d0b4a5fc	IB/hfi1: Remove unnecessary header While running perftests, there is a significant utilization of the random number daemon. This is due to the linux/random.h header being included in qp.c and verbs.c. However, none of the functions from this header are being used in these files, so remove the unnecessary header. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:17 -04:00
Mitko Haralanov	67caea1fec	IB/hfi1: Improve performance of interval RB trees The interval RB tree management functions use handlers to store user-specific callback for the various tree operations. These handlers are put on a doubly-linked list. When a RB tree function is called, the list is searched for the handler of the particular tree. The list which holds the handlers is modified very rarely - when a handler is created and when a handler is removed. On the other hand, it is searched very often. This a perfect usage scenario for RCU. The result is a much lower overhead of traversing the list as most of the time no locking will be required. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:16 -04:00
Mike Marciniszyn	b96b040445	IB/hfi1: Fix potential panic with sdma drained mechanism The guard is backwards, potentially causing the SDMA client to panic if a wait structure was not specified. psm and verbs are not exposed to the issue, but fix the code just to be correct. Fixes: a545f5308b6c ("staging/rdma/hfi: fix CQ completion order issue") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:16 -04:00
Mike Marciniszyn	17f15bf668	IB/hfi1: Fix pio wait counter double increment The code unconditionlly increments the pio wait counter making the counter inacurate and unusable. Fixes: 14553ca11039 ("staging/rdma/hfi1: Adaptive PIO for short messages") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:15 -04:00
Dean Luick	1ebe79c948	IB/hfi1: Remove no-op QSFP reset code The RESET_N bit of the ASIC_QSFPn_OE register is not used by the hardware. Remove code that tries to use it - it does nothing. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:15 -04:00
Easwar Hariharan	27a340f6b2	IB/hfi1: Correct external device configuration shift The external device configuration was incorrectly shifted to byte 3 of the 32 bit DC_HOST_COMM_SETTINGS instead of byte 0. This patch corrects the shift and provides the cable capability information in byte 0. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:15 -04:00
Easwar Hariharan	9775a991f9	IB/hfi1: Wait for QSFP modules to initialize The function level reset in init_chip() and subsequent write of all 1s to the ASIC_QSFP registers effectively resets attached active and optical QSFP modules that pay attention to the RESET_N pin. We subsequently try to access the QSFP management interface to qualify and tune the channel and fabric SerDes before enough time (2 seconds per SFF 8679 spec for QSFP28 modules) has elapsed for the module to finish initialization. This fails and causes the failure of the channel tuning algorithm, preventing us from bringing the link up. This patch checks the port type prior to beginning channel and SerDes tuning, and if found to be QSFP, watches for the QSFP initialization complete interrupt, with a maximum timeout of 2 seconds, to allow the initialization to complete. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:14 -04:00
Easwar Hariharan	0c7f77afb7	IB/hfi1: Ignore non-temperature warnings on a downed link QSFP modules can raise an interrupt to inform us of expected conditions while the link is down, such as RX power low. Actively ignore these conditions when the link is down as they only add reporting noise. Continue reporting conditions that are valid at all times, such as temperature alarms and warnings. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:39:14 -04:00
Bart Van Assche	ba987e51a6	iw_cxgb4: Convert a __force cast __force casts should be avoided if there is a better alternative. Hence modify the comparison of s_addr with INADDR_ANY such that the __force cast is no longer necessary. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Vipul Pandya <vipul@chelsio.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:11 -04:00
Hariprasad S	64bec74a9b	RDMA/iw_cxgb4: Add arp failure handlers to send_mpa_reply/reject() These handlers when called print error message to the kernel log, but the actual handling is done by _c4iw_free_ep() and process_timeout(). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:11 -04:00
Hariprasad S	093108cb36	RDMA/iw_cxgb4: Always wake up waiter in c4iw_peer_abort_intr() Currently c4iw_peer_abort_intr() does not wake up the waiter if the endpoint state indicates we're using MPAv2 and we're currently trying to connect. This was introduced with commit 7c0a33d61187a ("RDMA/cxgb4: Don't wakeup threads for MPAv2") However, this original fix is flawed because it introduces a race that can cause a deadlock of the iwarp stack. Here is the race: ->local side sets up an active offload connection. ->local side sends MPA_START request. ->peer sends MPA_START response. ->local side ingress cpl thread begins processing the MPA_START response, but before it changes the state from MPA_REQ_SENT to FPDU_MODE: ->peer sends a RST which results in a ABORT_REQ_RSS. This triggers peer_abort_intr() which sees the state in MPA_REQ_SENT and since mpa_rev is 2, it will avoid waking up the endpoint with -ECONNRESET, assuming the stack will re-attempt the connection using MPAv1. ->Meanwhile, the cpl thread moves the state to FPDU_MODE and calls c4iw_modify_rc_qp() which calls rdma_init() which sends a RI_WR/INIT WR to firmware. But since HW sent an abort, FW correctly drops the RI_WR/INIT WR. ->So the cpl thread is stuck waiting for a reply and cannot process the ABORT_REQ_RSS cpl sitting in its input queue. Thus everything comes to a halt because no more ingress cpls are processed by the stack... The correct fix for the issue is to always do the wake up in c4iw_abort_intr() but reinitialize the wait object in c4iw_reconnect(). Fixes: 7c0a33d61187a ("RDMA/cxgb4: Don't wakeup threads for MPAv2") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:10 -04:00
Hariprasad S	4a4dd8db9d	RDMA/iw_cxgb4: Handle ret value of process_mpa_reply() in rx_data Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:10 -04:00
Hariprasad S	f86fac79af	RDMA/iw_cxgb4: atomic find and reference for listening endpoints Add get_ep_from_stid() which will atomically find and reference the endpoint struct if found. This avoids touch-after-free races between threads destroying listening endpoints and the CPL processing thread processing an incoming PASS_ACCEPT_REQ CPL. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:09 -04:00
Hariprasad S	e8667a9b35	RDMA/iw_cxgb4: Handle ULP accept/reject during ABORTING c4iw_reject() and c4iw_accept() need to handle the case where the endpoint has timed out and is in the middle of ABORTING the connection. Here is the flow that causes the BUG_ON() to fire on the server side: 1) offload connection setup and endpoint timer started 2) MPA_START request received from peer, CONNECT_REQUEST passed to ULP 3) endpoint timer fires, and process_timeout() aborts the connection, this moves the endpoint state to ABORTING until HW sends up the ABORT_RPL_RSS. 4) application exits closing the CONNECT_REQUEST cm_id. The IWCM calls c4iw_reject_cr() to destroy this connection request. 5) WHAMO: BUG_ON() because the state is ABORTING. The fix is to change c4iw_reject_cr() and c4iw_accept_cr() to fail the operation if the state is not in MPA_REQ_RCVD vs in DEAD. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:09 -04:00
Hariprasad S	ceb110a8c1	RDMA/iw_cxgb4: Release ep for for FPDU_MODE and MPA_REQ_RCVD in process_timeout ARP failure may also happen when ep in FPDU_MODE and these failures need to be handled by process_timeout(). process_timeout() also has to handle case MPA_REQ_RCVD, setting abort to 1, leading to ep resource release. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:09 -04:00
Hariprasad S	c878b706ff	RDMA/iw_cxgb4: Free skb in case of arp failure in _c4iw_free_ep() Arp failure for send_mpa_reply/reject() is handled by freeing the mpa_skb in c4iw_free_ep() before releasing ep. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:08 -04:00
Hariprasad S	944661dd97	RDMA/iw_cxgb4: atomically lookup ep and get a reference There is a race between ULP threads calling c4iw_ep_disconnect() via c4iw_modify_rc_qp() and the ingress CPL thread where the ULP thread can free the endpoint just after the ingress CPL thread finds the ep pointer in the tid table. To avoid this, we now use the hwtid_idr table for lookups instead of the LLD tid table so we can lock around insert, remove, and lookup+get_ep to avoid the race. The CPL handlers now will either find the ep ptr and have a ref on it, or not find it and they can discard the CPL. Callers of get_ep_from_tid() will have a ref on the ep if found, and thus must deref when they are done. Negative advice in peer_abort_intr() need to dereference the ep. therefore peer_abort() is scheduled to dereference the ep later. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:08 -04:00
Hariprasad S	761e19a504	RDMA/iw_cxgb4: Handle return value of c4iw_ofld_send() in abort_arp_failure() In abort_arp_failure(), the return value from c4iw_ofld_send() is ignored and thus if the CPL isn't sent, the endpoint is stuck and never gets aborted. Failure of c4iw_ofld_send() is treated as fatal error, and the ep resources are released in a safer context through process_work(). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:07 -04:00
Hariprasad S	8a70f812b1	RDMA/iw_cxgb4: in process_timeout() don't move ep state to ABORTING Moving the state to ABORTING causes the ep to get stuck because c4iw_ep_timeout() thinks the ABORT has already been done. So leave the state alone and let c4iw_ep_disconnect() do the right thing given the ep state. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:07 -04:00
Hariprasad S	caa6c9f247	RDMA/iw_cxgb4: handle return value of c4iw_l2t_send() and send_mpa_req() ->In act_open_rpl(), CPL_ERR_TCAM_FULL error handling branch, there is no handling of the return value of send_fw_act_open_req(). ->In send_fw_act_open_req(), there is no handling of return value of c4iw_l2t_send(), which may cause a ep leak and won't notify upper layers on connection establish failure. ->send_mpa_req() should act on the return from c4iw_l2t_send() and return the error to the caller. ->In case of c4iw_l2t_send() failure in send_mpa_req(), returns without starting the timer and not changing the ep state, which is further handled by act_establish() -> In act_establish()?if send_mpa_request's get_skb returns an error, may cause an ep leak. So handle return value of send_mpa_req() Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-13 19:38:07 -04:00

1 2 3 4 5 ...

589799 Commits