linux

iv/linux

Author	SHA1	Message	Date
Matan Barak	22878dbc91	IB/core: Better checking of userspace values for receive flow steering - Don't allow unsupported comp_mask values, user should check ibv_query_device to know which features are supported. - Add a check in ib_uverbs_create_flow() to verify the size passed from the user space. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 11:12:48 -07:00
Hadar Hen Zion	f77c0162a3	IB/mlx4: Add receive flow steering support Implement ib_create_flow() and ib_destroy_flow(). Translate the verbs structures provided by the user to HW structures and call the MLX4_QP_FLOW_STEERING_ATTACH/DETACH firmware commands. On the ATTACH command completion, the firmware provides a 64-bit registration ID, which is placed into struct mlx4_ib_flow that wraps the instance of struct ib_flow which is retuned to caller. Later, this reg ID is used for detaching that flow from the firmware. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-28 09:53:56 -07:00
Hadar Hen Zion	436f2ad05a	IB/core: Export ib_create/destroy_flow through uverbs Implement ib_uverbs_create_flow() and ib_uverbs_destroy_flow() to support flow steering for user space applications. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-28 09:53:14 -07:00
Igor Ivanov	400dbc9658	IB/core: Infrastructure for extensible uverbs commands Add infrastructure to support extended uverbs capabilities in a forward/backward manner. Uverbs command opcodes which are based on the verbs extensions approach should be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. They have new header format and processed a bit differently. Whenever a specific IB_USER_VERBS_CMD_XXX is extended, which practically means it needs to have additional arguments, we will be able to add them without creating a completely new IB_USER_VERBS_CMD_YYY command or bumping the uverbs ABI version. This patch for itself doesn't provide the whole scheme which is also dependent on adding a comp_mask field to each extended uverbs command struct. The new header framework allows for future extension of the CMD arguments (ib_uverbs_cmd_hdr.in_words, ib_uverbs_cmd_hdr.out_words) for an existing new command (that is a command that supports the new uverbs command header format suggested in this patch) w/o bumping ABI version and with maintaining backward and formward compatibility to new and old libibverbs versions. In the uverbs command we are passing both uverbs arguments and the provider arguments. We split the ib_uverbs_cmd_hdr.in_words to ib_uverbs_cmd_hdr.in_words which will now carry only uverbs input argument struct size and ib_uverbs_cmd_hdr.provider_in_words that will carry the provider input argument size. Same goes for the response (the uverbs CMD output argument). For example take the create_cq call and the mlx4_ib provider: The uverbs layer gets libibverb's struct ibv_create_cq (named struct ib_uverbs_create_cq in the kernel), mlx4_ib gets libmlx4's struct mlx4_create_cq (which includes struct ibv_create_cq and is named struct mlx4_ib_create_cq in the kernel) and in_words = sizeof(mlx4_create_cq)/4 . Thus ib_uverbs_cmd_hdr.in_words carry both uverbs plus mlx4_ib input argument sizes, where uverbs assumes it knows the size of its input argument - struct ibv_create_cq. Now, if we wish to add a variable to struct ibv_create_cq, we can add a comp_mask field to the struct which is basically bit field indicating which fields exists in the struct (as done for the libibverbs API extension), but we need a way to tell what is the total size of the struct and not assume the struct size is predefined (since we may get different struct sizes from different user libibverbs versions). So we know at which point the provider input argument (struct mlx4_create_cq) begins. Same goes for extending the provider struct mlx4_create_cq. Thus we split the ib_uverbs_cmd_hdr.in_words to ib_uverbs_cmd_hdr.in_words which will now carry only uverbs input argument struct size and ib_uverbs_cmd_hdr.provider_in_words that will carry the provider (mlx4_ib) input argument size. Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-28 09:52:03 -07:00
Hadar Hen Zion	319a441d13	IB/core: Add receive flow steering support The RDMA stack allows for applications to create IB_QPT_RAW_PACKET QPs, which receive plain Ethernet packets, specifically packets that don't carry any QPN to be matched by the receiving side. Applications using these QPs must be provided with a method to program some steering rule with the HW so packets arriving at the local port can be routed to them. This patch adds ib_create_flow(), which allow providing a flow specification for a QP. When there's a match between the specification and a received packet, the packet is forwarded to that QP, in a the same way one uses ib_attach_multicast() for IB UD multicast handling. Flow specifications are provided as instances of struct ib_flow_spec_yyy, which describe L2, L3 and L4 headers. Currently specs for Ethernet, IPv4, TCP and UDP are defined. Flow specs are made of values and masks. The input to ib_create_flow() is a struct ib_flow_attr, which contains a few mandatory control elements and optional flow specs. struct ib_flow_attr { enum ib_flow_attr_type type; u16 size; u16 priority; u32 flags; u8 num_of_specs; u8 port; /* Following are the optional layers according to user request * struct ib_flow_spec_yyy * struct ib_flow_spec_zzz */ }; As these specs are eventually coming from user space, they are defined and used in a way which allows adding new spec types without kernel/user ABI change, just with a little API enhancement which defines the newly added spec. The flow spec structures are defined with TLV (Type-Length-Value) entries, which allows calling ib_create_flow() with a list of variable length of optional specs. For the actual processing of ib_flow_attr the driver uses the number of specs and the size mandatory fields along with the TLV nature of the specs. Steering rules processing order is according to the domain over which the rule is set and the rule priority. All rules set by user space applicatations fall into the IB_FLOW_DOMAIN_USER domain, other domains could be used by future IPoIB RFS and Ethetool flow-steering interface implementation. Lower numerical value for the priority field means higher priority. The returned value from ib_create_flow() is a struct ib_flow, which contains a database pointer (handle) provided by the HW driver to be used when calling ib_destroy_flow(). Applications that offload TCP/IP traffic can also be written over IB UD QPs. The ib_create_flow() / ib_destroy_flow() API is designed to support UD QPs too. A HW driver can set IB_DEVICE_MANAGED_FLOW_STEERING to denote support for flow steering. The ib_flow_attr enum type supports usage of flow steering for promiscuous and sniffer purposes: IB_FLOW_ATTR_NORMAL - "regular" rule, steering according to rule specification IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive all Ethernet traffic which isn't steered to any QP IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for multicast IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-28 09:51:52 -07:00
Or Gerlitz	6a06a4b8cf	[SCSI] IB/iser: Add Discovery support To run discovery over iSER we need to advertize the CAP_TEXT_NEGO capability towards user space. Also need to make sure the login RX buffer is posted when SendTargets TEXT PDUs are sent. For that end, we use a setting of the ISCSI_PARAM_DISCOVERY_SESS iscsi param as an indication that this is discovery session. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2013-08-26 18:53:49 +04:00
Joe Perches	8be04b9374	treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks Don't emit OOM warnings when k.alloc calls fail when there there is a v.alloc immediately afterwards. Converted a kmalloc/vmalloc with memset to kzalloc/vzalloc. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-08-20 13:06:40 +02:00
Jim Foraker	49b8e74438	IPoIB: Fix race in deleting ipoib_neigh entries In several places, this snippet is used when removing neigh entries: list_del(&neigh->list); ipoib_neigh_free(neigh); The list_del() removes neigh from the associated struct ipoib_path, while ipoib_neigh_free() removes neigh from the device's neigh entry lookup table. Both of these operations are protected by the priv->lock spinlock. The table however is also protected via RCU, and so naturally the lock is not held when doing reads. This leads to a race condition, in which a thread may successfully look up a neigh entry that has already been deleted from neigh->list. Since the previous deletion will have marked the entry with poison, a second list_del() on the object will cause a panic: #5 [ffff8802338c3c70] general_protection at ffffffff815108c5 [exception RIP: list_del+16] RIP: ffffffff81289020 RSP: ffff8802338c3d20 RFLAGS: 00010082 RAX: dead000000200200 RBX: ffff880433e60c88 RCX: 0000000000009e6c RDX: 0000000000000246 RSI: ffff8806012ca298 RDI: ffff880433e60c88 RBP: ffff8802338c3d30 R8: ffff8806012ca2e8 R9: 00000000ffffffff R10: 0000000000000001 R11: 0000000000000000 R12: ffff8804346b2020 R13: ffff88032a3e7540 R14: ffff8804346b26e0 R15: 0000000000000246 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #6 [ffff8802338c3d38] ipoib_cm_tx_handler at ffffffffa066fe0a [ib_ipoib] #7 [ffff8802338c3d98] cm_process_work at ffffffffa05149a7 [ib_cm] #8 [ffff8802338c3de8] cm_work_handler at ffffffffa05161aa [ib_cm] #9 [ffff8802338c3e38] worker_thread at ffffffff81090e10 #10 [ffff8802338c3ee8] kthread at ffffffff81096c66 #11 [ffff8802338c3f48] kernel_thread at ffffffff8100c0ca We move the list_del() into ipoib_neigh_free(), so that deletion happens only once, after the entry has been successfully removed from the lookup table. This same behavior is already used in ipoib_del_neighs_by_gid() and __ipoib_reap_neigh(). Signed-off-by: Jim Foraker <foraker1@llnl.gov> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Reviewed-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:57:37 -07:00
Steve Wise	09992579bc	RDMA/cxgb4: Issue RI.FINI before closing when entering TERM Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:49 -07:00
Steve Wise	a2de1499b3	RDMA/cxgb4: Advertise ~0ULL as max MR size Lustre uses a advertised max MR size of ~0ULL to indicate it should use a dma_mr. Hence advertise max MR size as ~0ULL. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:48 -07:00
Steve Wise	b298881fcf	RDMA/cxgb4: Always do GTS write if cidx_inc == CIDXINC_MASK When polling, we do a GTS update if the accumulated cidx_inc == the CQ depth / 16. However, if the CQ is large enough, Cq depth / 16 exceeds the size of the field in the GTS word. So we also need to update if cidx_inc hits CIDXINC_MASK to avoid overflowing the field. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:47 -07:00
Steve Wise	b38a0ad8ec	RDMA/cxgb4: Set arp error handler for PASS_ACCEPT_RPL messages accept_cr() failed to set the arp error handler on a reused skb. This results in a kernel crash if the arp does indeed time out. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:47 -07:00
Steve Wise	27ca34f54a	RDMA/cxgb4: Fix accounting for unsignaled SQ WRs to deal with wrap When determining how many WRs are completed with a signaled CQE, correctly deal with queue wraps. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:46 -07:00
Steve Wise	1cf24dcef4	RDMA/cxgb4: Fix QP flush logic This patch makes following fixes in QP flush logic: - correctly flushes unsignaled WRs followed by a signaled WR - supports for flushing a CQ bound to multiple QPs - resets cidx_flush if a active queue starts getting HW CQEs again - marks WQ in error when we leave RTS. This was only being done for user queues, but we need it for kernel queues too so that post_send/post_recv will start returning the appropriate error synchronously - eats unsignaled read resp CQEs. HW always inserts CQEs so we must silently discard them if the read work request was unsignaled. - handles QP flushes with pending SW CQEs. The flush and out of order completion logic has a bug where if out of order completions are flushed but not yet polled by the consumer and the qp is then flushed then we end up inserting duplicate completions. - c4iw_flush_sq() should only flush wrs that have not already been flushed. Since we already track where in the SQ we've flushed via sq.cidx_flush, just start at that point and flush any remaining. This bug only caused a problem in the presence of unsignaled work requests. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> [ Fixed sparse warning due to htonl/ntohl confusion. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:45 -07:00
Steve Wise	97d7ec0c41	RDMA/cxgb4: Handle newer firmware changes Move QP to TERMINATE instead to allow the peer to get the TERM message. This bug wasn't detectable until newer FW that moves connections out of RDMA mode as soon as an error is detected. QP can exit RTS before the last AE arrives. This was introduced by changes in the FW to kick connections out of RDMA mode as soon as an error is detected. A side effect of this is that the driver can move the QP out of RTS before the AE causing the connection to get kicked out of RDMA mode is processed. Fix for this is to always post async errors even if the QP is out of RTS. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:44 -07:00
Steve Wise	68074bb1ab	RDMA/cxgb4: Use correct bit shift macros for vlan filter tuples Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:43 -07:00
Vipul Pandya	830662f6f0	RDMA/cxgb4: Add support for active and passive open connection with IPv6 address Add new cpl messages, cpl_act_open_req6 and cpl_t5_act_open_req6, for initiating active open connections. Use LLD api cxgb4_create_server and cxgb4_create_server6 for initiating passive open connections. Similarly use cxgb4_remove_server to remove the passive open connections in place of listen_stop. Add support for iWARP over VLAN device and enable IPv6 support on VLAN device. Make use of import_ep in c4iw_reconnect. Signed-off-by: Vipul Pandya <vipul@chelsio.com> [ Fix build when IPv6 is disabled and make sure iw_cxgb4 is not built-in when ipv6 is a module. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:06 -07:00
Yishai Hadas	846be90d81	IB/core: Fixes to XRC reference counting in uverbs Added reference counting mechanism for XRC target QPs between ib_uqp_object and its ib_uxrcd_object. This prevents closing an XRC domain that is still attached to a QP. In addition, add missing code in ib_uverbs_destroy_srq() to handle ib_uxrcd_object reference counting correctly when destroying an xsrq. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:21:32 -07:00
Yishai Hadas	73c40c616a	IB/core: Add locking around event dispatching on XRC target QPs Fix a potential race when event occurrs on a target XRC QP and in the middle of reporting that on its shared qps, one of them is destroyed by user space application. Also add note for kernel consumers in ib_verbs.h that they must not destroy the QP from within the handler. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:21:32 -07:00
Yijing Wang	b29b076394	IB/qib: Clean up unnecessary MSI/MSI-X capability find PCI core will initialize device MSI/MSI-X capability in pci_msi_init_pci_dev(). So device drivers should use pci_dev->msi_cap/msix_cap to determine whether a device supports MSI/MSI-X instead of using pci_find_capability(pci_dev, PCI_CAP_ID_MSI/MSIX). Access to PCIe device config space again will consume more time. Signed-off-by: Yijing Wang <wangyijing@huawei.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:17:23 -07:00
Paul Bolle	bea25e82c6	IB/qib: Make qib_driver static struct pci_driver qib_driver is only used in qib_init.c. Remove it from qib.h and make it static in qib_init.c. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:14:51 -07:00
CQ Tang	4668e4b527	IB/qib: Improve SDMA performance 1. The code accepts chunks of messages, and splits the chunk into packets when converting packets into sdma queue entries. Adjacent packets will use user buffer pages smartly to avoid pinning the same page multiple times. 2. Instead of discarding all the work when SDMA queue is full, the work is saved in a pending queue. Whenever there are enough SDMA queue free entries, pending queue is directly put onto SDMA queue. 3. An interrupt handler is used to progress this pending queue. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: CQ Tang <cq.tang@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> [ Fixed up sparse warnings. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:14:34 -07:00
Steve Wise	24d44a391f	RDMA/cma: Add IPv6 support for iWARP Modify the type of local_addr and remote_addr fields in struct iw_cm_id from struct sockaddr_in to struct sockaddr_storage to hold IPv6 and IPv4 addresses uniformly. Change the references of local_addr and remote_addr in cxgb4, cxgb3, nes and amso drivers to match this. However to be able to actully run traffic over IPv6, low-level drivers have to add code to support this. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> [ Fix unused variable warnings when INFINIBAND_NES_DEBUG not set. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-12 12:32:31 -07:00
Naresh Gottumukkala	45e86b33ec	RDMA/ocrdma: Cache recv DB until QP moved to RTR 1) In post recv, don't ring the DB doorbell if the QP is in RTR state. Cache the DB calls, until the QP is moved to RTS state. 2) Add max_rd_sge support to dev->attr. 3) Code cleanup in alloc_pd path. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-12 11:00:51 -07:00
Naresh Gottumukkala	7b9b1a596e	RDMA/ocrdma: Remove __packed 1) Remove __packed for structures. 2) Align and pad all ABI structure to 64 bit boundaries instead of using __packed. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-12 10:59:44 -07:00
Naresh Gottumukkala	057729cb23	RDMA/ocrdma: Remove driver QP state machine Remove QP state machine in ocrdma low-level driver and use on the core IB stack's instead. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-12 10:58:38 -07:00
Naresh Gottumukkala	9c58726ba9	RDMA/ocrdma: Don't allow zero/invalid sgid usage Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-12 10:58:38 -07:00
Naresh Gottumukkala	1afc0454b6	RDMA/ocrdma: Remove redundant dev reference Remove redundant dev reference from structures: 1) ocrdma_cq. 2) ocrdma_ah. 3) ocrdma_hw_mr. 4) ocrdma_mw. 5) ocrdma_srq. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-12 10:58:38 -07:00
Naresh Gottumukkala	f99b1649db	RDMA/ocrdma: Style and redundant code cleanup Code cleanup and remove redundant code: 1) redundant initialization removed 2) braces changed as per CodingStyle. 3) redundant checks removed 4) extra braces in return statements removed. 5) removed unused pd pointer from mr. 6) reorganized get_dma_mr() 7) fixed set_av() to return error on invalid sgid index. 8) reference to ocrdma_dev removed from struct ocrdma_pd. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-12 10:58:37 -07:00
Sagi Grimberg	5587856c96	IB/iser: Introduce fast memory registration model (FRWR) Newer HCAs and Virtual functions may not support FMRs but rather a fast registration model, which we call FRWR - "Fast Registration Work Requests". This model was introduced in `00f7ec36c` ("RDMA/core: Add memory management extensions support") and works when the IB device supports the IB_DEVICE_MEM_MGT_EXTENSIONS capability. Upon creating the iser device iser will test whether the HCA supports FMRs. If no support for FMRs, check if IB_DEVICE_MEM_MGT_EXTENSIONS is supported and assign function pointers that handle fast registration and allocation of appropriate resources (fast_reg descriptors). Registration is done using posting IB_WR_FAST_REG_MR to the QP and invalidations using posting IB_WR_LOCAL_INV. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-09 17:18:10 -07:00
Sagi Grimberg	e657571b76	IB/iser: Place the fmr pool into a union in iser's IB conn struct This is preparation step for other memory registration methods to be added. In addition, change reg/unreg routines signature to indicate they use FMRs. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-09 17:18:09 -07:00
Sagi Grimberg	919fc274ef	IB/iser: Handle unaligned SG in separate function This routine will be shared with other rdma management schemes. The bounce buffer solution for unaligned SG-lists and the sg_to_page_vec routine are likely to be used for other registration schemes and not just FMR. Move them out of the FMR specific code, and call them from there. Later they will be called also from other reg methods code. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-09 17:18:09 -07:00
Sagi Grimberg	b4e155ffbb	IB/iser: Generalize rdma memory registration Currently the driver uses FMRs as the only means to register the memory pointed by SG provided by the SCSI mid-layer with the RDMA device. As preparation step for adding more methods for fast path memory registration, make the alloc/free and reg/unreg calls function pointers, which are for now just set to the existing FMR ones. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-09 17:18:09 -07:00
Shlomo Pongratz	b7f0451309	IB/iser: Accept session->cmds_max from user space Use cmds_max passed from user space to be the number of PDUs to be supported for the session instead of hard-coded ISCSI_DEF_XMIT_CMDS_MAX. This allow controlling the max number of SCSI commands for the session. Also don't ignore the qdepth passed from user space. Derive from session->cmds_max the actual number of RX buffers and FMR pool size to allocate during the connection bind phase. Since the iser transport connection is established before the iscsi session/connection are created and bound, we still use one hard-coded quantity ISER_DEF_XMIT_CMDS_MAX to compute the maximum number of work-requests to be supported by the RC QP used for the connection. The above quantity is made to be a power of two between ISCSI_TOTAL_CMDS_MIN (16) and ISER_DEF_XMIT_CMDS_MAX (512) inclusive. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-09 17:18:08 -07:00
Shlomo Pongratz	986db0d6c0	IB/iser: Restructure allocation/deallocation of connection resources This is a preparation step to a patch that accepts the number of max SCSI commands to be supported a session from user space iSCSI tools. Move the allocation of the login buffer, FMR pool and its associated page vector from iser_create_ib_conn_res() (which is called prior when we actually know how many commands should be supported) to iser_alloc_rx_descriptors() (which is called during the iscsi connection bind step where this quantity is known). Also do small refactoring around the deallocation to make that path similar to the allocation one. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-09 17:18:08 -07:00
Or Gerlitz	f91424cf5b	IB/iser: Use proper debug level value for info prints Commit `4f36388261` ("IB/iser: Move informational messages from error to info level") set info prints to be emitted at a lower debug level than warning prints, which is a bit odd. Fix that. Also move the prints on unaligned SG from warning to debug level. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-09 17:18:08 -07:00
Roland Dreier	569935db80	Merge branches 'cma', 'cxgb3', 'cxgb4', 'ipoib', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma' and 'qib' into for-next	2013-07-31 14:24:06 -07:00
Erez Shitrit	c290414169	IPoIB: Fix pkey change flow for virtualization environments IPoIB's required behaviour w.r.t to the pkey used by the device is the following: - For "parent" interfaces (e.g ib0, ib1, etc) who are created automatically as a result of hot-plug events from the IB core, the driver needs to take whatever pkey vlaue it finds in index 0, and stick to that index. - For child interfaces (e.g ib0.8001, etc) created by admin directive, the driver needs to use and stick to the value provided during its creation. In SR-IOV environment its possible for the VF probe to take place before the cloud management software provisions the suitable pkey for the VF in the paravirtualed PKEY table index 0. When this is the case, the VF IB stack will find in index 0 an invalide pkey, which is all zeros. Moreover, the cloud managment can assign the pkey value at index 0 at any time of the guest life cycle. The correct behavior for IPoIB to address these requirements for parent interfaces is to use PKEY_CHANGE event as trigger to optionally re-init the device pkey value and re-create all the relevant resources accordingly, if the value of the pkey in index 0 has changed (from invalid to valid or from valid value X to invalid value Y). This patch enhances the heavy flushing code which is triggered by pkey change event, to behave correctly for parent devices. For child devices, the code remains the same, namely chases pkey value and not index. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-31 14:23:44 -07:00
Or Gerlitz	3d790a4c26	IPoIB: Make sure child devices use valid/proper pkeys Make sure that the IB invalid pkey (0x0000 or 0x8000) isn't used for child devices. Also, make sure to always set the full membership bit for the pkey of devices created by rtnl link ops. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-31 14:23:40 -07:00
Jack Morgenstein	ef5ed4166f	IB/core: Create QP1 using the pkey index which contains the default pkey Currently, QP1 is created using pkey index 0. This patch simply looks for the index containing the default pkey, rather than hard-coding pkey index 0. This change will have no effect in native mode, since QP0 and QP1 are created before the SM configures the port, so pkey table will still be the default table defined by the IB Spec, in C10-123: "If non-volatile storage is not used to hold P_Key Table contents, then if a PM (Partition Manager) is not present, and prior to PM initialization of the P_Key Table, the P_Key Table must act as if it contains a single valid entry, at P_Key_ix = 0, containing the default partition key. All other entries in the P_Key Table must be invalid." Thus, in the native mode case, the driver will find the default pkey at index 0 (so it will be no different than the hard-coding). However, in SR-IOV mode, for VFs, the pkey table may be paravirtualized, so that the VF's pkey index zero may not necessarily be mapped to the real pkey index 0. For VFs, therefore, it is important to find the virtual index which maps to the real default pkey. This commit does the following for QP1 creation: 1. Find the pkey index containing the default pkey, and use that index if found. ib_find_pkey() returns the index of the limited-membership default pkey (0x7FFF) if the full-member default pkey is not in the table. 2. If neither form of the default pkey is found, use pkey index 0 (previous behavior). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-31 14:15:17 -07:00
Andi Shyti	618af3846b	mlx5_core: Variable may be used uninitialized In the sq_overhead() function, if qp_typ is equal to IB_QPT_RC, size will be used uninitialized. Signed-off-by: Andi Shyti <andi@etezian.org> Acked-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-31 14:12:33 -07:00
Dan Carpenter	92b0ca7cb1	IB/mlx5: Fix stack info leak in mlx5_ib_alloc_ucontext() We don't set "resp.reserved". Since it's at the end of the struct that means we don't have to copy it to the user. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-31 14:12:08 -07:00
Wei Yongjun	281d1a9211	IB/mlx5: Fix error return code in init_one() Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-31 14:12:07 -07:00
Jack Morgenstein	3eac103f83	IB/mlx4: Use default pkey when creating tunnel QPs When creating tunnel QPs for special QP tunneling, look for the default pkey in the slave's virtual pkey table. If it is present, use the real pkey index where the default pkey is located. If the default pkey is not found in the pkey table, use the real pkey index which is stored at index 0 in the slave's virtual pkey table (this is the current behavior). This change is required to support cloud computing, where the paravirtualized index of the default pkey is moved to index 1 or higher. The pkey at paravirtualized index 0 is used for the default IPoIB interface created by the VF. Its possible for the pkey value at paravirtualized index 0 to be invalid (zero) at VF probe time (pkey index 0 is mapped to real pkey index 127, which contains pkey = 0). At some point after the VF probe, the cloud computing interface at the hypervisor maps virtual index 0 for the VF to the pkey index containing the pkey that IPoIB will use in its operation. However, when the tunnel QP is created, the pkey at the slave's virtual index 0 is still mapped to the invalid pkey index, so tunnel QP creation fails. This commit causes the hypervisor to search for the default pkey in the slave's pkey table -- and this pkey is present in the table (at index > 0) at tunnel QP creation time, so that the tunnel QP creation will succeed. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-31 12:22:12 -07:00
Sean Hefty	5eb695c177	RDMA/cma: Only call cma_save_ib_info() for CM REQs Calling cma_save_ib_info() for CM SIDR REQs results in a crash accessing an invalid path record pointer. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-31 00:50:44 -07:00
Sean Hefty	e511d1ae16	RDMA/cma: Fix accessing invalid private data for UD If a application is using AF_IB with a UD QP, but does not provide any private data, we will end up accessing invalid memory. Check for this case and handle it appropriately. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-31 00:50:40 -07:00
Paul Bolle	8fb488d740	RDMA/cma: Fix gcc warning Building cma.o triggers this gcc warning: drivers/infiniband/core/cma.c: In function ‘rdma_resolve_addr’: drivers/infiniband/core/cma.c:465:23: warning: ‘port’ may be used uninitialized in this function [-Wmaybe-uninitialized] drivers/infiniband/core/cma.c:426:5: note: ‘port’ was declared here This is a false positive, as "port" will always be initialized if we're at "found". But if we assign to "id_priv->id.port_num" directly, we can drop "port". That will, obviously, silence gcc. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-30 16:11:22 -07:00
Roland Dreier	3c93f039d2	Revert "RDMA/nes: Fix compilation error when nes_debug is enabled" This reverts commit `bca1935ccd`, which removes variables nes_tcp_state_str and nes_iwarp_state_str, assuming that they aren't defined. However, they are defined within a #ifdef NES_DEBUG statement, which if enabled causes "defined but not used" compiler warning, when the variables are removed. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-30 15:48:35 -07:00
Mike Marciniszyn	b268e4db3d	IB/qib: Add err_decode() call for ring dump Commit `0b3ddf380c` ("Log all SDMA errors unconditionally") missed part of the patch. This also corrects a format warning when dma_addr_t is 32 bits on a 64 bit system. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-30 10:13:00 -07:00
Dan Carpenter	246fcdbc9d	RDMA/cxgb3: Fix stack info leak in iwch_create_cq() The "uresp.reserved" field isn't initialized on this path so it could leak uninitialized stack information to the user. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-30 10:11:33 -07:00
Dan Carpenter	604296303f	RDMA/nes: Fix info leaks in nes_create_qp() and nes_create_cq() We pass a few bytes of uninitialized stack memory to the user here. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-30 10:10:35 -07:00
Dan Carpenter	63ea374957	RDMA/ocrdma: Fix several stack info leaks A grab bag of places which don't properly initialize stack data. I removed one place which cleared ".rsvd" because it's not needed now that I have added a memset() earlier in the function. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-30 10:09:16 -07:00
Dan Carpenter	ae1fe07f3f	RDMA/cxgb4: Fix stack info leak in c4iw_create_qp() "uresp.ma_sync_key" doesn't get set on this path so we leak 8 bytes of data. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-30 10:07:56 -07:00
Roland Dreier	3606b99971	RDMA/ocrdma: Remove unused include I'd like to remove rdma/ib_cache.h some day, so let's avoid proliferating uses of it unnecessarily. Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-26 10:03:04 -07:00
Rusty Russell	8c6ffba0ed	PTR_RET is now PTR_ERR_OR_ZERO(): Replace most. Sweep of the simple cases. Cc: netdev@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-arm-kernel@lists.infradead.org Cc: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-07-15 11:25:01 +09:30
Linus Torvalds	c552441373	Main batch of InfiniBand/RDMA changes for 3.11 merge window: - AF_IB (native IB addressing) for CMA from Sean Hefty - New mlx5 driver for Mellanox Connect-IB adapters (including post merge request fixes) - SRP fixes from Bart Van Assche (including fix to first merge request) - qib HW driver updates - Resurrection of ocrdma HW driver development - uverbs conversion to create fds with O_CLOEXEC set - Other small changes and fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABCAAGBQJR30TKAAoJEENa44ZhAt0h854P/jvAhK5u+XTM5VyjAi0DKJ7P bWcsu+KxbOIFnjEdsYQl1mGP44gdO8GPZp7+JR5nDHDRpw9K76qy6QQiPbaF6Y8D cZH8Xlq4hzBfElTWBkExEemPrVUUq77j03FE9TBatdLAtEyYkgrNyqr7Ys6zVwVK ugR8nAahvnB7Jh1tsyZBBd9kfbWtXJnaGC8/Zk3Na4n4zXRAbr0DcnRF0sncTL38 VFnWbi33OQAxu5bsb2jGec/SNP3BbNwspFPjSCKqiiItRaCj13JiHhrKKvVk4RZe hIRnPH47kjLRp2/PwBo6o+gTXZuRg48VGBx4CKUTwx1nCzPPN1iz9ZOfqUv9Qwcv LX8mxC7QS/Yvud4KeEBsj6kotb80EkRF2KV5RkIKCxQiwetGD9127bZylC8ttxGw 2f6MzYtAGD4R4C10lO8N+59VugSg1xAvwsqz0a/jy2XyVHbI1ugQedzkB20x5WPY 51S08ABvtU9yIxIYrw2VEaa/5WN+XJ6+LpG9QBAGXdMLiCiiAe7n/YzyXI6AgwaW Jl/uKr6H6/jEHUHKwkyqsmbpVGPhtGWu8deyr1oYvOEP4i48gcDqMQsfMcCISrQV MeQU3hS/obykUlNeqjmMI2CXrecqSsiq0hXd4DLaSoZ2Rb4Drx2Wj6sTQLIAgL2q GBYjHWMUpZXIFHQaH7am =nZh8 -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA changes from Roland Dreier: - AF_IB (native IB addressing) for CMA from Sean Hefty - new mlx5 driver for Mellanox Connect-IB adapters (including post merge request fixes) - SRP fixes from Bart Van Assche (including fix to first merge request) - qib HW driver updates - resurrection of ocrdma HW driver development - uverbs conversion to create fds with O_CLOEXEC set - other small changes and fixes * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits) mlx5: Return -EFAULT instead of -EPERM IB/qib: Log all SDMA errors unconditionally IB/qib: Fix module-level leak mlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB spec IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd() mlx5_core: Fixes for sparse warnings IB/mlx5: Make profile[] static in main.c mlx5: Fix parameter type of health_handler_t mlx5: Add driver for Mellanox Connect-IB adapters IB/core: Add reserved values to enums for low-level driver use IB/srp: Bump driver version and release date IB/srp: Make HCA completion vector configurable IB/srp: Maintain a single connection per I_T nexus IB/srp: Fail I/O fast if target offline IB/srp: Skip host settle delay IB/srp: Avoid skipping srp_reset_host() after a transport error IB/srp: Fix remove_one crash due to resource exhaustion IB/qib: New transmitter tunning settings for Dell 1.1 backplane IB/core: Fix error return code in add_port() ...	2013-07-13 12:57:21 -07:00
Roland Dreier	e04abfa243	Merge branches 'mlx5', 'qib' and 'srp' into for-next	2013-07-11 16:49:30 -07:00
Dan Carpenter	5e631a03af	mlx5: Return -EFAULT instead of -EPERM For copy_to/from_user() failure, the correct error code is -EFAULT not -EPERM. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-11 16:48:45 -07:00
Dean Luick	0b3ddf380c	IB/qib: Log all SDMA errors unconditionally This patch adds code to log SDMA errors for supportability purposes. Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-11 16:47:06 -07:00
Mike Marciniszyn	308c813b19	IB/qib: Fix module-level leak The vzalloc()'ed field physshadow is leaked on module unload. This patch adds vfree after the sibling page shadow is freed. Reported-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-11 16:46:44 -07:00
Bart Van Assche	80d5e8a235	IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline If the transport layer is offline it is more appropriate to let srp_abort() return FAST_IO_FAIL instead of SUCCESS. Reported-by: Sebastian Riemer <sebastian.riemer@profitbricks.com> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-11 16:43:48 -07:00
Linus Torvalds	6d2fa9e141	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "Lots of activity this round on performance improvements in target-core while benchmarking the prototype scsi-mq initiator code with vhost-scsi fabric ports, along with a number of iscsi/iser-target improvements and hardening fixes for exception path cases post v3.10 merge. The highlights include: - Make persistent reservations APTPL buffer allocated on-demand, and drop per t10_reservation buffer. (grover) - Make virtual LUN=0 a NULLIO device, and skip allocation of NULLIO device pages (grover) - Add transport_cmd_check_stop write_pending bit to avoid extra access of ->t_state_lock is WRITE I/O submission fast-path. (nab) - Drop unnecessary CMD_T_DEV_ACTIVE check from transport_lun_remove_cmd to avoid extra access of ->t_state_lock in release fast-path. (nab) - Avoid extra t_state_lock access in __target_execute_cmd fast-path (nab) - Drop unnecessary vhost-scsi wait_for_tasks=true usage + ->t_state_lock access in release fast-path. (nab) - Convert vhost-scsi to use modern se_cmd->cmd_kref TARGET_SCF_ACK_KREF usage (nab) - Add tracepoints for SCSI commands being processed (roland) - Refactoring of iscsi-target handling of ISCSI_OP_NOOP + ISCSI_OP_TEXT to be transport independent (nab) - Add iscsi-target SendTargets=$IQN support for in-band discovery (nab) - Add iser-target support for in-band discovery (nab + Or) - Add iscsi-target demo-mode TPG authentication context support (nab) - Fix isert_put_reject payload buffer post (nab) - Fix iscsit_add_reject* usage for iser (nab) - Fix iscsit_sequence_cmd reject handling for iser (nab) - Fix ISCSI_OP_SCSI_TMFUNC handling for iser (nab) - Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED (nab) The last five iscsi/iser-target items are CC'ed to stable, as they do address issues present in v3.10 code. They are certainly larger than I'd like for stable patch set, but are important to ensure proper REJECT exception handling in iser-target for 3.10.y" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (51 commits) iser-target: Ignore non TEXT + LOGOUT opcodes for discovery target: make queue_tm_rsp() return void target: remove unused codes from enum tcm_tmrsp_table iscsi-target: kstrtou* configfs attribute parameter cleanups iscsi-target: Fix tfc_tpg_auth_cit configfs length overflow iscsi-target: Fix tfc_tpg_nacl_auth_cit configfs length overflow iser-target: Add support for ISCSI_OP_TEXT opcode + payload handling iser-target: Rename sense_buf_[dma,len] to pdu_[dma,len] iser-target: Add vendor_err debug output target: Add (obsolete) checking for PMI/LBA fields in READ CAPACITY(10) target: Return correct sense data for IO past the end of a device target: Add tracepoints for SCSI commands being processed iser-target: Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED iscsi-target: Fix ISCSI_OP_SCSI_TMFUNC handling for iser iscsi-target: Fix iscsit_sequence_cmd reject handling for iser iscsi-target: Fix iscsit_add_reject* usage for iser iser-target: Fix isert_put_reject payload buffer post iscsi-target: missing kfree() on error path iscsi-target: Drop left-over iscsi_conn->bad_hdr target: Make core_scsi3_update_and_write_aptpl return sense_reason_t ...	2013-07-11 12:57:19 -07:00
Linus Torvalds	496322bc91	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: "This is a re-do of the net-next pull request for the current merge window. The only difference from the one I made the other day is that this has Eliezer's interface renames and the timeout handling changes made based upon your feedback, as well as a few bug fixes that have trickeled in. Highlights: 1) Low latency device polling, eliminating the cost of interrupt handling and context switches. Allows direct polling of a network device from socket operations, such as recvmsg() and poll(). Currently ixgbe, mlx4, and bnx2x support this feature. Full high level description, performance numbers, and design in commit `0a4db187a9` ("Merge branch 'll_poll'") From Eliezer Tamir. 2) With the routing cache removed, ip_check_mc_rcu() gets exercised more than ever before in the case where we have lots of multicast addresses. Use a hash table instead of a simple linked list, from Eric Dumazet. 3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski, Marek Puzyniak, Michal Kazior, and Sujith Manoharan. 4) Support reporting the TUN device persist flag to userspace, from Pavel Emelyanov. 5) Allow controlling network device VF link state using netlink, from Rony Efraim. 6) Support GRE tunneling in openvswitch, from Pravin B Shelar. 7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from Daniel Borkmann and Eric Dumazet. 8) Allow controlling of TCP quickack behavior on a per-route basis, from Cong Wang. 9) Several bug fixes and improvements to vxlan from Stephen Hemminger, Pravin B Shelar, and Mike Rapoport. In particular, support receiving on multiple UDP ports. 10) Major cleanups, particular in the area of debugging and cookie lifetime handline, to the SCTP protocol code. From Daniel Borkmann. 11) Allow packets to cross network namespaces when traversing tunnel devices. From Nicolas Dichtel. 12) Allow monitoring netlink traffic via AF_PACKET sockets, in a manner akin to how we monitor real network traffic via ptype_all. From Daniel Borkmann. 13) Several bug fixes and improvements for the new alx device driver, from Johannes Berg. 14) Fix scalability issues in the netem packet scheduler's time queue, by using an rbtree. From Eric Dumazet. 15) Several bug fixes in TCP loss recovery handling, from Yuchung Cheng. 16) Add support for GSO segmentation of MPLS packets, from Simon Horman. 17) Make network notifiers have a real data type for the opaque pointer that's passed into them. Use this to properly handle network device flag changes in arp_netdev_event(). From Jiri Pirko and Timo Teräs. 18) Convert several drivers over to module_pci_driver(), from Peter Huewe. 19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a O(1) calculation instead. From Eric Dumazet. 20) Support setting of explicit tunnel peer addresses in ipv6, just like ipv4. From Nicolas Dichtel. 21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet. 22) Prevent a single high rate flow from overruning an individual cpu during RX packet processing via selective flow shedding. From Willem de Bruijn. 23) Don't use spinlocks in TCP md5 signing fast paths, from Eric Dumazet. 24) Don't just drop GSO packets which are above the TBF scheduler's burst limit, chop them up so they are in-bounds instead. Also from Eric Dumazet. 25) VLAN offloads are missed when configured on top of a bridge, fix from Vlad Yasevich. 26) Support IPV6 in ping sockets. From Lorenzo Colitti. 27) Receive flow steering targets should be updated at poll() time too, from David Majnemer. 28) Fix several corner case regressions in PMTU/redirect handling due to the routing cache removal, from Timo Teräs. 29) We have to be mindful of ipv4 mapped ipv6 sockets in upd_v6_push_pending_frames(). From Hannes Frederic Sowa. 30) Fix L2TP sequence number handling bugs, from James Chapman." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits) drivers/net: caif: fix wrong rtnl_is_locked() usage drivers/net: enic: release rtnl_lock on error-path vhost-net: fix use-after-free in vhost_net_flush net: mv643xx_eth: do not use port number as platform device id net: sctp: confirm route during forward progress virtio_net: fix race in RX VQ processing virtio: support unlocked queue poll net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit Documentation: Fix references to defunct linux-net@vger.kernel.org net/fs: change busy poll time accounting net: rename low latency sockets functions to busy poll bridge: fix some kernel warning in multicast timer sfc: Fix memory leak when discarding scattered packets sit: fix tunnel update via netlink dt:net:stmmac: Add dt specific phy reset callback support. dt:net:stmmac: Add support to dwmac version 3.610 and 3.710 dt:net:stmmac: Allocate platform data only if its NULL. net:stmmac: fix memleak in the open method ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available net: ipv6: fix wrong ping_v6_sendmsg return value ...	2013-07-09 18:24:39 -07:00
Roland Dreier	0eba551148	Merge branches 'af_ib', 'cxgb4', 'misc', 'mlx5', 'ocrdma', 'qib' and 'srp' into for-next	2013-07-08 11:22:11 -07:00
Roland Dreier	da183c7af8	IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd() The macro get_unused_fd() is used to allocate a file descriptor with default flags. Those default flags (0) can be "unsafe": O_CLOEXEC must be used by default to not leak file descriptor across exec(). Replace calls to get_unused_fd() in uverbs with calls to get_unused_fd_flags(O_CLOEXEC). Inheriting uverbs fds across exec() cannot be used to do anything useful. Based on a patch/suggestion from Yann Droneaud <ydroneaud@opteya.com>. Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-08 11:15:45 -07:00
Roland Dreier	ad32b95f82	IB/mlx5: Make profile[] static in main.c Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-08 10:32:32 -07:00
Eli Cohen	e126ba97db	mlx5: Add driver for Mellanox Connect-IB adapters The driver is comprised of two kernel modules: mlx5_ib and mlx5_core. This partitioning resembles what we have for mlx4, except that mlx5_ib is the pci device driver and not mlx5_core. mlx5_core is essentially a library that provides general functionality that is intended to be used by other Mellanox devices that will be introduced in the future. mlx5_ib has a similar role as any hardware device under drivers/infiniband/hw. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> [ Merge in coccinelle fixes from Fengguang Wu <fengguang.wu@intel.com>. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-08 10:32:24 -07:00
Nicholas Bellinger	ca40d24eb8	iser-target: Ignore non TEXT + LOGOUT opcodes for discovery This patch adds a check in isert_rx_opcode() to ignore non TEXT + LOGOUT opcodes when SessionType=Discovery has been negotiated. Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-07-07 18:37:02 -07:00
Joern Engel	b79fafac70	target: make queue_tm_rsp() return void The return value wasn't checked by any of the callers. Assuming this is correct behaviour, we can simplify some code by not bothering to generate it. nab: Add srpt_queue_data_in() + srpt_queue_tm_rsp() nops around srpt_queue_response() void return Signed-off-by: Joern Engel <joern@logfs.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-07-07 18:36:53 -07:00
Nicholas Bellinger	adb54c2931	iser-target: Add support for ISCSI_OP_TEXT opcode + payload handling This patch adds isert_handle_text_cmd() to handle incoming ISCSI_OP_TEXT PDU processing, along with isert_put_text_rsp() for posting ISCSI_OP_TEXT_RSP ib_send_wr response. It copies ISCSI_OP_TEXT payload using unsolicited payload at &iser_rx_desc->data[0] into iscsi_cmd->text_in_ptr for usage with outgoing isert_put_text_rsp() -> iscsit_build_text_rsp() v2 changes: - Let iscsit_build_text_rsp() determine any extra padding Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-07-07 18:36:48 -07:00
Nicholas Bellinger	dbbc5d1107	iser-target: Rename sense_buf_[dma,len] to pdu_[dma,len] Now that these two variables are used for REJECT payloads as well as SCSI response sense payloads, rename them to something that makes more sense. Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-07-07 18:36:47 -07:00
Nicholas Bellinger	c5a2adbfcb	iser-target: Add vendor_err debug output Add output for ib_wc.vendor_err in isert_cq_[t,r]x_work(), which is useful for debugging future issues. Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-07-07 18:36:46 -07:00
Nicholas Bellinger	b2cb96494d	iser-target: Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED This patch addresses a bug where RDMA_CM_EVENT_DISCONNECTED may occur before the connection shutdown has been completed by rx/tx threads, that causes isert_free_conn() to wait indefinately on ->conn_wait. This patch allows isert_disconnect_work code to invoke rdma_disconnect when isert_disconnect_work() process context is started by client session reset before isert_free_conn() code has been reached. It also adds isert_conn->conn_mutex protection for ->state within isert_disconnect_work(), isert_cq_comp_err() and isert_free_conn() code, along with isert_check_state() for wait_event usage. (v2: Add explicit iscsit_cause_connection_reinstatement call during isert_disconnect_work() to force conn reset) Cc: stable@vger.kernel.org # 3.10+ Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-07-07 18:35:56 -07:00
Nicholas Bellinger	186a964701	iscsi-target: Fix ISCSI_OP_SCSI_TMFUNC handling for iser This patch adds target_get_sess_cmd reference counting for iscsit_handle_task_mgt_cmd(), and adds a target_put_sess_cmd() for the failure case. It also fixes a bug where ISCSI_OP_SCSI_TMFUNC type commands where leaking iscsi_cmd->i_conn_node and eventually triggering an OOPs during struct isert_conn shutdown. Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-07-06 22:01:23 -07:00
Nicholas Bellinger	561bf15892	iscsi-target: Fix iscsit_sequence_cmd reject handling for iser This patch moves ISCSI_OP_REJECT failures into iscsit_sequence_cmd() in order to avoid external iscsit_reject_cmd() reject usage for all PDU types. It also updates PDU specific handlers for traditional iscsi-target code to not reset the session after posting a ISCSI_OP_REJECT during setup. (v2: Fix CMDSN_LOWER_THAN_EXP for ISCSI_OP_SCSI to call target_put_sess_cmd() after iscsit_sequence_cmd() failure) Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-07-06 21:59:54 -07:00
Nicholas Bellinger	ba15991408	iscsi-target: Fix iscsit_add_reject* usage for iser This patch changes iscsit_add_reject() + iscsit_add_reject_from_cmd() usage to not sleep on iscsi_cmd->reject_comp to address a free-after-use usage bug in v3.10 with iser-target code. It saves ->reject_reason for use within iscsit_build_reject() so the correct value for both transport cases. It also drops the legacy fail_conn parameter usage throughput iscsi-target code and adds two iscsit_add_reject_cmd() and iscsit_reject_cmd helper functions, along with various small cleanups. (v2: Re-enable target_put_sess_cmd() to be called from iscsit_add_reject_from_cmd() for rejects invoked after target_get_sess_cmd() has been called) Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-07-06 21:59:53 -07:00
Nicholas Bellinger	3df8f68aaf	iser-target: Fix isert_put_reject payload buffer post This patch adds the missing isert_put_reject() logic to post a outgoing payload buffer to hold the 48 bytes of original PDU header request payload for the rejected cmd. It also fixes ISTATE_SEND_REJECT handling in isert_response_completion() -> isert_do_control_comp() code, and drops incorrect iscsi_cmd_t->reject_comp usage. Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-07-06 21:59:35 -07:00
Linus Torvalds	74b9272bbe	Device tree updates for v3.11 This branch contains the following changes: - Removal of CONFIG_OF_DEVICE, it is always enabled by CONFIG_OF - Remove #ifdef from linux/of_platform.h to increase compiler syntax coverage - Bug fix for address decoding on Bimini and js2x powerpc platforms. - miscellaneous binding changes One note on the above. The binding changes going in from all kinds of different trees has gotten rather out of hand. I picked up some during this cycle, but even going though my tree isn't a great fit. Ian Campbell has prototyped splitting the bindings and .dtb files into a separate repository. The plan is to migrate to using that sometime in the next few kernel releases which should get rid of a lot of the churn on binding docs and .dts files. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJR1fP3AAoJEEFnBt12D9kB3IIP/0Q5ctMespiJ50+ThjGsaR3m sUbQkMK46uL/oupXaJT2ybX2PxLN5LpgvO9rPt77hblOoL0+wZt+j9G0pLy1qZQZ aHprH9SrpGJv6F0SFbHp/+D/m9vESPv+zwYzL9TvrOALvCD7OSZ7tHLaoF7Y1ADM QnZa7pta3Owpu5NsGXaTXLpaZzfXzfWzf4PDzv2FsAIDbtuVJZGJZ7sJVO7Z0r+K KCY85uKJ4VOHY0onBVlM6uoCnopOi2XMMkyxYvR28lL2Kiv2b3np46jG3zX1EZH5 Qxdu85QZn2oio9iaTeYKK8bG9aRIRsXnzCnF2s68n2rQlEtPpWKN9Lj2AS/KJ+Ig obFTOFDHmxt1F4GIA0/HIPkDvRd7GTIwgwYYubEMi44E3Mae0N+xzkIRE41vYP7s 8zaNHbjAjsYjplsvN5gTPxxiU/ta24a5bl7Ont2zmOjAbXCsDajm4NCKZRJ3lb2f FHNsS1zHGmqgJ9zt13GQabo/Tp4t3KwTzBirPQsDokRO4eoL6klcS3GCRv82VWC0 dLnzu92hXcyXgh7mX2sj6sRBSwNygxMn4ZsZJklle38/LynvtrzT72BOZjghS+Vh l553uDInjSJ3IBrXnClPoyObcu50cmsBBgsK39FzU+MF9mcCHmkHQiT52zM6ZW3M wwY1OfcZk3XaT7akcVu6 =CndB -----END PGP SIGNATURE----- Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux Pull device tree updates from Grant Likely: "This branch contains the following changes: - Removal of CONFIG_OF_DEVICE, it is always enabled by CONFIG_OF - Remove #ifdef from linux/of_platform.h to increase compiler syntax coverage - Bug fix for address decoding on Bimini and js2x powerpc platforms. - miscellaneous binding changes One note on the above. The binding changes going in from all kinds of different trees has gotten rather out of hand. I picked up some during this cycle, but even going though my tree isn't a great fit. Ian Campbell has prototyped splitting the bindings and .dtb files into a separate repository. The plan is to migrate to using that sometime in the next few kernel releases which should get rid of a lot of the churn on binding docs and .dts files" * tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux: of: Fix address decoding on Bimini and js2x machines of: remove CONFIG_OF_DEVICE usb: chipidea: depend on CONFIG_OF instead of CONFIG_OF_DEVICE of: remove of_platform_driver ibmebus: convert of_platform_driver to platform_driver driver core: move to_platform_driver to platform_device.h mfd: DT bindings for the palmas family MFD ARM: dts: omap3-devkit8000: fix NAND memory binding of/base: fix typos of: remove #ifdef from linux/of_platform.h	2013-07-04 15:51:45 -07:00
Linus Torvalds	80cc38b163	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "The usual stuff from trivial tree" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits) treewide: relase -> release Documentation/cgroups/memory.txt: fix stat file documentation sysctl/net.txt: delete reference to obsolete 2.4.x kernel spinlock_api_smp.h: fix preprocessor comments treewide: Fix typo in printk doc: device tree: clarify stuff in usage-model.txt. open firmware: "/aliasas" -> "/aliases" md: bcache: Fixed a typo with the word 'arithmetic' irq/generic-chip: fix a few kernel-doc entries frv: Convert use of typedef ctl_table to struct ctl_table sgi: xpc: Convert use of typedef ctl_table to struct ctl_table doc: clk: Fix incorrect wording Documentation/arm/IXP4xx fix a typo Documentation/networking/ieee802154 fix a typo Documentation/DocBook/media/v4l fix a typo Documentation/video4linux/si476x.txt fix a typo Documentation/virtual/kvm/api.txt fix a typo Documentation/early-userspace/README fix a typo Documentation/video4linux/soc-camera.txt fix a typo lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment ...	2013-07-04 11:40:58 -07:00
Kees Cook	02aa2a3763	drivers: avoid format string in dev_set_name Calling dev_set_name with a single paramter causes it to be handled as a format string. Many callers are passing potentially dynamic string content, so use "%s" in those cases to avoid any potential accidents, including wrappers like device_create*() and bdi_register(). Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-07-03 16:07:41 -07:00
Vu Pham	e8ca413558	IB/srp: Bump driver version and release date Signed-off-by: Vu Pham <vu@mellanox.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-01 10:42:02 -07:00
Bart Van Assche	4b5e5f41c8	IB/srp: Make HCA completion vector configurable Several InfiniBand HCAs allow configuring the completion vector per CQ. This allows spreading the workload created by IB completion interrupts over multiple MSI-X vectors and hence over multiple CPU cores. In other words, configuring the completion vector properly not only allows reducing latency on an initiator connected to multiple SRP targets but also allows improving throughput. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-01 10:40:55 -07:00
Bart Van Assche	96fc248a4c	IB/srp: Maintain a single connection per I_T nexus An SRP target is required to maintain a single connection between initiator and target. This means that if the 'add_target' attribute is used to create a second connection to a target, the first connection will be logged out and that the SCSI error handler will kick in. The SCSI error handler will cause the SRP initiator to reconnect, which will cause I/O over the second connection to fail. Avoid such ping-pong behavior by disabling relogins. If reconnecting manually is necessary, that is possible by deleting and recreating an rport via sysfs. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Sebastian Riemer <sebastian.riemer@profitbricks.com> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-01 10:40:07 -07:00
Bart Van Assche	99e1c1398f	IB/srp: Fail I/O fast if target offline If reconnecting failed we know that no command completion will be received anymore. Hence let the SCSI error handler fail such commands immediately. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-01 10:37:13 -07:00
Bart Van Assche	2742c1dadd	IB/srp: Skip host settle delay The SRP initiator implements host reset by reconnecting to the SRP target. That means that communication with the target is possible as soon as host reset finished. Hence skip the host settle delay. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sebastian Riemer <sebastian.riemer@profitbricks.com> Reviewed-by: Christoph Hellwig <hch@infradead.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-27 16:44:39 -07:00
Bart Van Assche	086f44f588	IB/srp: Avoid skipping srp_reset_host() after a transport error The SCSI error handler assumes that the transport layer is operational if an eh_abort_handler() returns SUCCESS. Hence srp_abort() only should return SUCCESS if sending the ABORT TASK task management function succeeded. This patch avoids the SCSI error handler skipping the srp_reset_host() call after a transport layer error. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-27 16:44:39 -07:00
Dotan Barak	1fe0cb8488	IB/srp: Fix remove_one crash due to resource exhaustion If the add_one callback fails during driver load no resources are allocated so there isn't a need to release any resources. Trying to clean the resource may lead to the following kernel panic: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffffa0132331>] srp_remove_one+0x31/0x240 [ib_srp] RIP: 0010:[<ffffffffa0132331>] [<ffffffffa0132331>] srp_remove_one+0x31/0x240 [ib_srp] Process rmmod (pid: 4562, threadinfo ffff8800dd738000, task ffff8801167e60c0) Call Trace: [<ffffffffa024500e>] ib_unregister_client+0x4e/0x120 [ib_core] [<ffffffffa01361bd>] srp_cleanup_module+0x15/0x71 [ib_srp] [<ffffffff810ac6a4>] sys_delete_module+0x194/0x260 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Sebastian Riemer <sebastian.riemer@profitbricks.com> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-27 16:44:38 -07:00
Mitko Haralanov	22baa407f9	IB/qib: New transmitter tunning settings for Dell 1.1 backplane The Dell blade chassis got an updated backplane which requires new transmitter tuning settings. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-26 09:27:50 -07:00
Nicholas Bellinger	778de36896	iscsi/isert-target: Refactor ISCSI_OP_NOOP RX handling This patch refactors ISCSI_OP_NOOP handling within iscsi-target in order to handle iscsi_nopout payloads in a transport specific manner. This includes splitting existing iscsit_handle_nop_out() into iscsit_setup_nop_out() and iscsit_process_nop_out() calls, and makes iscsit_handle_nop_out() be only used internally by traditional iscsi socket calls. Next update iser-target code to use new callers and add FIXME for the handling iscsi_nopout payloads. Also fix reject response handling in iscsit_setup_nop_out() to use proper iscsit_add_reject_from_cmd(). v2: Fix uninitialized iscsit_handle_nop_out() payload_length usage (Fengguang) v3: Remove left-over dead code in iscsit_setup_nop_out() (DanC) Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-06-24 22:35:51 -07:00
Wei Yongjun	80b15043e3	IB/core: Fix error return code in add_port() Fix to return -ENOMEM in the add_port() error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-24 13:52:22 -07:00
Wei Yongjun	c94e15c5cb	RDMA/ocrdma: Fix error return code in ocrdma_set_create_qp_rq_cmd() Fix to return -ENOMEM in the alloc dma coherent error case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-24 13:50:45 -07:00
Mike Marciniszyn	1dd173b01f	IB/qib: Add qp_stats debug file This adds a seq_file iterator for reporting the QP hash table when the qp_stats file is read. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-21 17:19:52 -07:00
Mike Marciniszyn	17db3a92c1	IB/qib: Add per-context stats interface This patch adds a debugfs stats interface for per kernel contexts packet counts. The code uses the opcode stats count and eliminates the counter in the context. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-21 17:19:51 -07:00
Mike Marciniszyn	ddb8876589	IB/qib: Convert opcode counters to per-context This fix changes the opcode relative counters for receive to per context. Profiling has shown that when mulitple contexts are being used there is a lot of cache activity associated with these counters. The code formerly kept these counters per port, but only provided the interface to read per HCA. This patch converts the read of counters to per HCA and adds the debugfs hooks to be able to read the file as a sequence of opcodes. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-21 17:19:50 -07:00
Mike Marciniszyn	85caafe307	IB/qib: Optimize CQ callbacks The current workqueue implemention has the following performance deficiencies on QDR HCAs: - The CQ call backs tend to run on the CPUs processing the receive queues - The single thread queue isn't optimal for multiple HCAs This patch adds a dedicated per HCA bound thread to process CQ callbacks. Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-21 17:19:49 -07:00
Ramkrishna Vepa	c804f07248	IB/qib: Add dual-rail NUMA awareness for PSM processes The driver currently selects a HCA based on the algorithm that PSM chooses, contexts within a HCA or across. The HCA can also be chosen by the user. Either way, this patch assigns a CPU on the NUMA node local to the selected HCA. This patch also tries to select the HCA closest to the NUMA node of the CPU assigned via taskset to PSM process. If this HCA is unusable then another unit is selected based on the algorithm that is currently enforced or selected by PSM - round robin context selection 'within' or 'across' HCA's. Fixed a bug wherein contexts are setup on the NUMA node on which the processes are opened (setup_ctxt()) and not on the NUMA node that the driver recommends the CPU on. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com> Signed-off-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-21 17:19:49 -07:00
Ramkrishna Vepa	e0f30baca1	IB/qib: Add optional NUMA affinity This patch adds context relative numa affinity conditioned on the module parameter numa_aware. The qib_ctxtdata has an additional node_id member and qib_create_ctxtdata() has an addition node_id parameter. The allocations within the hdr queue and eager queue setup routines now take this additional member and adjust allocations as necesary. PSM will pass the either current numa node or the node closest to the HCA depending on numa_aware. Verbs will always use the node closest to the HCA. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com> Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-21 17:19:48 -07:00
Vinit Agnihotri	ab4a13d69b	IB/qib: Update minor version number External PSM repositories have advanced the minor number for a variety of reasons. The driver needs to increase to avoid warnings. Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-21 17:19:47 -07:00
Mike Marciniszyn	f7cf9a618b	IB/qib: Remove atomic_inc_not_zero() from QP RCU Follow Documentation/RCU/rcuref.txt guidance in removing atomic_inc_not_zero() from QP RCU implementation. This patch also removes an unneeded synchronize_rcu() in the add path. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-21 17:19:46 -07:00
Mike Marciniszyn	8469ba39a6	IB/qib: Add DCA support This patch adds DCA cache warming for systems that support DCA. The code uses cpu affinity notification to react to an affinity change from a user mode program like irqbalance and (re-)program the chip accordingly. This notification avoids reading the current cpu on every interrupt. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> [ Add Kconfig dependency on SMP && GENERIC_HARDIRQS to avoid failure to build due to undefined struct irq_affinity_notify. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-21 17:19:38 -07:00

1 2 3 4 5 ...

3533 Commits