IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This addresses a problem where NFS client writes over IPoIB connected
mode may deadlock on memory allocation/writeback.
The problem is not directly memory reclamation. There is an indirect
dependency between network filesystems writing back pages and
ipoib_cm_tx_init() due to how a kworker is used. Page reclaim cannot
make forward progress until ipoib_cm_tx_init() succeeds and it is
stuck in page reclaim itself waiting for network transmission.
Ordinarily this situation may be avoided by having the caller use
GFP_NOFS but ipoib_cm_tx_init() does not have that information.
To address this, take a general approach and add a new QP creation
flag that tells the low-level hardware driver to use GFP_NOIO for the
memory allocations related to the new QP.
Use the new flag in the ipoib connected mode path, and if the driver
doesn't support it, re-issue the QP creation without the flag.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fix the usnic and thw qib drivers to err when QP creation flags that
they don't understand are provided.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
It is not possible to build only the drivers/infiniband/hw/ (or ulp/)
subdirectory with command such as:
$ make ARCH=x86_64 O=./obj-x86_64/ drivers/infiniband/hw/
This fails with following error messages:
make[2]: Nothing to be done for `all'.
make[2]: Nothing to be done for `relocs'.
CHK include/config/kernel.release
Using /home/ydroneaud/src/linux as source for kernel
GEN /home/ydroneaud/src/linux/obj-x86_64/Makefile
CHK include/generated/uapi/linux/version.h
CHK include/generated/utsrelease.h
CALL /home/ydroneaud/src/linux/scripts/checksyscalls.sh
/home/ydroneaud/src/linux/scripts/Makefile.build:44: /home/ydroneaud/src/linux/drivers/infiniband/hw/Makefile: No such file or directory
make[2]: *** No rule to make target `/home/ydroneaud/src/linux/drivers/infiniband/hw/Makefile'. Stop.
make[1]: *** [drivers/infiniband/hw/] Error 2
make: *** [sub-make] Error 2
This patch creates a Makefile in hw/ and ulp/ and moves each
corresponding parts of drivers/infiniband/Makefile in the new
Makefiles.
It should not break build except if some hw/ drivers or ulp/ were
allowed previously to be built while CONFIG_INFINIBAND is set to 'n',
but according to drivers/infiniband/Kconfig, it's not possible. So it
should be safe to apply.
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The “affinity hint” mechanism is used by the user space
daemon, irqbalancer, to indicate a preferred CPU mask for irqs.
Irqbalancer can use this hint to balance the irqs between the
cpus indicated by the mask.
We wish the HCA to preferentially map the IRQs it uses to numa cores
close to it. To accomplish this, we use cpumask_set_cpu_local_first(), that
sets the affinity hint according the following policy:
First it maps IRQs to “close” numa cores. If these are exhausted, the
remaining IRQs are mapped to “far” numa cores.
Signed-off-by: Yuval Atias <yuvala@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The i386 ABI disagrees with most other ABIs regarding alignment of
data types larger than 4 bytes: on most ABIs a padding must be added
at end of the structures, while it is not required on i386.
So for most ABI struct c4iw_create_cq_resp gets implicitly padded
to be aligned on a 8 bytes multiple, while for i386, such padding
is not added.
The tool pahole can be used to find such implicit padding:
$ pahole --anon_include \
--nested_anon_include \
--recursive \
--class_name c4iw_create_cq_resp \
drivers/infiniband/hw/cxgb4/iw_cxgb4.o
Then, structure layout can be compared between i386 and x86_64:
+++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 11:43:05.547432195 +0100
--- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100
@@ -14,9 +13,8 @@ struct c4iw_create_cq_resp {
__u32 size; /* 28 4 */
__u32 qid_mask; /* 32 4 */
- /* size: 36, cachelines: 1, members: 6 */
- /* last cacheline: 36 bytes */
+ /* size: 40, cachelines: 1, members: 6 */
+ /* padding: 4 */
+ /* last cacheline: 40 bytes */
};
This ABI disagreement will make an x86_64 kernel try to write past the
buffer provided by an i386 binary.
When boundary check will be implemented, the x86_64 kernel will refuse
to write past the i386 userspace provided buffer and the uverbs will
fail.
If the structure is on a page boundary and the next page is not
mapped, ib_copy_to_udata() will fail and the uverb will fail.
This patch adds an explicit padding at end of structure
c4iw_create_cq_resp, and, like 92b0ca7cb149 ("IB/mlx5: Fix stack info
leak in mlx5_ib_alloc_ucontext()"), makes function c4iw_create_cq()
not writting this padding field to userspace. This way, x86_64 kernel
will be able to write struct c4iw_create_cq_resp as expected by
unpatched and patched i386 libcxgb4.
Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Fixes: cfdda9d764362 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC")
Fixes: e24a72a3302a6 ("RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq()")
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
SRP defines pr_fmt(fmt) to be "PFX fmt", and then includes a bunch of
header files before it gets around to defining PFX. This causes
problems if any of the header files do a pr_... and use pr_fmt().
Fix this by using KBUILD_MODNAME instead of the private PFX.
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Avoid leaking a kref count in ib_umad_open() if port->ib_dev == NULL
or if nonseekable_open() fails.
Avoid leaking a kref count, that sm_sem is kept down and also that the
IB_PORT_SM capability mask is not cleared in ib_umad_sm_open() if
nonseekable_open() fails.
Since container_of() never returns NULL, remove the code that tests
whether container_of() returns NULL.
Moving the kref_get() call from the start of ib_umad_*open() to the
end is safe since it is the responsibility of the caller of these
functions to ensure that the cdev pointer remains valid until at least
when these functions return.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@vger.kernel.org>
[ydroneaud@opteya.com: rework a bit to reduce the amount of code changed]
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
[ nonseekable_open() can't actually fail, but.... - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
This commit adds the sysfs interface for enabling QP0 on VFs for
selected VF/port.
By default, no VFs are enabled for QP0 operation.
To enable QP0 operation on a VF/port, under
/sys/class/infiniband/mlx4_x/iov/<b:d:f>/ports/x there are two new entries:
- smi_enabled (read-only). Indicates whether smi is currently
enabled for the indicated VF/port
- enable_smi_admin (rw). Used by the admin to request that smi
capability be enabled or disabled for the indicated VF/port.
0 = disable, 1 = enable.
The requested enablement will occur at the next reset of the
VF (e.g. driver restart on the VM which owns the VF).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This commit adds the infrastructure for enabling selected VFs to
operate SMI (QP0) MADs without restriction.
Additionally, for these enabled VFs, their QP0 proxy and tunnel QPs
are MLX QPs. As such, they operate over VL15. Therefore, they are
not affected by "credit" problems or changes in the VLArb table (which
may shut down VL0).
Non-enabled VFs may only create UD proxy QP0 qps (which are forced by
the hypervisor to send packets using the q-key it assigns and places
in the qp-context). Thus, non-enabled VFs will not pose a security
risk. The hypervisor discards any privileged MADs it receives from
these non-enabled VFs.
By default, all VFs are NOT enabled, and must explicitly be enabled
by the administrator.
The sysfs interface which operates the VF enablement infrastructure
is provided in the next commit.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Currently, VFs in SRIOV VFs are denied QP0 access. The main reason
for this decision is security, since Subnet Management Datagrams
(SMPs) are not restricted by network partitioning and may affect the
physical network topology. Moreover, even the SM may be denied access
from portions of the network by setting management keys unknown to the
SM.
However, it is desirable to grant SMI access to certain privileged
VFs, so that certain network management activities may be conducted
within virtual machines instead of the hypervisor.
This commit does the following:
1. Create QP0 tunnel QPs for all VFs.
2. Discard SMI mads sent-from/received-for non-privileged VFs in the
hypervisor MAD multiplex/demultiplex logic. SMI mads from/for
privileged VFs are allowed to pass.
3. MAD_IFC wrapper changes/fixes. For non-privileged VFs, only
host-view MAD_IFC commands are allowed, and only for SMI LID-Routed
GET mads. For privileged VFs, there are no restrictions.
This commit does not allow privileged VFs as yet. To determine if a VF
is privileged, it calls function mlx4_vf_smi_enabled(). This function
returns 0 unconditionally for now.
The next two commits allow defining and activating privileged VFs.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
mlx4_ib_modify_port is invoked in IB for resetting the Q_Key violations
counters and for modifying the IB port capability flags.
For example, when opensm is started up on the hypervisor,
mlx4_ib_modify_port is called to set the port's IsSM flag.
In multifunction mode, the SET_PORT command used in this flow should
be wrapped (so that the PF port capability flags are also tracked,
thus enabling the aggregate of all the VF/PF capability flags to be
tracked properly).
The procedure mlx4_SET_PORT() in main.c is also renamed to mlx4_ib_SET_PORT()
to differentiate it from procedure mlx4_SET_PORT() in port.c.
mlx4_ib_SET_PORT() is used exclusively by mlx4_ib_modify_port().
Finally, the CM invokes ib_modify_port() to set the IsCMSupported flag
even when running over RoCE. Therefore, when RoCE is active,
mlx4_ib_modify_port should return OK unconditionally (since the
capability flags and qkey violations counter are not relevant).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patches changes user visible function names containing "qlogic"
in module init and cleanup.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We know that "reset_tpt_entry" is false on this side of the if else
statement so there is no need to check again.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit 297e0dad7206 ("IB/mlx4: Handle Ethernet L2 parameters for IP
based GID addressing") introduced a bug where is_mcast is now no
longer initialized on the non-multicast condition and so it can be
any random value from the stack. This issue was detected by cppcheck:
[drivers/infiniband/hw/mlx4/ah.c:103]: (error) Uninitialized
variable: is_mcast
Simple fix is to initialise is_mcast to zero.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Time comparisons must use time_after / time_before to avoid problems
when jiffies wraps.
Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We need to cast wr_id to unsigned long before casting to a pointer.
This fixes:
drivers/infiniband/hw/mlx5/mr.c: In function 'mlx5_umr_cq_handler':
>> drivers/infiniband/hw/mlx5/mr.c:724:13: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
context = (struct mlx5_ib_umr_context *)wc.wr_id;
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Prepends copyright and license to usnic_uiom_interval_tree.c
Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch addresses an issue where the legacy diagpacket is sent in
from the user, but the driver operates on only the extended
diagpkt. This patch specifically initializes the extended diagpkt
based on the legacy packet.
Cc: <stable@vger.kernel.org>
Reported-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The code used a literal 1 in dispatching an IB_EVENT_PKEY_CHANGE.
As of the dual port qib QDR card, this is not necessarily correct.
Change to use the port as specified in the call.
Cc: <stable@vger.kernel.org>
Reported-by: Alex Estrin <alex.estrin@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The cpl_abort_req struct has several reserved members which need to be
cleared to avoid disclosing kernel information. I have added a memset()
so now it matches the cxgb4 version of this function.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The i386 ABI disagrees with most other ABIs regarding alignment of
data type larger than 4 bytes: on most ABIs a padding must be added at
end of the structures, while it is not required on i386.
So for most ABIs struct mlx5_ib_create_srq gets implicitly padded to be
aligned on a 8 bytes multiple, while for i386, such padding is not
added.
Tool pahole could be used to find such implicit padding:
$ pahole --anon_include \
--nested_anon_include \
--recursive \
--class_name mlx5_ib_create_srq \
drivers/infiniband/hw/mlx5/mlx5_ib.o
Then, structure layout can be compared between i386 and x86_64:
+++ obj-i386/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-28 11:43:07.386413682 +0100
--- obj-x86_64/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-27 13:06:17.788472721 +0100
@@ -69,7 +68,6 @@ struct mlx5_ib_create_srq {
__u64 db_addr; /* 8 8 */
__u32 flags; /* 16 4 */
- /* size: 20, cachelines: 1, members: 3 */
- /* last cacheline: 20 bytes */
+ /* size: 24, cachelines: 1, members: 3 */
+ /* padding: 4 */
+ /* last cacheline: 24 bytes */
};
ABI disagreement will make an x86_64 kernel try to read past
the buffer provided by an i386 binary.
When boundary check will be implemented, the x86_64 kernel will
refuse to read past the i386 userspace provided buffer and the
uverb will fail.
Anyway, if the structure lay in memory on a page boundary and
next page is not mapped, ib_copy_from_udata() will fail and the
uverb will fail.
This patch makes create_srq_user() takes care of the input
data size to handle the case where no padding was provided.
This way, x86_64 kernel will be able to handle struct mlx5_ib_create_srq
as sent by unpatched and patched i386 libmlx5.
Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Fixes: e126ba97dba9e ("mlx5: Add driver for Mellanox Connect-IB adapter")
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The i386 ABI disagrees with most other ABIs regarding alignment of
data type larger than 4 bytes: on most ABIs a padding must be added at
end of the structures, while it is not required on i386.
So for most ABI struct mlx5_ib_create_cq get padded to be aligned on a
8 bytes multiple, while for i386, such padding is not added.
The tool pahole can be used to find such implicit padding:
$ pahole --anon_include \
--nested_anon_include \
--recursive \
--class_name mlx5_ib_create_cq \
drivers/infiniband/hw/mlx5/mlx5_ib.o
Then, structure layout can be compared between i386 and x86_64:
+++ obj-i386/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-28 11:43:07.386413682 +0100
--- obj-x86_64/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-27 13:06:17.788472721 +0100
@@ -34,9 +34,8 @@ struct mlx5_ib_create_cq {
__u64 db_addr; /* 8 8 */
__u32 cqe_size; /* 16 4 */
- /* size: 20, cachelines: 1, members: 3 */
- /* last cacheline: 20 bytes */
+ /* size: 24, cachelines: 1, members: 3 */
+ /* padding: 4 */
+ /* last cacheline: 24 bytes */
};
This ABI disagreement will make an x86_64 kernel try to read past the
buffer provided by an i386 binary.
When boundary check will be implemented, a x86_64 kernel will refuse
to read past the i386 userspace provided buffer and the uverb will
fail.
Anyway, if the structure lies in memory on a page boundary and next
page is not mapped, ib_copy_from_udata() will fail when trying to read
the 4 bytes of padding and the uverb will fail.
This patch makes create_cq_user() takes care of the input data size to
handle the case where no padding is provided.
This way, x86_64 kernel will be able to handle struct
mlx5_ib_create_cq as sent by unpatched and patched i386 libmlx5.
Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Fixes: e126ba97dba9e ("mlx5: Add driver for Mellanox Connect-IB adapter")
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Instead of having the UMR context part of each memory region, allocate
a struct on the stack. This allows queuing multiple UMRs that access
the same memory region.
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
For user QPs, the creation process does not currently initialize the fields:
* qp->rq.offset
* qp->sq.offset
* qp->sq.wqe_shift
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The patch stores iova, pd and size during mr creation and after UMRs
that modify them. It removes the unused access flags field.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
For memory regions that are allocated using reg_umr, the suffix of
mlx5_core_create_mkey isn't being called. Instead the creation is
completed in a callback function (reg_mr_callback). This means that
these MRs aren't being added to the MR radix tree. Add them in the
callback.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
If ib_post_send fails when posting the UMR work request in reg_umr,
the code doesn't release the temporary pas buffer allocated, and
doesn't dma_unmap it.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Some DIF implementations (SCSI initiator/target) may want to use different
input/output values for application tag and/or reference tag. So in
case memory/wire domain values don't match HW must not copy them.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
No need for repetition format pattern in case the data and protection
are already interleaved in the memory domain since the pattern
already exists. A single key entry is sufficient and may save some
extra fetch ops.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When the data and protection are interleaved in the memory domain, no
need to expand the mkey total length.
At the moment no Linux user works (iSER initiator & target) in
interleaved mode. This may change in the future as for SCSI
pass-through devices there is no real point in target performing
de-interleaving and re-interleaving of the protection data in the PT
stage. Regardless, signature verbs support this mode.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Logging messages need terminating newlines to avoid possible message
interleaving. Add them.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In some circumstances (multiple targets), RDMA_CM ESTABLISHED event
and ep_disconnect may race. In this case, the iser connection state
may transition to UP (after ep_disconnect transitioned it to
TERMINATING), while the connection is being torn down.
Upon RDMA_CM event ESTABLISHED we allow iser connection state to
transition to UP only from PENDING. We also make sure to protect this
state change (done under the connection lock).
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
iSER relies on refcounting to manage iser connections establishment
and teardown.
Following commit 39ff05dbbbdb ("IB/iser: Enhance disconnection logic
for multi-pathing"), iser connection maintain 3 references:
- iscsi_endpoint (at creation stage)
- cma_id (at connection request stage)
- iscsi_conn (at bind stage)
We can avoid taking explicit refcounts by correctly serializing iser
teardown flows (graceful and non-graceful).
Our approach is to trigger a scheduled work to handle ordered teardown
by gracefully waiting for 2 cleanup stages to complete:
1. Cleanup of live pending tasks indicated by iscsi_conn_stop completion
2. Flush errors processing
Each completed stage will notify a waiting worker thread when it is
done to allow teardwon continuation.
Since iSCSI connection establishment may trigger endpoint disconnect
without a successful endpoint connect, we rely on the iscsi <-> iser
binding (.conn_bind) to learn about the teardown policy we should take
wrt cleanup stages.
Since all cleanup worker threads are scheduled (release_wq) in
.ep_disconnect it is safe to assume that when module_exit is called,
all cleanup workers are already scheduled. Thus proper module unload
shall flush all scheduled works before allowing safe exit, to
guarantee no resources got left behind.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Conflicts:
drivers/net/bonding/bond_alb.c
drivers/net/ethernet/altera/altera_msgdma.c
drivers/net/ethernet/altera/altera_sgdma.c
net/ipv6/xfrm6_output.c
Several cases of overlapping changes.
The xfrm6_output.c has a bug fix which overlaps the renaming
of skb->local_df to skb->ignore_df.
In the Altera TSE driver cases, the register access cleanups
in net-next overlapped with bug fixes done in net.
Similarly a bug fix to send ALB packets in the bonding driver using
the right source address overlaps with cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"It looks like a sizeble collection but this is nearly 3 weeks of bug
fixing while you were away.
1) Fix crashes over IPSEC tunnels with NAT, the latter can reroute
the packet through a non-IPSEC protected path and the code has to
be able to handle SKBs attached to routes lacking an attached xfrm
state. From Steffen Klassert.
2) Fix OOPSs in ipv4 and ipv6 ipsec layers for unsupported
sub-protocols, also from Steffen Klassert.
3) Set local_df on fragmented netfilter skbs otherwise we won't be
able to forward successfully, from Florian Westphal.
4) cdc_mbim ipv6 neighbour code does __vlan_find_dev_deep without
holding RCU lock, from Bjorn Mork.
5) local_df test in ip_may_fragment is inverted, from Florian
Westphal.
6) jme driver doesn't check for DMA mapping failures, from Neil
Horman.
7) qlogic driver doesn't calculate number of TX queues properly, from
Shahed Shaikh.
8) fib_info_cnt can drift irreversibly positive if we fail to
allocate the fi->fib_metrics array, from Sergey Popovich.
9) Fix use after free in ip6_route_me_harder(), also from Sergey
Popovich.
10) When SYSCTL is disabled, we don't handle local_port_range and
ping_group_range defaults properly at all, from Cong Wang.
11) Unaccelerated VLAN tagged frames improperly handled by cdc_mbim
driver, fix from Bjorn Mork.
12) cassini driver needs nested lock annotations for TX locking, from
Emil Goode.
13) On init error ipv6 VTI driver can unregister pernet ops twice,
oops. Fix from Mahtias Krause.
14) If macvlan device is down, don't propagate IFF_ALLMULTI changes,
from Peter Christensen.
15) Missing NULL pointer check while parsing netlink config options in
ip6_tnl_validate(). From Susant Sahani.
16) Fix handling of neighbour entries during ipv6 router reachability
probing, from Duan Jiong.
17) x86 and s390 JIT address randomization has some address
calculation bugs leading to crashes, from Alexei Starovoitov and
Heiko Carstens.
18) Clear up those uglies with nop patching and net_get_random_once(),
from Hannes Frederic Sowa.
19) Option length miscalculated in ip6_append_data(), fix also from
Hannes Frederic Sowa.
20) A while ago we fixed a race during device unregistry when a
namespace went down, turns out there is a second place that needs
similar protection. From Cong Wang.
21) In the new Altera TSE driver multicast filtering isn't working,
disable it and just use promisc mode until the cause is found.
From Vince Bridgers.
22) When we disable router enabling in ipv6 we have to flush the
cached routes explicitly, from Duan Jiong.
23) NBMA tunnels should not cache routes on the tunnel object because
the key is variable, from Timo Teräs.
24) With stacked devices GRO information in skb->cb[] can be not setup
properly, make sure it is in all code paths. From Eric Dumazet.
25) Really fix stacked vlan locking, multiple levels of nesting with
intervening non-vlan devices are possible. From Vlad Yasevich.
26) Fallback ipip tunnel device's mtu is not setup properly, from
Steffen Klassert.
27) The packet scheduler's tcindex filter can crash because we
structure copy objects with list_head's inside, oops. From Cong
Wang.
28) Fix CHECKSUM_COMPLETE handling for ipv6 GRE tunnels, from Eric
Dumazet.
29) In some configurations 'itag' in __mkroute_input() can end up
being used uninitialized because of how fib_validate_source()
works. Fix it by explitly initializing itag to zero like all the
other fib_validate_source() callers do, from Li RongQing"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
batman: fix a bogus warning from batadv_is_on_batman_iface()
ipv4: initialise the itag variable in __mkroute_input
bonding: Send ALB learning packets using the right source
bonding: Don't assume 802.1Q when sending alb learning packets.
net: doc: Update references to skb->rxhash
stmmac: Remove unbalanced clk_disable call
ipv6: gro: fix CHECKSUM_COMPLETE support
net_sched: fix an oops in tcindex filter
can: peak_pci: prevent use after free at netdev removal
ip_tunnel: Initialize the fallback device properly
vlan: Fix build error wth vlan_get_encap_level()
can: c_can: remove obsolete STRICT_FRAME_ORDERING Kconfig option
MAINTAINERS: Pravin Shelar is Open vSwitch maintainer.
bnx2x: Convert return 0 to return rc
bonding: Fix alb mode to only use first level vlans.
bonding: Fix stacked device detection in arp monitoring
macvlan: Fix lockdep warnings with stacked macvlan devices
vlan: Fix lockdep warning with stacked vlan devices.
net: Allow for more then a single subclass for netif_addr_lock
net: Find the nesting level of a given device by type.
...
Pull scsi target fixes from Nicholas Bellinger:
"This series include:
- Close race between iser-target network portal shutdown + accepting
new connection logins (sagi)
- Fix free-after-use regression in tcm_fc post conversion to
percpu-ida pre-allocation (nab)
- Explicitly disable Immediate + Unsolicited Data for iser-target
connections when T10-PI is enabled (sagi + nab)
- Allow pi_prot_type + emulate_write_cache attributes to be set to
zero regardless of backend support (andy)
- memory leak fix (mikulas)"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: fix memory leak on XCOPY
target: Don't allow setting WC emulation if device doesn't support
iscsi-target: Disable Immediate + Unsolicited Data with ISER Protection
tcm_fc: Fix free-after-use regression in ft_free_cmd
iscsi-target: Change BUG_ON to REJECT in iscsit_process_nop_out
Target/iscsi,iser: Avoid accepting transport connections during stop stage
Target/iser: Fix iscsit_accept_np and rdma_cm racy flow
Target/iser: Fix wrong connection requests list addition
target: Allow non-supporting backends to set pi_prot_type to 0
In case user chose to set T10-PI enable on the target while
the IB device does not support it, gracefully reject the request.
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
disconnected_handler works are scheduled on system_wq.
When attempting to unload, first make sure all works
have cleaned up.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
There are 4 RDMA_CM events that all basically mean that
the user should teardown the IB connection:
- DISCONNECTED
- ADDR_CHANGE
- DEVICE_REMOVAL
- TIMEWAIT_EXIT
Only in DISCONNECTED/ADDR_CHANGE it makes sense to
call rdma_disconnect (send DREQ/DREP to our initiator).
So we keep the same teardown handler for all of them
but only indicate calling rdma_disconnect for the relevant
events.
This patch also removes redundant debug prints for each single
event.
v2 changes:
- Call isert_disconnected_handler() for DEVICE_REMOVAL (Or + Sag)
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Certain HCA types (e.g. Connect-IB) and certain configurations (e.g.
ConnectX VF) support fast registration but not FMR. Hence add fast
registration support.
In function srp_rport_reconnect(), move the the srp_finish_req()
loop from after to before the srp_create_target_ib() call. This is
needed to avoid that srp_finish_req() tries to queue any
invalidation requests for rkeys associated with the old queue pair
on the newly allocated queue pair. Invoking srp_finish_req() before
the queue pair has been reallocated is safe since srp_claim_req()
handles completions correctly that arrive after srp_finish_req()
has been invoked.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The next patch will cause the renamed variables to be shared between
the code for FMR and for FR memory registration. Make the names of
these variables independent of the memory registration mode. This
patch does not change any functionality. The start of this patch was
the changes applied via the following shell command:
sed -i.orig 's/SRP_FMR_SIZE/SRP_MAX_PAGES_PER_MR/g; \
s/fmr_page_mask/mr_page_mask/g;s/fmr_page_size/mr_page_size/g; \
s/fmr_page_shift/mr_page_shift/g;s/fmr_max_size/mr_max_size/g; \
s/max_pages_per_fmr/max_pages_per_mr/g;s/nfmr/nmdesc/g; \
s/fmr_len/dma_len/g' drivers/infiniband/ulp/srp/ib_srp.[ch]
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Allocate one FMR pool per SRP connection instead of one SRP pool
per HCA. This improves scalability of the SRP initiator.
Only request the SCSI mid-layer to retry a SCSI command after a
temporary mapping failure (-ENOMEM) but not after a permanent
mapping failure. This avoids that SCSI commands are retried
indefinitely if a permanent memory mapping failure occurs.
Tell the SCSI mid-layer to reduce queue depth temporarily in the
unlikely case where an application is queuing many requests with
more than max_pages_per_fmr sg-list elements.
For FMR pool allocation, base the max_pages_per_fmr parameter on
the HCA memory registration limit. Only try to allocate an FMR
pool if FMR is supported.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Add a kernel module parameter that enables memory registration also for SG-lists
that can be processed without memory registration. This makes it easier for kernel
developers to test the memory registration code.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Avoid that the kernel-doc tool warns about missing argument
descriptions for the ib_srp.[ch] source files.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Avoid that the loops that iterate over the request ring can encounter
a pointer to a SCSI command in req->scmnd that is no longer associated
with that request. If the function srp_unmap_data() is invoked twice
for a SCSI command that is not in flight then that would cause
ib_fmr_pool_unmap() to be invoked with an invalid pointer as argument,
resulting in a kernel oops.
Reported-by: Sagi Grimberg <sagig@mellanox.com>
Reference: http://thread.gmane.org/gmane.linux.drivers.rdma/19068/focus=19069
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>