IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
A small series to clean up the mlx5 mkey code across the mlx5_core and
InfiniBand.
* branch 'mlx5_mkey':
RDMA/mlx5: Attach ndescs to mlx5_ib_mkey
RDMA/mlx5: Move struct mlx5_core_mkey to mlx5_ib
RDMA/mlx5: Replace struct mlx5_core_mkey by u32 key
RDMA/mlx5: Remove pd from struct mlx5_core_mkey
RDMA/mlx5: Remove size from struct mlx5_core_mkey
RDMA/mlx5: Remove iova from struct mlx5_core_mkey
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Generalize the use of ndescs by adding it to mlx5_ib_mkey.
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Move mlx5_core_mkey struct to mlx5_ib, as the mlx5_core doesn't use it
at this point.
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
In mlx5_core and vdpa there is no use of mlx5_core_mkey members except
for the key itself.
As preparation for moving mlx5_core_mkey to mlx5_ib, the occurrences of
struct mlx5_core_mkey in all modules except for mlx5_ib are replaced by
a u32 key.
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
There is no read of mkey->pd, only write. Remove it.
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
mkey->size is already stored in ibmr->length, no need to store it here.
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
iova is already stored in ibmr->iova, no need to store it here.
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Control IRQ is the first IRQ vector. This complicates handling of
completion irqs as we need to offset them by one.
in the next patch, there are scenarios where completion and control EQs
will share the same irq. for example: functions with single IRQ. To ease
such scenarios, we shift control IRQ to the end of the irq array.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
UID field was added to alloc_uar and dealloc_uar PRM command, to specify
DevX UID for UAR. This change enables firmware validating user access to
its own UAR resources.
For the kernel allocated UARs the UID will stay 0 as of today.
Signed-off-by: Meir Lichtinger <meirl@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
An important error case regression fixes in mlx5:
- Wrong size used when computing the error path smaller allocation request
leads to corruption
- Confusing but ultimately harmless alignment mis-calculation
- Static checker warnings:
Null pointer subtraction in qib
kcalloc in bnxt_re
Missing static on global variable in hfi1
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmE4qagACgkQOG33FX4g
mxoueg//SfLuxirMifDHOpH2Db5vC2a/BwYW3AuT78zAXzlUmmEWYIUw1fbJA2vK
vt/fsjaxW2FBsluiIgV8J961Mlrj+/Y1KPxbV+76CKFoBkaYcjEJcIajGXkm71r5
bXviUuFEHV2810Nxs6QEitMPyEP3sODmcK5EoKIuifVfjhdcJ3XrLbJb0NA5PKk1
voD2zpAfhumiAzjZ2q3rJgpCv08WWqts30DzAKdii1asdG9tDF8WBS9BJuPC7MHF
B2AmYVTgwhsepCAUaYGathKxHblqakRXF3gjOIAB4TKuZEv0OwGVzH3flFbukg5L
icEw/Ai0H2GM7e3yT3BI3QDoXMg52DfCeChgwttkNsJJidjlc7cUCwaaBCJ1oVo/
860foW9YFA14ftAlBt2BtvfNfOaqYnmz4tmavVNxqW7NiZCqT/+uyV5BSDPD5b5M
Xtsa8NK/6u65rz92GtNpVYd8W4ZfjOCAc8cWj6+ELeSDsU92sFWBWalWsl59Hq+t
AdCUi25uwUAlLPiPKRlYxSXoCCOG80iDgrj8sntIGoVYT0nr4ITVv5VmmVWNQLPw
V5xfDhCmALrNlp1nCBRsmtBHA/Legqu82/lWpkl12WZGOytU/jTEb2gLWPDglwr+
BWmaq3Us/g36NX9h/GOvqD5tZCogzkjHvWLW0biMRebDyO1DW1Y=
=Sg6A
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"I don't usually send a second PR in the merge window, but the fix to
mlx5 is significant enough that it should start going through the
process ASAP. Along with it comes some of the usual -rc stuff that
would normally wait for a -rc2 or so.
Summary:
Important error case regression fixes in mlx5:
- Wrong size used when computing the error path smaller allocation
request leads to corruption
- Confusing but ultimately harmless alignment mis-calculation
Static checker warning fixes:
- NULL pointer subtraction in qib
- kcalloc in bnxt_re
- Missing static on global variable in hfi1"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/hfi1: make hist static
RDMA/bnxt_re: Prefer kcalloc over open coded arithmetic
IB/qib: Fix null pointer subtraction compiler warning
RDMA/mlx5: Fix xlt_chunk_align calculation
RDMA/mlx5: Fix number of allocated XLT entries
The XLT chunk alignment depends on ent_size not sizeof(ent_size) aka
sizeof(size_t). The incoming ent_size is either 8 or 16, so the
miscalculation when 16 is required is only an over-alignment and
functional harmless.
Fixes: 8010d74b9965 ("RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()")
Link: https://lore.kernel.org/r/20210908081849.7948-2-schnelle@linux.ibm.com
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In commit 8010d74b9965b ("RDMA/mlx5: Split the WR setup out of
mlx5_ib_update_xlt()") the allocation logic was split out of
mlx5_ib_update_xlt() and the logic was changed to enable better OOM
handling. Sadly this change introduced a miscalculation of the number of
entries that were actually allocated when under memory pressure where it
can actually become 0 which on s390 lets dma_map_single() fail.
It can also lead to corruption of the free pages list when the wrong
number of entries is used in the calculation of sg->length which is used
as argument for free_pages().
Fix this by using the allocation size instead of misusing get_order(size).
Cc: stable@vger.kernel.org
Fixes: 8010d74b9965 ("RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()")
Link: https://lore.kernel.org/r/20210908081849.7948-1-schnelle@linux.ibm.com
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
- Various cleanup and small features for rtrs
- kmap_local_page() conversions
- Driver updates and fixes for: efa, rxe, mlx5, hfi1, qed, hns
- Cache the IB subnet prefix
- Rework how CRC is calcuated in rxe
- Clean reference counting in iwpm's netlink
- Pull object allocation and lifecycle for user QPs to the uverbs core
code
- Several small hns features and continued general code cleanups
- Fix the scatterlist confusion of orig_nents/nents introduced in an
earlier patch creating the append operation
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmEudRgACgkQOG33FX4g
mxraJA//c6bMxrrTVrzmrtrkyYD4tYWE8RDfgvoyZtleZnnEOJeunCQWakQrpJSv
ukSnOGCA3PtnmRMdV54f/11YJ/7otxOJodSO7jWsIoBrqG/lISAdX8mn2iHhrvJ0
dIaFEFPLy0WqoMLCJVIYIupR0IStVHb/mWx0uYL4XnnoYKyt7f7K5JMZpNWMhDN2
ieJw0jfrvEYm8pipWuxUvB16XARlzAWQrjqLpMRI+jFRpbDVBY21dz2/LJvOJPrA
LcQ+XXsV/F659ibOAGm6bU4BMda8fE6Lw90B/gmhSswJ205NrdziF5cNYHP0QxcN
oMjrjSWWHc9GEE7MTipC2AH8e36qob16Q7CK+zHEJ+ds7R6/O/8XmED1L8/KFpNA
FGqnjxnxsl1y27mUegfj1Hh8PfoDp2oVq0lmpEw0CYo4cfVzHSMRrbTR//XmW628
Ie/mJddpFK4oLk+QkSNjSLrnxOvdTkdA58PU0i84S5eUVMNm41jJDkxg2J7vp0Zn
sclZsclhUQ9oJ5Q2so81JMWxu4JDn7IByXL0ULBaa6xwQTiVEnyvSxSuPlflhLRW
0vI2ylATYKyWkQqyX7VyWecZJzwhwZj5gMMWmoGsij8bkZhQ/VaQMaesByzSth+h
NV5UAYax4GqyOQ/tg/tqT6e5nrI1zof87H64XdTCBpJ7kFyQ/oA=
=ZwOe
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"This is quite a small cycle, no major series stands out. The HNS and
rxe drivers saw the most activity this cycle, with rxe being broken
for a good chunk of time. The significant deleted line count is due to
a SPDX cleanup series.
Summary:
- Various cleanup and small features for rtrs
- kmap_local_page() conversions
- Driver updates and fixes for: efa, rxe, mlx5, hfi1, qed, hns
- Cache the IB subnet prefix
- Rework how CRC is calcuated in rxe
- Clean reference counting in iwpm's netlink
- Pull object allocation and lifecycle for user QPs to the uverbs
core code
- Several small hns features and continued general code cleanups
- Fix the scatterlist confusion of orig_nents/nents introduced in an
earlier patch creating the append operation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (90 commits)
RDMA/mlx5: Relax DCS QP creation checks
RDMA/hns: Delete unnecessary blank lines.
RDMA/hns: Encapsulate the qp db as a function
RDMA/hns: Adjust the order in which irq are requested and enabled
RDMA/hns: Remove RST2RST error prints for hw v1
RDMA/hns: Remove dqpn filling when modify qp from Init to Init
RDMA/hns: Fix QP's resp incomplete assignment
RDMA/hns: Fix query destination qpn
RDMA/hfi1: Convert to SPDX identifier
IB/rdmavt: Convert to SPDX identifier
RDMA/hns: Bugfix for incorrect association between dip_idx and dgid
RDMA/hns: Bugfix for the missing assignment for dip_idx
RDMA/hns: Bugfix for data type of dip_idx
RDMA/hns: Fix incorrect lsn field
RDMA/irdma: Remove the repeated declaration
RDMA/core/sa_query: Retry SA queries
RDMA: Use the sg_table directly and remove the opencoded version from umem
lib/scatterlist: Fix wrong update of orig_nents
lib/scatterlist: Provide a dedicated function to support table append
RDMA/hns: Delete unused hns bitmap interface
...
From Maor Gottlieb
====================
Fix the use of nents and orig_nents in the sg table append helpers. The
nents should be used by the DMA layer to store the number of DMA mapped
sges, the orig_nents is the number of CPU sges.
Since the sg append logic doesn't always create a SGL with exactly
orig_nents entries store a total_nents as well to allow the table to be
properly free'd and reorganize the freeing logic to share across all the
use cases.
====================
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* 'sg_nents':
RDMA: Use the sg_table directly and remove the opencoded version from umem
lib/scatterlist: Fix wrong update of orig_nents
lib/scatterlist: Provide a dedicated function to support table append
In order to create DCS QPs, we don't need to rely on both
log_max_dci_stream_channels and log_max_dci_errored_streams capabilities.
Fixes: 11656f593a86 ("RDMA/mlx5: Add DCS offload support")
Link: https://lore.kernel.org/r/3e7b3363fd73686176cc584295e86832a7cf99b2.1630320354.git.leonro@nvidia.com
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This allows using the normal sg_table APIs and makes all the code
cleaner. Remove sgt, nents and nmapd from ib_umem.
Link: https://lore.kernel.org/r/20210824142531.3877007-4-maorg@nvidia.com
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Saeed Mahameed says:
====================
This pulls mlx5-next branch which includes patches already reviewed on
net-next and rdma mailing lists.
1) mlx5 single E-Switch FDB for lag
2) IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq
3) Add DCS caps & fields support
We need this in net-next as multiple features are dependent on the
single FDB feature.
====================
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* mellanox/mlx5-next:
net/mlx5: Lag, Create shared FDB when in switchdev mode
net/mlx5: E-Switch, add logic to enable shared FDB
net/mlx5: Lag, move lag destruction to a workqueue
net/mlx5: Lag, properly lock eswitch if needed
net/mlx5: Add send to vport rules on paired device
net/mlx5: E-Switch, Add event callback for representors
net/mlx5e: Use shared mappings for restoring from metadata
net/mlx5e: Add an option to create a shared mapping
net/mlx5: E-Switch, set flow source for send to uplink rule
RDMA/mlx5: Add shared FDB support
{net, RDMA}/mlx5: Extend send to vport rules
RDMA/mlx5: Fill port info based on the relevant eswitch
net/mlx5: Lag, add initial logic for shared FDB
net/mlx5: Return mdev from eswitch
IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq
Fix the below crash when deleting a slave from the unaffiliated list
twice. First time when the slave is bound to the master and the second
when the slave is unloaded.
Fix it by checking if slave is unaffiliated (doesn't have ib device)
before removing from the list.
RIP: 0010:mlx5r_mp_remove+0x4e/0xa0 [mlx5_ib]
Call Trace:
auxiliary_bus_remove+0x18/0x30
__device_release_driver+0x177/x220
device_release_driver+0x24/0x30
bus_remove_device+0xd8/0x140
device_del+0x18a/0x3e0
mlx5_rescan_drivers_locked+0xa9/0x210 [mlx5_core]
mlx5_unregister_device+0x34/0x60 [mlx5_core]
mlx5_uninit_one+0x32/0x100 [mlx5_core]
remove_one+0x6e/0xe0 [mlx5_core]
pci_device_remove+0x36/0xa0
__device_release_driver+0x177/0x220
device_driver_detach+0x3c/0xa0
unbind_store+0x113/0x130
kernfs_fop_write_iter+0x110/0x1a0
new_sync_write+0x116/0x1a0
vfs_write+0x1ba/0x260
ksys_write+0x5f/0xe0
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: 93f8244431ad ("RDMA/mlx5: Convert mlx5_ib to use auxiliary bus")
Link: https://lore.kernel.org/r/17ec98989b0ba88f7adfbad68eb20bce8d567b44.1628587493.git.leonro@nvidia.com
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
can and ieee802154.
Current release - regressions:
- r8169: fix ASPM-related link-up regressions
- bridge: fix flags interpretation for extern learn fdb entries
- phy: micrel: fix link detection on ksz87xx switch
- Revert "tipc: Return the correct errno code"
- ptp: fix possible memory leak caused by invalid cast
Current release - new code bugs:
- bpf: add missing bpf_read_[un]lock_trace() for syscall program
- bpf: fix potentially incorrect results with bpf_get_local_storage()
- page_pool: mask the page->signature before the checking, avoid
dma mapping leaks
- netfilter: nfnetlink_hook: 5 fixes to information in netlink dumps
- bnxt_en: fix firmware interface issues with PTP
- mlx5: Bridge, fix ageing time
Previous releases - regressions:
- linkwatch: fix failure to restore device state across suspend/resume
- bareudp: fix invalid read beyond skb's linear data
Previous releases - always broken:
- bpf: fix integer overflow involving bucket_size
- ppp: fix issues when desired interface name is specified via netlink
- wwan: mhi_wwan_ctrl: fix possible deadlock
- dsa: microchip: ksz8795: fix number of VLAN related bugs
- dsa: drivers: fix broken backpressure in .port_fdb_dump
- dsa: qca: ar9331: make proper initial port defaults
Misc:
- bpf: add lockdown check for probe_write_user helper
- netfilter: conntrack: remove offload_pickup sysctl before 5.14 is out
- netfilter: conntrack: collect all entries in one cycle,
heuristically slow down garbage collection scans
on idle systems to prevent frequent wake ups
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmEVb/AACgkQMUZtbf5S
Irvlzw//XGDHNNPPOueHVhYK50+WiqPMxezQ5nbnG6uR6JtPyirMNTgzST8rQRsu
HmQy8/Oi6bK5rbPC9iDtKK28ba6Ldvu1ic8lTkuWyNNthG/pZGJJQ+Pg7dmkd7te
soJGZKnTbNWwbgGOFbfw9rLRuzWsjQjQ43vxTMjjNnpOwNxANuNR1GN0S/t8e9di
9BBT8jtgcHhtW5jRMHMNWHk+k8aeyIZPxjl9fjzzsMt7meX50DFrCJgf8bKkZ5dA
W2b/fzUyMqVQJpgmIY4ktFmR4mV382pWOOs6rl+ppSu+mU/gpTuYCofF7FqAUU5S
71mzukW6KdOrqymVuwiTXBlGnZB370aT7aUU5PHL/ZkDJ9shSyVRcg/iQa40myzn
5wxunZX936z5f84bxZPW1J5bBZklba8deKPXHUkl5RoIXsN2qWFPJpZ1M0eHyfPm
ZdqvRZ1IkSSFZFr6FF374bEqa88NK1wbVKUbGQ+yn8abE+HQfXQR9ZWZa1DR1wkb
rF8XWOHjQLp/zlTRnj3gj3T4pEwc5L1QOt7RUrYfI36Mh7iUz5EdzowaiEaDQT6/
neThilci1F6Mz4Uf65pK4TaDTDvj1tqqAdg3g8uneHBTFARS+htGXqkaKxP6kSi+
T/W4woOqCRT6c0+BhZ2jPRhKsMZ5kR1vKLUVBHShChq32mDpn6g=
=hzDl
-----END PGP SIGNATURE-----
Merge tag 'net-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from netfilter, bpf, can and
ieee802154.
The size of this is pretty normal, but we got more fixes for 5.14
changes this week than last week. Nothing major but the trend is the
opposite of what we like. We'll see how the next week goes..
Current release - regressions:
- r8169: fix ASPM-related link-up regressions
- bridge: fix flags interpretation for extern learn fdb entries
- phy: micrel: fix link detection on ksz87xx switch
- Revert "tipc: Return the correct errno code"
- ptp: fix possible memory leak caused by invalid cast
Current release - new code bugs:
- bpf: add missing bpf_read_[un]lock_trace() for syscall program
- bpf: fix potentially incorrect results with bpf_get_local_storage()
- page_pool: mask the page->signature before the checking, avoid dma
mapping leaks
- netfilter: nfnetlink_hook: 5 fixes to information in netlink dumps
- bnxt_en: fix firmware interface issues with PTP
- mlx5: Bridge, fix ageing time
Previous releases - regressions:
- linkwatch: fix failure to restore device state across
suspend/resume
- bareudp: fix invalid read beyond skb's linear data
Previous releases - always broken:
- bpf: fix integer overflow involving bucket_size
- ppp: fix issues when desired interface name is specified via
netlink
- wwan: mhi_wwan_ctrl: fix possible deadlock
- dsa: microchip: ksz8795: fix number of VLAN related bugs
- dsa: drivers: fix broken backpressure in .port_fdb_dump
- dsa: qca: ar9331: make proper initial port defaults
Misc:
- bpf: add lockdown check for probe_write_user helper
- netfilter: conntrack: remove offload_pickup sysctl before 5.14 is
out
- netfilter: conntrack: collect all entries in one cycle,
heuristically slow down garbage collection scans on idle systems to
prevent frequent wake ups"
* tag 'net-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
vsock/virtio: avoid potential deadlock when vsock device remove
wwan: core: Avoid returning NULL from wwan_create_dev()
net: dsa: sja1105: unregister the MDIO buses during teardown
Revert "tipc: Return the correct errno code"
net: mscc: Fix non-GPL export of regmap APIs
net: igmp: increase size of mr_ifc_count
MAINTAINERS: switch to my OMP email for Renesas Ethernet drivers
tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets
net: pcs: xpcs: fix error handling on failed to allocate memory
net: linkwatch: fix failure to restore device state across suspend/resume
net: bridge: fix memleak in br_add_if()
net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge
net: bridge: fix flags interpretation for extern learn fdb entries
net: dsa: sja1105: fix broken backpressure in .port_fdb_dump
net: dsa: lantiq: fix broken backpressure in .port_fdb_dump
net: dsa: lan9303: fix broken backpressure in .port_fdb_dump
net: dsa: hellcreek: fix broken backpressure in .port_fdb_dump
bpf, core: Fix kernel-doc notation
net: igmp: fix data-race in igmp_ifc_timer_expire()
net: Fix memory leak in ieee802154_raw_deliver
...
The CQ destroy is performed based on the IRQ number that is stored in
cq->irqn. That number wasn't set explicitly during CQ creation and as
expected some of the API users of mlx5_core_create_cq() forgot to update
it.
This caused to wrong synchronization call of the wrong IRQ with a number
0 instead of the real one.
As a fix, set the IRQ number directly in the mlx5_core_create_cq() and
update all users accordingly.
Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Fixes: ef1659ade359 ("IB/mlx5: Add DEVX support for CQ events")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Shared FDB allows to create a single RDMA device that holds representors
from both eswitches. As shared FDB is only active when both uplink
representors are enslaved there is a single RDMA port that represents
both uplinks.
The number of ports is the number of vports on both eswitches minus one
as we only need 1 port for both uplinks.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In shared FDB there is only one eswitch which is active and it receives
traffic from all representors and all vports in the HCA.
While the Ethernet representor will always reside on its native PF
the IB representor will not. Extend send to vport rule creation to
support such flows. Need to account for source vport that sends the
traffic (on which the representors resides) and the target eswitch
the traffic which reach.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In shared FDB a single RDMA device can have representors that are
connected to two different eswitches. Use the right eswitch when
preparing the response to userspace.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
XRC_TGT QPs are created through kernel verbs and don't have udata at all.
Fixes: 6eefa839c4dd ("RDMA/mlx5: Protect from kernel crash if XRC_TGT doesn't have udata")
Fixes: e383085c2425 ("RDMA/mlx5: Set ECE options during QP create")
Link: https://lore.kernel.org/r/b68228597e730675020aa5162745390a2d39d3a2.1628014762.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
There is no real value in bypassing IB/core APIs for creating standard
objects with standard types. The open-coded variant didn't have any
restrack task management calls and caused to such objects to be not
present when running rdmatoool.
Link: https://lore.kernel.org/r/f745590e5fb7d56f90fdb25f64ee3983ba17e1e4.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Convert QP object to follow IB/core general allocation scheme. That
change allows us to make sure that restrack properly kref the memory.
Link: https://lore.kernel.org/r/48e767124758aeecc433360ddd85eaa6325b34d9.1627040189.git.leonro@nvidia.com
Reviewed-by: Gal Pressman <galpress@amazon.com> #efa
Tested-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> #rdma and core
Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Starting from commit 2b1f747071c5 ("RDMA/core: Allow drivers to disable
restrack DB") the restrack is able to handle non-standard QP types either.
That change allows us to rewrite custom QP calls to their IB/core
counterparts, so we will use general QP creation flow even for the driver
QP types.
Link: https://lore.kernel.org/r/51682ab82298748941f38bd23ee3bf77ef1cab7b.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The dev->devr.mutex was intended to protect GSI QP pointer change in the
struct mlx5_ib_port_resources when it is accessed from the
pkey_change_work. However that pointer isn't changed during the runtime
and once IB/core adds MAD, it stays stable.
Link: https://lore.kernel.org/r/6e338c561033df20d92e1371fc6a7a0d93aad945.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In the driver release flow, we are ensuring that notifier is disabled and
no new works can be added to pkey_change_handler. It means that we can
cancel that handler before destroying resources to make sure that our
unwind routine is symmetrical to the allocation one.
Link: https://lore.kernel.org/r/f2b1ea1bad952e4e7a48a6f731de9e0344986b29.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Fixing a typo that causes a cache entry to shrink immediately after adding
to it new MRs if the entry size exceeds the high limit. In doing so, the
cache misses its purpose to prevent the creation of new mkeys on the
runtime by using the cached ones.
Fixes: b9358bdbc713 ("RDMA/mlx5: Fix locking in MR cache work queue")
Link: https://lore.kernel.org/r/fcb546986be346684a016f5ca23a0567399145fa.1627370131.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
is_apu_thread_cq() used to detect CQs which are attached to APU
threads. This was extended to support other elements as well,
so the function was renamed to is_apu_cq().
c_eqn_or_apu_element was extended from 8 bits to 32 bits, which wan't
reflected when the APU support was first introduced.
Acked-by: Michael S. Tsirkin <mst@redhat.com> # vdpa
Signed-off-by: Tal Gilboa <talgi@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
DCS is an offload to SW load balancing of DC initiator work requests.
A single DCI can be connected to only one target at the time and can't
start new connection until the previous work request is completed. This
limitation will cause to delay when the initiator process needs to
transfer data to multiple targets at the same time. The SW solution is to
use a process that handling and spreading the work request on many DCIs
according to destinations.
This feature is an offload to this process and coming to reduce the load
from the CPU and improve the performance.
Link: https://lore.kernel.org/r/491c2c2afdb5b07de7f03eab3f93cf0704549dbc.1624258894.git.leonro@nvidia.com
Reviewed-by: Meir Lichtinger <meirl@nvidia.com>
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This patch isolates DCI QP creation logic to separate function, so this
change will reduce complexity when adding new features to DCI QP without
interfering with other QP types.
The code was copied from create_user_qp() while taking only DCI relevant bits.
Link: https://lore.kernel.org/r/b4530bdd999349c59691224f016ff1efb5dc3b92.1624258894.git.leonro@nvidia.com
Reviewed-by: Meir Lichtinger <meirl@nvidia.com>
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This PR contains a replacement driver for Intel iWarp hardware. This new
driver supports the old ethernet hardware and also newer chips that can do
ROCE. Otherwise this contains the typical mix of patches:
- Driver updates and cleanups for bnxt_re, cxgb4, mlx4, and mlx5
- Many static checker driven code clean ups, including a wide refcount_t
conversion
- Several series for the hns driver, more HIP09 HW capabilities, migration
to new HW register manipulators, and code cleanups
- Minor fixes and improvements in srp, rts, and cm
- Improvements throughout for sysfs related code to use DEVICE_ATTR_*,
make the ib_port sysfs first-class, and overall use sysfs APIs properly
- Intel's new irdma driver replacing i40iw
- rxe general clean ups and Memory Window support
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmDcunQACgkQOG33FX4g
mxqSBA//dsZi/UzpzgU+YqyMFmUp04wd2/iCYzOcCViNPQZCyCARbGaMXI4kMa4s
8dM5xU76OnCuNSnXHaIwvHC3CdN9GUm08j9eWY7syvAiKtXCjzv7qmCVfBw35UyK
IXKfXh57toTSSAIfxw8yKc97QgaDSJ2zQ34fXkoE0AvTlfyN6pHQe9ef/Ca0ejS4
awUGYVG/oilLXrEHcSSAv5UoX6hOUje6jqqRgp5jmZTI3g7SlIPL8mWgXBkHAYmd
kDX7lBd09CKo2bmR071/kF6xUzvbCg1tmeE6lZze7gE+aKlBkZcvCBe1RAh3sBzK
ysLfON5GGw5qnkMaY8j5h3sgWvi3qTTEW+jCAmmVi/6z4PF47mvmVVn+/pZc3y2e
PqH43cunhwS0KuoUJ5Sd48J/UvabrvdbCNZrjCGCpt45EF4VwKxYMh74Bf0ABEQS
i9eKR/+wyHG6Uv1U37fIXsqa8yUttl9aV/s88s8irn4xhG8ygBLZgeVQNeGUfvdV
1W0XLEjRmKFezC1FhiPOz7CLIgL3BfSU1V+S7p0Gvb6ijZqyZTfRUaWbaD3KJpRT
1kwzE4qp6IbJMEqgQH/lq0xBzvzF48FPvBslX5kwlm0phQRrMCwMVIafutpu395q
ySeStEvsTVfz/JUHL3ZaEJyTRjAvPL0lXLH80XUpgWk9GzsksOM=
=wyqt
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"This contains a replacement driver for Intel iWarp hardware. This new
driver supports the old ethernet hardware and also newer chips that
can do ROCE.
Other than that, this contains the typical mix of patches:
- Driver updates and cleanups for bnxt_re, cxgb4, mlx4, and mlx5
- Many static checker driven code clean ups, including a wide
refcount_t conversion
- Several series for the hns driver, more HIP09 HW capabilities,
migration to new HW register manipulators, and code cleanups
- Minor fixes and improvements in srp, rts, and cm
- Improvements throughout for sysfs related code to use
DEVICE_ATTR_*, make the ib_port sysfs first-class, and overall use
sysfs APIs properly
- Intel's new irdma driver replacing i40iw
- rxe general clean ups and Memory Window support"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (211 commits)
RDMA/core: Always release restrack object
RDMA/mlx5: Don't access NULL-cleared mpi pointer
RDMA/irdma: Fix potential overflow expression in irdma_prm_get_pbles
RDMA/irdma: Check contents of user-space irdma_mem_reg_req object
RDMA/rxe: Missing unlock on error in get_srq_wqe()
RDMA/cma: Fix rdma_resolve_route() memory leak
RDMA/core/sa_query: Remove unused argument
RDMA/cma: Fix incorrect Packet Lifetime calculation
RDMA/cma: Protect RMW with qp_mutex
RDMA/cma: Remove unnecessary INIT->INIT transition
RDMA/hns: Add window selection field of congestion control
RDMA/hfi1: Remove use of kmap()
RDMA/irdma: Remove use of kmap()
RDMA/bnxt_re: Fix uninitialized struct bit field rsvd1
IB/isert: Align target max I/O size to initiator size
RDMA/hns: Fix incorrect vlan enable bit in QPC
MAINTAINERS: Update Broadcom RDMA maintainers
RDMA/irdma: Use the queried port attributes
RDMA/rxe: Fix redundant skb_put_zero
RDMA/rxe: Fix extra copy in prepare_ack_packet
...
Aharon Landau says:
====================
In case device supports only real-time timestamp, the kernel will fail to
create QP despite rdma-core requested such timestamp type.
It is because device returns free-running timestamp, and the conversion
from free-running to real-time is performed in the user space.
This series fixes it, by returning real-time timestamp.
====================
* mlx5_realtime_ts:
RDMA/mlx5: Support real-time timestamp directly from the device
RDMA/mlx5: Refactor get_ts_format functions to simplify code
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmDPuyMeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGvxgH/RKvSuRPwkJ2Jcp9
VLi5kCbqtJlYLq6tB6peSJ8otKgxkcRwC0pIY4LlYIAWYboktLQ5RKp/9nB2h2FN
aMZUMu6AI/lVJyFMI5MnKnJIUiUq+WXR3lSSlw68vwFLFdzqUZFNq+bqeiVvnIy1
yqA6naj24Tu/RbYffQoPvdSJcU2SLXRMxwD8HRGiU2d51RaFsOvsZvF+P5TVcsEV
ZmttJeER21CaI/A809eqaFmyGrUOcZZK9roZEbMwanTZOMw18biEsLu/UH4kBX01
JC4+RlGxcWjQ5YNZgChsgoOK/CHzc6ITztTntdeDWAvwZjQFzV7pCy4/3BWne3O+
5178yHM=
=o8cN
-----END PGP SIGNATURE-----
Merge tag 'v5.13-rc7' into rdma.git for-next
Linux 5.13-rc7
Needed for dependencies in following patches. Merge conflict in rxe_cmop.c
resolved by compining both patches.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently, if the user asks for a real-time timestamp, the device will
return a free-running one, and the timestamp will be translated to
real-time in the user-space.
When the device supports only real-time timestamp and not free-running,
the creation of the QP will fail even though the user needs supported the
real-time one. To prevent this, we will return the real-time timestamp
directly from the device.
Link: https://lore.kernel.org/r/c6cfc8e6f038575c5c2de6505830f7e74e4de80d.1623829775.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
QPC, SQC and RQC timestamp formats and capabilities are always equal
because they represent general hardware support. So instead of code
duplication, let's merge them into general enum and logic.
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Relaxed Ordering is a capability that can only benefit users that support
it. All kernel ULPs should support Relaxed Ordering, as they are designed
to read data only after observing the CQE and use the DMA API correctly.
Hence, implicitly enable Relaxed Ordering by default for MR transfers in
kernel ULPs.
Link: https://lore.kernel.org/r/b7e820aab7402b8efa63605f4ea465831b3b1e5e.1623236426.git.leonro@nvidia.com
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Trivial conflicts in net/can/isotp.c and
tools/testing/selftests/net/mptcp/mptcp_connect.sh
scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c
to include/linux/ptp_clock_kernel.h in -next so re-apply
the fix there.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is being used to implement both the port and device global stats,
which is causing some confusion in the drivers. For instance EFA and i40iw
both seem to be misusing the device stats.
Split it into two ops so drivers that don't support one or the other can
leave the op NULL'd, making the calling code a little simpler to
understand.
Link: https://lore.kernel.org/r/1955c154197b2a159adc2dc97266ddc74afe420c.1623427137.git.leonro@nvidia.com
Tested-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The mlx5_ib_bind_slave_port() doesn't remove multiport device from the
unaffiliated list, but mlx5_ib_unbind_slave_port() did it. This unbalanced
flow caused to the situation where mlx5_ib_unaffiliated_port_list was
changed during iteration.
Fixes: 32f69e4be269 ("{net, IB}/mlx5: Manage port association for multiport RoCE")
Link: https://lore.kernel.org/r/2726e6603b1e6ecfe76aa5a12a063af72173bcf7.1622477058.git.leonro@nvidia.com
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Whenever users provided affinity for an EQ creation request, map the
EQ to a matching IRQ.
Matching IRQ=IRQ with the same affinity and type (completion/control) of
the EQ created.
This mapping is being done in agressive dedicated IRQ allocation scheme,
which described bellow.
First, we check whether there is a matching IRQ that his min threshold
is not exhausted.
- min_eqs_threshold = 3 for control EQ.
- min_eqs_threshold = 1 for completion EQ.
In case no matching IRQ was found, try to request a new IRQ.
In case we can't request a new IRQ, reuse least-used matching IRQ.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The users of EQ are running their code on different CPUs and with
various affinity patterns. Move the cpumask setting close to their
actual usage.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The function init_cq_frag_buf() can be called to initialize the current CQ
fragments buffer cq->buf, or the temporary cq->resize_buf that is filled
during CQ resize operation.
However, the offending commit started to use function get_cqe() for
getting the CQEs, the issue with this change is that get_cqe() always
returns CQEs from cq->buf, which leads us to initialize the wrong buffer,
and in case of enlarging the CQ we try to access elements beyond the size
of the current cq->buf and eventually hit a kernel panic.
[exception RIP: init_cq_frag_buf+103]
[ffff9f799ddcbcd8] mlx5_ib_resize_cq at ffffffffc0835d60 [mlx5_ib]
[ffff9f799ddcbdb0] ib_resize_cq at ffffffffc05270df [ib_core]
[ffff9f799ddcbdc0] llt_rdma_setup_qp at ffffffffc0a6a712 [llt]
[ffff9f799ddcbe10] llt_rdma_cc_event_action at ffffffffc0a6b411 [llt]
[ffff9f799ddcbe98] llt_rdma_client_conn_thread at ffffffffc0a6bb75 [llt]
[ffff9f799ddcbec8] kthread at ffffffffa66c5da1
[ffff9f799ddcbf50] ret_from_fork_nospec_begin at ffffffffa6d95ddd
Fix it by getting the needed CQE by calling mlx5_frag_buf_get_wqe() that
takes the correct source buffer as a parameter.
Fixes: 388ca8be0037 ("IB/mlx5: Implement fragmented completion queue (CQ)")
Link: https://lore.kernel.org/r/90a0e8c924093cfa50a482880ad7e7edb73dc19a.1623309971.git.leonro@nvidia.com
Signed-off-by: Alaa Hleihel <alaa@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>