IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Add support for the HW feature of multi-packet WQE in XDP
xmit flow.
The conventional TX descriptor (WQE, Work Queue Element) serves
a single packet. Our HW has support for multi-packet WQE (MPWQE)
in which a single descriptor serves multiple TX packets.
This reduces both the PCI overhead and the CPU cycles wasted on
writing them.
In this patch we add support for the HW feature, which is supported
starting from ConnectX-5.
Performance:
Tested packet rate for UDP 64Byte multi-stream over ConnectX-5 NICs.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
XDP_TX:
We see a huge gain on single port ConnectX-5, and reach the 100 Mpps
milestone.
* Single-port HCA:
Before: 70 Mpps
After: 100 Mpps (+42.8%)
* Dual-port HCA:
Before: 51.7 Mpps
After: 57.3 Mpps (+10.8%)
* In both cases we tested traffic on one port and for now On Dual-port HCAs
we see only small gain, we are working to overcome this bottleneck, but
for the moment only with experimental firmware on dual port HCAs we can
reach the wanted numbers as seen on Single-port HCAs.
XDP_REDIRECT:
Redirect from (A) ConnectX-5 to (B) ConnectX-5.
Due to a setup limitation, (A) and (B) are on different NUMA nodes,
so absolute performance numbers are not optimal.
Note:
Below is the transmit rate of (B), not the redirect rate of (A)
which is in some cases higher.
* (B) is single-port:
Before: 77 Mpps
After: 90 Mpps (+16.8%)
* (B) is dual-port:
Before: 61 Mpps
After: 72 Mpps (+18%)
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Each xdp_wqe_info instance describes the number of data-segments
and WQEBBs of the WQE.
This is useful for a downstream patch that adds support for
Multi-Packet TX WQE feature.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This provides infrastructure to have multiple xdp_info instances
for the same consumer index.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Instead of calculating the control segment to be used upon an
XDP xmit doorbell, save it in SQ structure.
Nullify when no pending doorbell.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Do not ignore the CQE opcode.
This helps expose issues and debug them.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Do not maintain an SQ state bit to indicate whether an
XDP SQ serves redirect operations.
Instead, rely on the fact that such an XDP SQ doesn't reside
in an RQ instance, while the others do.
This info is not known to the XDP SQ functions themselves,
and they rely on their callers to distinguish between the cases.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
At the end of the RQ polling loop, some XDP-related operations
might be required. Before checking them one by one, check if
an XDP program is even loaded.
Combine all the checks and operations in a single function in xdp files.
This saves unnecessary checks for non-XDP flows.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The opcode indicates about the error reason.
Printing it helps in debug.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-12-21
The following pull-request contains BPF updates for your *net-next* tree.
There is a merge conflict in test_verifier.c. Result looks as follows:
[...]
},
{
"calls: cross frame pruning",
.insns = {
[...]
.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
.errstr_unpriv = "function calls to other bpf functions are allowed for root only",
.result_unpriv = REJECT,
.errstr = "!read_ok",
.result = REJECT,
},
{
"jset: functional",
.insns = {
[...]
{
"jset: unknown const compare not taken",
.insns = {
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
BPF_FUNC_get_prandom_u32),
BPF_JMP_IMM(BPF_JSET, BPF_REG_0, 1, 1),
BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0),
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
.errstr_unpriv = "!read_ok",
.result_unpriv = REJECT,
.errstr = "!read_ok",
.result = REJECT,
},
[...]
{
"jset: range",
.insns = {
[...]
},
.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
.result_unpriv = ACCEPT,
.result = ACCEPT,
},
The main changes are:
1) Various BTF related improvements in order to get line info
working. Meaning, verifier will now annotate the corresponding
BPF C code to the error log, from Martin and Yonghong.
2) Implement support for raw BPF tracepoints in modules, from Matt.
3) Add several improvements to verifier state logic, namely speeding
up stacksafe check, optimizations for stack state equivalence
test and safety checks for liveness analysis, from Alexei.
4) Teach verifier to make use of BPF_JSET instruction, add several
test cases to kselftests and remove nfp specific JSET optimization
now that verifier has awareness, from Jakub.
5) Improve BPF verifier's slot_type marking logic in order to
allow more stack slot sharing, from Jiong.
6) Add sk_msg->size member for context access and add set of fixes
and improvements to make sock_map with kTLS usable with openssl
based applications, from John.
7) Several cleanups and documentation updates in bpftool as well as
auto-mount of tracefs for "bpftool prog tracelog" command,
from Quentin.
8) Include sub-program tags from now on in bpf_prog_info in order to
have a reliable way for user space to get all tags of the program
e.g. needed for kallsyms correlation, from Song.
9) Add BTF annotations for cgroup_local_storage BPF maps and
implement bpf fs pretty print support, from Roman.
10) Fix bpftool in order to allow for cross-compilation, from Ivan.
11) Update of bpftool license to GPLv2-only + BSD-2-Clause in order
to be compatible with libbfd and allow for Debian packaging,
from Jakub.
12) Remove an obsolete prog->aux sanitation in dump and get rid of
version check for prog load, from Daniel.
13) Fix a memory leak in libbpf's line info handling, from Prashant.
14) Fix cpumap's frame alignment for build_skb() so that skb_shared_info
does not get unaligned, from Jesper.
15) Fix test_progs kselftest to work with older compilers which are less
smart in optimizing (and thus throwing build error), from Stanislav.
16) Cleanup and simplify AF_XDP socket teardown, from Björn.
17) Fix sk lookup in BPF kselftest's test_sock_addr with regards
to netns_id argument, from Andrey.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When doing indirect access in the Ocelot chip, a command is setup,
issued and then we need to poll until the result is ready. The polling
timeout is specified in milliseconds in the datasheet and not in
register access attempts.
It is not a bug on the currently supported platform, but we observed
that the code does not work properly on other platforms that we want to
support as the timing requirements there are different.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This series adds some misc updates and the support for tunnels over VLAN
tc offloads.
From Miroslav Lichvar, patches #1,2
1) Update timecounter at least twice per counter overflow
2) Extend PTP gettime function to read system clock
From Gavi Teitz, patch #3
3) Increase VF representors' SQ size to 128
From Eli Britstein and Or Gerlitz, patches #4-10
4) Adds the capability to support tunnels over VLAN device.
Patch 4 avoids crash for TC flow with egress upper devices
Patch 5 refactors tunnel routing devs into a helper function
Patch 6 avoids crash for TC encap flows with vlan on underlay
Patches 7-8 refactor encap tunnel header preparing code.
Patch 9 adds support for building VLAN tagged ETH header.
Patch 10 adds support for tunnel routing to VLAN device.
From Aviv, patches 11,12 to fix earlier VF lag series
5) Fix query_nic_sys_image_guid() error during init
6) Fix LAG requirement when CONFIG_MLX5_ESWITCH is off
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJcG5O8AAoJEEg/ir3gV/o+v0AH/ja1bJ6PkAlw2bCRMIuI6HK/
1vh9n8D74tIOAlvUi6QiLOgJ7CLdAgiAVFppDWzJSBUsY2XNAtRdYvLDDt9hGafO
QGysBWNVcX2aUVp+pLDCCVEYBWyyIzW416CWHx2IUgdAg9S6cvJK6P/81wd+l4Zp
8YlPstEUANP/JKZxHHjSelMcnY+Bj0JrDzuyyyaQdwmcHo5I7Ht0tNex14yFfFbl
gd7YvfPr1PtovPX2w9hMt3y3ml4mommB+jtxc0+59D+A7680hBOpAnH5pONms9rv
OKKKV8KqpxL1m32/TGbd7XSG+llLJwsSM6RtD4oUdiPA4iCiVDVdreP5vac52C4=
=0QQ5
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2018-12-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2018-12-19
This series adds some misc updates and the support for tunnels over VLAN
tc offloads.
From Miroslav Lichvar, patches #1,2
1) Update timecounter at least twice per counter overflow
2) Extend PTP gettime function to read system clock
From Gavi Teitz, patch #3
3) Increase VF representors' SQ size to 128
From Eli Britstein and Or Gerlitz, patches #4-10
4) Adds the capability to support tunnels over VLAN device.
Patch 4 avoids crash for TC flow with egress upper devices
Patch 5 refactors tunnel routing devs into a helper function
Patch 6 avoids crash for TC encap flows with vlan on underlay
Patches 7-8 refactor encap tunnel header preparing code.
Patch 9 adds support for building VLAN tagged ETH header.
Patch 10 adds support for tunnel routing to VLAN device.
From Aviv, patches 11,12 to fix earlier VF lag series
5) Fix query_nic_sys_image_guid() error during init
6) Fix LAG requirement when CONFIG_MLX5_ESWITCH is off
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
VID 1 is not reserved anymore, so remove the check that prevented the
creation of VLAN devices with this VID over mlxsw ports.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need to abuse VID 1 anymore and we can instead use VID 4095
as the default VLAN, which will be configured on the port throughout its
lifetime.
The OVS join / leave functions are changed to enable VIDs 1-4094
(inclusive) instead of 2-4095. This because VID 4095 is now the default
VLAN instead of 1.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VLAN entries on a port can be associated with either a bridge VLAN or a
router port. Before the VLAN entry is destroyed these associations need
to be cleaned up.
Currently, this is always invoked from the function which destroys the
VLAN entry, but next patch is going to skip the destruction of the
default entry when a port in unlinked from a LAG.
The above does not mean that the associations should not be cleaned up,
so add a helper that will be invoked from both call sites.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subsequent patches will need to access the default port VLAN. Since this
VLAN will exist throughout the lifetime of the port, simply store it in
the port's struct.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function allows flushing all the existing VLAN entries on a port. It
is invoked when a port is destroyed and when it is unlinked from a LAG.
In the latter case, when moving to the new default VLAN, there will not
be a need to destroy the default VLAN entry.
Therefore, add an argument that allows to control whether the default
port VLAN should be destroyed or not. Currently it is always set to
'true'.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the driver does not set the port's PVID when initializing a
new port. This is because the driver is using VID 1 as PVID which is the
firmware default.
Subsequent patches are going to change the PVID the driver is setting
when initializing a new port.
Prepare for that by explicitly setting the port's PVID.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subsequent patches are going to replace the current default VID (1) with
VLAN_N_VID - 1 (4095).
Prepare for this conversion by replacing the hard-coded '1' with a
define.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In symmetric routing, the only two members in the VLAN corresponding to
the L3 VNI are the router port and the VXLAN tunnel.
In case the VXLAN device is already enslaved to the bridge and only
later the VLAN interface is configured, the tunnel will not be
offloaded.
The reason for this is that when the router interface (RIF)
corresponding to the VLAN interface is configured, it calls the core
fid_get() API which does not check if NVE should be enabled on the FID.
Instead, call into the bridge code which will check if NVE should be
enabled on the FID.
This effectively means that the same code path is used to retrieve a FID
when either a local port or a router port joins the FID.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2018-12-20
This series contains updates to e100, igb, ixgbe, i40e and ice drivers.
I replaced spinlocks for mutex locks to reduce the latency on CPU0 for
igb when updating the statistics. This work was based off a patch
provided by Jan Jablonsky, which was against an older version of the igb
driver.
Jesus adjusts the receive packet buffer size from 32K to 30K when
running in QAV mode, to stay within 60K for total packet buffer size for
igb.
Vinicius adds igb kernel documentation regarding the CBS algorithm and
its implementation in the i210 family of NICs.
YueHaibing from Huawei fixed the e100 driver that was potentially
passing a NULL pointer, so use the kernel macro IS_ERR_OR_NULL()
instead.
Konstantin Khorenko fixes i40e where we were not setting up the
neigh_priv_len in our net_device, which caused the driver to read beyond
the neighbor entry allocated memory.
Miroslav Lichvar extends the PTP gettime() to read the system clock by
adding support for PTP_SYS_OFFSET_EXTENDED ioctl in i40e.
Young Xiao fixed the ice driver to only enable NAPI on q_vectors that
actually have transmit and receive rings.
Kai-Heng Feng fixes an igb issue that when placed in suspend mode, the
NIC does not wake up when a cable is plugged in. This was due to the
driver not setting PME during runtime suspend.
Stephen Douthit enables the ixgbe driver allow DSA devices to use the
MII interface to talk to switches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the mii_bus callbacks to address the entire clause 22/45 address
space. Enables userspace to poke switch registers instead of a single
PHY address.
The ixgbe firmware may be polling PHYs in a way that is not protected by
the mii_bus lock. This isn't new behavior, but as Andrew Lunn pointed
out there are more addresses available for conflicts.
Signed-off-by: Stephen Douthit <stephend@silicom-usa.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Most dsa devices expect a 'struct mii_bus' pointer to talk to switches
via the MII interface.
While this works for dsa devices, it will not work safely with Linux
PHYs in all configurations since the firmware of the ixgbe device may
be polling some PHY addresses in the background.
Signed-off-by: Stephen Douthit <stephend@silicom-usa.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
I210 ethernet card doesn't wakeup when a cable gets plugged. It's
because its PME is not set.
Since commit 42eca23021 ("PCI: Don't touch card regs after runtime
suspend D3"), if the PCI state is saved, pci_pm_runtime_suspend() stops
calling pci_finish_runtime_suspend(), which enables the PCI PME.
To fix the issue, let's not to save PCI states when it's runtime
suspend, to let the PCI subsystem enables PME.
Fixes: 42eca23021 ("PCI: Don't touch card regs after runtime suspend D3")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If ice driver has q_vectors w/ active NAPI that has no rings,
then this will result in a divide by zero error. To correct it
I am updating the driver code so that we only support NAPI on
q_vectors that have 1 or more rings allocated to them.
See commit 13a8cd191a ("i40e: Do not enable NAPI on q_vectors
that have no rings") for detail.
Signed-off-by: Young Xiao <YangX92@hotmail.com>
Acked-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl.
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Out of bound read reported by KASan.
i40iw_net_event() reads unconditionally 16 bytes from
neigh->primary_key while the memory allocated for
"neighbour" struct is evaluated in neigh_alloc() as
tbl->entry_size + dev->neigh_priv_len
where "dev" is a net_device.
But the driver does not setup dev->neigh_priv_len and
we read beyond the neigh entry allocated memory,
so the patch in the next mail fixes this.
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fix a static code checker warning:
drivers/net/ethernet/intel/e100.c:1349
e100_load_ucode_wait() warn: passing zero to 'PTR_ERR'
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Lots of conflicts, by happily all cases of overlapping
changes, parallel adds, things of that nature.
Thanks to Stephen Rothwell, Saeed Mahameed, and others
for their guidance in these resolutions.
Signed-off-by: David S. Miller <davem@davemloft.net>
Section 4.5.9 of the datasheet says that the total size of all packet
buffers combined (TxPB 0 + 1 + 2 + 3 + RxPB + BMC2OS + OS2BMC) must not
exceed 60KB. Today we are configuring a total of 62KB, so reduce the
RxPB from 32KB to 30KB in order to respect that.
The choice of changing RxPBSIZE here is mainly because it seems more
correct to give more priority to the transmit packet buffers over the
receiver ones when running in Qav mode. Also, the BMC2OS and OS2BMC
sizes are already too short.
Signed-off-by: Jesus Sanchez-Palencia <jesus.s.palencia@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change is based off of the work and suggestion of Jan Jablonsky
<jan.jablonsky@thalesgroup.com>.
The Watchdog workqueue in igb driver is scheduled every 2s for each
network interface. That includes updating a statistics protected by
spinlock. Function igb_update_stats in this case will be protected
against preemption. According to number of a statistics registers
(cca 60), processing this function might cause additional cpu load
on CPU0.
In case of statistics spinlock may be replaced with mutex, which
reduce latency on CPU0.
CC: Bernhard Kaindl <bernhard.kaindl@thalesgroup.com>
CC: Jan Jablonsky <jan.jablonsky@thalesgroup.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The top word of the constant can only have bits set if sign
extension set it to all-1, therefore we don't really have to
mask the top half of the register. We can just OR it into
the result as is.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The verifier will now understand the JSET instruction, so don't
mark the dead branch in the JIT as noop. We won't generate any
code, anyway.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
For a little better performance on faster machines and faster link
speeds.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Older firmware only supports 16-bit flow handle, because of which the
number of flows that can be offloaded can’t scale beyond a point.
Newer firmware supports 64-bit flow handle enabling the host to scale
upto millions of flows. With the new 64-bit flow handle support, driver
has to query flow stats in a different way compared to the older approach.
This patch adds support for 64-bit flow handle and new way to query
flow stats.
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If RDMA is supported on the 57500 chip, increase context memory
allocations for the resources used by RDMA.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the new functions bnxt_alloc_ctx_pg_tbls()/bnxt_free_ctx_pg_tbls()
to allocate and free pages for context memory. The new functions
will handle the different levels of paging support and allocate/free
the pages accordingly using the existing functions.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To support level 2 context page memory structures, enhance the
bnxt_ring_mem_info structure with a "depth" field to specify the page
level and add a flag to specify using full pages for L1 and L2 page
tables. This is needed to support RDMA functionality on 57500 chips
since RDMA requires more context memory.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Earlier, some of the firmware commands (ex: CFA_FLOW_*) which are processed
by KONG processor were sent to the CHIMP processor from the host. This
approach was taken as there was no direct message channel to KONG.
CHIMP in turn used to send them to KONG. Newer firmware supports a new
message channel which the host can send messages directly to the KONG
processor.
This patch adds support for required changes needed in the driver
to support direct KONG message channel. This speeds up flow related
messages sent to the firmware for CLS_FLOWER offload.
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These routines will be enhanced in the subsequent patch to
return the 2nd firmware comm. channel's hwrm response address &
sequence id respectively.
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Typecast hwrm_cmd_resp_addr to (u8 *) from (void *) before doing
arithmetic.
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for adding a 2nd communication channel to firmware.
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set hwrm_intr_seq_id value to its inverted value instead of
HWRM_SEQ_INVALID, when an hwrm completion of type
CMPL_BASE_TYPE_HWRM_DONE is received. This will enable us to use
the complete 16-bit sequence ID space.
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The major changes are in the flow offload firmware APIs.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If CONFIG_MLX5_ESWITCH is not defined, test for SR-IOV being disabled,
instead of calling e-switch LAG prereq routine.
Since LAG with SRIOV is allowed only when switchdev mode is on.
Fixes: eff849b2c6 ("net/mlx5: Allow/disallow LAG according to pre-req only")
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
vport system image guid should be queried using vport nic API for
Ethernet ports, and vport hca API for Infiniband ports.
Fixes: fadd59fc50 ("net/mlx5: Introduce inter-device communication mechanism")
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Generate encap header depending on the routed device to support
native/tagged Ethernet header.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Support generation of native or tagged Ethernet header for encap
header, depending on provided net device.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Change the order to first route IPv4/6 and return if error. Only after
successful route continue to allocate an encap header, with no
functional change.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In tunnel encap we prepare the encap header for IPv4/6 cases, in two
separate functions. For ETH header generation the code is almost
duplicated.
Move the ETH header generation code from IPv4/6 functions to a helper
function, with no functional change.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently we don't support nor fail attempts to offload encap flows routed
to vlan device on the underlay network. We wrongly consider a vlan underlay
device to be on the same e-switch b/c the switchdev ID is retrieved recursively.
Add explicit check for that and fail such attempts.
Also align to a more strict check for the ingress and the underlay devices
to practically be on the same eswitch.
Fixes: ce99f6b97f ('net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels')
Fixes: 3e621b19b0 ('net/mlx5e: Support TC encapsulation offloads with upper devices')
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>