1727 Commits

Author SHA1 Message Date
Jacob Keller
be664cbefc i40e/i40evf: spread CPU affinity hints across online CPUs only
Currently, when setting up the IRQ for a q_vector, we set an affinity
hint based on the v_idx of that q_vector. Meaning a loop iterates on
v_idx, which is an incremental value, and the cpumask is created based
on this value.

This is a problem in systems with multiple logical CPUs per core (like in
simultaneous multithreading (SMT) scenarios). If we disable some logical
CPUs, by turning SMT off for example, we will end up with a sparse
cpu_online_mask, i.e., only the first CPU in a core is online, and
incremental filling in q_vector cpumask might lead to multiple offline
CPUs being assigned to q_vectors.

Example: if we have a system with 8 cores each one containing 8 logical
CPUs (SMT == 8 in this case), we have 64 CPUs in total. But if SMT is
disabled, only the 1st CPU in each core remains online, so the
cpu_online_mask in this case would have only 8 bits set, in a sparse way.

In general case, when SMT is off the cpu_online_mask has only C bits set:
0, 1*N, 2*N, ..., C*(N-1)  where
C == # of cores;
N == # of logical CPUs per core.
In our example, only bits 0, 8, 16, 24, 32, 40, 48, 56 would be set.

Instead, we should only assign hints for CPUs which are online. Even
better, the kernel already provides a function, cpumask_local_spread()
which takes an index and returns a CPU, spreading the interrupts across
local NUMA nodes first, and then remote ones if necessary.

Since we generally have a 1:1 mapping between vectors and CPUs, there
is no real advantage to spreading vectors to local CPUs first. In order
to avoid mismatch of the default XPS hints, we'll pass -1 so that it
spreads across all CPUs without regard to the node locality.

Note that we don't need to change the q_vector->affinity_mask as this is
initialized to cpu_possible_mask, until an actual affinity is set and
then notified back to us.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-06 08:11:31 -07:00
Mitch Williams
64615b5418 i40e: add private flag to control source pruning
By default, our devices do source pruning, that is, they drop receive
packets that have the source MAC matching one of the receive filters.
Unfortunately, this breaks ARP monitoring in channel bonding, as the
bonding driver expects devices to receive ARPs containing their own
source address.

Add an ethtool private flag to control this feature.

Also, remove the netif_running() check when we process our private
flags. It's OK to reset when the device is closed and in most cases we
need the reset the apply these changes.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-06 08:11:31 -07:00
Rami Rosen
ec2f25d203 i40e: fix a typo in i40e_pf documentation
This patch fixes a typo in i40e_pf object documentation; num_req_vfs
refers to the number of VFs requested for the PF.

Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-06 08:11:31 -07:00
Scott Peterson
ab243ec940 i40e: Stop dropping 802.1ad tags - eth proto 0x88a8
Enable i40e to pass traffic with VLAN tags using the 802.1ad ethernet
protocol ID (0x88a8).

This requires NIC firmware providing version 1.7 of the API. With
older NIC firmware 802.1ad tagged packets will continue to be dropped.

No VLAN offloads nor RSS are supported for 802.1ad VLANs.

Signed-off-by: Scott Peterson <scott.d.peterson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:36 -07:00
Alan Brady
c53d11f669 i40e: fix client notify of VF reset
Currently there is a bug in which the PF driver fails to inform clients
of a VF reset which then causes clients to leak resources.  The bug
exists because we were incorrectly checking the I40E_VF_STATE_PRE_ENABLE
bit.

When a VF is first init we go through a reset to initialize variables
and allocate resources but we don't want to inform clients of this first
reset since the client isn't fully enabled yet so we set a state bit
signifying we're in a "pre-enabled" client state.  During the first
reset we should be clearing the bit, allowing all following resets to
notify the client of the reset when the bit is not set.  This patch
fixes the issue by negating the 'test_and_clear_bit' check to accurately
reflect the behavior we want.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:36 -07:00
Alan Brady
41d0a4d0c8 i40e: fix handling of vf_states variable
Currently we inappropriately clear the vf_states variable with a null
assignment.  This is problematic because we should be using atomic
bitops on this variable and we don't actually want to clear all the
flags.  We should just clear the ones we know we want to clear.
Additionally remove the I40E_VF_STATE_FCOEENA bit because it is no
longer being used.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Jacob Keller
d43d60e5eb i40e: ensure reset occurs when disabling VF
It is possible although rare that we may not reset when
i40e_vc_disable_vf() is called. This can lead to some weird
circumstances with some values not being properly set. Modify
i40e_reset_vf() to return a code indicating whether it reset or not.

Now, i40e_vc_disable_vf() can wait until a reset actually occurs. If it
fails to free up within a reasonable time frame we'll display a warning
message.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Jacob Keller
f18d20218a i40e: make use of i40e_vc_disable_vf
Replace i40e_vc_notify_vf_reset and i40e_reset_vf with a call to
i40e_vc_disable_vf which does this exact thing. This matches similar
code patterns throughout the driver.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Jacob Keller
eeeddbb806 i40e: drop i40e_pf *pf from i40e_vc_disable_vf()
It's never used, and the vf structure could get back to the PF if
necessary. Lets just drop the extra unneeded parameter.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Jacob Keller
ba4e003d29 i40e: don't hold spinlock while resetting VF
When we refactored handling of the PVID in commit 9af52f60b2d9
("i40e: use (add|rm)_vlan_all_mac helper functions when changing PVID")
we introduced a scheduling while atomic regression.

This occurred because we now held the spinlock across a call to
i40e_reset_vf(), which results in a usleep_range() call that triggers
a scheduling while atomic bug. This was rare as it only occurred if the
user configured a VLAN on a VF and also attempted to reconfigure the VF
from the host system with a port VLAN.

We do need to hold the lock while calling i40e_is_vsi_in_vlan(), but we
should not be holding it while we reset the VF.

We'll fix this by introducing a separate helper function
i40e_vsi_has_vlans which checks whether we have a PVID and whether the
VSI has configured VLANs. This helper function will manage its own need
for the mac_filter_hash_lock.

Then, we can move the acquiring of the spinlock until after we reset the
VF, which ensures that we do not sleep while holding the lock.

Using a separate function like this makes the code more clear and is
easier to read than attempting to release and re-acquire the spinlock
when we reset the VF.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Mariusz Stachura
00f6c2f5e2 i40e: use admin queue for setting LEDs behavior
Instead of accessing register directly, use newly added AQC in
order to blink LEDs. Introduce and utilize a new flag to prevent
excessive API version checking.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Filip Sadowski
9c0e5caf63 i40e: Add support for 'ethtool -m'
This patch adds support for 'ethtool -m' command which displays
information about (Q)SFP+ module plugged into NIC's cage.

Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Filip Sadowski
d60bcc7980 i40e: Fix reporting of supported link modes
This patch fixes incorrect reporting of supported link modes on some NICs.

Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Christophe JAILLET
54902349ee i40e: Fix a potential NULL pointer dereference
If 'kzalloc()' fails, a NULL pointer will be dereferenced.
Return an error code (-ENOMEM) instead.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Lihong Yang
5872866e16 i40e: remove logically dead code
This patch removes the !vf condition check that cannot be
true in i40e_ndo_set_vf_trust function

Detected by CoverityScan, CID 1397531 Logically dead code

Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Shannon Nelson
e50d5751c8 i40e: limit lan queue count in large CPU count machine
When a machine has more CPUs than queue pairs, e.g. 512 cores, the
counting gets a little funky and turns off Flow Director with the
message:
  not enough queues for Flow Director. Flow Director feature is disabled

This patch limits the number of lan queues initially allocated to
be sure we have some left for FD and other features.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:34 -07:00
Mitch Williams
22b96551f2 i40e: refactor FW version checking
The i40e driver now supports two different devices with two different
firmware versions. So be smart about how we handle these. Move the FW
version macros to the appropriate header file, and add a convenience
macro that checks the version based on the device. Then use this macro
to check whether or not the driver can use the new link info API.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:51:02 -07:00
Alan Brady
a3f5aa9073 i40e: Enable VF to negotiate number of allocated queues
Currently the PF allocates a default number of queues for each VF and
cannot be changed.  This patch enables the VF to request a different
number of queues allocated to it.  This patch also adds a new virtchnl
op and capability flag to facilitate this negotiation.

After the PF receives a request message, it will set a requested number
of queues for that VF.  Then when the VF resets, its VSI will get a new
number of queues allocated to it.

This is a best effort request and since we only allocate a guaranteed
default number, if the VF tries to ask for more than the guaranteed
number, there may not be enough in HW to accommodate it unless other
queues for other VFs are freed. It should also be noted decreasing the
number queues allocated to a VF to below the default will NOT enable the
allocation of more than 32 VFs per PF and will not free queues guaranteed
to each VF by default.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:51:01 -07:00
Jacob Keller
b980c0634f i40e: shutdown all IRQs and disable MSI-X when suspended
On some platforms with a large number of CPUs, we will allocate many IRQ
vectors. When hibernating, the system will attempt to migrate all of the
vectors back to CPU0 when shutting down all the other CPUs. It is
possible that we have so many vectors that it cannot re-assign them to
CPU0. This is even more likely if we have many devices installed in one
platform.

The end result is failure to hibernate, as it is not possible to
shutdown the CPUs. We can avoid this by disabling MSI-X and clearing our
interrupt scheme when the device is suspended. A more ideal solution
would be some method for the stack to properly handle this for all
drivers, rather than on a case-by-case basis for each driver to fix
itself.

However, until this more ideal solution exists, we can do our part and
shutdown our IRQs during suspend, which should allow systems with
a large number of CPUs to safely suspend or hibernate.

It may be worth investigating if we should shut down even further when
we suspend as it may make the path cleaner, but this was the minimum fix
for the hibernation issue mentioned here.

Testing-hints:
  This affects systems with a large number of CPUs, and with multiple
  devices enabled. Without this change, those platforms are unable to
  hibernate at all.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:51:01 -07:00
Jacob Keller
5c49922880 i40e: prevent service task from running while we're suspended
Although the service task does check the suspended status before
running, it might already be part way through running when we go to
suspend. Lets ensure that the service task is stopped and will not be
restarted again until we finish resuming. This ensures that service task
code does not cause strange interactions with the suspend/resume
handlers.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:51:01 -07:00
Jacob Keller
401586c2b9 i40e: don't clear suspended state until we finish resuming
When handling suspend and resume callbacks we want to make sure that (a)
we don't suspend again if we're already suspended and (b) we don't
resume again if we're already resuming. Lets make sure we test_and_set
the __I40E_SUSPENDED bit in i40e_suspend which ensures that a suspend
call when already suspended will exit early. Additionally, if
__I40E_SUSPENDED is not set when we begin resuming, exit early as well.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:51:00 -07:00
Jacob Keller
0e5d3da400 i40e: use newer generic PM support instead of legacy PM callbacks
Stop using the old legacy PM support, since we now have stable support
for the newer generic PM callbacks.

This has several advantages. First, we no longer have to manage our
own pci_save_state() and power changes, as it's preferred to have the
PCI stack do this. Second, these routines get called for both hibernate
and suspend to ram, so we can have the driver properly handle all the
suspend/resume flows that it needs to.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:51:00 -07:00
Jacob Keller
c17401a1dd i40e: use separate state bit for miscellaneous IRQ setup
We currently (mis)use the __I40E_RECOVERY_PENDING bit to determine when
we should actually request a new IRQ in i40e_setup_misc_vector().

This led to a design mistake where we open-coded the re-setup of the
miscellaneous vector in i40e_resume() instead of using the function
provided. If we did not open-code this and instead tried to use the
i40e_setup_misc_vector() function, it would lead to never reallocating
the IRQ.

This would lead to a second i40e_suspend() call failing to free the
vector due to a NULL pointer dereference.

A future patch is going to re-work how the i40e_suspend() and
i40e_resume() flows work to clear all IRQ vectors, which would require
us to use i40e_setup_misc_vector() directly. Since during this time the
__I40E_RECOVERY_PENDING bit is set, we'll never re-allocate the vector.

Rather than leaving the open-coded setup in i40e_resume() lets just fix
the problem properly in i40e_setup_misc_vector().

Introduce a new state bit which indicates when the IRQ has been
assigned, which will be set when i40e_setup_misc_vector is first called.
This ultimately resolves the issue of re-requesting the vector, without
overloading the __I40E_RECOVERY_PENDING state. This ensures that the
suspend/resume cycle can use the setup function instead of open-coding
the re-request during resume.

Additionally, since the only callers of i40e_stop_misc_vector also want
to free it, move this code directly into the function to avoid
duplication. Due to the new functionality, rename it to
i40e_free_misc_vector().

This lets us drop the extra calls to free and re-enable the vector
during i40e_suspend() and i40e_resume(). We don't need to call
i40e_setup_misc_Vector() in i40e_resume() because it gets called by the
i40e_rebuild() call.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:51:00 -07:00
Mariusz Stachura
0dc8692e91 i40e: fix for flow director counters not wrapping as expected
An errata with GLQF_PCNT causes it to not wrap as expected. This
can cause an error in flow director statistics. This patch resets
affected counters just after reading.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:50:59 -07:00
Mariusz Stachura
e04ea00217 i40e: relax warning message in case of version mismatch
Fortville and Fort Park devices are often on different firmware release
schedules. This change relaxes the minor version warning message,
so it is only displayed for older FW warning version for old
firmware Fortville 3 or earlier.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:50:59 -07:00
Sudheer Mogilappagari
3fded4663b i40e: simplify member variable accesses
This commit replaces usage of vsi->back in i40e_print_link_message()
(which is actually a PF pointer) with temp variable.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:50:59 -07:00
Sudheer Mogilappagari
9a03449d3e i40e: Fix link down message when interface is brought up
i40e_print_link_message() is intended to compare new
link state with current link state and print log message
only if the new state is different from current state.

However in current driver the new state does not get updated
when link is going down because of the if condition. When an
interface is brought down, vsi->state is set to I40E_VSI_DOWN
in i40e_vsi_close() and later i40e_print_link_message() does
not get invoked in i40e_link_event due to if condition. Hence
link down message doesn't appear when link is going down. The
down state is seen  later during i40e_open() and old state
gets printed. The actual link state doesn't get updated in
i40e_close() or i40e_open() but when i40e_handle_link_event is
called inside i40e_clean_adminq_subtask.

This change allows i40e_print_link_message() to be called when
interface is going down and keeps the state information updated.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:50:59 -07:00
Sudheer Mogilappagari
16badf758b i40e: Fix unqualified module message while bringing link up
In current driver, when ifconfig ethx up is done, the link state
doesn't transition to UP inside i40e_open(). It changes after AQ
command response is handled in i40e_handle_link_event().

When pf->hw.phy.link_info.link_info is DOWN inside i40e_open(),
The state is transient and invalid. So log message gets printed
based on incorrect info (i.e link_info and an_info).

This commit removes check for unqualified module inside
i40e_up_complete(). The existing check in i40e_handle_link_event()
logs the error message based on correct link state information.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:50:59 -07:00
Jacob Keller
2b634bb068 i40e/i40evf: rename bytes_per_int to bytes_per_usec
This value is not calculating bytes_per_int, which would actually just
be bytes/ITR_COUNTDOWN_START, but rather it's calculating bytes/usecs.

Rename the variable for clarity so that future developers understand
what the value is actually calculating.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:50:58 -07:00
Daniel Borkmann
de8f3a83b0 bpf: add meta pointer for direct access
This work enables generic transfer of metadata from XDP into skb. The
basic idea is that we can make use of the fact that the resulting skb
must be linear and already comes with a larger headroom for supporting
bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work
on a similar principle and introduce a small helper bpf_xdp_adjust_meta()
for adjusting a new pointer called xdp->data_meta. Thus, the packet has
a flexible and programmable room for meta data, followed by the actual
packet data. struct xdp_buff is therefore laid out that we first point
to data_hard_start, then data_meta directly prepended to data followed
by data_end marking the end of packet. bpf_xdp_adjust_head() takes into
account whether we have meta data already prepended and if so, memmove()s
this along with the given offset provided there's enough room.

xdp->data_meta is optional and programs are not required to use it. The
rationale is that when we process the packet in XDP (e.g. as DoS filter),
we can push further meta data along with it for the XDP_PASS case, and
give the guarantee that a clsact ingress BPF program on the same device
can pick this up for further post-processing. Since we work with skb
there, we can also set skb->mark, skb->priority or other skb meta data
out of BPF, thus having this scratch space generic and programmable
allows for more flexibility than defining a direct 1:1 transfer of
potentially new XDP members into skb (it's also more efficient as we
don't need to initialize/handle each of such new members). The facility
also works together with GRO aggregation. The scratch space at the head
of the packet can be multiple of 4 byte up to 32 byte large. Drivers not
yet supporting xdp->data_meta can simply be set up with xdp->data_meta
as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out,
such that the subsequent match against xdp->data for later access is
guaranteed to fail.

The verifier treats xdp->data_meta/xdp->data the same way as we treat
xdp->data/xdp->data_end pointer comparisons. The requirement for doing
the compare against xdp->data is that it hasn't been modified from it's
original address we got from ctx access. It may have a range marking
already from prior successful xdp->data/xdp->data_end pointer comparisons
though.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26 13:36:44 -07:00
David S. Miller
66bed8465a Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2017-09-05

This series contains fixes for i40e only.

These two patches fix an issue where our nvmupdate tool does not work on RHEL 7.4
and newer kernels, in fact, the use of the nvmupdate tool on newer kernels can
cause the cards to be non-functional unless these patches are applied.

Anjali reworks the locking around accessing the NVM so that NVM acquire timeouts
do not occur which was causing the failed firmware updates.

Jake correctly updates the wb_desc when reading the NVM through the AdminQ.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05 20:03:40 -07:00
Jacob Keller
3c8f3e96af i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq
When introducing the functions to read the NVM through the AdminQ, we
did not correctly mark the wb_desc.

Fixes: 7073f46e443e ("i40e: Add AQ commands for NVM Update for X722", 2015-06-05)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-05 17:52:46 -07:00
Anjali Singhai Jain
09f79fd49d i40e: avoid NVM acquire deadlock during NVM update
X722 devices use the AdminQ to access the NVM, and this requires taking
the AdminQ lock. Because of this, we lock the AdminQ during
i40e_read_nvm(), which is also called in places where the lock is
already held, such as the firmware update path which wants to lock once
and then unlock when finished after performing several tasks.

Although this should have only affected X722 devices, commit
96a39aed25e6 ("i40e: Acquire NVM lock before reads on all devices",
2016-12-02) added locking for all NVM reads, regardless of device
family.

This resulted in us accidentally causing NVM acquire timeouts on all
devices, causing failed firmware updates which left the eeprom in
a corrupt state.

Create unsafe non-locked variants of i40e_read_nvm_word and
i40e_read_nvm_buffer, __i40e_read_nvm_word and __i40e_read_nvm_buffer
respectively. These variants will not take the NVM lock and are expected
to only be called in places where the NVM lock is already held if
needed.

Since the only caller of i40e_read_nvm_buffer() was in such a path,
remove it entirely in favor of the unsafe version. If necessary we can
always add it back in the future.

Additionally, we now need to hold the NVM lock in i40e_validate_checksum
because the call to i40e_calc_nvm_checksum now assumes that the NVM lock
is held. We can further move the call to read I40E_SR_SW_CHECKSUM_WORD
up a bit so that we do not need to acquire the NVM lock twice.

This should resolve firmware updates and also fix potential raise that
could have caused the driver to report an invalid NVM checksum upon
driver load.

Reported-by: Stefan Assmann <sassmann@kpanic.de>
Fixes: 96a39aed25e6 ("i40e: Acquire NVM lock before reads on all devices", 2016-12-02)
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-05 17:48:22 -07:00
Jacob Keller
742c987575 i40e/i40evf: avoid dynamic ITR updates when polling or low packet rate
The dynamic ITR algorithm depends on a calculation of usecs which
assumes that the interrupts have been firing constantly at the interrupt
throttle rate. This is not guaranteed because we could have a low packet
rate, or have been polling in software.

We'll estimate whether this is the case by using jiffies to determine if
we've been too long. If the time difference of jiffies is larger we are
guaranteed to have an incorrect calculation. If the time difference of
jiffies is smaller we might have been polling some but the difference
shouldn't affect the calculation too much.

This ensures that we don't get stuck in BULK latency during certain rare
situations where we receive bursts of packets that force us into NAPI
polling.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 16:15:24 -07:00
Jacob Keller
0a2c7722be i40e/i40evf: remove ULTRA latency mode
Since commit c56625d59726 ("i40e/i40evf: change dynamic interrupt
thresholds") a new higher latency ITR setting called I40E_ULTRA_LATENCY
was added with a cryptic comment about how it was meant for adjusting Rx
more aggressively when streaming small packets.

This mode was attempting to calculate packets per second and then kick
in when we have a huge number of small packets.

Unfortunately, the ULTRA setting was kicking in for workloads it wasn't
intended for including single-thread UDP_STREAM workloads.

This wasn't caught for a variety of reasons. First, the ip_defrag
routines were improved somewhat which makes the UDP_STREAM test still
reasonable at 10GbE, even when dropped down to 8k interrupts a second.
Additionally, some other obvious workloads appear to work fine, such
as TCP_STREAM.

The number 40k doesn't make sense for a number of reasons. First, we
absolutely can do more than 40k packets per second. Second, we calculate
the value inline in an integer, which sometimes can overflow resulting
in using incorrect values.

If we fix this overflow it makes it even more likely that we'll enter
ULTRA mode which is the opposite of what we want.

The ULTRA mode was added originally as a way to reduce CPU utilization
during a small packet workload where we weren't keeping up anyways. It
should never have been kicking in during these other workloads.

Given the issues outlined above, let's remove the ULTRA latency mode. If
necessary, a better solution to the CPU utilization issue for small
packet workloads will be added in a future patch.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 16:12:15 -07:00
Jacob Keller
6d9777298b i40e: invert logic for checking incorrect cpu vs irq affinity
In commit 96db776a3682 ("i40e/vf: fix interrupt affinity bug")
we added some code to force exit of polling in case we did
not have the correct CPU. This is important since it was possible for
the IRQ affinity to be changed while the CPU is pegged at 100%. This can
result in the polling routine being stuck on the wrong CPU until
traffic finally stops.

Unfortunately, the implementation, "if the CPU is correct, exit as
normal, otherwise, fall-through to the end-polling exit" is incredibly
confusing to reason about. In this case, the normal flow looks like the
exception, while the exception actually occurs far away from the if
statement and comment.

We recently discovered and fixed a bug in this code because we were
incorrectly initializing the affinity mask.

Re-write the code so that the exceptional case is handled at the check,
rather than having the logic be spread through the regular exit flow.
This does end up with minor code duplication, but the resulting code is
much easier to reason about.

The new logic is identical, but inverted. If we are running on a CPU not
in our affinity mask, we'll exit polling. However, the code flow is much
easier to understand.

Note that we don't actually have to check for MSI-X, because in the MSI
case we'll only have one q_vector, but its default affinity mask should
be correct as it includes all CPUs when it's initialized. Further, we
could at some point add code to setup the notifier for the non-MSI-X
case and enable this workaround for that case too, if desired, though
there isn't much gain since its unlikely to be the common case.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 16:10:48 -07:00
Jacob Keller
759dc4a7e6 i40e: initialize our affinity_mask based on cpu_possible_mask
On older kernels a call to irq_set_affinity_hint does not guarantee that
the IRQ affinity will be set. If nothing else on the system sets the IRQ
affinity this can result in a bug in the i40e_napi_poll() routine where
we notice that our interrupt fired on the "wrong" CPU according to our
internal affinity_mask variable.

This results in a bug where we continuously tell NAPI to stop polling to
move the interrupt to a new CPU, but the CPU never changes because our
affinity mask does not match the actual mask setup for the IRQ.

The root problem is a mismatched affinity mask value. So lets initialize
the value to cpu_possible_mask instead. This ensures that prior to the
first time we get an IRQ affinity notification we'll have the mask set
to include every possible CPU.

We use cpu_possible_mask instead of cpu_online_mask since the former is
almost certainly never going to change, while the later might change
after we've made a copy.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 16:09:03 -07:00
Jacob Keller
9254c0e34e i40e: move enabling icr0 into i40e_update_enable_itr
If we don't have MSI-X enabled, we handle interrupts on all icr0. This
is a special case, so let's move the conditional into
i40e_update_enable_itr() in order to make i40e_napi_poll easier to
read about.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 16:07:13 -07:00
Jacob Keller
ba4460d45a i40e: remove workaround for resetting XPS
Since commit 3ffa037d7f78 ("i40e: Set XPS bit mask to zero in DCB mode")
we've tried to reset the XPS settings by building a custom
empty CPU mask.

This workaround is not necessary because we're not really removing the
XPS setting, but simply setting it so that no CPU is valid.

Second, we shorten the code further by using zalloc_cpumask_var instead
of a separate call to bitmap_zero().

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 16:06:02 -07:00
Carolyn Wyborny
19279235be i40e: Fix for unused value issue found by static analysis
This patch fixes an issue where an error return value is
set, but without an immediate exit, the value can be overwritten
by the following code execution.  The condition  at this point
is not fatal, so remove the error assignment and comment the
intent for future code maintainers

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 16:02:16 -07:00
Mariusz Stachura
68e49702a1 i40e: 25G FEC status improvements
This patch improves the system log message. The log message will
be expanded to include the FEC mode the FW requested before link
was established.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 16:01:03 -07:00
Mariusz Stachura
8774370d26 i40e/i40evf: support for VF VLAN tag stripping control
This patch gives VF capability to control VLAN tag stripping via
ethtool. As rx-vlan-offload was fixed before, now the VF is able to
change it using "ethtool --offload <IF> rxvlan on/off" settings.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 15:47:43 -07:00
Jacob Keller
8c9eb350aa i40e: force VMDQ device name truncation
In new versions of GCC since 7.x a new warning exists which warns when
a string is truncated before all of the format can be completed.

When we setup VMDQ netdev names we are copying a pre-existing interface
name which could be up to 15 characters in length. Since we also add
4 bytes, v, the literal %, the d and a \0 null, we would overrun the
available size unless snprintf truncated for us.

The snprintf call will of course truncate on the end, so lets instead
modify the code to force truncation of the copied netdev name by
4 characters, to create enough space for the 4 bytes we're adding.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 15:44:04 -07:00
Akeem G Abodunrin
e53b382f3a i40e: Use correct flag to enable egress traffic for unicast promisc
Albeit, we usually set true promiscuous mode for both multicast and
unicast at the same time - however, it is possible to set it
individually, so using allmulti flag which is only for allmulticast might
caused unwanted behavior in mirroring egress traffic promiscuous for
unicast in VF.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 15:43:53 -07:00
Jacob Keller
b5d5504aa1 i40e: prevent snprintf format specifier truncation
Increase the size of the prefix buffer so that it can hold enough
characters for every possible input. Although 20 is enough for all
expected inputs, it is possible for the values to be larger than
expected, resulting in a possibly truncated string. Additionally, lets
use sizeof(prefix) in order to ensure we use the correct size if we need
to change the array length in the future.

New versions of GCC starting at 7 now include warnings to prevent
truncation unless you handle the return code. At most 27 bytes can be
written here, so lets just increase the buffer size even if for all
expected hw->bus.* values we only needed 20.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 15:43:41 -07:00
Mariusz Stachura
ed601f6601 i40e: Store the requested FEC information
Store information about FEC modes, that were requested. It will be used
in printing link status information function and this way there is no
need to call admin queue there.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 15:43:34 -07:00
Sudheer Mogilappagari
167d52edc4 i40e: Update state variable for adminq subtask
During NVM update, state machine gets into unrecoverable state because
i40e_clean_adminq_subtask can get scheduled after the admin queue
command but before other state variables are updated. This causes
incorrect input to i40e_nvmupd_check_wait_event and state transitions
don't happen.

This fix updates the state variables so that adminq_subtask will have
accurate information whenever it gets scheduled.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 15:42:53 -07:00
Sudheer Mogilappagari
2bf01935ec i40e: synchronize nvmupdate command and adminq subtask
During NVM update, state machine gets into unrecoverable state because
i40e_clean_adminq_subtask can get scheduled after the admin queue
command but before other state variables are updated. This causes
incorrect input to i40e_nvmupd_check_wait_event and state transitions
don't happen.

This issue existed before but surfaced after commit 373149fc99a0
("i40e: Decrease the scope of rtnl lock")

This fix adds locking around admin queue command and update of
state variables so that adminq_subtask will have accurate information
whenever it gets scheduled.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25 14:52:52 -07:00
Alan Brady
06b2decd92 i40e: prevent changing ITR if adaptive-rx/tx enabled
Currently the driver allows the user to change (or even disable)
interrupt moderation if adaptive-rx/tx is enabled when this should
not be the case.

Adaptive RX/TX will not respect the user's ITR settings so
allowing the user to change it is weird.  This bug would also
allow the user to disable interrupt moderation with adaptive-rx/tx
enabled which doesn't make much sense either.

This patch makes it such that if adaptive-rx/tx is enabled, the user
cannot make any manual adjustments to interrupt moderation.  It also
makes it so that if ITR is disabled but adaptive-rx/tx is then
enabled, ITR will be re-enabled.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25 14:52:46 -07:00
Jacob Keller
7e4d01e7d3 i40e: use cpumask_copy instead of direct assignment
According to the header file cpumask.h, we shouldn't be directly copying
a cpumask_t, since its a bitmap and might not be copied correctly. Lets
use the provided cpumask_copy() function instead.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25 14:52:42 -07:00