Commit Graph

706724 Commits

Author SHA1 Message Date
Marcelo Ricardo Leitner
2fc019f790 sctp: introduce sctp_chunk_stream_no
Add a helper to fetch the stream number from a given chunk.

Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 16:27:28 -07:00
Marcelo Ricardo Leitner
f952be79ce sctp: introduce struct sctp_stream_out_ext
With the stream schedulers, sctp_stream_out will become too big to be
allocated by kmalloc and as we need to allocate with BH disabled, we
cannot use __vmalloc in sctp_stream_init().

This patch moves out the stats from sctp_stream_out to
sctp_stream_out_ext, which will be allocated only when the application
tries to sendmsg something on it.

Just the introduction of sctp_stream_out_ext would already fix the issue
described above by splitting the allocation in two. Moving the stats
to it also reduces the pressure on the allocator as we will ask for less
memory atomically when creating the socket and we will use GFP_KERNEL
later.

Then, for stream schedulers, we will just use sctp_stream_out_ext.

Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 16:27:28 -07:00
Marcelo Ricardo Leitner
1fdb8d8fef sctp: factor out stream->in allocation
There is 1 place allocating it and another reallocating. Move such
procedures to a common function.

v2: updated changelog

Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 16:27:28 -07:00
Marcelo Ricardo Leitner
e090abd0d8 sctp: factor out stream->out allocation
There is 1 place allocating it and 2 other reallocating. Move such
procedures to a common function.

Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 16:27:28 -07:00
Marcelo Ricardo Leitner
1ae2eaaa22 sctp: silence warns on sctp_stream_init allocations
As SCTP supports up to 65535 streams, that can lead to very large
allocations in sctp_stream_init(). As Xin Long noticed, systems with
small amounts of memory are more prone to not have enough memory and
dump warnings on dmesg initiated by user actions. Thus, silence them.

Also, if the reallocation of stream->out is not necessary, skip it and
keep the memory we already have.

Reported-by: Xin Long <lucien.xin@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 16:27:28 -07:00
David S. Miller
af14827fa3 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2017-10-03

This series contains updates to fm10k only.

Jake provides majority of the changes in this series, starting with using
fm10k_prepare_for_reset() if we lose PCIe link.  Before we would detach
the device and close the netdev, which left a lot of items still active,
such as the Tx/Rx resources.  This could cause problems where register
reads would return potentially invalid values and would result in unknown
driver behavior, so call fm10k_prepare_for_reset() much like we do for
suspend/resume cycles.  This will attempt to shutdown as much as possible
to prevent possible issues.  Then replaced the PCI specific legacy power
management hooks with the new generic power management hooks for both
suspend and hibernate.  Introduced a workqueue item which monitors a
queue of MAC and VLAN requests since a large number of MAC address or
VLAN updates at once can overload the mailbox with too many messages at
once.  Fixed a cppcheck warning by properly declaring the min_rate and
max_rate variables in the declaration and definition for .ndo_set_vf_bw,
rather than using "unused" for the minimum rates.

Joe Perches fixes the backward logic when using net_ratelimit().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 16:25:05 -07:00
Florian Westphal
6c5570016b net: core: decouple ifalias get/set from rtnl lock
Device alias can be set by either rtnetlink (rtnl is held) or sysfs.

rtnetlink hold the rtnl mutex, sysfs acquires it for this purpose.
Add an extra mutex for it and use rcu to protect concurrent accesses.

This allows the sysfs path to not take rtnl and would later allow
to not hold it when dumping ifalias.

Based on suggestion from Eric Dumazet.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 15:56:01 -07:00
Mahesh Bandewar
4d2c0cda07 bonding: speed/duplex update at NETDEV_UP event
Some NIC drivers don't have correct speed/duplex settings at the
time they send NETDEV_UP notification and that messes up the
bonding state. Especially 802.3ad mode which is very sensitive
to these settings. In the current implementation we invoke
bond_update_speed_duplex() when we receive NETDEV_UP, however,
ignore the return value. If the values we get are invalid
(UNKNOWN), then slave gets removed from the aggregator with
speed and duplex set to UNKNOWN while link is still marked as UP.

This patch fixes this scenario. Also 802.3ad mode is sensitive to
these conditions while other modes are not, so making sure that it
doesn't change the behavior for other modes.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 14:32:25 -07:00
Dan Carpenter
b5c7d4e54c mlxsw: spectrum: Add missing error code on allocation failure
We accidentally return success if the kmalloc_array() call fails.

Fixes: 0e14c7777a ("mlxsw: spectrum: Add the multicast routing hardware logic")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 10:26:58 -07:00
Dan Carpenter
b508e0b6e4 mlxsw: spectrum: Fix check for IS_ERR() instead of NULL
mlxsw_afa_block_create() doesn't return error pointers, it returns NULL
on error.

Fixes: 0e14c7777a ("mlxsw: spectrum: Add the multicast routing hardware logic")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 10:26:58 -07:00
Colin Ian King
360cc342c9 net: dsa: mt7530: make functions mt7530_phy_write static
The function mt7530_phy_write is local to the source and does not need to
be in global scope, so make it static.

Cleans up sparse warnings:
symbol 'mt7530_phy_write' was not declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 10:20:12 -07:00
Colin Ian King
161ae6b04d net: dsa: lan9303: make functions lan9303_mdio_phy_{read|write} static
The functions lan9303_mdio_phy_write and lan9303_mdio_phy_read are local
to the source and do not need to be in global scope, so make them static.

Cleans up sparse warnings:
symbol 'lan9303_mdio_phy_write' was not declared. Should it be static?
symbol 'lan9303_mdio_phy_read' was not declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 10:20:12 -07:00
David S. Miller
da885b614a Merge branch 'mlxsw-mc-route-offload'
Jiri Pirko says:

====================
mlxsw: Add support for partial multicast route offload

Yotam says:

Previous patchset introduced support for offloading multicast MFC routes to
the Spectrum hardware. As described in that patchset, no partial offloading
is supported, i.e if a route has one output interface which is not a valid
offloadable device (e.g. pimreg device, dummy device, management NIC), the
route is trapped to the CPU and the forwarding is done in slow-path.

Add support for partial offloading of multicast routes, by letting the
hardware to forward the packet to all the in-hardware devices, while the
kernel ipmr module will continue forwarding to all other interfaces.

Similarly to the bridge, the kernel ipmr module will forward a marked
packet to an interface only if the interface has a different parent ID than
the packet's ingress interfaces.

The first patch introduces the offload_mr_fwd_mark skb field, which can be
used by offloading drivers to indicate that a packet had already gone
through multicast forwarding in hardware, similarly to the offload_fwd_mark
field that indicates that a packet had already gone through L2 forwarding
in hardware.

Patches 2 and 3 change the ipmr module to not forward packets that had
already been forwarded by the hardware, i.e. packets that are marked with
offload_mr_fwd_mark and the ingress VIF shares the same parent ID with the
egress VIF.

Patches 4, 5, 6 and 7 add the support in the mlxsw Spectrum driver for trap
and forward routes, while marking the trapped packets with the
offload_mr_fwd_mark.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 10:06:31 -07:00
Yotam Gigi
f60c254998 mlxsw: spectrum: mr: Support trap-and-forward routes
Add the support of trap-and-forward route action in the multicast routing
offloading logic. A route will be set to trap-and-forward action if one (or
more) of its output interfaces is not offload-able, i.e. does not have a
valid Spectrum RIF.

This way, a route with mixed output VIFs list, which contains both
offload-able and un-offload-able devices can go through partial offloading
in hardware, and the rest will be done in the kernel ipmr module.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 10:06:30 -07:00
Yotam Gigi
607feadef8 mlxsw: spectrum: mr_tcam: Add trap-and-forward multicast route
In addition to the current multicast route actions, which include trap
route action and a forward route action, add the trap-and-forward multicast
route action, and implement it in the multicast routing hardware logic.

To implement that, add a trap-and-forward ACL action as the last action in
the route flexible action set. The used trap is the ACL2 trap, which marks
the packets with offload_mr_forward_mark, to prevent the packet from being
forwarded again by the kernel.

Note: At that stage the offloading logic does not support trap-and-forward
multicast routes. This patch adds the support only in the hardware logic.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 10:06:30 -07:00
Yotam Gigi
a0040c8c93 mlxsw: spectrum: Add trap for multicast trap-and-forward routes
When a multicast route is configured with trap-and-forward action, the
packets should be marked with skb->offload_mr_fwd_mark, in order to prevent
the packets from being forwarded again by the kernel ipmr module.

Due to this, it is not possible to use the already existing multicast trap
(MLXSW_TRAP_ID_ACL1) as the packet should be marked differently. Add the
MLXSW_TRAP_ID_ACL2 which is for trap-and-forward multicast routes, and set
the offload_mr_fwd_mark skb field in its handler.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 10:06:30 -07:00
Yotam Gigi
2678724355 mlxsw: acl: Introduce ACL trap and forward action
Use trap/discard flex action to implement trap and forward. The action will
later be used for multicast routing, as the multicast routing mechanism is
done using ACL flexible actions in Spectrum hardware. Using that action, it
will be possible to implement a trap-and-forward route.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 10:06:30 -07:00
Yotam Gigi
a5bc9294d7 ipv4: ipmr: Don't forward packets already forwarded by hardware
Change the ipmr module to not forward packets if:
 - The packet is marked with the offload_mr_fwd_mark, and
 - Both input interface and output interface share the same parent ID.

This way, a packet can go through partial multicast forwarding in the
hardware, where it will be forwarded only to the devices that share the
same parent ID (AKA, reside inside the same hardware). The kernel will
forward the packet to all other interfaces.

To do this, add the ipmr_offload_forward helper, which per skb, ingress VIF
and egress VIF, returns whether the forwarding was offloaded to hardware.
The ipmr_queue_xmit frees the skb and does not forward it if the result is
a true value.

All the forwarding path code compiles out when the CONFIG_NET_SWITCHDEV is
not set.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 10:06:30 -07:00
Yotam Gigi
5d8b3e69fc ipv4: ipmr: Add the parent ID field to VIF struct
In order to allow the ipmr module to do partial multicast forwarding
according to the device parent ID, add the device parent ID field to the
VIF struct. This way, the forwarding path can use the parent ID field
without invoking switchdev calls, which requires the RTNL lock.

When a new VIF is added, set the device parent ID field in it by invoking
the switchdev_port_attr_get call.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 10:06:30 -07:00
Yotam Gigi
abf4bb6b63 skbuff: Add the offload_mr_fwd_mark field
Similarly to the offload_fwd_mark field, the offload_mr_fwd_mark field is
used to allow partial offloading of MFC multicast routes.

Switchdev drivers can offload MFC multicast routes to the hardware by
registering to the FIB notification chain. When one of the route output
interfaces is not offload-able, i.e. has different parent ID, the route
cannot be fully offloaded by the hardware. Examples to non-offload-able
devices are a management NIC, dummy device, pimreg device, etc.

Similar problem exists in the bridge module, as one bridge can hold
interfaces with different parent IDs. At the bridge, the problem is solved
by the offload_fwd_mark skb field.

Currently, when a route cannot go through full offload, the only solution
for a switchdev driver is not to offload it at all and let the packet go
through slow path.

Using the offload_mr_fwd_mark field, a driver can indicate that a packet
was already forwarded by hardware to all the devices with the same parent
ID as the input device. Further patches in this patch-set are going to
enhance ipmr to skip multicast forwarding to devices with the same parent
ID if a packets is marked with that field.

The reason why the already existing "offload_fwd_mark" bit cannot be used
is that a switchdev driver would want to make the distinction between a
packet that has already gone through L2 forwarding but did not go through
multicast forwarding, and a packet that has already gone through both L2
and multicast forwarding.

For example: when a packet is ingressing from a switchport enslaved to a
bridge, which is configured with multicast forwarding, the following
scenarios are possible:
 - The packet can be trapped to the CPU due to exception while multicast
   forwarding (for example, MTU error). In that case, it had already gone
   through L2 forwarding in the hardware, thus A switchdev driver would
   want to set the skb->offload_fwd_mark and not the
   skb->offload_mr_fwd_mark.
 - The packet can also be trapped due to a pimreg/dummy device used as one
   of the output interfaces. In that case, it can go through both L2 and
   (partial) multicast forwarding inside the hardware, thus a switchdev
   driver would want to set both the skb->offload_fwd_mark and
   skb->offload_mr_fwd_mark.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellaox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 10:06:30 -07:00
Arjun Vynipadath
a047fbae23 cxgb4: Update comment for min_mtu
We have lost a comment for minimum mtu value set for netdevice with
'commit d894be57ca ("ethernet: use net core MTU range checking in
more drivers"). Updating it accordingly.

Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 09:55:41 -07:00
Jacob Keller
3e256ac5b1 fm10k: fix mis-ordered parameters in declaration for .ndo_set_vf_bw
We've had support for setting both a minimum and maximum bandwidth via
.ndo_set_vf_bw since commit 883a9ccbae ("fm10k: Add support for SR-IOV
to driver", 2014-09-20).

Likely because we do not support minimum rates, the declaration
mis-ordered the "unused" parameter, which causes warnings when analyzed
with cppcheck.

Fix this warning by properly declaring the min_rate and max_rate
variables in the declaration and definition (rather than using
"unused"). Also rename "rate" to max_rate so as to clarify that we only
support setting the maximum rate.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-03 09:00:04 -07:00
Jacob Keller
87be98927e fm10k: prefer %s and __func__ for diagnostic prints
Don't hard code the function names in the diagnostic output when these
reset related routines fail. Instead, use %s and __func__ so that future
refactors don't need to change the print outs.

Additionally, while we are here, add missing function header comments
for the new reset_prepare and reset_done function handlers.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-03 08:55:20 -07:00
Joe Perches
c0ad8ef3df fm10k: Fix misuse of net_ratelimit()
Correct the backward logic using !net_ratelimit()

Miscellanea:

o Add a blank line before the error return label

Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-03 08:51:27 -07:00
Jacob Keller
ef57ab791c fm10k: bump version number
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-03 08:49:52 -07:00
Jacob Keller
1f5c27e528 fm10k: use the MAC/VLAN queue for VF<->PF MAC/VLAN requests
Now that we have a working MAC/VLAN queue for handling MAC/VLAN messages
from the netdev, replace the default handler for the VF<->PF messages.
This new handler is very similar to the default code, but uses the
MAC/VLAN queue instead of sending the message directly. Unfortunately we
can't easily re-use the default code, so we'll just replace the entire
function.

This ensures that a VF requesting a large number of VLANs or MAC
addresses does not start a reset cycle, as explained in the commit which
introduced the message queue.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Ngai-mint Kwan <ngai-mint.kwan@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-03 08:39:17 -07:00
Jacob Keller
fc9173682d fm10k: introduce a message queue for MAC/VLAN messages
Under some circumstances, when dealing with a large number of MAC
address or VLAN updates at once, the fm10k driver, particularly the VFs
can overload the mailbox with too many messages at once.

This results in a mailbox timeout, which causes the driver to initiate
a reset. During the reset, we re-send all the same messages that
originally caused the timeout. This results in a cycle of resets each
triggering a future reset.

To fix or avoid this, we introduce a workqueue item which monitors
a queue of MAC and VLAN requests. These requests are queued to the end
of the list, and we process as a FIFO periodically.

Initially we only handle requests for the netdev, but we do handle
unicast MAC addresses, multicast MAC addresses, and update VLAN
requests.

A future patch will add support to use this queue for handling MAC
update requests from the VF<->PF mailbox.

The MAC/VLAN work item will keep checking to make sure that each request
does not overflow the mailbox and cause a timeout. If it might, then the
work item will reschedule itself a short time later. This avoids any
reset cycle, since we never send the message if the mailbox is not
ready.

As an alternative, we tried increasing the mailbox message FIFO, but
this just delays the problem and results in needless memory waste on the
system. Our new message queue is dynamically allocated so only uses as
much memory as it needs. Additionally, it need not be contiguous like
the Tx and Rx FIFOs.

Note that this patch chose to only create a queue for MAC and VLAN
messages, since these are the only messages sent in a large enough
volume to cause the reset loop. Other messages are very unlikely to
overflow the mailbox Tx FIFO so easily.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-03 08:25:36 -07:00
Jacob Keller
8249c47c6b fm10k: use generic PM hooks instead of legacy PCIe power hooks
Replace the PCI specific legacy power management hooks with the new
generic power management hooks which work properly for both suspend and
hibernate. The new generic system is better and properly handles the
lower level PCIe power management rather than forcing the driver to
handle it.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-03 08:19:33 -07:00
Jacob Keller
b4fcd43661 fm10k: use spinlock to implement mailbox lock
Lets not re-invent the locking wheel. Remove our bitlock and use
a proper spinlock instead.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-03 08:12:44 -07:00
Jacob Keller
0b40f45748 fm10k: prepare_for_reset() when we lose PCIe Link
If we lose PCIe link, such as when an unannounced PFLR event occurs, or
when a device is surprise removed, we currently detach the device and
close the netdev. This unfortunately leaves a lot of things still
active, such as the msix_mbx_pf IRQ, and Tx/Rx resources.

This can cause problems because the register reads will return
potentially invalid values which may result in unknown driver behavior.

Begin the process of resetting using fm10k_prepare_for_reset(), much in
the same way as the suspend and resume cycle does. This will attempt to
shutdown as much as possible, in order to prevent possible issues.

A naive implementation for this has issues, because there are now
multiple flows calling the reset logic and setting a reset bit. This
would cause problems, because the "re-attach" routine might call
fm10k_handle_reset() prior to the reset actually finishing. Instead,
we'll add state bits to indicate which flow actually initiated the
reset.

For the general reset flow, we'll assume that if someone else is
resetting that we do not need to handle it at all, so it does not need
its own state bit. For the suspend case, we will simply issue a warning
indicating that we are attempting to recover from this case when
resuming.

For the detached subtask, we'll simply refuse to re-attach until we've
actually initiated a reset as part of that flow.

Finally, we'll stop attempting to manage the mailbox subtask when we're
detached, since there's nothing we can do if we don't have a PCIe
address.

Overall this produces a much cleaner shutdown and recovery cycle for
a PCIe surprise remove event.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-03 08:06:44 -07:00
David S. Miller
4efac6ff4d Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2017-10-02

This series contains updates to i40e and i40evf.

Shannon Nelson fixes an issue where when a machine has more CPUs than
queue pairs, the counting gets a "little funky" and turns off Flow
Director.  So to correct it, limit the number of LAN queues initially
allocated to be sure there are some left for Flow Director and other
features.

Lihong cleans up dead code by removing a condition check which cannot
ever be true.

Christophe Jaillet fixes a potential NULL pointer dereference, which
could happen if kzalloc() fails.

Filip corrects the reporting of supported link modes, which was incorrect
for some NICs.  Added support for 'ethtool -m' command, which displays
information about QSFP+ modules.

Mariusz adds functions to read/write the LED registers to control the
LEDS, instead of accessing the registers directly whenever the LEDs
need to be controlled.

Jake fixes a regression where we introduced a scheduling while atomic,
so introduce a separate helper function which will manage its own need
for the mac_filter_hash_lock.  Also cleaned up the "PF" parameter in
i40e_vc_disable_vf() since it is never used and is not needed.  Fixed
a rare case where it is possible that a reset does not occur when
i40e_vc_disable_vf() is called, so modify i40e_reset_vf() to return a
bool to indicate whether it reset or not so that i40e_vc_disable_vf()
can wait until a reset actually occurs.

Alan adds the ability for the VF to request more or less underlying
allocated queues from the PF.  Fixes the incorrect method for clearing
the vf_states variable with a NULL assignment, when we should be
using atomic bitops since we don't actually want to clear all the
flags.  Fixed a resource leak, where the PF driver fails to inform
clients of a VF reset because we were incorrectly checking the
I40E_VF_STATE_PRE_ENABLE bit.

Mitch converts i40evf_map_rings_to_vectors() to a void function since
it cannot fail and allows us to clean up the checks for the function
return value.

Scott enables the driver(s) to pass traffic with VLAN tags using the
802.1ad Ethernet protocol.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-02 15:16:03 -07:00
Scott Peterson
ab243ec940 i40e: Stop dropping 802.1ad tags - eth proto 0x88a8
Enable i40e to pass traffic with VLAN tags using the 802.1ad ethernet
protocol ID (0x88a8).

This requires NIC firmware providing version 1.7 of the API. With
older NIC firmware 802.1ad tagged packets will continue to be dropped.

No VLAN offloads nor RSS are supported for 802.1ad VLANs.

Signed-off-by: Scott Peterson <scott.d.peterson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:36 -07:00
Alan Brady
c53d11f669 i40e: fix client notify of VF reset
Currently there is a bug in which the PF driver fails to inform clients
of a VF reset which then causes clients to leak resources.  The bug
exists because we were incorrectly checking the I40E_VF_STATE_PRE_ENABLE
bit.

When a VF is first init we go through a reset to initialize variables
and allocate resources but we don't want to inform clients of this first
reset since the client isn't fully enabled yet so we set a state bit
signifying we're in a "pre-enabled" client state.  During the first
reset we should be clearing the bit, allowing all following resets to
notify the client of the reset when the bit is not set.  This patch
fixes the issue by negating the 'test_and_clear_bit' check to accurately
reflect the behavior we want.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:36 -07:00
Alan Brady
41d0a4d0c8 i40e: fix handling of vf_states variable
Currently we inappropriately clear the vf_states variable with a null
assignment.  This is problematic because we should be using atomic
bitops on this variable and we don't actually want to clear all the
flags.  We should just clear the ones we know we want to clear.
Additionally remove the I40E_VF_STATE_FCOEENA bit because it is no
longer being used.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Mitch Williams
1b7b7596ae i40e: make i40evf_map_rings_to_vectors void
This function cannot fail, so why is it returning a value? And why are
we checking it? Why shouldn't we just make it void? Why is this commit
message made up of only questions?

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Alan Brady
5b36e8d04b i40evf: Enable VF to request an alternate queue allocation
Currently the VF gets a default number of allocated queues from HW on
init and it could choose to enable or disable those allocated queues.
This makes it such that the VF can request more or less underlying
allocated queues from the PF.

First the VF negotiates the number of queues it wants that can be
supported by the PF and if successful asks for a reset.  During reset
the PF will reallocate the HW queues for the VF and will then remap the
new queues.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Jacob Keller
d43d60e5eb i40e: ensure reset occurs when disabling VF
It is possible although rare that we may not reset when
i40e_vc_disable_vf() is called. This can lead to some weird
circumstances with some values not being properly set. Modify
i40e_reset_vf() to return a code indicating whether it reset or not.

Now, i40e_vc_disable_vf() can wait until a reset actually occurs. If it
fails to free up within a reasonable time frame we'll display a warning
message.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Jacob Keller
f18d20218a i40e: make use of i40e_vc_disable_vf
Replace i40e_vc_notify_vf_reset and i40e_reset_vf with a call to
i40e_vc_disable_vf which does this exact thing. This matches similar
code patterns throughout the driver.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Jacob Keller
eeeddbb806 i40e: drop i40e_pf *pf from i40e_vc_disable_vf()
It's never used, and the vf structure could get back to the PF if
necessary. Lets just drop the extra unneeded parameter.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Jacob Keller
ba4e003d29 i40e: don't hold spinlock while resetting VF
When we refactored handling of the PVID in commit 9af52f60b2
("i40e: use (add|rm)_vlan_all_mac helper functions when changing PVID")
we introduced a scheduling while atomic regression.

This occurred because we now held the spinlock across a call to
i40e_reset_vf(), which results in a usleep_range() call that triggers
a scheduling while atomic bug. This was rare as it only occurred if the
user configured a VLAN on a VF and also attempted to reconfigure the VF
from the host system with a port VLAN.

We do need to hold the lock while calling i40e_is_vsi_in_vlan(), but we
should not be holding it while we reset the VF.

We'll fix this by introducing a separate helper function
i40e_vsi_has_vlans which checks whether we have a PVID and whether the
VSI has configured VLANs. This helper function will manage its own need
for the mac_filter_hash_lock.

Then, we can move the acquiring of the spinlock until after we reset the
VF, which ensures that we do not sleep while holding the lock.

Using a separate function like this makes the code more clear and is
easier to read than attempting to release and re-acquire the spinlock
when we reset the VF.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Mariusz Stachura
00f6c2f5e2 i40e: use admin queue for setting LEDs behavior
Instead of accessing register directly, use newly added AQC in
order to blink LEDs. Introduce and utilize a new flag to prevent
excessive API version checking.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Filip Sadowski
9c0e5caf63 i40e: Add support for 'ethtool -m'
This patch adds support for 'ethtool -m' command which displays
information about (Q)SFP+ module plugged into NIC's cage.

Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Filip Sadowski
d60bcc7980 i40e: Fix reporting of supported link modes
This patch fixes incorrect reporting of supported link modes on some NICs.

Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Christophe JAILLET
54902349ee i40e: Fix a potential NULL pointer dereference
If 'kzalloc()' fails, a NULL pointer will be dereferenced.
Return an error code (-ENOMEM) instead.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Lihong Yang
5872866e16 i40e: remove logically dead code
This patch removes the !vf condition check that cannot be
true in i40e_ndo_set_vf_trust function

Detected by CoverityScan, CID 1397531 Logically dead code

Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:35 -07:00
Shannon Nelson
e50d5751c8 i40e: limit lan queue count in large CPU count machine
When a machine has more CPUs than queue pairs, e.g. 512 cores, the
counting gets a little funky and turns off Flow Director with the
message:
  not enough queues for Flow Director. Flow Director feature is disabled

This patch limits the number of lan queues initially allocated to
be sure we have some left for FD and other features.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-02 12:46:34 -07:00
David S. Miller
d9601be13c Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2017-10-02

This series contains updates to fm10k only.

Jake provides all but one of the changes in this series.  Most are small
fixes, starting with ensuring prompt transmission of messages queued up
after each VF message is received and handled.  Fix a possible race
condition between the watchdog task and the processing of mailbox
messages by just checking whether the mailbox is still open.  Fix a
couple of GCC v7 warnings, including misspelled "fall through" comments
and warnings about possible truncation of calls to snprintf().  Cleaned
up a convoluted bitshift and read for the PFVFLRE register.  Fixed a
potential divide by zero when finding the proper r_idx.

Markus Elfring fixes an issue which was found using Coccinelle, where
we should have been using seq_putc() instead of seq_puts().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-02 11:59:34 -07:00
David S. Miller
c4b3630aff Merge branch 'Thunderbolt-networking'
Mika Westerberg says:

====================
Thunderbolt networking

In addition of tunneling PCIe, Display Port and USB traffic, Thunderbolt
allows connecting two hosts (domains) over a Thunderbolt cable. It is
possible to tunnel arbitrary data packets over such connection using
high-speed DMA rings available in the Thunderbolt host controller.

In order to discover Thunderbolt services the other host supports, there is
a software protocol running on top of the automatically configured control
channel (ring 0). This protocol is called XDomain discovery protocol and it
uses XDomain properties to describe the host (domain) and the services it
supports.

Once both sides have agreed what services are supported they can enable
high-speed DMA rings to transfer data over the cable.

This series adds support for the XDomain protocol so that we expose each
remote connection as Thunderbolt XDomain device and each service as
Thunderbolt service device. On top of that we create an API that allows
writing drivers for these services and finally we provide an example
Thunderbolt service driver that creates virtual ethernet inferface that
allows tunneling networking packets over Thunderbolt cable. The API could
be used for creating other future Thunderbolt services, such as tunneling
SCSI over Thunderbolt, for example.

The XDomain protocol and networking support is also available in macOS and
Windows so this makes it possible to connect Linux to macOS and Windows as
well.

The patches are based on previous Thunderbolt networking patch series by
Amir Levy and Michael Jamet, that can be found here:

  https://lwn.net/Articles/705998/

The main difference to that patch series is that we have the XDomain
protocol running in the kernel now so there is no need for a separate
userspace daemon.

Note this does not affect the existing functionality, so security levels
and NVM firmware upgrade continue to work as before (with the small
exception that now sysfs also shows the XDomain connections and services in
addition to normal Thunderbolt devices). It is also possible to connect up
to 5 Thunderbolt devices and then another host, and the network driver
works exactly the same.

This is third version of the patch series. The previous versions can be
be found here:

  v2: https://lkml.org/lkml/2017/9/25/225
  v1: https://lwn.net/Articles/734019/

Changes from the v2:

  * Add comment regarding calculation of interrupt throttling value
  * Add UUIDs as strings in comments on top of each declaration
  * Add a patch removing __packed from existing ICM messages. They are all
    32-bit aligned and should pack fine without the __packed.
  * Move adding MAINTAINERS entries to a separate patches
  * Added Michael and Yehezkel to be maintainers of the network driver
  * Remove __packed from the new ICM messages. They should pack fine as
    well without it.
  * Call register_netdev() after all other initialization is done in the
    network driver.
  * Use build_skb() instead of copying. We allocate order 1 page here to
    leave room for SKB shared info required by build_skb(). However, we do
    not leave room for full NET_SKB_PAD because the NHI hardware does not
    cope well if a frame crosses 4kB boundary. According comments in
    __build_skb() that should still be fine.
  * Added Reviewed-by tag from Andy.

Changes from the v1:

  * Add include/linux/thunderbolt.h to MAINTAINERS
  * Correct Linux version and date of new sysfs entries in
    Documentation/ABI/testing/sysfs-bus-thunderbolt
  * Move network driver from drivers/thunderbolt/net.c to
    drivers/net/thunderbolt.c and update it to follow coding style in
    drivers/net/*.
  * Add MAINTAINERS entry for the network driver
  * Minor cleanups

In case someone wants to try this out, the last patch adds documentation
how the networking driver can be used. In short, if you connect Linux to a
macOS or Windows, everything is done automatically (as those systems have
the networking service enabled by default). For Linux to Linux connection
one host needs to load the networking driver first (so that the other side
can locate the networking service and load the corresponding driver).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-02 11:24:42 -07:00
Mika Westerberg
c024297e7b MAINTAINERS: Add entry for Thunderbolt network driver
I will be maintaining the Thunderbolt network driver along with Michael
and Yehezkel.

Signed-off-by: Michael Jamet <michael.jamet@intel.com>
Signed-off-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-02 11:24:42 -07:00
Amir Levy
e69b6c02b4 net: Add support for networking over Thunderbolt cable
ThunderboltIP is a protocol created by Apple to tunnel IP/ethernet
traffic over a Thunderbolt cable. The protocol consists of configuration
phase where each side sends ThunderboltIP login packets (the protocol is
determined by UUID in the XDomain packet header) over the configuration
channel. Once both sides get positive acknowledgment to their login
packet, they configure high-speed DMA path accordingly. This DMA path is
then used to transmit and receive networking traffic.

This patch creates a virtual ethernet interface the host software can
use in the same way as any other networking interface. Once the
interface is brought up successfully network packets get tunneled over
the Thunderbolt cable to the remote host and back.

The connection is terminated by sending a ThunderboltIP logout packet
over the configuration channel. We do this when the network interface is
brought down by user or the driver is unloaded.

Signed-off-by: Amir Levy <amir.jer.levy@intel.com>
Signed-off-by: Michael Jamet <michael.jamet@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-02 11:24:42 -07:00