107388 Commits

Author SHA1 Message Date
Willem de Bruijn
f859a44847 tcp: allow zerocopy with fastopen
Accept MSG_ZEROCOPY in all the TCP states that allow sendmsg. Remove
the explicit check for ESTABLISHED and CLOSE_WAIT states.

This requires correctly handling zerocopy state (uarg, sk_zckey) in
all paths reachable from other TCP states. Such as the EPIPE case
in sk_stream_wait_connect, which a sendmsg() in incorrect state will
now hit. Most paths are already safe.

Only extension needed is for TCP Fastopen active open. This can build
an skb with data in tcp_send_syn_data. Pass the uarg along with other
fastopen state, so that this skb also generates a zerocopy
notification on release.

Tested with active and passive tcp fastopen packetdrill scripts at
1747eef03d

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25 22:41:08 -08:00
Peter Oskolkov
d4289fcc9b net: IP6 defrag: use rbtrees for IPv6 defrag
Currently, IPv6 defragmentation code drops non-last fragments that
are smaller than 1280 bytes: see
commit 0ed4229b08c1 ("ipv6: defrag: drop non-last frags smaller than min mtu")

This behavior is not specified in IPv6 RFCs and appears to break
compatibility with some IPv6 implemenations, as reported here:
https://www.spinics.net/lists/netdev/msg543846.html

This patch re-uses common IP defragmentation queueing and reassembly
code in IPv6, removing the 1280 byte restriction.

Signed-off-by: Peter Oskolkov <posk@google.com>
Reported-by: Tom Herbert <tom@herbertland.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25 21:37:11 -08:00
Peter Oskolkov
c23f35d19d net: IP defrag: encapsulate rbtree defrag code into callable functions
This is a refactoring patch: without changing runtime behavior,
it moves rbtree-related code from IPv4-specific files/functions
into .h/.c defrag files shared with IPv6 defragmentation code.

Signed-off-by: Peter Oskolkov <posk@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25 21:37:11 -08:00
David S. Miller
30e5c2c6bf net: Revert devlink health changes.
This reverts the devlink health changes from 9/17/2019,
Jiri wants things to be designed differently and it was
agreed that the easiest way to do this is start from the
beginning again.

Commits reverted:

cb5ccfbe73b389470e1dc11061bb185ef4bc9aec
880ee82f0313453ec5a6cb122866ac057263066b
c7af343b4e33578b7de91786a3f639c8cfa0d97b
ff253fedab961b22117a73ab808fcfa9e6852b50
6f9d56132eb6d2603d4273cfc65bed914ec47acb
fcd852c69d776c0f46c8f79e8e431e5cc6ddc7b7
8a66704a13d9713593342e29b4f0c19762f5746b
12bd0dcefe88782ac1c9fff632958dd1b71d27e5
aba25279c10094c5c97d09c3491ca86d00b4ad5e
ce019faa70f81555fa17ebc1d5a03651f2e7e15a
b8c45a033acc607201588f7665ba84207e5149e0

And the follow-on build fix:

o33a0efa4baecd689da9474ce0e8b673eb6931c60

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25 10:53:23 -08:00
Priyaranjan Jha
78dc70ebaa tcp_bbr: adapt cwnd based on ack aggregation estimation
Aggregation effects are extremely common with wifi, cellular, and cable
modem link technologies, ACK decimation in middleboxes, and LRO and GRO
in receiving hosts. The aggregation can happen in either direction,
data or ACKs, but in either case the aggregation effect is visible
to the sender in the ACK stream.

Previously BBR's sending was often limited by cwnd under severe ACK
aggregation/decimation because BBR sized the cwnd at 2*BDP. If packets
were acked in bursts after long delays (e.g. one ACK acking 5*BDP after
5*RTT), BBR's sending was halted after sending 2*BDP over 2*RTT, leaving
the bottleneck idle for potentially long periods. Note that loss-based
congestion control does not have this issue because when facing
aggregation it continues increasing cwnd after bursts of ACKs, growing
cwnd until the buffer is full.

To achieve good throughput in the presence of aggregation effects, this
algorithm allows the BBR sender to put extra data in flight to keep the
bottleneck utilized during silences in the ACK stream that it has evidence
to suggest were caused by aggregation.

A summary of the algorithm: when a burst of packets are acked by a
stretched ACK or a burst of ACKs or both, BBR first estimates the expected
amount of data that should have been acked, based on its estimated
bandwidth. Then the surplus ("extra_acked") is recorded in a windowed-max
filter to estimate the recent level of observed ACK aggregation. Then cwnd
is increased by the ACK aggregation estimate. The larger cwnd avoids BBR
being cwnd-limited in the face of ACK silences that recent history suggests
were caused by aggregation. As a sanity check, the ACK aggregation degree
is upper-bounded by the cwnd (at the time of measurement) and a global max
of BW * 100ms. The algorithm is further described by the following
presentation:
https://datatracker.ietf.org/meeting/101/materials/slides-101-iccrg-an-update-on-bbr-work-at-google-00

In our internal testing, we observed a significant increase in BBR
throughput (measured using netperf), in a basic wifi setup.
- Host1 (sender on ethernet) -> AP -> Host2 (receiver on wifi)
- 2.4 GHz -> BBR before: ~73 Mbps; BBR after: ~102 Mbps; CUBIC: ~100 Mbps
- 5.0 GHz -> BBR before: ~362 Mbps; BBR after: ~593 Mbps; CUBIC: ~601 Mbps

Also, this code is running globally on YouTube TCP connections and produced
significant bandwidth increases for YouTube traffic.

This is based on Ian Swett's max_ack_height_ algorithm from the
QUIC BBR implementation.

Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 22:27:27 -08:00
Nikolay Aleksandrov
949e7cea0c bonding: count master 3ad stats separately
I made a dumb mistake when I summed up the slave stats, obviously slaves
can come and go which would make the master stats unreliable.
Count and export the master stats separately.

Fixes: a258aeacd7f0 ("bonding: add support for xstats and export 3ad stats")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 22:18:48 -08:00
Heiner Kallweit
434a4315b9 net: phy: change phy_start_interrupts to phy_request_interrupt
Now that we enable the interrupts in phy_start() we don't have to do it
before. Therefore remove enabling interrupts from phy_start_interrupts()
and rename this function to reflect the changed functionality.

v2:
- improve warning to clearly state that we fall back to polling

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 22:15:15 -08:00
Yangbo Lu
19df7510d5 ptp: add debugfs support for ptp_qoriq
This patch is to add debugfs support for ptp_qoriq. Current debugfs
supports to control fiper1/fiper2 loopback mode. If the loopback mode
is enabled, the fiper1/fiper2 pulse is looped back into trigger1/
trigger2 input. This is very useful for validating hardware and driver
without external hardware. Below is an example to enable fiper1 loopback.

echo 1 > /sys/kernel/debug/2d10e00.ptp_clock/fiper1-loopback

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 20:21:57 -08:00
Yangbo Lu
6815d8b092 ptp_qoriq: support external trigger stamp FIFO
The external trigger stamp FIFO was introduced as a new feature
for QorIQ 1588 timer IP block. This patch is to support it by
adding a new dts property "fsl,extts-fifo". Any QorIQ 1588 timer
supporting this feature is required to add this property in its
dts node.

In addition, the FIFO should be cleaned up before enabling external
trigger interrupts. Otherwise, there will be interrupts immediately
just after enabling external trigger interrupts.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 20:21:57 -08:00
Linus Lüssing
4b3087c7e3 bridge: Snoop Multicast Router Advertisements
When multiple multicast routers are present in a broadcast domain then
only one of them will be detectable via IGMP/MLD query snooping. The
multicast router with the lowest IP address will become the selected and
active querier while all other multicast routers will then refrain from
sending queries.

To detect such rather silent multicast routers, too, RFC4286
("Multicast Router Discovery") provides a standardized protocol to
detect multicast routers for multicast snooping switches.

This patch implements the necessary MRD Advertisement message parsing
and after successful processing adds such routers to the internal
multicast router list.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 17:18:09 -08:00
Linus Lüssing
4effd28c12 bridge: join all-snoopers multicast address
Next to snooping IGMP/MLD queries RFC4541, section 2.1.1.a) recommends
to snoop multicast router advertisements to detect multicast routers.

Multicast router advertisements are sent to an "all-snoopers"
multicast address. To be able to receive them reliably, we need to
join this group.

Otherwise other snooping switches might refrain from forwarding these
advertisements to us.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 17:18:08 -08:00
Linus Lüssing
ba5ea61462 bridge: simplify ip_mc_check_igmp() and ipv6_mc_check_mld() calls
This patch refactors ip_mc_check_igmp(), ipv6_mc_check_mld() and
their callers (more precisely, the Linux bridge) to not rely on
the skb_trimmed parameter anymore.

An skb with its tail trimmed to the IP packet length was initially
introduced for the following three reasons:

1) To be able to verify the ICMPv6 checksum.
2) To be able to distinguish the version of an IGMP or MLD query.
   They are distinguishable only by their size.
3) To avoid parsing data for an IGMPv3 or MLDv2 report that is
   beyond the IP packet but still within the skb.

The first case still uses a cloned and potentially trimmed skb to
verfiy. However, there is no need to propagate it to the caller.
For the second and third case explicit IP packet length checks were
added.

This hopefully makes ip_mc_check_igmp() and ipv6_mc_check_mld() easier
to read and verfiy, as well as easier to use.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 17:18:08 -08:00
Nikolay Aleksandrov
a258aeacd7 bonding: add support for xstats and export 3ad stats
This patch adds support for extended statistics (xstats) call to the
bonding. The first user would be the 3ad code which counts the following
events:
 - LACPDU Rx/Tx
 - LACPDU unknown type Rx
 - LACPDU illegal Rx
 - Marker Rx/Tx
 - Marker response Rx/Tx
 - Marker unknown type Rx

All of these are exported via netlink as separate attributes to be
easily extensible as we plan to add more in the future.
Similar to how the bridge and other xstats exports, the structure
inside is:
 [ IFLA_STATS_LINK_XSTATS ]
   -> [ LINK_XSTATS_TYPE_BOND ]
        -> [ BOND_XSTATS_3AD ]
             -> [ 3ad stats attributes ]

With this structure it's easy to add more stat types later.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 12:04:14 -08:00
Nikolay Aleksandrov
267c095aa2 bonding: add 3ad stats
Count the following types of 3ad packets per slave:
 - rx/tx lacpdu
 - rx/tx marker
 - rx/tx marker response
 - rx illegal lacpdus (right now counted on wrong length)
 - rx unknown lacpdu type
 - rx unknown marker type

The counters are using atomic64 since this is not fast path.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 12:04:14 -08:00
Cong Wang
856c395cfa net: introduce a knob to control whether to inherit devconf config
There have been many people complaining about the inconsistent
behaviors of IPv4 and IPv6 devconf when creating new network
namespaces.  Currently, for IPv4, we inherit all current settings
from init_net, but for IPv6 we reset all setting to default.

This patch introduces a new /proc file
/proc/sys/net/core/devconf_inherit_init_net to control the
behavior of whether to inhert sysctl current settings from init_net.
This file itself is only available in init_net.

As demonstrated below:

Initial setup in init_net:
 # cat /proc/sys/net/ipv4/conf/all/rp_filter
 2
 # cat /proc/sys/net/ipv6/conf/all/accept_dad
 1

Default value 0 (current behavior):
 # ip netns del test
 # ip netns add test
 # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter
 2
 # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad
 0

Set to 1 (inherit from init_net):
 # echo 1 > /proc/sys/net/core/devconf_inherit_init_net
 # ip netns del test
 # ip netns add test
 # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter
 2
 # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad
 1

Set to 2 (reset to default):
 # echo 2 > /proc/sys/net/core/devconf_inherit_init_net
 # ip netns del test
 # ip netns add test
 # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter
 0
 # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad
 0

Set to a value out of range (invalid):
 # echo 3 > /proc/sys/net/core/devconf_inherit_init_net
 -bash: echo: write error: Invalid argument
 # echo -1 > /proc/sys/net/core/devconf_inherit_init_net
 -bash: echo: write error: Invalid argument

Reported-by: Zhu Yanjun <Yanjun.Zhu@windriver.com>
Reported-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 11:07:21 -08:00
David S. Miller
fa7f3a8d56 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Completely minor snmp doc conflict.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-21 14:41:32 -08:00
Linus Torvalds
7d0ae236ed Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix endless loop in nf_tables, from Phil Sutter.

 2) Fix cross namespace ip6_gre tunnel hash list corruption, from
    Olivier Matz.

 3) Don't be too strict in phy_start_aneg() otherwise we might not allow
    restarting auto negotiation. From Heiner Kallweit.

 4) Fix various KMSAN uninitialized value cases in tipc, from Ying Xue.

 5) Memory leak in act_tunnel_key, from Davide Caratti.

 6) Handle chip errata of mv88e6390 PHY, from Andrew Lunn.

 7) Remove linear SKB assumption in fou/fou6, from Eric Dumazet.

 8) Missing udplite rehash callbacks, from Alexey Kodanev.

 9) Log dirty pages properly in vhost, from Jason Wang.

10) Use consume_skb() in neigh_probe() as this is a normal free not a
    drop, from Yang Wei. Likewise in macvlan_process_broadcast().

11) Missing device_del() in mdiobus_register() error paths, from Thomas
    Petazzoni.

12) Fix checksum handling of short packets in mlx5, from Cong Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (96 commits)
  bpf: in __bpf_redirect_no_mac pull mac only if present
  virtio_net: bulk free tx skbs
  net: phy: phy driver features are mandatory
  isdn: avm: Fix string plus integer warning from Clang
  net/mlx5e: Fix cb_ident duplicate in indirect block register
  net/mlx5e: Fix wrong (zero) TX drop counter indication for representor
  net/mlx5e: Fix wrong error code return on FEC query failure
  net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames
  tools: bpftool: Cleanup license mess
  bpf: fix inner map masking to prevent oob under speculation
  bpf: pull in pkt_sched.h header for tooling to fix bpftool build
  selftests: forwarding: Add a test case for externally learned FDB entries
  selftests: mlxsw: Test FDB offload indication
  mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky
  net: bridge: Mark FDB entries that were added by user as such
  mlxsw: spectrum_fid: Update dummy FID index
  mlxsw: pci: Return error on PCI reset timeout
  mlxsw: pci: Increase PCI SW reset timeout
  mlxsw: pci: Ring CQ's doorbell before RDQ's
  MAINTAINERS: update email addresses of liquidio driver maintainers
  ...
2019-01-21 12:52:31 +13:00
Linus Torvalds
bb617b9b45 virtio, vhost: fixes, cleanups
fixes and cleanups all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcPTc7AAoJECgfDbjSjVRpOEgH/Ahdx7VMYJtFsdmoJKiwhB7M
 jRRi9R903V9H87vl1BXy6dutHw+WONJtm6FSZ1ayNWlVmUmWS6vci+IUErr2uDrv
 KSG+dJMQLlF7t1dnLRwlLazvGa4/58+u0J459uKPQ5ckqwV5wXPjUS5Z0xF3ldxM
 Twz6vhYRGKCUc10YZm/WmsjlLROgaNtRya10PzAGVmXPzbCpvJfiojKWJER+Eigq
 JxWynTCm/YvIk824Ls9cDBVkDvb8GPS3blVbFnusR+D3ktvX7vLDPOsErGn4umVS
 nUm3/WiQALB9fKer+SsgcEGVh+fa06KIITK+IBblULmrAIT3CJdJp70UJBjfdTM=
 =DCkE
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio/vhost fixes and cleanups from Michael Tsirkin:
 "Fixes and cleanups all over the place"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost/scsi: Use copy_to_iter() to send control queue response
  vhost: return EINVAL if iovecs size does not match the message size
  virtio-balloon: tweak config_changed implementation
  virtio: don't allocate vqs when names[i] = NULL
  virtio_pci: use queue idx instead of array idx to set up the vq
  virtio: document virtio_config_ops restrictions
  virtio: fix virtio_config_ops description
2019-01-21 07:37:16 +13:00
Linus Torvalds
315a6d850a include/linux/compiler*.h changes:
- A fix for OPTIMIZER_HIDE_VAR
     From Michael S. Tsirkin
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAlxEkZkACgkQGXyLc2ht
 IW2ksQ/8C2rfoTcKKeNxjpgV2XQ0HV6DsacEeQw+0VSBTzA2OPSiV61Dfm6/qywp
 Y7DyjrosteIOShZhxSdg6M04uDEtb9m8GaWQitW3ewfmbnXmRX5OB3CI5vFF1/nS
 LOj/Bd6FRqkIM6b0b7MWp/J0hQMnSNuElZp+yqlJyy1YuivfXN3vu8iDsHDhlyAs
 OY4SqNmmlaUtSlRaTgJsSt28AFJ4CSgJqziKZux17xzjrstXg1p9BhcnZVzcmjeY
 tcGFptbpUEmrcF2iqR8weaWkJSizgfsI60USrRZwLKM+i1NyOmnk9AWtSxOblNZZ
 0z4QslkZG1/7rtmHOn6qVGcWsic+AINbrzSeBReEg8G/f/P/XI7yRJmQAaQWqzOD
 ByEYoCp6U7gmQY6QiLLwq9d3VTHxV9d6PeC6gqEDM5ifrTIdOwNbL0MPvpb/UOlC
 1IC/RpHOqAwWKTaYvpoutXzw9kG/TXG/yvdphTsStxOSnXeEXntdwmd0CDLKG6sG
 y4xmEqU51KUoQ1UsX++dhxxxR4H7O6WcUmcFGXcrrhAD+x0N7Rd9g+lSDCpxW9yC
 sIzr2aaPpaZMD40gAHb4vlXR+MqJdIFAAJ4xI1oaU+zSuxkEUn05xR6OgVgAdMLp
 jT1SbI0XpbviakV6mwquAND7HOKWP/eZ7NLf6LlJIp0wrkBZRrQ=
 =2ce7
 -----END PGP SIGNATURE-----
mergetag object 99e309b6ed75fab4a43afd9e523441ecc5a1f511
 type commit
 tag clang-format-for-linus-v5.0-rc3
 tagger Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> 1547999145 +0100
 
 clang-format changes:
 
   - Update of the for_each macro list
     From Jason Gunthorpe
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAlxEmAYACgkQGXyLc2ht
 IW3AZQ//byEzF7HPogmwAQDA8cs6Rg2GSP3ohPsGavvqmd6AegXMqAfV9VNzFRBr
 tekhJuPufen6TjFkD+1Tlk+dvsXZXb/n/wYQWAJWm5SgOUfzh6Monprhe1JofLz0
 pQ5xIAKtbpsuS6TmjWttXMKWN7aTenNQIPyGXxZziKe653yLYdISuYKDqo9SbLQX
 QaXi911REjZHSUXdHrsmGyidci6HizJZkqCVB7zWrmB3+ygMTzo8x6HPTeIBOAdX
 zEHGDdKQEPKO7y6Jyh5GkCzpCKWqSgdgXsI16eyKsPYymkARaqgMYtCPHTBvZ3e1
 DkpCUg2BEMEDeftEFa9ysNOWppQTw9xDVuk6BO0T8YYeXnLlo9CWb6Dl7YRnoO63
 0nsdvmHRkDKP93Hs9Zn3kZRVvy1EgOeIkfD+gK6sJpibyzJZRFGAwC3ysP/ERDVx
 Lb25tdluWaxKZQwepqC472fiwX1V65YrLX66gUGfF5JIJqYDjeoOl+lgVb8L6Ped
 sjYKO8uf2D9ZPRpsXgecx9u+Fy94P0fPTEm76vo5z1HBMAldihrQnw1U9ZNsvjBr
 HiWIB6ccP/chDN+wtoI/lQGKgqjM6EYWJpts/NkPHvA1d0BUEPJ7/tHTFmUZ0c6z
 DxdcjX/g4Bu/rSyIJaeosdcKNgFm+maHWQX+L+YV9yE1uGTzdcE=
 =mM3e
 -----END PGP SIGNATURE-----

Merge tags 'compiler-attributes-for-linus-v5.0-rc3' and 'clang-format-for-linus-v5.0-rc3' of git://github.com/ojeda/linux

Pull misc clang fixes from Miguel Ojeda:

  - A fix for OPTIMIZER_HIDE_VAR from Michael S Tsirkin

  - Update clang-format with the latest for_each macro list from Jason
    Gunthorpe

* tag 'compiler-attributes-for-linus-v5.0-rc3' of git://github.com/ojeda/linux:
  include/linux/compiler*.h: fix OPTIMIZER_HIDE_VAR

* tag 'clang-format-for-linus-v5.0-rc3' of git://github.com/ojeda/linux:
  clang-format: Update .clang-format with the latest for_each macro list
2019-01-21 07:23:42 +13:00
Cong Wang
5954894ba3 net_sched: add performance counters for basic filter
Similar to u32 filter, it is useful to know how many times
we reach each basic filter and how many times we pass the
ematch attached to it.

Sample output:

filter protocol arp pref 49152 basic chain 0
filter protocol arp pref 49152 basic chain 0 handle 0x1  (rule hit 3 success 3)
	action order 1: gact action pass
	 random type none pass val 0
	 index 1 ref 1 bind 1 installed 81 sec used 4 sec
	Action statistics:
	Sent 126 bytes 3 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19 16:05:42 -08:00
Linus Torvalds
5d5c303ea0 A few MIPS fixes for 5.0:
- Fix IPI handling for Lantiq SoCs, which was broken by changes made
   back in v4.12.
 
 - Enable OF/DT serial support in ath79_defconfig to give us working
   serial by default.
 
 - Fix 64b builds for the Jazz platform.
 
 - Set up a struct device for the BCM47xx SoC to allow BCM47xx drivers to
   perform DMA again following the major DMA mapping changes made in
   v4.19.
 
 - Disable MSI on Cavium Octeon systems when the pcie_disable command
   line parameter introduced in v3.3 is used, in order to avoid
   inadvetently accessing PCIe controller registers despite the command
   line.
 
 - Fix a build failure for Cavium Octeon kernels with kexec enabled,
   introduced in v4.20.
 
 - Fix a regression in the behaviour of semctl/shmctl/msgctl IPC syscalls
   for kernels including n32 support but not o32 support caused by some
   cleanup in v3.19.
 -----BEGIN PGP SIGNATURE-----
 
 iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCXEJhqRUccGF1bC5idXJ0
 b25AbWlwcy5jb20ACgkQPqefrLV1AN2aWwEA4ZExeZQi+g9oPNII/jd9wbLKU4Eq
 xjl/+NdzPVu+pP4A/AuG5hsEMFIgS2U0k2js7kNMHCzoV9Ky2m3kdbSNHvQI
 =AqoC
 -----END PGP SIGNATURE-----

Merge tag 'mips_fixes_5.0_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Paul Burton:

 - Fix IPI handling for Lantiq SoCs, which was broken by changes made
   back in v4.12.

 - Enable OF/DT serial support in ath79_defconfig to give us working
   serial by default.

 - Fix 64b builds for the Jazz platform.

 - Set up a struct device for the BCM47xx SoC to allow BCM47xx drivers
   to perform DMA again following the major DMA mapping changes made in
   v4.19.

 - Disable MSI on Cavium Octeon systems when the pcie_disable command
   line parameter introduced in v3.3 is used, in order to avoid
   inadvetently accessing PCIe controller registers despite the command
   line.

 - Fix a build failure for Cavium Octeon kernels with kexec enabled,
   introduced in v4.20.

 - Fix a regression in the behaviour of semctl/shmctl/msgctl IPC
   syscalls for kernels including n32 support but not o32 support caused
   by some cleanup in v3.19.

* tag 'mips_fixes_5.0_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: OCTEON: fix kexec support
  mips: fix n32 compat_ipc_parse_version
  Disable MSI also when pcie-octeon.pcie_disable on
  MIPS: BCM47XX: Setup struct device for the SoC
  MIPS: jazz: fix 64bit build
  MIPS: ath79: Enable OF serial ports in the default config
  MIPS: lantiq: Use CP0_LEGACY_COMPARE_IRQ
  MIPS: lantiq: Fix IPI interrupt handling
2019-01-20 10:33:18 +12:00
Linus Torvalds
26caabbcd7 libnvdimm v5.0-rc3
* Fix driver initialization crash due to the inability to report an
   'error' state for a DIMM's security capability.
 * Build warning fix for little-endian ARM64 builds
 * Fix a potential race between the EDAC driver's usage of the NFIT
   SMBIOS id for a DIMM and the driver shutdown path.
 * A small collection of one-line benign cleanups for duplicate variable
   assignments, a duplicate header include and a mis-typed function
   argument.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJcQ4vDAAoJEB7SkWpmfYgCwrYP/iCZZVWz7TlP8u73alEXtw4G
 gPPQHb+n0Y5DQOMZCu1jFfKvkZzcIq4Wp6277Ht6DEREzoobyswRM8G1EKOwbzUM
 J9D0Hm8XCy9Z6v3MpH/VASMowjQkClL/5xy8rlJdZMjprjDFqq3t4M/sBARsaeZv
 xldAcNdbAQWq6HEebLZXja/AF8Ex0o59Z/w3oJ1Ds98mGPjGPUejV9cnpgCOHGZc
 rfB9DZdEKf4CUEqZmlRI253yhR4iHtzKJQGMadQVhcLdO/Jjc8DKxozOgxQOuWQu
 SvAv6t+tymGzPCUJg+pFROwemn3mB2fh3XMaueEY1biCZcj67Lml0H/zTewezsI7
 CZ4ZUPypc9L50GIUYCyPkNuB7E6jXmMD6Vn7EEx1GwQTlZmRQ8WbTELLGGsNUfvI
 55TqT0IbcnUhOXqKbaYql1Y7Z/yEmiBI1f5VdB9zXQKZ66dcgfgfVqLfhw7HUmmu
 j6J3xAtvcYX8x2EA75fdfr5mKUyrr1UAYG6bkActll0mRIHK4u2+ME4E4wmhZidG
 PlZV0sIy6EMDTbTyFP1N0P9FsG4DmK6mTATXFn/CqYMoW3TTuAlsWxhdtGMzhpur
 See6W4yJ0pT3wmzzPcOgPI4erntb7dQirwDjclw4qj/DnUegPKxP8UxXzFBH3gFV
 /wYYqXWh0cXa/Jq9YX3L
 =HCIs
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-fixes-5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:
 "A crash fix, a build warning fix, a miscellaneous small cleanups.

  In case anyone is looking for them, there was a regression caught by
  testing that caused two patches to be dropped from this update.  Those
  patches have been reworked and will soak for another week / re-target
  5.0-rc4.

   - Fix driver initialization crash due to the inability to report an
     'error' state for a DIMM's security capability.

   - Build warning fix for little-endian ARM64 builds

   - Fix a potential race between the EDAC driver's usage of the NFIT
     SMBIOS id for a DIMM and the driver shutdown path.

   - A small collection of one-line benign cleanups for duplicate
     variable assignments, a duplicate header include and a mis-typed
     function argument"

* tag 'libnvdimm-fixes-5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm/security: Fix nvdimm_security_state() state request selection
  acpi/nfit: Remove duplicate set nd_set in acpi_nfit_init_interleave_set()
  acpi/nfit: Fix race accessing memdev in nfit_get_smbios_id()
  libnvdimm/dimm: Fix security capability detection for non-Intel NVDIMMs
  nfit: Mark some functions as __maybe_unused
  ACPI/nfit: delete the function to_acpi_nfit_desc
  ACPI/nfit: delete the redundant header file
2019-01-20 10:24:30 +12:00
Jakub Kicinski
59c28058fa net: netlink: add helper to retrieve NETLINK_F_STRICT_CHK
Dumps can read state of the NETLINK_F_STRICT_CHK flag from
a field in the callback structure.  For non-dump GET requests
we need a way to access the state of that flag from a socket.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19 10:09:58 -08:00
Camelia Groza
3e64cf7a43 net: phy: phy driver features are mandatory
Since phy driver features became a link_mode bitmap, phy drivers that
don't have a list of features configured will cause the kernel to crash
when probed.

Prevent the phy driver from registering if the features field is missing.

Fixes: 719655a14971 ("net: phy: Replace phy driver features u32 with link_mode bitmap")
Reported-by: Scott Wood <oss@buserror.net>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19 10:03:08 -08:00
Toke Høiland-Jørgensen
5f2939d933 sch_api: Change signature of qdisc_tree_reduce_backlog() to use ints
There are now several places where qdisc_tree_reduce_backlog() is called
with a negative number of packets (to signal an increase in number of
packets in the queue). Rather than rely on overflow behaviour, change the
function signature to use signed integers to communicate this usage to
people reading the code.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19 09:53:18 -08:00
Eran Ben Elisha
12bd0dcefe devlink: Add health dump {get,clear} commands
Add devlink health dump commands, in order to run an dump operation
over a specific reporter.

The supported operations are dump_get in order to get last saved
dump (if not exist, dump now) and dump_clear to clear last saved
dump.

It is expected from driver's callback for diagnose command to fill it
via the buffer descriptors API. Devlink will parse it and convert it to
netlink nla API in order to pass it to the user.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18 14:51:23 -08:00
Eran Ben Elisha
8a66704a13 devlink: Add health diagnose command
Add devlink health diagnose command, in order to run a diagnose
operation over a specific reporter.

It is expected from driver's callback for diagnose command to fill it
via the buffer descriptors API. Devlink will parse it and convert it to
netlink nla API in order to pass it to the user.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18 14:51:23 -08:00
Eran Ben Elisha
fcd852c69d devlink: Add health recover command
Add devlink health recover command to the uapi, in order to allow the user
to execute a recover operation over a specific reporter.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18 14:51:23 -08:00
Eran Ben Elisha
6f9d56132e devlink: Add health set command
Add devlink health set command, in order to set configuration parameters
for a specific reporter.
Supported parameters are:
- graceful_period: Time interval between auto recoveries (in msec)
- auto_recover: Determines if the devlink shall execute recover upon
		receiving error for the reporter

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18 14:51:22 -08:00
Eran Ben Elisha
ff253fedab devlink: Add health get command
Add devlink health get command to provide reporter/s data for user space.
Add the ability to get data per reporter or dump data from all available
reporters.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18 14:51:22 -08:00
Eran Ben Elisha
c7af343b4e devlink: Add health report functionality
Upon error discover, every driver can report it to the devlink health
mechanism via devlink_health_report function, using the appropriate
reporter registered to it. Driver can pass error specific context which
will be delivered to it as part of the dump / recovery callbacks.

Once an error is reported, devlink health will do the following actions:
* A log is being send to the kernel trace events buffer
* Health status and statistics are being updated for the reporter instance
* Object dump is being taken and stored at the reporter instance (as long
  as there is no other dump which is already stored)
* Auto recovery attempt is being done. depends on:
  - Auto Recovery configuration
  - Grace period vs. time since last recover

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18 14:51:22 -08:00
Eran Ben Elisha
880ee82f03 devlink: Add health reporter create/destroy functionality
Devlink health reporter is an instance for reporting, diagnosing and
recovering from run time errors discovered by the reporters.
Define it's data structure and supported operations.
In addition, expose devlink API to create and destroy a reporter.
Each devlink instance will hold it's own reporters list.

As part of the allocation, driver shall provide a set of callbacks which
will be used the devlink in order to handle health reports and user
commands related to this reporter. In addition, driver is entitled to
provide some priv pointer, which can be fetched from the reporter by
devlink_health_reporter_priv function.

For each reporter, devlink will hold a metadata of statistics,
buffers and status.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18 14:51:22 -08:00
Eran Ben Elisha
cb5ccfbe73 devlink: Add health buffer support
Devlink health buffer is a mechanism to pass descriptors between drivers
and devlink. The API allows the driver to add objects, object pair,
value array (nested attributes), value and name.

Driver can use this API to fill the buffers in a format which can be
translated by the devlink to the netlink message.

In order to fulfill it, an internal buffer descriptor is defined. This
will hold the data and metadata per each attribute and by used to pass
actual commands to the netlink.

This mechanism will be later used in devlink health for dump and diagnose
data store by the drivers.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18 14:51:22 -08:00
Cong Wang
f88c19aab5 net_sched: add hit counter for matchall
Although matchall always matches packets, however, it still
relies on a protocol match first. So it is still useful to have
such a counter for matchall. Of course, unlike u32, every time
we hit a matchall filter, it is always a success, so we don't
have to distinguish them.

Sample output:

filter protocol 802.1Q pref 100 matchall chain 0
filter protocol 802.1Q pref 100 matchall chain 0 handle 0x1
  not_in_hw (rule hit 10)
	action order 1: vlan  pop continue
	 index 1 ref 1 bind 1 installed 40 sec used 1 sec
	Action statistics:
	Sent 836 bytes 10 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

Reported-by: Martin Olsson <martin.olsson+netdev@sentorsecurity.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18 14:13:50 -08:00
Heiner Kallweit
bb658ab7b8 net: phy: remove phy_stop_interrupts
Interrupts have been disabled in phy_stop() already. So we can remove
phy_stop_interrupts() and free the interrupt in phy_disconnect()
directly.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18 14:12:25 -08:00
Ross Lagerwall
6c57f04580 net: Fix usage of pskb_trim_rcsum
In certain cases, pskb_trim_rcsum() may change skb pointers.
Reinitialize header pointers afterwards to avoid potential
use-after-frees. Add a note in the documentation of
pskb_trim_rcsum(). Found by KASAN.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18 14:05:14 -08:00
Yafang Shao
340a6f3d2d tcp: declare tcp_mmap() only when CONFIG_MMU is set
Since tcp_mmap() is defined when CONFIG_MMU is set.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18 14:03:53 -08:00
Heiner Kallweit
e302c2a5fe net: phy: remove state PHY_CHANGELINK
Since recent changes to the phylib state machine state PHY_CHANGELINK
isn't used any longer. Therefore let's remove it.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18 13:57:20 -08:00
Linus Torvalds
2a8cbf2a02 fbdev fixes for v5.0-rc3:
- fix stack memory leak in omap2fb driver (Vlad Tsyrklevich)
 
 - fix OF node name handling v4.20 regression in offb driver (Rob Herring)
 
 - convert CONFIG_FB_LOGO_CENTER config option added in v5.0-rc1 into
   a kernel parameter (Peter Rosin)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJcQcZZAAoJEH4ztj+gR8IL/7wQAKbpzwH5WnxTvZzRIacnHcoS
 BAzlgG4QSYqr2h09BRbsgi5GlrWKdMkk0AH7q/jkVfcmaJIkaibS755LnH7imumy
 +y8OGPYlNq0ys/F2wV4gYr+yZJ+FCplZA0Nl4DoZxjG9kTw3/Akayh/RbnzEgKkU
 LvNse8sP/ksON74p9AzBEtp9VLUL3QcyqksN0jtse/7UmLcL+o+j8kWPqwM2XRQ6
 XUSpIvhVhcl/l4zT5feMy2x0TCZ8GPLcjKDcevvGypPlMbDr9jPdnYDCU5SIqCsM
 gOl4Uiuhnd4Amg5eOWgYxzBnmGFWwqNjYLDNUmuPy95NaIDeIpQw/QY3+GAo+BDh
 O6u8BKFbUhy5RVQF9s/wP2p8HXUy29oUpNuorowFUd46fHsBvd6hfb1/diwKy6iB
 2HXaEplIWcEk34hH7uM0gUrJ+57YQAv30TmLGBH+zyhfgpb8OyHSVpDqmRWBBP8s
 HU/YGwQMBQE/lGqoAq5ku6Q8Ex4kJ9GSBJELKyiwyElB5eZNX6gSnMv5iURS3tlZ
 Wh1hTutmsktVW5+ndlAzXcRX0SbvfoLfWiTaAXdKZv17+7uaBoZvsnc7bLahOOVU
 xY5WAYSgqWEVX03Owb2QM8sJsrIIrQ4w44SAb4UNNDw1o1HCVDkhrHq5uG68k0na
 +UskmI44TNIA5ZIQiMC4
 =O8jy
 -----END PGP SIGNATURE-----

Merge tag 'fbdev-v5.0-rc3' of git://github.com/bzolnier/linux

Pull fbdev fixes from Bartlomiej Zolnierkiewicz:

 - fix stack memory leak in omap2fb driver (Vlad Tsyrklevich)

 - fix OF node name handling v4.20 regression in offb driver (Rob
   Herring)

 - convert CONFIG_FB_LOGO_CENTER config option added in v5.0-rc1 into a
   kernel parameter (Peter Rosin)

* tag 'fbdev-v5.0-rc3' of git://github.com/bzolnier/linux:
  fbdev: fbmem: convert CONFIG_FB_LOGO_CENTER into a cmd line option
  fbdev: offb: Fix OF node name handling
  omap2fb: Fix stack memory disclosure
2019-01-19 05:43:05 +12:00
YueHaibing
8b59bfe83c qed: remove duplicated include from qed_if.h
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 21:57:45 -08:00
Linus Torvalds
d7393226d1 First 5.0 rc pull request
Not much so far, but I'm feeling like the 2nd PR -rc will be larger than
 this. We have the usual batch of bugs and two fixes to code merged this cycle.
 
 - Restore valgrind support for the ioctl verbs interface merged this window,
   and fix a missed error code on an error path from that conversion
 
 - A user reported crash on obsolete mthca hardware
 
 - pvrdma was using the wrong command opcode toward the hypervisor
 
 - NULL pointer crash regression when dumping rdma-cm over netlink
 
 - Be conservative about exposing the global rkey
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlxBTeMACgkQOG33FX4g
 mxrOIQ//YdZdU9J825DM4ppH/MWRoPgayI+cca5sW2EG/nkgsvFJoiVDDK5/ka1g
 ge5Q21ZLMSPCBR0Iu/e/JOq6fJI4fsbcJGZURbyKgRZqyCBCf6qJbhiZKifpQMVb
 w7RP8kRFRdaiQzkAYfZSv9TP93JLvTDLg6zZ74r4vc8YphIzkI410v568hs6FiVu
 MIcb53pBWUswpCAnBVB+54sw+phJyjd02kmY4xTlWmiEzwHBb0JQ+Kps72/G0IWy
 0vOlDI1UjwqoDfThzyT7mcXqnSbXxg/e8EecMpyFzlorQyxgZ5TsJgQ8ubSYxuiQ
 7+dZ4rsdoZD++3MGtpmqDMQzKSPb989WzJT8WLp5oSw4ryAXeJJ+tys/APLtvPkf
 EgKgVyEqfxMDXn02/ENwDPpZyKLZkhcHFLgvfYmxtlDvtai/rvTLmzV1mptEaxlF
 +2pwSQM4/E/8qrLglN9kdFSfjBMb7Bvd2NYQqZ9vah2omb7gPsaTEEpVw6l/E0NX
 oOxFKPEzb0nP9KmJmwO8KLCvcrruuRL8kpmhc6sQMQJ6z0h4hmZrHF5EZZH92g0p
 maHyrx66vqw/Yl+TLvAb/T6FV1ax5c1TauiNErAjnag2wgVWW42Q7lQzSFLFI8su
 GU8oRlbIclDQ/1bszsf0IShq0r9G17+2n6yyTX39rj62YioiDlI=
 =ymZq
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes frfom Jason Gunthorpe:
 "Not much so far. We have the usual batch of bugs and two fixes to code
  merged this cycle:

   - Restore valgrind support for the ioctl verbs interface merged this
     window, and fix a missed error code on an error path from that
     conversion

   - A user reported crash on obsolete mthca hardware

   - pvrdma was using the wrong command opcode toward the hypervisor

   - NULL pointer crash regression when dumping rdma-cm over netlink

   - Be conservative about exposing the global rkey"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/uverbs: Mark ioctl responses with UVERBS_ATTR_F_VALID_OUTPUT
  RDMA/mthca: Clear QP objects during their allocation
  RDMA/vmw_pvrdma: Return the correct opcode when creating WR
  RDMA/cma: Add cm_id restrack resource based on kernel or user cm_id type
  RDMA/nldev: Don't expose unsafe global rkey to regular user
  RDMA/uverbs: Fix post send success return value in case of error
2019-01-18 17:17:20 +12:00
Petr Machata
6685987c29 switchdev: Add extack argument to call_switchdev_notifiers()
A follow-up patch will enable vetoing of FDB entries. Make it possible
to communicate details of why an FDB entry is not acceptable back to the
user.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 15:18:47 -08:00
Petr Machata
4c59b7d160 vxlan: Add extack to switchdev operations
There are four sources of VXLAN switchdev notifier calls:

- the changelink() link operation, which already supports extack,
- ndo_fdb_add() which got extack support in a previous patch,
- FDB updates due to packet forwarding,
- and vxlan_fdb_replay().

Extend vxlan_fdb_switchdev_call_notifiers() to include extack in the
switchdev message that it sends, and propagate the argument upwards to
the callers. For the first two cases, pass in the extack gotten through
the operation. For case #3, pass in NULL.

To cover the last case, extend vxlan_fdb_replay() to take extack
argument, which might come from whatever operation necessitated the FDB
replay.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 15:18:47 -08:00
Petr Machata
87b0984ebf net: Add extack argument to ndo_fdb_add()
Drivers may not be able to support certain FDB entries, and an error
code is insufficient to give clear hints as to the reasons of rejection.

In order to make it possible to communicate the rejection reason, extend
ndo_fdb_add() with an extack argument. Adapt the existing
implementations of ndo_fdb_add() to take the parameter (and ignore it).
Pass the extack parameter when invoking ndo_fdb_add() from rtnl_fdb_add().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 15:18:47 -08:00
David Herrmann
f5dd3d0c96 net: introduce SO_BINDTOIFINDEX sockopt
This introduces a new generic SOL_SOCKET-level socket option called
SO_BINDTOIFINDEX. It behaves similar to SO_BINDTODEVICE, but takes a
network interface index as argument, rather than the network interface
name.

User-space often refers to network-interfaces via their index, but has
to temporarily resolve it to a name for a call into SO_BINDTODEVICE.
This might pose problems when the network-device is renamed
asynchronously by other parts of the system. When this happens, the
SO_BINDTODEVICE might either fail, or worse, it might bind to the wrong
device.

In most cases user-space only ever operates on devices which they
either manage themselves, or otherwise have a guarantee that the device
name will not change (e.g., devices that are UP cannot be renamed).
However, particularly in libraries this guarantee is non-obvious and it
would be nice if that race-condition would simply not exist. It would
make it easier for those libraries to operate even in situations where
the device-name might change under the hood.

A real use-case that we recently hit is trying to start the network
stack early in the initrd but make it survive into the real system.
Existing distributions rename network-interfaces during the transition
from initrd into the real system. This, obviously, cannot affect
devices that are up and running (unless you also consider moving them
between network-namespaces). However, the network manager now has to
make sure its management engine for dormant devices will not run in
parallel to these renames. Particularly, when you offload operations
like DHCP into separate processes, these might setup their sockets
early, and thus have to resolve the device-name possibly running into
this race-condition.

By avoiding a call to resolve the device-name, we no longer depend on
the name and can run network setup of dormant devices in parallel to
the transition off the initrd. The SO_BINDTOIFINDEX ioctl plugs this
race.

Reviewed-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 14:55:51 -08:00
Vakul Garg
692d7b5d1f tls: Fix recvmsg() to be able to peek across multiple records
This fixes recvmsg() to be able to peek across multiple tls records.
Without this patch, the tls's selftests test case
'recv_peek_large_buf_mult_recs' fails. Each tls receive context now
maintains a 'rx_list' to retain incoming skb carrying tls records. If a
tls record needs to be retained e.g. for peek case or for the case when
the buffer passed to recvmsg() has a length smaller than decrypted
record length, then it is added to 'rx_list'. Additionally, records are
added in 'rx_list' if the crypto operation runs in async mode. The
records are dequeued from 'rx_list' after the decrypted data is consumed
by copying into the buffer passed to recvmsg(). In case, the MSG_PEEK
flag is used in recvmsg(), then records are not consumed or removed
from the 'rx_list'.

Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 14:20:40 -08:00
Florian Fainelli
5db5ea995f net: phy: Add helpers to determine if PHY driver is generic
We are already checking in phy_detach() that the PHY driver is of
generic kind (1G or 10G) and we are going to make use of that in the SFP
layer as well for 1000BaseT SFP modules, so expose helper functions to
return that information.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 11:33:17 -08:00
Florian Fainelli
8cfb5faf32 net: dsa: Include platform_data header file
b53 and mv88e6xxx support passing platform_data, and now that we have
split the platform_data portion from the main net/dsa.h header file,
include only the relevant parts.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 11:31:24 -08:00
Florian Fainelli
ecfc937210 net: dsa: Split platform data to header file
Instead of having net/dsa.h contain both the internal switch tree/driver
structures, split the relevant platform_data parts into
include/linux/platform_data/dsa.h and make that header be included by
net/dsa.h in order not to break any setup. A subsequent set of patches
will update code including net/dsa.h to include only the platform_data
header.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 11:31:24 -08:00
Linus Torvalds
a3a80255d5 AFS fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAXECcyPu3V2unywtrAQIpGw//ctcGHg2sfUEra17pvlKEbpZe9bMeJxp6
 UY2YR5gPpiYMmvNe8hLl3I68b43h03jGOx+KmqowqX7anNq3o2nMy0pDbuGmKtuS
 5NmIOECAml8k2uPSpacAF2s6TsxB2lTDYwZdyeuRmZ4scOTujNby33RlijGIxX4s
 87WJFRuCacm9I1KkiKKn4PWoYGjDdsZ7rsDyEeBmQ/MiKOSLG4QP5XuNr4X9zFMX
 r8uF3N8h/NzJWefEirc2DPFfiWLJqkyclq9tgsTB1Z3l+x5u/MHnIg3rZpZhH0uC
 GhGWjlGYqTxOwzYCaOOsNIDRF4rAGPi3lzuJXjONnhvbOO7DCGJ+Mo2obxkAqLL6
 PrtFuQvgXIOl/k8y6AdckuPPu/OMHT1hyY0PQXmTGhHAAfPP3RHoPl7owEjmAbRg
 hvRkFDSIKZ4Kr1nKP4vwaJYEKtxUQrkOwZKmN6ve31ZeJsyrH12MsCgWvp7oQwRJ
 fVbk8DWVRtYzy4RaO3Xr+0WfD+03dDi6KKCPPiC2gtNKOO+1Kco/EtIqPi6SXg0m
 ee/mFOkRsmEh++iNxS58qLxH37On6GSYOElSIMN0NJDNA2TzLrvidXlMSOTD1hj4
 n6gL38E3br/CIimKPYm87qi6yC59CAtrJCulYiPOoMc20eaEUP4DIXN8yjgjLNQp
 Hj5M9GRTwXk=
 =cUrU
 -----END PGP SIGNATURE-----

Merge tag 'afs-fixes-20190117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull AFS fixes from David Howells:
 "Here's a set of fixes for AFS:

   - Use struct_size() for kzalloc() size calculation.

   - When calling YFS.CreateFile rather than AFS.CreateFile, it is
     possible to create a file with a file lock already held. The
     default value indicating no lock required is actually -1, not 0.

   - Fix an oops in inode/vnode validation if the target inode doesn't
     have a server interest assigned (ie. a server that will notify us
     of changes by third parties).

   - Fix refcounting of keys in file locking.

   - Fix a race in refcounting asynchronous operations in the event of
     an error during request transmission. The provision of a dedicated
     function to get an extra ref on a call is split into a separate
     commit"

* tag 'afs-fixes-20190117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Fix race in async call refcounting
  afs: Provide a function to get a ref on a call
  afs: Fix key refcounting in file locking code
  afs: Don't set vnode->cb_s_break in afs_validate()
  afs: Set correct lock type for the yfs CreateFile
  afs: Use struct_size() in kzalloc()
2019-01-18 06:27:24 +12:00